Towards Workload-aware Efficient Machine Learning Systems
dc.contributor.author | Khan, Redwan Ibne Seraj | en |
dc.contributor.committeechair | Butt, Ali | en |
dc.contributor.committeemember | Cameron, Kirk W. | en |
dc.contributor.committeemember | Ji, Bo | en |
dc.contributor.committeemember | Jian, Xun | en |
dc.contributor.committeemember | Cheng, Yue | en |
dc.contributor.department | Computer Science and#38; Applications | en |
dc.date.accessioned | 2025-03-04T09:00:13Z | en |
dc.date.available | 2025-03-04T09:00:13Z | en |
dc.date.issued | 2025-03-03 | en |
dc.description.abstract | Machine learning (ML) is transforming various aspects of our lives, driving the need for computing systems that efficiently support large-scale ML workloads. As models grow in size and complexity, existing systems struggle to adapt, limiting both performance and flexibility. Additionally, ML techniques can enhance traditional computing tasks, but current systems lack the adaptability to integrate these advancements effectively. Building systems for running machine learning workloads, and running workloads using machine learning - both require a careful understanding of the nature of the systems and ML models. In this dissertation we design and develop a series of novel storage and scheduling solutions for ML systems by bringing attention to the unique characteristics of workloads and the underlying system. We find that by designing ML systems that are finely tuned to workload characteristics and underlying infrastructure, we can significantly enhance application performance and maximize resource utilization. In the first part of this dissertation (Ch- 3), we analyze popular ML models and datasets, uncovering insights that inspired SHADE, a data-importance-aware caching solution for ML. The second part of this dissertation (Ch- 4) proposes to leverage system characteristics of hundreds of client devices along with the characteristics of the samples within the clients to design novel sampling, caching and client scheduling mechanisms to tackle the data and system heterogeneity among client devices and thereby fundamentally improve the performance of federated learning using edge devices in the cloud. The third part of this dissertation (Ch- 5) proposes to leverage multi-agent LLM application and user request characteristics to design an efficient request scheduling mechanism that can serve clients in multi-tenant environments in a fair and efficient manner while preventing abuse. My dissertation demonstrates that leveraging workload-aware strategies can significantly enhance the efficiency (e.g., reduced training time, increased throughput, lower latency) and flexibility (e.g., improved ease of use, deployment, and programmability) of ma- chine learning systems. By accounting for workload dynamicity and heterogeneity, these principles can guide the design of next-generation ML systems, ensuring adaptability to emerging models and evolving hardware technologies. | en |
dc.description.abstractgeneral | Machine learning (ML) has become an integral part of our daily lives, powering applications from virtual assistants to medical diagnostics. As ML models grow larger and more complex, the systems that run them must evolve to keep pace. This dissertation explores how we can build more efficient and adaptable computing systems to support large-scale ML workloads. Traditional computing systems often struggle to accommodate the ever-changing demands of ML applications. Similarly, ML techniques can be leveraged to improve the performance of non-ML workloads, but existing systems lack the flexibility to integrate these advancements seamlessly. This research tackles both challenges: designing systems optimized for ML workloads and enhancing traditional systems using ML-driven insights. By designing intelligent, workload-aware strategies, this research demonstrates substantial improvements in the speed, efficiency, and flexibility of ML systems. These principles will help shape the next generation of computing infrastructure, ensuring that future ML models and applications can be deployed smoothly, regardless of scale or complexity. | en |
dc.description.degree | Doctor of Philosophy | en |
dc.format.medium | ETD | en |
dc.identifier.other | vt_gsexam:42574 | en |
dc.identifier.uri | https://hdl.handle.net/10919/124762 | en |
dc.language.iso | en | en |
dc.publisher | Virginia Tech | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | Machine Learning | en |
dc.subject | Deep Learning | en |
dc.subject | Federated Learning | en |
dc.subject | High Performance Computing | en |
dc.subject | Cloud Computing | en |
dc.subject | Storage Systems | en |
dc.subject | Data Storage | en |
dc.subject | Data Management | en |
dc.subject | Machine Learning Systems | en |
dc.subject | Job Scheduling | en |
dc.subject | Resource Management | en |
dc.subject | MLSys | en |
dc.subject | SysML | en |
dc.subject | Efficiency | en |
dc.subject | Flexibility | en |
dc.title | Towards Workload-aware Efficient Machine Learning Systems | en |
dc.type | Dissertation | en |
thesis.degree.discipline | Computer Science & Applications | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | doctoral | en |
thesis.degree.name | Doctor of Philosophy | en |
Files
Original bundle
1 - 1 of 1