Breaking Privacy in Model-Heterogeneous Federated Learning

Haldankar, Atharva Amit

Breaking Privacy in Model-Heterogeneous Federated Learning

dc.contributor.author	Haldankar, Atharva Amit	en
dc.contributor.committeechair	Hoang, Thang	en
dc.contributor.committeemember	Cho, Jin-Hee	en
dc.contributor.committeemember	Viswanath, Bimal	en
dc.contributor.department	Computer Science and#38; Applications	en
dc.date.accessioned	2024-05-15T08:01:54Z	en
dc.date.available	2024-05-15T08:01:54Z	en
dc.date.issued	2024-05-14	en
dc.description.abstract	Federated learning (FL) is a communication protocol that allows multiple distrustful clients to collaboratively train a machine learning model. In FL, data never leaves client devices; instead, clients only share locally computed gradients or model parameters with a central server. As individual gradients may leak information about a given client's dataset, secure aggregation was proposed. With secure aggregation, the server only receives the aggregate gradient update from the set of all sampled clients without being able to access any individual gradient. One challenge in FL is the systems-level heterogeneity that is quite often present among client devices. Specifically, clients in the FL protocol may have varying levels of compute power, on-device memory, and communication bandwidth. These limitations are addressed by model-heterogeneous FL schemes, where clients are able to train on subsets of the global model. Despite the benefits of model-heterogeneous schemes in addressing systems-level challenges, the implications of these schemes on client privacy have not been thoroughly investigated. In this thesis, we investigate whether the nature of model distribution and the computational heterogeneity among client devices in model-heterogeneous FL schemes may result in the server being able to recover sensitive information from target clients. To this end, we propose two novel attacks in the model-heterogeneous setting, even with secure aggregation in place. We call these attacks the Convergence Rate Attack and the Rolling Model Attack. The Convergence Rate Attack targets schemes where clients train on the same subset of the global model, while the Rolling Model Attack targets schemes where model-parameters are dynamically updated each round. We show that a malicious adversary is able to compromise the model and data confidentiality of a target group of clients. We evaluate our attacks on the MNIST dataset and show that using our techniques, an adversary can reconstruct data samples with high fidelity.	en
dc.description.abstractgeneral	Federated learning (FL) is a communication protocol that allows multiple distrustful users to collaboratively train a machine learning model together. In FL, data never leaves user devices; instead, users only share locally computed gradients or model parameters (e.g. weight and bias values) with an aggregation server. As individual gradients may leak information about a given user's dataset, secure aggregation was proposed. Secure aggregation is a protocol that users and the server run together, where the server only receives the aggregate gradient update from the set of all sampled users instead of each individual user update. In FL, users often have varying levels of compute power, on-device memory, and communication bandwidth. These differences between users are collectively referred to as systems-level (or system) heterogeneity. While there are a number of techniques to address system heterogeneity, one popular approach is to have users train on different subsets of the global model. This approach is known as model-heterogeneous FL. Despite the benefits of model-heterogeneous FL schemes in addressing systems-level challenges, the implications of these schemes on user privacy have not been thoroughly investigated. In this thesis, we investigate whether the nature of model distribution and the differences in compute power between user devices in model-heterogeneous FL schemes may result in the server being able to recover sensitive information. To this end, we propose two novel attacks in the model-heterogeneous setting with secure aggregation in place. We call these attacks the Convergence Rate Attack and the Rolling Model Attack. The Convergence Rate Attack targets schemes where users train on the same subset of the global model, while the Rolling Model Attack targets schemes where model-parameters may change each round. We first show that a malicious server is able to obtain individual user updates, despite secure aggregation being in place. Then, we demonstrate how an adversary can utilize those updates to reverse engineer data samples from users. We evaluate our attacks on the MNIST dataset, a commonly used dataset of handwritten digits and their labels. We show that by running our attacks, an adversary can accurately identify what images a user trained on.	en
dc.description.degree	Master of Science	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:40365	en
dc.identifier.uri	https://hdl.handle.net/10919/118984	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Model-Heterogeneous FL	en
dc.subject	Secure Aggregation	en
dc.subject	Privacy	en
dc.subject	Confidentiality	en
dc.subject	Gradient Inversion	en
dc.title	Breaking Privacy in Model-Heterogeneous Federated Learning	en
dc.type	Thesis	en
thesis.degree.discipline	Computer Science & Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Haldankar_AA_T_2024.pdf
Size:: 714.79 KB
Format:: Adobe Portable Document Format

Download

Collections

Masters Theses