MedLeak: Multimodal Medical Data Leakage in Secure Federated Learning with Crafted Models

Abstract

Federated learning (FL) allows participants to collaboratively train machine learning models while keeping their data private, making it ideal for collaborations among healthcare institutions on sensitive datasets. However, in this paper, we demonstrate a novel privacy attack called MedLeak, which allows a malicious participant who initiates the FL task as the server to recover highquality site-specific private medical images and text records from the model updates uploaded by clients. In MedLeak, a malicious server introduces an adversarially crafted model during the FL training process. Honest clients, unaware of the insidious changes in the published model, continue to send back their updates as per the standard FL training protocol. Leveraging a novel analytical method, MedLeak can efficiently recover private client data from the aggregated parameter updates. This recovery scheme is significantly more efficient than the state-of-the-art solutions, as it avoids the costly optimization process. Additionally, the scheme relies solely on the aggregated updates, thus rendering secure aggregation protocols ineffective, as they depend on the randomization of intermediate results for security while leaving the final aggregated results unaltered.

We implement MedLeak on medical image datasets MedMNIST, COVIDx CXR-4, and Kaggle Brain Tumor MRI datasets, as well as the medical text dataset MedAbstract. The results demonstrate that the proposed privacy attack is highly effective on both image and text datasets, achieving high recovery rates and strong quantitative scores. We also thoroughly evaluate MedLeak across different attack parameters, providing insights into key factors that influence attack performance and potential defenses. Furthermore, we perform downstream tasks, such as disease classification, using the recovered data, showing no significant performance degradation compared to the original training samples. Our findings validate the need for enhanced privacy measures in federated learning systems, particularly for safeguarding sensitive medical data against powerful model inversion attacks.

Description

Keywords

Citation