MedLeak: Multimodal Medical Data Leakage in Secure Federated Learning with Crafted Models

dc.contributor.authorShi, Shanghaoen
dc.contributor.authorHaque, Md Shahedulen
dc.contributor.authorParida, Abhijeeten
dc.contributor.authorZhang, Chaoyuen
dc.contributor.authorLinguraru, Marius Georgeen
dc.contributor.authorHou, Y. Thomasen
dc.contributor.authorAnwar, Syed Muhammaden
dc.contributor.authorLou, Wenjingen
dc.date.accessioned2025-12-03T14:42:21Zen
dc.date.available2025-12-03T14:42:21Zen
dc.date.issued2025-06-24en
dc.date.updated2025-12-01T08:46:15Zen
dc.description.abstractFederated learning (FL) allows participants to collaboratively train machine learning models while keeping their data private, making it ideal for collaborations among healthcare institutions on sensitive datasets. However, in this paper, we demonstrate a novel privacy attack called MedLeak, which allows a malicious participant who initiates the FL task as the server to recover highquality site-specific private medical images and text records from the model updates uploaded by clients. In MedLeak, a malicious server introduces an adversarially crafted model during the FL training process. Honest clients, unaware of the insidious changes in the published model, continue to send back their updates as per the standard FL training protocol. Leveraging a novel analytical method, MedLeak can efficiently recover private client data from the aggregated parameter updates. This recovery scheme is significantly more efficient than the state-of-the-art solutions, as it avoids the costly optimization process. Additionally, the scheme relies solely on the aggregated updates, thus rendering secure aggregation protocols ineffective, as they depend on the randomization of intermediate results for security while leaving the final aggregated results unaltered. We implement MedLeak on medical image datasets MedMNIST, COVIDx CXR-4, and Kaggle Brain Tumor MRI datasets, as well as the medical text dataset MedAbstract. The results demonstrate that the proposed privacy attack is highly effective on both image and text datasets, achieving high recovery rates and strong quantitative scores. We also thoroughly evaluate MedLeak across different attack parameters, providing insights into key factors that influence attack performance and potential defenses. Furthermore, we perform downstream tasks, such as disease classification, using the recovered data, showing no significant performance degradation compared to the original training samples. Our findings validate the need for enhanced privacy measures in federated learning systems, particularly for safeguarding sensitive medical data against powerful model inversion attacks.en
dc.description.versionPublished versionen
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1145/3721201.3721375en
dc.identifier.urihttps://hdl.handle.net/10919/139815en
dc.language.isoenen
dc.publisherACMen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.holderThe author(s)en
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.titleMedLeak: Multimodal Medical Data Leakage in Secure Federated Learning with Crafted Modelsen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
3721201.3721375.pdf
Size:
2.27 MB
Format:
Adobe Portable Document Format
Description:
Published version
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: