MedLeak: Multimodal Medical Data Leakage in Secure Federated Learning with Crafted Models

Shi, Shanghao; Haque, Md Shahedul; Parida, Abhijeet; Zhang, Chaoyu; Linguraru, Marius George; Hou, Y. Thomas; Anwar, Syed Muhammad; Lou, Wenjing

MedLeak: Multimodal Medical Data Leakage in Secure Federated Learning with Crafted Models

dc.contributor.author	Shi, Shanghao	en
dc.contributor.author	Haque, Md Shahedul	en
dc.contributor.author	Parida, Abhijeet	en
dc.contributor.author	Zhang, Chaoyu	en
dc.contributor.author	Linguraru, Marius George	en
dc.contributor.author	Hou, Y. Thomas	en
dc.contributor.author	Anwar, Syed Muhammad	en
dc.contributor.author	Lou, Wenjing	en
dc.date.accessioned	2025-12-03T14:42:21Z	en
dc.date.available	2025-12-03T14:42:21Z	en
dc.date.issued	2025-06-24	en
dc.date.updated	2025-12-01T08:46:15Z	en
dc.description.abstract	Federated learning (FL) allows participants to collaboratively train machine learning models while keeping their data private, making it ideal for collaborations among healthcare institutions on sensitive datasets. However, in this paper, we demonstrate a novel privacy attack called MedLeak, which allows a malicious participant who initiates the FL task as the server to recover highquality site-specific private medical images and text records from the model updates uploaded by clients. In MedLeak, a malicious server introduces an adversarially crafted model during the FL training process. Honest clients, unaware of the insidious changes in the published model, continue to send back their updates as per the standard FL training protocol. Leveraging a novel analytical method, MedLeak can efficiently recover private client data from the aggregated parameter updates. This recovery scheme is significantly more efficient than the state-of-the-art solutions, as it avoids the costly optimization process. Additionally, the scheme relies solely on the aggregated updates, thus rendering secure aggregation protocols ineffective, as they depend on the randomization of intermediate results for security while leaving the final aggregated results unaltered. We implement MedLeak on medical image datasets MedMNIST, COVIDx CXR-4, and Kaggle Brain Tumor MRI datasets, as well as the medical text dataset MedAbstract. The results demonstrate that the proposed privacy attack is highly effective on both image and text datasets, achieving high recovery rates and strong quantitative scores. We also thoroughly evaluate MedLeak across different attack parameters, providing insights into key factors that influence attack performance and potential defenses. Furthermore, we perform downstream tasks, such as disease classification, using the recovered data, showing no significant performance degradation compared to the original training samples. Our findings validate the need for enhanced privacy measures in federated learning systems, particularly for safeguarding sensitive medical data against powerful model inversion attacks.	en
dc.description.version	Published version	en
dc.format.mimetype	application/pdf	en
dc.identifier.doi	https://doi.org/10.1145/3721201.3721375	en
dc.identifier.uri	https://hdl.handle.net/10919/139815	en
dc.language.iso	en	en
dc.publisher	ACM	en
dc.rights	Creative Commons Attribution 4.0 International	en
dc.rights.holder	The author(s)	en
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	en
dc.title	MedLeak: Multimodal Medical Data Leakage in Secure Federated Learning with Crafted Models	en
dc.type	Article - Refereed	en
dc.type.dcmitype	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 3721201.3721375.pdf
Size:: 2.27 MB
Format:: Adobe Portable Document Format
Description:: Published version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.5 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Journal Articles, Association for Computing Machinery (ACM)
Scholarly Works, Computer Science
Scholarly Works, Electrical and Computer Engineering