FLARE: Defending Federated Learning against Model Poisoning Attacks via Latent Space Representations

TR Number
Journal Title
Journal ISSN
Volume Title

Federated learning (FL) has been shown vulnerable to a new class of adversarial attacks, known as model poisoning attacks (MPA), where one or more malicious clients try to poison the global model by sending carefully crafted local model updates to the central parameter server. Existing defenses that have been fixated on analyzing model parameters show limited effectiveness in detecting such carefully crafted poisonous models. In this work, we propose FLARE, a robust model aggregation mechanism for FL, which is resilient against state-of-the-art MPAs. Instead of solely depending on model parameters, FLARE leverages the penultimate layer representations (PLRs) of the model for characterizing the adversarial influence on each local model update. PLRs demonstrate a better capability to differentiate malicious models from benign ones than model parameter-based solutions. We further propose a trust evaluation method that estimates a trust score for each model update based on pairwise PLR discrepancies among all model updates. Under the assumption that honest clients make up the majority, FLARE assigns a trust score to each model update in a way that those far from the benign cluster are assigned low scores. FLARE then aggregates the model updates weighted by their trust scores and finally updates the global model. Extensive experimental results demonstrate the effectiveness of FLARE in defending FL against various MPAs, including semantic backdoor attacks, trojan backdoor attacks, and untargeted attacks, and safeguarding the accuracy of FL.