Interpretability and Debugging for Distributed Privacy Preserving Machine Learning

Loading...
Thumbnail Image

Files

TR Number

Date

2026-01-07

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Machine learning systems increasingly rely on privacy-preserving distributed training to leverage sensitive data across multiple organizations without centralization. Federated Learning (FL), a distributed privacy-preserving machine learning paradigm, enables hospitals, devices, and enterprises to collaboratively train models without accessing raw client data (e.g., Siri, Alexa, and healthcare applications). Centralized machine learning benefits from rich debugging and interpretability techniques enabled by transparent access to training data. However, FL removes this transparency, rendering traditional techniques ineffective and making debugging and interpretability a challenging open problem. This thesis addresses this challenge by asking: How can we design automated debugging and interpretability methods for federated learning that effectively localize faults and attribute global model predictions without degrading performance or violating FL's core privacy principles? The central insight is that effective debugging and interpretability can be achieved by analyzing model parameters, activations, and gradients-information already shared or derivable in standard FL protocols (e.g., FedAvg). We present three contributions. First, towards fault localization, we redesign traditional differential testing to operate on neuron activations produced by auto-generated inputs, exploiting the fact that faulty clients produce models with divergent activations. Second, we introduce neuron provenance, which decouples data-influence tracking from data access. It identifies influential neurons via gradient-based weighting and decomposes them to client-specific origins, yielding ranked lists of responsible clients across CNNs and Transformers. Third, we extend neuron provenance to federated LLMs, where autoregressive generation and billion-parameter scale make naive tracking infeasible. It introduces token-level provenance at targeted transformer layers, achieving high attribution accuracy across multiple LLM architectures. In each case, the solution operates entirely on information available at the aggregator, requiring no client-side instrumentation. Collectively, these contributions culminate in practical tools that integrate seamlessly with existing distributed ML workflows, enabling real-time debugging and transparent model insights for both classification and LLMs in FL.

Description

Keywords

Federated Learning, Privacy, Interpretability, Debugging, LLMs, CNNs

Citation