Balancing Privacy Protection and Interpretability in Federated Learning

Zhe Li Honglong Chen Zhichen Ni Huajie Shao

Abstract

Federated learning (FL) aims to collaboratively train the global model in a distributed manner by sharing the model parameters from local clients to a central server, thereby potentially protecting users’ private information. Nevertheless, recent studies have illustrated that FL still suffers from information leakage as adversaries try to recover the training data by analyzing shared parameters from local clients. To deal with this issue, differential privacy (DP) is adopted to add noise to the gradients of local models before aggregation. It, however, results in the poor performance of gradient-based interpretability methods, since some weights capturing the salient region in feature map will be perturbed. To overcome this problem, we propose a simple yet effective adaptive differential privacy (ADP) mechanism that selectively adds noisy perturbations to the gradients of client models in FL. We also theoretically analyze the impact of gradient perturbation on the model interpretability. Finally, extensive experiments on both IID and Non-IID data demonstrate that the proposed ADP can achieve a good trade-off between privacy and interpretability in FL.

1 Introduction

Federated learning (FL) [23] is an emerging privacy-preserving mechanism that collaboratively trains a global model without exchanging data with different clients [20, 19, 16, 9]. Since it has the potential to protect the private information, FL has been widely applied to a variety of application domains, such as healthcare [28, 10], insurance industry [21], and Internet of Things (IoTs) [34, 31, 12]. Recent works have pointed out that FL is vulnerable to gradient leakage attacks (GLA) that try to reconstruct the training data from the publicly shared gradients with a central server [35, 30]. To deal with this problem, one of commonly used defense strategies is differential privacy (DP) [11], which injects noise to the model parameters (weights or gradients) before they are uploaded to a central server [33]. However, DP protects private information of users at the cost of model accuracy in FL. Hence, some methods have been developed to achieve a good trade-off between privacy preservation and accuracy. Existing works mainly pay attention to accuracy but neglect the model interpretability in FL. In reality, injecting noise to model parameters using differential privacy will hurt the gradient-based interpretability, as illustrated in Figure 1. As a result, the lack of interpretability will hinder the deployment of FL in safety-critical domains, such as healthcare and autonomous driving. While there exist very few studies [25, 13] on the trade-offs between interpretability and privacy protection in general machine learning (not federated learning), they mainly focus on perturbation-based interpretation by modifying the input features. To our best knowledge, none of prior works has explored the trade-off between privacy protection and gradient-based interpretability in FL.

Refer to caption — Figure 1: Visualizations of different methods for model inerpretability using Grad-CAM [29]. Red regions denote higher scores for classification. We can see that the regular DP will lead to poor model interpretability while our ADP can still retain the performance of interpretability. More importantly, it can still be robust to gradient leakage attack and guarantee privacy protection.

In this paper, we will study how to trade off privacy protection and interpretability in the context of privacy-preserving federated learning. The key challenge lies in how to smartly add noise to the model parameters while trying to retain its interpretability. To address this challenge, we first theoretically analyze the impact of injected noise on gradient-based model interpretability. This work uses the well-known Grad-CAM [29] as a representative. Based on the analysis results, we develop a novel adaptive differential privacy (ADP) approach that selectively adds noisy perturbations to the weights of local clients. More specifically, we do not add noise to the important (large) weights that capture the salient regions of feature map but inject noise to small weights that may not affect the model interpretability too much. Figure 1 shows an illustration example of the proposed ADP for ensuring privacy protection and interpretability. Finally, we evaluate the performance of our ADP on multiple benchmark datasets. Evaluation results demonstrate that our approach is simple yet effective, which can improve the model interpretability while guaranteeing privacy in FL. Importantly, it is robust to gradient leakage attack.

In sum, the main contributions of this work include: 1) to our best knowledge, we are the first to explore the trade-offs between privacy protection and model interpretability in federated learning; 2) we provide theoretical analysis on the impact of injected noise on gradient-based interpretability methods; 3) a novel adaptive differential privacy (ADP) approach is proposed to inject noise to model parameters smartly; and 4) experimental results demonstrate the proposed approach can achieve a good trade-off between privacy protection and model interpretability in FL.

2 Preliminaries

In this section, we introduce some basic knowledge about gradient-based interpretability, federated learning, and differential privacy that will be used in the paper.

2.1 Grad-CAM

As a typical gradient-based interpretability approach, Gradient-weighted Class Activation Mapping (Grad-CAM) [29] aims to provide visual explanations on the tasks of computer vision by using the gradients to capture the salient regions of feature map in the last convolutional layer of deep neural networks.

In this paper, we use image classification as a running example to introduce the basic idea of Grad-CAM. Let $y^{c}$ be the prediction score (before softmax activation) of class $c$ and $Z^{k}$ denote the feature map of a certain (e.g., last) convolutional layer. The gradient of $y^{c}$ with respect to $Z^{k}$ is denoted by $\frac{\partial y^{c}}{\partial Z^{k}}$ . Then we can leverage the gradients via back-propagation to get the following neuron importance weights as:

\displaystyle\alpha_{k}^{c}=\frac{1}{M}\sum_{i}\sum_{j}\frac{\partial y^{c}}{\partial Z_{ij}^{k}},

where $k$ represents the $k$ -th channel of feature map $Z$ , $M$ represents the product of the width and height of the feature map, $\alpha_{k}^{c}$ denotes the importance of feature map $Z^{k}$ for class $c$ , and $Z_{ij}^{k}$ is the data of feature map $Z$ at the coordinate $(i,j)$ in channel $k$ .

Next, we can use weighed sum of forward activation maps and the ReLU activation function to obtain the localization map Grad-CAM $L_{Grad-CAM}^{c}$ as follows.

\displaystyle L_{Grad-CAM}^{c}=ReLU\Big{(}\sum_{k}\alpha_{k}^{c}Z^{k}\Big{)}.

Note that, ReLU is used to remain features that have a positive impact on class $c$ .

2.2 Federated learning

Federated learning is an effective privacy-preserving technique that collaboratively trains a global model without sharing data with each client. The detailed implementation of FL can be summarized as follows. First, a central server sends a global model with weight initialization $w_{0}$ to local clients. Then $K$ selected clients will train the model with local data, and then each of them, e.g., client $k$ , will upload the model parameter $w_{k}^{t}$ to the central server at the $t$ -th round of communication. On server side, it will update $w_{t}$ by aggregating the weights from selected clients, yielding

w_{t}\leftarrow\text{Aggregate}(w_{1}^{t},\ldots,w_{k}^{t},\ldots,w_{K}^{t}).

After that, the server will broadcast the global parameter to local clients for the next round of training. In local training, the weights of each client can be updated with

w_{k}^{t+1}\leftarrow w_{t}-\eta\nabla F_{k}(w_{k}^{t+1}),

where $\eta$ is the learning rate and $\nabla F_{k}(w_{k}^{t+1})$ is the gradient of client $k$ .

2.3 Differential privacy

As a privacy-preserving method, differential privacy (DP) attempts to maximize the useful information retrieved from a database while minimizing the expenditure of privacy resources [4]. In recent years, DP has been widely used in federated learning to defend against gradient leakage attack. One of the most popular DP techniques is called ( $\epsilon$ , $\delta$ )-differential privacy [11], given by

\displaystyle{\rm Pr}\big{[}M(D_{i})\in S\big{]}\leq e^{\epsilon}\,{\rm Pr}\big{[}M(D^{{}^{\prime}}_{i})\in S\big{]}+\delta,

where $D_{i}$ and $D^{{}^{\prime}}_{i}$ are two neighboring datasets and only one single entry of data differs between them. $\epsilon$ is the privacy budget. The smaller $\epsilon$ , the higher privacy protection. In addition, the parameter $\delta$ denotes the upper bound of the output probability that can be changed by adding or removing a record in the dataset, i.e., the privacy protection standard.

In the applications of federated learning, we often achieve differential privacy by adding a moderate amount of Gaussian noise and clipping the gradient.

3 Effect of DP on interpretability

In this section, we provide theoretical analysis about the impact of DP on gradient-based interpretability in FL.

We take an $L$ -layer deep neural network (DNN) where the last two layers are composed of one convolutional layer (the second to last layer) and one fully connected layer as an example. The detailed network architecture is given by

	$\displaystyle Z^{(1)}=W^{(1)}X+b^{(1)},$
	$\displaystyle H^{(1)}=\sigma\big{(}Z^{(1)}\big{)},$
	$\displaystyle\ \ \ \ \ \ \ \ \ \ \ \vdots$
	$\displaystyle Z^{(l-1)}=Conv\big{(}H^{(l-2)},W^{(l-1)}\big{)}+b^{(l-1)},$
	$\displaystyle H^{(l-1)}=\sigma\big{(}Z^{(l-1)}\big{)},$
	$\displaystyle Y=W^{(l)}H^{(l-1)}+b^{(l)},$

where $Y$ is the output logits before softmax layer and $\sigma(\cdot)$ is the activation function. In addition, $Conv()$ is the second to last convoluational layer after $L-2$ fully connected layers, and the last layer is fully connected.

In the following, we will investigate how the injected noise added to the model parameters influences the gradient-based interpretability in privacy-preserving FL. In this work, we mainly focus on the widely used GradCAM for model interpretability.

3.1 Feedforward propagation

Since the GradCAM approach mainly relies on the feature map of the last convolutional layer for model interpretability, for simplicity, we will focus on the last two layers of the DNN, including one convolutional layer and one fully connected layer. Specifically, let $H^{(l-2)}$ and $W^{(l-1)}$ denote the input of the convolutional layer and the convolution kernel, respectively. Then we add the independent identical distribution (IID) Gaussian noise, $n_{\epsilon}\sim N(0,\zeta^{2})$ , to the convolution kernel $W^{(l-1)}$ and the last fully connected layer, yielding

	$\displaystyle\widetilde{W}^{(l-1)}=W^{(l-1)}+n_{\epsilon},$
	$\displaystyle\widetilde{W}^{(l)}=W^{(l)}+n_{\epsilon}.$

After injecting noise, the last two layers of the DNN can be written as:

	$\displaystyle Z^{(l-1)}=\text{Conv}\big{(}H^{(l-2)},\widetilde{W}^{(l-1)}\big{)}+b^{(l-1)},$
	$\displaystyle H^{(l-1)}=\sigma\big{(}Z^{(l-1)}\big{)},$
	$\displaystyle Y=\widetilde{W}^{(l)}H^{(l-1)}+b^{(l)}.$

Note that for simplicity, we do not show the injected noise to the bias term in the derivation, but will add the noise to all the parameters in the DNN. Next, we still study how the noise impacts the weights during backpropagation.

3.2 Backpropagation

Assume that the loss function is denoted as $L$ and the prediction (before softmax) for class $c$ as $Y_{c}$ . To obtain the gradient of $Y_{c}$ with respect to the feature map in the $i$ -th channel $\frac{\partial Y_{c}}{\partial Z_{i}^{(l-1)}}$ , we need to compute the backpropagation in the following steps.

	$\displaystyle\frac{\partial Y_{c}}{\partial H_{i}^{(l-1)}}=\widetilde{W}_{c,i}^{(l)},$
	$\displaystyle\frac{\partial H_{i}^{(l-1)}}{\partial Z_{i}^{(l-1)}}=\sigma^{\prime}(Z_{i}^{(l-1)}).$

Based on the above derivation, we can obtain the following expression for $\frac{\partial Y_{c}}{\partial Z_{i}^{(l-1)}}$ :

		$\displaystyle\frac{\partial Y_{c}}{\partial Z_{i}^{(l-1)}}=\frac{\partial Y_{c}}{\partial H_{i}^{(l-1)}}\frac{\partial H_{i}^{(l-1)}}{\partial Z_{i}^{(l-1)}}=\widetilde{W}_{c,i}^{(l)}\sigma^{\prime}(Z_{i}^{(l-1)})$
		$\displaystyle=\big{(}W_{c,i}^{(l)}+n_{\epsilon}\big{)}\sigma^{\prime}\Big{(}\text{Conv}_{i}\Big{(}H^{(l-2)},\big{(}W^{(l-1)}+n_{\epsilon}\big{)}\Big{)}+b_{i}^{(l-1)}\Big{)}.$

The above equation is composed of two terms: the first term is the derivative of the last fully connected layer and the second term is the derivative of convolutional layer. Next, we briefly analyze the influence of noise on the feature map using gradient based interpretability, as introduced in Section 2.1. We can see that if we only add noise to the fully connected layer, it will affect the feature map used for model interpretability. While if we only inject noise to the convolutional layer, it will change the output of activation function, thereby changing the feature map.

The above simple analysis suggests that the regular DP will affect the gradient-based interpretability in federated learning. It motivates us to develop a new differential privacy method to improve the model interpretability in the following section.

4 Proposed method

In this section, we are going to design a new adaptive differential privacy (ADP) mechanism that selectively injects noise into the model parameters to improve the model interpretability while retaining privacy protection in federated learning.

4.1 DP-based federated learning

We first briefly present the core idea of DP-based Federated Learning. As described in Section 2.2, federated learning collaboratively trains a global model in a central server by aggregating the weights or gradients from selected clients and then sends them back to the clients after aggregation. Different from the regular FL, DP-based federated learning attempts to add Gaussian noise into the weights before aggregation. Then the weights with noisy perturbation in local clients will be uploaded to the central server. As discussed above, injecting noise will negatively affect the gradient-based interpretability. As a result, the lack of model interpretability hinders the widespread deployment of FL in safety-critical applications, such as healthcare and autonomous driving.

4.2 Adaptive differential privacy mechanism

To deal with the challenge above, we will explore how to improve model interpretability while preserving privacy in the context of federated learning. As we know, gradient-based interpretability, such as Grad-CAM, will target the salient regions (high weights) in the feature map to interpret the important pixel-level features. The key insight is that when the noise with negative value is injected to the large weights, it will counteract the neuron important weights, leading to poor interpretability of FL. Conversely, if we add large positive noise to less important weights, the insignificant regions become salient ones, misleading the interpretability of prediction results. Inspired by this, we propose to design a simple yet effective adaptive differential privacy (ADP) mechanism that selectively injects noise into model parameters. The main idea is that we only inject noise into the less important weights, such that their values should not be over the neuron important weights. To achieve this, we need to sort the weights in a descending order, and then add noise to the bottom $r$ % weights, $\widetilde{W}_{r}$ . If $\widetilde{W}_{r}$ exceeds the smallest value of the top $r$ % weights, it should be clipped.

4.3 Application of ADP to FL

We apply the proposed ADP to improve the interpretability of DNN in federated learning. In this work, we mainly focus on improving the trade-off between privacy and interpretability of image classification using ResNet [14] consisting of multiple convolutional layers. In general, the shallow layers of ResNet contain high-resolution features but with low-level semantic information. The deep layers extract some more complex and abstract features with high-level semantics, but at the same time the feature resolution decreases. Considering both “shallow local features” and “deep global features”, we choose to selectively add noise to the weights of the last convolutional layer of each block of ResNet (8 parameters for ResNet18). We will conduct experiments to validate the good performance of the proposed strategy.

Algorithm 1 Adaptive differential privacy mechanism

r\in(0,1)

: ratio of noise addition,

\epsilon

: privacy budget of differential privacy,

\delta

: relaxation item of differential privacy,

\Delta s

: sensitivity of differential privacy,

target\_layers

: selected convolutional layers,

local\_parameters

: local parameters trained on clients,

target\_parameters

: parameters of

target\_layers

2:Local parameters with adaptive noise.

3:for each

layer

target\_layers

W_{cam}

\leftarrow

GradCAM(target_layer=

layer

)

m

\leftarrow

multiply

r

with number of elements in

W_{cam}

e

\leftarrow

sort

W_{cam}

and take out the

(m-1)

-th element

7: for each weight

w

W_{cam}

8: if

w\geq e

then

9: Set

w

to 0;

10: else

11: Set

w

to 1;

12: end if

13: Noise mask

M

\leftarrow

W_{cam}

;

14: end for

15:end for

16:for each

p

local\_parameters

17:

N

\leftarrow

Gaussian_Simple(

\epsilon

\delta

\Delta s

);

18: for each

q

target\_parameters

19: if

p=q

then

20:

N

\leftarrow

N*M

;

21: end if

22: end for

23:

p

\leftarrow

p

N

;

24:end for

The proposed ADP mechanism is summarized in Algorithm 1. The names of the selected convolutional layers and the parameter of each layer are stored in “target layers” and “target parameters”, respectively. Line 2 shows the weights of feature maps of each convolutional layer obtained by GradCAM. Line 4 sorts the weights of each target layer and Lines 14-21 selectively add noise to the less important weights for interpretability. In order to easily add noise to the weights, we create a mask matrix that can generate binary value (0 and 1) by setting a threshold to judge whether perturbing weights or not. This mask matrix can help adaptively add noise to the model parameters in “target parameters”. Note that, the proposed mechanism is able to flexibly choose the ratio of injected noise to the weights in each target layer. It is simple, easy to use, yet effective to achieve a good trade-off between interpretability and privacy.

5 Evaluation

Table 1: Accuracy comparison of different methods on IID data using 3 random seeds.

Dataset	No DP	DP	Random DP			ADP
Dataset	0% noise	100% noise	75% noise	50% noise	25% noise	75% noise	50% noise	25% noise
MNIST	97.71%	86.08%	86.93%	89.77%	91.39%	87.81%	89.90%	91.77%
Blood	92.00%	50.77%	55.47%	60.97%	62.62%	65.81%	67.01%	69.67%
Animals	69.92%	53.10%	54.14%	56.55%	58.43%	54.56%	57.04%	59.07%

Table 2: Accuracy comparison of different methods on Non-IID data using 3 random seeds.

Dataset	No DP	DP	Random DP			ADP
Dataset	0% noise	100% noise	75% noise	50% noise	25% noise	75% noise	50% noise	25% noise
MNIST	96.96%	71.45%	74.10%	75.04%	81.82%	74.34%	77.63%	82.19%
Blood	89.25%	45.36%	50.20%	52.08%	54.82%	50.73%	54.42%	57.04%
Animals	58.10%	39.02%	40.87%	41.74%	44.90%	41.27%	42.62%	45.48%

In this section, we carry out extensive experiments to evaluate the performance of proposed ADP in FL. First of all, we study the effect of ADP on classification accuracy. Then we exlore the impact of ADP on gradient-based interpretability. Finally, we leverage gradient leakage attack to measure the robustness of the proposed method.

5.1 Datasets

We evaluate the proposed ADP on three benchmark datasets, MNIST [2], Blood [3], and Animals [1], as described below.

MNIST. This dataset contains a total of 70,000 grayscale images of size 28×28, divided into a training set of 60,000 images (and labels) and a test set of 10,000 images (and labels).

Blood. This dataset contains 8-class categories of microscopic peripheral blood cell images. It contains a total of $17092$ RGB images of individual blood cells, which are split into 13600 and 3492 images for training and testing. In our experiment, we resize the images from the original 3 × 360 × 363 pixels to 3 × 64 × 64 pixels.

Animals. This dataset contains 10 animal species, commonly used in the field of image classification. Then we split the total 26,000 images into 20,800 and 5,200 for training and testing. To keep the same image size, we uniformly resize them to 3 × 64 × 64 pixels.

5.2 Model configuration and parameter settings

We use ResNet18 with 17 convolutional layers and 1 fully connected layer for image classification, and choose SGD as the optimizer with a learning rate of 0.001. We set a total of 100 clients for federated training, with 0.1-fraction clients chosen in each communication round. We set the number of communication rounds to 1000, and the local training batch to 100. At each round, we set the epoch to 1 for MNIST and Blood, and 5 for Animals. We implement experiments on both IID and Non-IID scenarios.

In the setting of differential privacy, we follow the common setting of $\delta$ to $10^{-5}$ , test the effect of $\epsilon$ on accuracy by setting $\epsilon$ to 100, 10, 5, 1, 0.1, respectively and the effect of gradient clipping parameter $c$ on accuracy by setting $c$ to 5, 10, 20, 40, 80, respectively. Note that the parameter $\epsilon$ is the total privacy budget, while each client’s budget in each communication round is the total privacy budget divided by the product of the fraction and the number of communication rounds.

5.3 Baselines

Since we are the first to study the trade-off between privacy and gradient-based interpretability, it is hard to find appropriate benchmark baselines in the prior works. Nevertheless, we compare the proposed ADP method with the following baselines

No DP. “No DP” means that no differential privacy is applied to the federated averaging model, which will serve as an essential baseline to provide a comparison for DP-based models.

DP. “DP” stands for applying differential privacy on the basis of No DP, which adds randomly 100% of noise into the model parameters.

Random DP. For a fair comparison, we choose a baseline called “Random DP”, which will add the same ratio of noise to the weights as the proposed ADP method. The only difference between this method and ADP is that it randomly adds noise to the weights while ADP selectively adds noises to the less important weights.

5.4 Impact of ADP on accuracy

Before studying the impact of ADP on model interpretability, we first would like to check whether the proposed method can improve accuracy. In this experiment, we choose to inject 25%, 50%, and 75% noise to model parameters of Random DP and ADP. In addition, we observe that accuracy decreases as the clipping parameter $c$ rises. In order to not affect the accuracy too much, we set parameter $c$ to 5. Regarding privacy budget $\epsilon$ , we set it to $1$ , $5$ , and $5$ for MNIST and the other two datasets by considering the trade-off between accuracy and privacy protection. We compare our methods to the baselines under the IID and Non-IID scenarios.

Tables 1 and 2 illustrate the comparison results of different methods under different ratio of injected noise to model parameters. First of all, it can be seen that as the amount of injected noise increases from 25% to 75%, the classification accuracy drops. This validates that there exists a trade-off between accuracy and privacy. Besides, we can observe that our ADP has a higher accuracy than the DP and Random DP on Blood dataset while achieving comparable accuracy to the other datasets. It suggests that the proposed ADP can improve the classification accuracy to some extent under privacy protection.

5.5 Impact of ADP on interpretability

Next, we explore the influence of our ADP on the interpretability of classification results using three benchmark datasets. Figures 2 and 3 illustrate the visual explanation of different methods using heatmap. We first take a look at the visualizations on MNIST. We can observe that DP no longer localizes the critical parts of handwritten numbers, leading to poor interpretability. In contrast, our ADP still has the ability to localize the most identifiable parts of each number, indicating that it can achieve higher interpretability and improve the recognition ability. In addition, the localizations of random DP are not as good as our method. We will further compare and explain their visualizations on the Blood dataset.

As shown in Figure 2(b), we can observe from the heatmap that the proposed ADP can better localize the salient regions of feature map compared to Random DP and No DP under IID data. It means that our method helps localize the lesions more precisely in medical examinations, thereby facilitating the understanding of the predicted results by doctors and patients. It thus can promote the deployment of FL in safety-critical applications.

Next, we compare the visual interpretation of our method and the baselines on more complex Animals dataset. We can see from Figure 2 (c) that our proposed method can capture the whole facial features and body features of animals, which is in line with human visual understanding. In contrast, No DP can hardly localize facial features and body features of animals while Random DP can only capture part of farcical features. We can conclude that our method can better explain the classification results on complex data.

What is more, we illustrate the visualizations of heatmap for different methods under the Non-IID scenario, as illustrated in Figure 3. Compared to IID, all the methods will reduce their ability to localize the salient regions of feature map. Nevertheless, we can still observe from Figure 3 that our method can better interpret the prediction results.

The above extensive experiments illustrate that our proposed method can not only enhance interpretability but also improve classification accuracy. Therefore, the proposed ADP can help improve the trade-off between privacy, accuracy, and interpretability.

5.6 Robust to gradient leakage attack

Finally, we also show the robustness of the proposed ADP to gradient leakage attack. We adopt the well-known deep leakage from gradients (DLG) algorithm [35] to attack our method. We want to verify whether the DLG method can reconstruct the input data based on the shared weights from local clients. Figures 4 and 5 show the reconstruction of input data by the attacker under different training iterations. We can observe that DLG fails to recover the input data from the share weights under ADP. The main reason is that DLG is not very effective when the noise is large as mentioned in [35]. In the proposed ADP, we add sufficient Gaussian noise with a variance ranging from $10^{-4}$ to $10^{-3}$ to the model parameters, which matches the range of noise scale to defend against gradient leakage attack as discussed in the prior work. Based on these experimental results, we can conclude that the proposed ADP is robust to the gradient leakage attack, thus is able to protect private data in federated learning.

6 Related Work

We review the related work on privacy-preserving federated learning and the trade-offs between privacy protection, accuracy, and interpretability.

6.1 Privacy-preserving federated learning

In order to protect the sensitive information of users, researchers adopt different privacy-protecting methods to prevent gradient leakage attacks (GLK). These approaches can be categorized into two main groups: secure multiparty computing (SMC) [17, 18, 6] and differential privacy (DP) [11]. SMC mainly adopts cryptographic techniques to protect the private information of input, making attackers hard to see the model updates. For instance, Kanagavelu et al. [17] proposed a two-phase SMC method to protect data privacy in federated learning. However, SMC is computational expensive, limiting its application to energy-aware IoT devices. DP aims to inject noise to the model parameters to defend against GLK. Wei et al. [32] proposed a privacy-protecting federated learning framework that adds noisy perturbation to the gradients of local clients before sending them to the server. While DP is very simple yet effective method for privacy protection, it will lead to an accuracy drop of federated learning [8].

6.2 Trade-off between privacy and accuracy

Some studies [13, 7] try to improve the trade-off between privacy and accuracy. For example, Luo et al. [22] combined transfer learning and sparse network finetuning to improve the privacy-utility trade-off in federated learning. Hu et al. [15] developed Fed-SMP, a new differentially private FL framework for ensuring the client-level DP while retaining the model accuracy. In addition, some researchers [16, 5] integrated secure aggregation and distributed DP to improve privacy-accuracy trade-off. These existing works can achieve remarkable performance on the trade-offs between privacy and model accuracy, they do not take into account the interpretability. In real-world applications, such as healthcare, it is crucial to explain and understand the diagnosis results of disease using federated learning [24].

6.3 Trade-off between interpretability and privacy

Some recent works focus on the trade-off between accuracy and interpretability in classical machine learning (not federated learning) [27, 26, 24]. This is because trustworthy machine learning system needs to ensure both the privacy protection and interpretability. Harder et al. [13] attempted to produce accurate prediction while interpreting the classification results using several locally linear maps (LLM) per class. However, the proposed approach tries to explain the results based on input features using some simple datasets. Patel et al. [25] designed a privacy-preserving framework to study the minimum privacy budget required for feature-based model explanations. In summary, existing works focus on the trade-off between privacy and feature-based interpretation rather than gradient-based interpretation in regular machine learning. None of existing research works has explored the trade-off between privacy preservation and gradient-based interpretation in federated learning. To our best knowledge, the proposed adaptive differential privacy mechanism in this paper is the first work to fill the gap between them in federated learning.

7 Conclusion

This paper developed a simple yet effective adaptive differential privacy (ADP) method to overcome the trade-off between privacy and model interpretability. Specifically, we selectively inject noise into the model parameters based on the importance weights in the feature map. Evaluation results on multiple benchmark datasets illustrated that the proposed ADP can not only improve model interpretability but also guarantee privacy protection in the context of federated learning. In addition, our method is resistant to the well-known gradient leakage attack.

References

[1] Corrado Alessio, Animals-10 Dataset, Animal Pictures of 10 Different Categories Taken From Google Images. Accessed on: Dec. 20, 2020, [Online]. Available: https://www.kaggle.com/alessiocorrado99/animals10.
[2] Y. LeCun, The mnist database of handwritten digits. Accessed on: 1998, [Online]. Available: http://yann.lecun.com/exdb/mnist/.
[3] Andrea Acevedo, Anna Merino, Santiago Alférez, et al. A dataset of microscopic peripheral blood cell images for development of automatic recognition systems. Data in Brief, 30, 2020.
[4] Mohammed Adnan, Shivam Kalra, Jesse C Cresswell, et al. Federated learning and differential privacy for medical image analysis. Scientific Reports, 12(1):1–10, 2022.
[5] Naman Agarwal, Peter Kairouz, and Ziyu Liu. The Skellam Mechanism for Differentially Private Federated Learning. Advances in Neural Information Processing Systems, 34:5052–5064, 2021.
[6] Yoshinori Aono, Takuya Hayashi, Lihua Wang, et al. Privacy-preserving deep learning via additively homomorphic encryption. IEEE Transactions on Information Forensics and Security, 13(5):1333–1345, 2017.
[7] Alberto Bietti, Chen-Yu Wei, Miroslav Dudik, et al. Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning. In Proc. of International Conference on Machine Learning, pages 1945–1962, 2022.
[8] Ittai Dayan, Holger R Roth, Aoxiao Zhong, et al. Federated learning for predicting clinical outcomes in patients with COVID-19. Nature Medicine, 27(10):1735–1743, 2021.
[9] Yongheng Deng, Feng Lyu, Ju Ren, et al. AUCTION: Automated and Quality-Aware Client Selection Framework for Efficient Federated Learning. IEEE Transactions on Parallel and Distributed Systems, 33(8):1996–2009, 2022.
[10] Qi Dou, Tiffany Y So, Meirui Jiang, et al. Federated deep learning for detecting COVID-19 lung abnormalities in CT: a privacy-preserving multinational validation study. NPJ Digital Medicine, 4(1):1–11, 2021.
[11] Cynthia Dwork, Aaron Roth, et al. The Algorithmic Foundations of Differential Privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
[12] Bimal Ghimire and Danda B. Rawa. Recent Advances on Federated Learning for Cybersecurity and Cybersecurity for Federated Learning for Internet of Things. IEEE Internet of Things Journal, 9(11):8229–8249, 2022.
[13] Frederik Harder, Matthias Bauer, and Mijung Park. Interpretable and Differentially Private Predictions. In Proc. of AAAI Conference on Artificial Intelligence, volume 34, pages 4083–4090, 2020.
[14] Kaiming He, Xiangyu Zhang, Shaoqing Ren, et al. Deep residual learning for image recognition. In Proc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
[15] Rui Hu, Yanmin Gong, and Yuanxiong Guo. Federated Learning with Sparsified Model Perturbation: Improving Accuracy under Client-Level Differential Privacy. arXiv preprint arXiv:2202.07178, 2022.
[16] Peter Kairouz, Ziyu Liu, and Thomas Steinke. The Distributed Discrete Gaussian Mechanism for Federated Learning with Secure Aggregation. In Proc. of International Conference on Machine Learning, pages 5201–5212, 2021.
[17] Renuga Kanagavelu, Zengxiang Li, Juniarto Samsudi, et al. Two-phase multi-party computation enabled privacy-preserving federated learning. In Proc. of IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, pages 410–419, 2020.
[18] Brian Knott, Shobha Venkataraman, Awni Hannun, et al. Crypten: Secure multi-party computation meets machine learning. Advances in Neural Information Processing Systems, 34:4961–4973, 2021.
[19] Qinbin Li, Bingsheng He, and Dawn Song. Model-contrastive federated learning. In Proc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10713–10722, 2021.
[20] Tao Lin, Lingjing Kong, Sebastian U Stich, et al. Ensemble Distillation For Robust Model Fusion In Federated Learning. Advances in Neural Information Processing Systems, 33:2351–2363, 2020.
[21] Fenglin Liu, Xian Wu, Shen Ge, et al. Federated Learning for Vision-and-Language Grounding Problems. In Proc. of AAAI Conference on Artificial Intelligence, volume 34, pages 11572–11579, 04 2020.
[22] Zelun Luo, Daniel J Wu, Ehsan Adeli, et al. Scalable Differential Privacy with Sparse Network Finetuning. In Proc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5059–5068, 2021.
[23] Brendan McMahan, Eider Moore, Daniel Ramage, et al. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proc. of International Conference on Artificial Intelligence and Statistics, volume 54, pages 1273–1282, 2017.
[24] Chuizheng Meng, Loc Trinh, Nan Xu, et al. Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset. Scientific Reports, 12(1):1–28, 2022.
[25] Neel Patel, Reza Shokri, and Yair Zick. Model Explanations with Differential Privacy. In Proc. of ACM Conference on Fairness, Accountability, and Transparency, pages 1895–1904, 2022.
[26] Vipin Pillai and Hamed Pirsiavash. Explainable Models with Consistent Interpretations. In Proc. of AAAI Conference on Artificial Intelligence, volume 35, pages 2431–2439, 2021.
[27] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why Should I Trust You?” Explaining the Predictions of Any Classifier. In Proc. of ACM International Conference on Knowledge Discovery and Data Mining, pages 1135–1144, 2016.
[28] Nicola Rieke, Jonny Hancox, Wenqi Li, et al. The future of digital health with federated learning. NPJ Digital Medicine, 3(1):1–7, 2020.
[29] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. In Proc. of IEEE International Conference on Computer Vision, pages 618–626, 2017.
[30] Junxiao Wang, Song Guo, Xin Xie, et al. Protect Privacy from Gradient Leakage Attack in Federated Learning. In Proc. of IEEE Conference on Computer Communications, pages 580–589, 2022.
[31] Pengfei Wang, Yian Zhao, Mohammad S. Obaidat, et al. Blockchain-enhanced Federated Learning Market with Social Internet of Things. IEEE Journal on Selected Areas in Communications, pages 1–17, 2022.
[32] Kang Wei, Jun Li, Ming Ding, et al. Federated Learning with Differential Privacy: Algorithms and Performance Analysis. IEEE Transactions on Information Forensics and Security, 15:3454–3469, 2020.
[33] Xuefei Yin, Yanming Zhu, and Jiankun Hu. A Comprehensive Survey of Privacy-preserving Federated Learning: A Taxonomy, Review, and Future Directions. ACM Computing Surveys, 54(6):1–36, 2021.
[34] Weiting Zhang, Dong Yang, Wen Wu, et al. Optimizing Federated Learning in Distributed Industrial IoT: A Multi-Agent Approach. IEEE Journal on Selected Areas in Communications, 39(12):3688–3703, 2021.
[35] Ligeng Zhu, Zhijian Liu, and Song Han. Deep Leakage from Gradients. Advances in Neural Information Processing Systems, 32, 2019.