Efficient Client Contribution Evaluation
for Horizontal Federated Learning

Abstract

In federated learning (FL), fair and accurate measurement of the contribution of each federated participant is of great significance. The level of contribution not only provides a rational metric for distributing financial benefits among federated participants, but also helps to discover malicious participants that try to poison the FL framework. Previous methods for contribution measurement were based on enumeration over possible combination of federated participants. Their computation costs increase drastically with the number of participants or feature dimensions, making them inapplicable in practical situations. In this paper an efficient method is proposed to evaluate the contributions of federated participants. This paper focuses on the horizontal FL framework, where client servers calculate parameter gradients over their local data, and upload the gradients to the central server. Before aggregating the client gradients, the central server train a data value estimator of the gradients using reinforcement learning techniques. As shown by experimental results, the proposed method consistently outperforms the conventional leave-one-out method in terms of valuation authenticity as well as time complexity.

Index Terms— Federated learning, reinforcement learning, machine learning, contribution evaluation, big data

1 Introduction

In recent years, federated learning (FL) [1, 2, 3] has received increasing attention in the machine learning society. In various machine learning tasks, FL allows the use of isolated data from multiple resources without violating the privacy protection policy [4, 5, 6]. At their basis, FL systems rely on the participation of individual data holders [7, 8]. Computer scientists as well as economists are working closely to promote motivations for data holders to join in broader FL applications [9, 10]. On one hand, economists provide pricing strategies for data contributors based on their gain and loss in the FL ecosystems. On the other hand, computer scientists strive to achieve effective evaluation of the contributions from each data source [11]. Therefore, a fair and accurate evaluation of the contributions of federated participants is important for completing a FL ecosystem [12].

The quality and quantity of data is centric to all machine learning algorithms. It has been shown that the performance of a neural network scales sublinearly with the size of the training set [13]. Series of methods have been proposed in the literature to determine the value of each data point in the training set. Delete-and-retrain methods, such as leave-one-out (LOO) [14] and Data Shapley [15] provided brute-force solutions to the valuation problem. To ameliorate the computational expense, approximated LOO [14] was proposed for faster but more constrained value estimation. Data valuation using Reinforcement Learning (DVRL) [16], which formulated data valuation as a meta learning framework, was recently proposed by Yoon et al. In their experiments, DVRL outperformed LOO and Data Shapley algorithms in spotting corrupted data points.

In FL systems, Wang et al [11] adopted the deletion diagnostic methods, LOO and Data Shapley, to measure the participant contributions. As the number of participants or the number of features increased, the computational cost increased drastically in Wang et al.’s methods. These methods are hardly applicable in any practical configurations.

In this paper, an integrated client contribution evaluation method for the horizontal FL systems is proposed. The proposed method is based on reinforcement learning (RL) and named Federated REINFORCE Client Contribution Evaluation (F-RCCE). The reinforcement signal in F-RCCE is based on the performance of a small privately-held validation dataset of central server. F-RCCE can fairly and cost-effectively measure the contribution to a federated model by each client, in a privacy-preserving manner. The main contributions of this study are:

1.

Propose a novel participant contribution evaluation method for FL systems, that uses RL to evaluate contributions fairly;
2.

Verify the effectiveness and the time cost of the proposed method on the horizontal FL scenario;
3.

Investigation of the performance of FL systems under imbalanced data distributions using the proposed method.

2 Proposed method

2.1 Overall structure

In horizontal FL, the feature space is shared among all client datasets $\mathcal{D}_{i}=\{(\textbf{x}_{k}^{i},y_{k}^{i})\}_{k=1}^{m_{i}}$ , $i=1,...,N$ . The local model owned by each participant has the same structure $f$ as the global model. In a communication round $t$ , first, each client optimizes its local model parameters $\bm{\theta}_{i}^{t}$ with $\mathcal{D}_{i}$ and sends the parameter gradients $\nabla\bm{\theta}_{i}^{t}$ (or $\bm{\theta}_{i}^{t}$ , the two are equivalent in FL) to the central server. The central server then collects and aggregates $\bm{\theta}_{i}^{t}$ ’s to update the global model by

\displaystyle\bm{\theta}_{G}^{t+1}\leftarrow\bm{\theta}_{G}^{t}-\frac{\alpha_{\bm{\theta}}}{N}\sum_{i=1}^{N}\nabla\bm{\theta}_{i}^{t},

(1)

where $\alpha_{\bm{\theta}}$ is the predefined learning rate. Finally, the renewed model parameters $\bm{\theta}_{G}^{t+1}$ are broadcast to all participants. This is the workflow of the basic FederatedAverage (FedAvg) algorithm. In this process, the central server cannot access the raw data held by each client. Therefore, for clients, the value of data is reflected by the gradients they upload to central server. If the contribution of each gradient to the global model can be evaluated, it will indirectly reflect the client’s contribution.

To achieve this goal, a novel method is proposed, named Federated REINFORCE client contribution evaluation (F-RCCE). The F-RCCE is an integrated evaluation algorithm with the horizontal FL system (Fig.1). In the proposed framework, an evaluator module $g_{\bm{\phi}}$ is implemented on the central server. Before aggregating client updates, the central server runs the evaluator to obtain an estimate of the values of the gradients. The gradients are then selected or discarded as decided by the evaluator. Only the selected client gradients are aggregated to renew the global model. In this framework, the evaluator is trained alongside the global task model $f_{\bm{\theta}}$ , using a held-out validation set $\mathcal{D}_{v}=\{(\textbf{x}_{k}^{v},y_{k}^{v})\}_{k=1}^{m_{v}}$ owned by the central server. Thus the evaluator is able to perform task-specific evaluation that serves designated data distribution.

Optimization of the evaluator defines a non-differentiable sequential decision-making task. It is difficult to be achieved by end-to-end gradient descent. This problem is modeled as a RL problem for decision-making [17, 18, 19]. Details on the proposed formulation is explained in Section 2.2.

In this paper, we assume a modest security environment for the FL system. That is, all participants of the system are honest-but-curious. Client datasets $\mathcal{D}_{i}$ are private to the respective participants. The validation set $\mathcal{D}_{v}$ is selected to the need of the task model owner and private to the central server. All communications to and from the central server should be encrypted. In the experiments the encryption / decryption is omitted because it is independent of the proposed methods.

2.2 F-RCCE

Refer to caption — Fig. 1: The block diagram of F-RCCE method.

In F-RCCE, the evaluator module is formulated as an RL agent that collects client updates $\bm{\theta}_{i}^{t}$ , and outputs a selection probability $\bm{\omega}^{t}$ , that is, $g_{\bm{\phi}}(\bm{\theta}^{t})=\bm{\omega^{t}}$ . The evaluator then samples a selection vector $\textbf{S}^{t}=[s_{1}^{t},...,s_{n}^{t}]$ , where $s_{i}^{t}\in\{0,1\}$ represents $\bm{\theta_{i}^{t}}$ being included or discarded in the model aggregation. $P(s_{i}^{t}=1)=\bm{\omega}_{i}^{t}$ is the probability of $\bm{\theta_{i}^{t}}$ being included. After model aggregation, a reward signal $r$ is calculated based on the global model’s performance on $\mathcal{D}_{v}$ . In summary, the proposed RL problem is:

•

State space: The feasible set of $\bm{\theta}_{G}$ .
•

Action space: $\textbf{S}^{t}$ .
•

Reward function: $r(\textbf{S}^{t})$ .

In this paper, the reward function is defined to be directly related with the global model’s performance on the validation set. Specifically,

\displaystyle r(\textbf{S}^{t})=\frac{1}{m_{v}}\sum_{k=1}^{m_{v}}L_{v}\left(f_{\bm{\theta}_{G}}(\textbf{x}_{k}^{v}),y_{k}^{v}\right)-\delta,

(2)

where $L_{v}$ is the loss function of global model on validation set $\mathcal{D}_{v}$ , and $\delta$ is a baseline calculated by moving average of previous loss $L_{v}$ with moving averaging window $T>0$ .

As opposed to FedAvg, the global model parameters are updated as

\displaystyle\bm{\theta}_{G}^{t+1}\leftarrow\bm{\theta}_{G}^{t}-\frac{\alpha_{\bm{\theta}}}{\sum_{i=1}^{N}s_{i}^{t}}\sum_{i=1}^{N}s_{i}^{t}\nabla\bm{\theta}_{i}^{t}.

(3)

To optimize the evaluator, the objective function of $g_{\bm{\phi}}$ is defined as follows:

\displaystyle J(\bm{\phi})

\displaystyle=\mathbb{E}_{\textbf{S}^{t}\sim\pi_{\bm{\phi}}(\bm{\theta}^{t},\cdot)}r(\textbf{S}^{t}),

(4)

where $\pi_{\bm{\phi}}(\bm{\theta}^{t},\cdot)$ is a stochastic, parameterized policy defined by $\bm{\phi}$ . According to the policy gradient theorem [20], we can obtain the probability of $\textbf{S}^{t}$ by $p(\textbf{S}^{t}|\bm{\phi})=\prod_{i=1}^{n}[{\bm{\omega}_{i}^{t}}^{s_{i}^{t}}\cdot(1-{\bm{\omega}_{i}^{t}})^{1-s_{i}^{t}}]$ . With the log-derivative trick, we have

\displaystyle\nabla_{\bm{\phi}}p(\textbf{S}^{t}|\bm{\phi})=p(\textbf{S}^{t}|\bm{\phi})\nabla_{\bm{\phi}}\log p(\textbf{S}^{t}|\bm{\phi}),

(5)

where

\displaystyle\nabla_{\bm{\phi}}\log p(\textbf{S}^{t}|\bm{\phi})=\sum_{i=1}^{n}s_{i}^{t}\nabla_{\bm{\phi}}\log\bm{\omega}^{t}_{i}+(1-s_{i}^{t})\nabla_{\bm{\phi}}\log(1-\bm{\omega}^{t}_{i}).

(6)

Putting it all together, the following policy gradient can be given as

\displaystyle\nabla_{\bm{\phi}}J(\bm{\phi})=\mathbb{E}_{\textbf{S}^{t}\sim\pi_{\bm{\phi}}(\bm{\theta}^{t},\cdot)}\nabla_{\bm{\phi}}\log p(\textbf{S}^{t}|\bm{\phi})r(\textbf{S}^{t}).

(7)

Then, the evaluator’s model parameters $\bm{\phi}$ can be optimized by gradient ascent method with learning rate $\alpha$ :

\displaystyle\bm{\phi}^{t+1}\leftarrow\bm{\phi}^{t}+\alpha_{\bm{\phi}}\sum_{i=1}^{n}r(\bm{S}^{t})\nabla_{\bm{\phi}}\log p(\textbf{S}^{t}|\bm{\phi})|_{\bm{\phi}^{t}}.

(8)

The pseudo-code for the proposed F-RCCE algorithm is illustrated in Algorithm 1.

Inputs: Learning rate

\alpha_{\bm{\theta}},\alpha_{\bm{\phi}}>0

, moving average window

T>0

Initialize: Parameters

\bm{\theta_{G}^{0}},\bm{\phi^{0}}

, moving average

\delta

3 for round $t=1,2,...$ do

4 Clients execute:

5 for client $c=1,2,...,N$ do

6 Client

c

copies

\bm{\theta}^{t-1}_{G}

as local model

\bm{\theta}^{t-1}_{c}

;

7 Client

c

updates local model

\bm{\theta_{c}^{t}}

;

8 end for

9 Server executes:

10 Collect

\bm{\theta}^{t}=[\bm{\theta}_{1}^{t},...,\bm{\theta}_{N}^{t}]

;

11 Calculate selection probabilities

\bm{\omega}^{t}=g_{\bm{\phi}}(\bm{\theta}^{t})

;

12 Sample selection vector

\textbf{S}^{t}\sim Ber(\bm{\omega}^{t})

;

13 Update

\bm{\theta_{G}^{t-1}}

with Eq.(3);

14 Calculate reward signal

r(\textbf{S}^{t})

with Eq.(2);

15 Calculate gradient of

\bm{\phi}^{t}

with Eq.(7);

16 Update

\bm{\phi}^{t}

with Eq.(8);

17 Update

\delta\leftarrow\frac{T-1}{T}\delta+\frac{1}{T}L_{v}(f_{\bm{\theta}_{G}}(\textbf{x}^{v}),y^{v})

18 end for

Algorithm 1 F-RCCE Algorithm.

3 Experiments

3.1 Experimental settings

In this section, a Horizontal FL environment with a large number of clients ( $\geq 50$ ) is simulated to test the proposed F-RCCE method on the SMS Spam dataset [21]. The SMS Spam data set contains 5572 samples, among which 5000 samples are divided into 50 groups according to Dirichlet distribution and distributed to each client, and the remaining 572 samples are stored as a validation set in the central server. The task model $f_{\bm{\theta}}$ classifies SMS as ham or spam. In data preprocessing, the data is tokenized and the text is converted to sequences using Keras with $max\_words=1000$ and $max\_len=150$ . After that, data was standardized to have zero mean and one standard deviation using scikit-learn’s Standard Nomalizer. Labels of categorical data are encoded in one-hot embedding.

On the model selection, logistic regression classifier is used as the task model and a four-layer multi-layer-perceptron as evaluator model. Adam optimizer with learning rate $\alpha_{\bm{\phi}}=10^{-5}$ is used to optimize evaluator model and SGD optimizer with learning rate $\alpha_{\bm{\theta}}=0.1$ is used to optimize the local model which each client copied by global model.

To verify the authenticity of the reported contribution factor, the remove-and-retrain scheme is adopted which proposed by Yoon et al [16]. Specifically, given an estimate of contribution values, the lowest- or highest-valued datapoints are discarded. The task model are then re-trained with the remaining data. When the value estimation is correct, removing high-valued datapoints should lead to a decrease in the task model accuracy, vice versa.

In order to reduce the uncertainty caused by random initialization, each experiment is repeated five times. The experimental data is averaged as the presented results. The baseline task model is trained by conventional FedAvg, without any data value estimation. The convergence curves of the FedAvg baseline as well as the co-trained F-RCCE model are presented in Fig.2(a). The two curves are almost identical, demonstrating that the inclusion of the evaluator $\bm{\phi}$ does not have a noticeable impact on the federated model.

3.2 Comparison with LOO

As aforementioned, LOO[11] cannot directly measure the value of gradients uploaded by clients for each communication round, a specific client is deleted in each re-training iteration to evaluate the contribution of individual client. Afterwards, part of the highest-/lowest-contribution clients are removed following removing rate in re-training stage.

For F-RCCE, the evaluator and the global task model are trained for 1,000 communication rounds. Then the task model is re-trained with the evaluator fixed. In the re-training process, the selection probability output by the evaluator is taken as the contribution value of the client update. At each round, 30% of the client gradients are removed with the highest (F-RCCE Removing Highest) or lowest (F-RCCE Removing Lowest) contribution values. The task model is updated with the remaining gradients. The validation accuracy of the task model is recorded for each round.

As shown in Fig.2 (a), removing the highest contribution gradients makes the model converge more slowly than the baseline. Removing the lowest contribution gradients, on the other hand, slightly improves the validation accuracy. In comparison, the difference between removing highest or lowest valued clients by the LOO method is not significant.

3.3 Experiments with corrupted data

To verify the ability of F-RCCE to identify data abnormality in horizontal FL, a training set is generated with randomly corrupted data. Noises is added to 20% of the SMS samples by deleting 20% of the words in each text string. The subsequent data preprocessing is same as the previous experiment.

Experimental results are shown as Fig.2 (b). The LOO method again fail to distinguish the highest-/lowest-contribution clients. The F-RCCE method performs significantly better than LOO. It is more sensitive to the removal of high/low contribution gradients. After removing the gradients with highest contribution calculated by F-RCCE, the performance of the global model has an obvious decline. And the performance of the global model is gratifying after removing the gradients with lowest contribution calculated by F-RCCE.

3.4 Impact of the number of clients

In a business-to-client model, horizontal FL often has a large number of clients. The following experiment investigates the consistency and time complexity of F-RCCE verses increasing number of clients. Horizontal FL systems are simulated with 50 to 500 clients. Considering the convergence of the Evaluator requires a number of iterations, each global model in F-RCCE iterates for 1000 rounds. As a control, each global model iterates for 50 rounds under LOO. The running time of the two algorithms are recorded, as shown in Table 1. With the number of clients increasing from 100 to 500, the time cost of F-RCCE only increases by 6%. However, the time cost of LOO increased almost linearly. Therefore, F-RCCE is more applicable in in practical situations.

The validation accuracies with highest-/lowest-contribution gradients verses different numbers of clients are plotted on Fig.2 (c). It shows that, removing the highest-contribution gradients always degrades task model performance. Removing the lowest-contribution gradients can sometimes benefit the validation accuracy, but sometimes causes destructive consequences. This phenomenon is due to the fact that the datasets are randomly generated. Some of the removed gradients still have positive effects on the model, although its contribution ranks low compared to others. There is no obvious trend in the validation accuracy with the increased number of clients. Result support that F-RCCE performs consistently in both small and large numbers of clients.

Clients number	100	200	300	400	500
LOO (s)	249.5	464.6	699.6	806.1	1135.9
F-RCCE (s)	62.6	63.0	64.2	64.9	66.9

Table 1: Time cost of LOO and F-RCCE on different number of clients.

3.5 Client contribution using F-RCCE

A intuitive idea is proposed to measure the clients contribution using F-RCCE. That is, the selection probability calculated by F-RCCE in each iteration is taken as the contribution of the iteration, and the contributions in all iterations are summed and scaled as the client contribution. For clients, when their amount of clean data is large, the calculated gradient is relatively stable, and it is easier to eliminate the influence of outlier. So the amount of clean data is considered to be positively correlated with the client’s contribution. In the experiment, a trained evaluator is used to measure the clients contribution for 50 communication rounds. Clients are sorted by the number of samples they held, and the contributions are plot in Fig. 2 (d). With the increase of the number of samples, the contribution of client also showed an upward trend. Experimental result shows that F-RCCE has the ability to evaluate the contribution of clients.

4 CONCLUSION

Fair and accurate measurement of the contribution of each parcitipant is important for the FL society. In this work, a client contribution evaluation method named F-RCCE is proposed which accomplished integrated contribution evaluation using RL method. Experimental results strongly support that the proposed method can accurately evaluate the contribution of gradients provided by each client. Its time cost remained almost constant with increased number of clients, so it is suitable for applications with large business-to-client models. Future research may consider how to effectively distribute benefits to all participants using the clients’ contribution obtained by F-RCCE. This is one of the key components in attempts to instantiate the incentive design for the FL ecosystem.

5 ACKNOWLEDGEMENTS

This paper is supported by National Key Research and Development Program of China under grant No. 2018YFB1003500, No. 2018YFB0204400 and No. 2017YFB1401202. Corresponding author is Jianzong Wang from Ping An Technology (Shenzhen) Co., Ltd.

References

[1] Jakub Konečnỳ, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon, “Federated learning: Strategies for improving communication efficiency,” in NIPS Workshop on Private Multi-Party Machine Learning, 2016.
[2] Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al., “Advances and open problems in federated learning,” arXiv preprint arXiv:1912.04977, 2019.
[3] Hongyu Li, Dan Meng, Hong Wang, and Xiaolin Li, “Knowledge federation: A unified and hierarchical privacy-preserving ai framework,” in 2020 IEEE International Conference on Knowledge Graph (ICKG). IEEE, 2020, pp. 84–91.
[4] Xinghua Zhu, Jianzong Wang, Zhenhou Hong, Tian Xia, and Jing Xiao, “Federated learning of unsegmented chinese text recognition model,” in 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 2019, pp. 1341–1345.
[5] Lingwei Kong, Hengtao Tao, Jianzong Wang, zhangcheng Huang, and Jing Xiao, “Network coding for federated learning systems,” in International Conference on Neural Information Processing. Springer, 2020.
[6] Xinghua Zhu, Jianzong Wang, Zhenhou Hong, and Jing Xiao, “Empirical studies of institutional federated learning for natural language processing,” in Findings of EMNLP. ACL, 2020.
[7] Anxun He, Jianzong Wang, Zhangcheng Huang, and Jing Xiao, “Fedsmart: An auto updating federated learning optimization mechanism,” arXiv preprint arXiv:2009.07455, 2020.
[8] Jakub Konečnỳ, Brendan McMahan, and Daniel Ramage, “Federated optimization: Distributed optimization beyond the datacenter,” arXiv preprint arXiv:1511.03575, 2015.
[9] Yuan Liu, Shuai Sun, Zhengpeng Ai, Shuangfeng Zhang, Zelei Liu, and Han Yu, “Fedcoin: A peer-to-peer payment system for federated learning,” arXiv preprint arXiv:2002.11711, 2020.
[10] Yufeng Zhan, Peng Li, Zhihao Qu, Deze Zeng, and Song Guo, “A learning-based incentive mechanism for federated learning,” IEEE Internet of Things Journal, 2020.
[11] Guan Wang, Charlie Xiaoqian Dang, and Ziye Zhou, “Measure contribution of participants in federated learning,” in 2019 IEEE International Conference on Big Data (Big Data). IEEE, 2019, pp. 2597–2604.
[12] Jingfeng Zhang, Cheng Li, Antonio Robles-Kelly, and Mohan Kankanhalli, “Hierarchically fair federated learning,” arXiv preprint arXiv:2004.10386, 2020.
[13] Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md Patwary, Mostofa Ali, Yang Yang, and Yanqi Zhou, “Deep learning scaling is predictable, empirically,” arXiv preprint arXiv:1712.00409, 2017.
[14] Pang Wei Koh and Percy Liang, “Understanding black-box predictions via influence functions,” in Proceedings of the 34th International Conference on Machine Learning. ICML, 2017, pp. 1885–1894.
[15] Amirata Ghorbani and James Zou, “Data shapley: Equitable valuation of data for machine learning,” in Proceedings of the 36th International Conference on Machine Learning. ICML, 2019, pp. 2242–2251.
[16] Jinsung Yoon, Sercan O Arik, and Tomas Pfister, “Data valuation using reinforcement learning,” in International Conference on Machine Learning. ICML, 2020.
[17] Richard S Sutton and Andrew G Barto, Reinforcement learning: An introduction, MIT press, 2018.
[18] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al., “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
[19] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016.
[20] Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Advances in neural information processing systems, 2000, pp. 1057–1063.
[21] Tiago A Almeida, José María G Hidalgo, and Akebo Yamakami, “Contributions to the study of sms spam filtering: new collection and results,” in Proceedings of the 11th ACM symposium on Document engineering, 2011, pp. 259–262.

Efficient Client Contribution Evaluation for Horizontal Federated Learning