CAT: Collaborative Adversarial Training

Xingbin Liu¹ Huafeng Kuang¹ Xianming Lin¹ Yongjian Wu² Rongrong Ji¹
¹ Media Analytics and Computing Lab, Department of Artificial Intelligence,
School of Informatics, Xiamen University, 361005, China.
² Tencent Youtu Lab, Shanghai, China

Abstract

Adversarial training can improve the robustness of neural networks. Previous methods focus on a single adversarial training strategy and do not consider the model property trained by different strategies. By revisiting the previous methods, we find different adversarial training methods have distinct robustness for sample instances. For example, a sample instance can be correctly classified by a model trained using standard adversarial training (AT) but not by a model trained using TRADES, and vice versa. Based on this observation, we propose a collaborative adversarial training framework to improve the robustness of neural networks. Specifically, we use different adversarial training methods to train robust models and let models interact with their knowledge during the training process. Collaborative Adversarial Training (CAT) can improve both robustness and accuracy. Extensive experiments on various networks and datasets validate the effectiveness of our method. CAT achieves state-of-the-art adversarial robustness without using any additional data on CIFAR-10 under the Auto-Attack benchmark. Code is available at https://github.com/liuxingbin/CAT.

1 Introduction

With the development of deep learning, Deep Neural Networks (DNNs) have been applied to various visual tasks, such as image classification [10], object detection [30], semantic segmentation [24], etc. And state-of-the-art performance has been obtained. But recent research has found that DNNs are vulnerable to adversarial perturbations [9]. A finely crafted adversarial perturbation by a malicious agent can easily fool the neural network. This phenomenon raises security concerns about the deployment of neural networks in security-critical areas such as Autonomous driving [3] and medical diagnostics [16].

To cope with the vulnerability of DNNs, different types of methods have been proposed to improve the robustness of neural networks, including adversarial training [21], defensive distillation [27], feature denoising [35] and neural network pruning [20]. Among them, Adversarial Training (AT) is the most effective method to improve adversarial robustness. AT can be regarded as a type of data augmentation strategy that trains neural networks based on adversarial examples crafted from natural examples. AT is usually formulated as a min-maximization problem, where the inner maximization generates adversarial examples, while the outer minimization optimizes the parameters of the model based on the adversarial examples generated by the inner maximization process.

Refer to caption — Figure 1: Confusion matrices of models trained by different methods with ResNet-18 on the CIFAR-10 test dataset (better seen with color). We set the diagonal value as 0 for better illustration. Confusion exists in models trained by any two methods, especially for blocks from class 3 to class 7. The value of the prediction discrepancy is 18.98%, 22.54%, and 21.05% respectively.

The previous methods have focused on how to improve the model’s adversarial accuracy, focusing only on the numerical improvement, but not on the characteristics of the different methods. By revisiting the process of adversarial defense, a question comes to our mind: Do models trained by different adversarial training methods perform the same for sample instances? We analyzed different adversarial training methods and found that classification confusion exists in models adversarially trained by different methods, as illustrated in Fig. 1. Specifically, taking AT [21] and TRADES [37] as an example, for the same adversarial example, the network trained by AT can classify correctly, while the network trained by TRADES misclassifies, and vice versa. A conclusion can be drawn that although AT and TRADES have similar numerical adversarial accuracy, they behave differently for sample instances, i.e., different knowledge is mastered by models trained by different methods. This raises a question:

Do two networks learn better if they collaborate?

Based on this observation, we propose a Collaborative Adversarial Training (CAT) framework to improve the robustness of neural networks. Our framework is shown in Fig. 2. Specifically, we simultaneously train two deep neural networks separately using different adversarial training methods. At the same time, the adversarial examples generated by this network are input to the peer network to obtain the corresponding logit, which is used to guide the learning of this network together with its own adversarial training objective function. We expect to improve the robustness of the neural network by allowing peers to learn from each other in this collaborative learning way. Extensive experiments on different neural networks (VGG, MobileNet, ResNet) and different datasets (CIFAR, Tiny-ImageNet) demonstrate the effectiveness of our approach. CAT achieved new state-of-the-art robustness without any additional synthetic or real data on CIFAR-10 under the Auto-Attack benchmark. Furthermore, we provide property analysis for CAT to get a better understanding.

In summary, our contribution is threefold as follows.

•

We find that the models obtained using different adversarial training methods have different representations for individual sample instances.
•

We propose a novel adversarial training framework: Collaborative Adversarial Training. CAT simultaneously trains neural networks from scratch using different adversarial training methods and allows them to collaborate to improve the robustness of the model.
•

We have conducted extensive experiments on a variety of datasets and networks, and evaluated them on state-of-the-art attacks. We demonstrate that CAT can substantially improve the robustness of neural networks and obtain new state-of-the-art performance without any additional data.

2 Related Work

2.1 Adversarial Attack

Since Szegedy [34] discovers that DNNs are vulnerable to adversarial examples, a large number of works have been proposed to craft adversarial examples. Based on the accessibility to the knowledge of the target model, it can be divided into white-box attacks and black-box attacks. White-box attacks craft adversarial examples based on the knowledge of the target model, while black-box attacks are agnostic to the knowledge of the target model.

White-box Attack: Goodfellow [9] proposes FGSM to efficiently craft adversarial examples, which can be generated in just one step. Madry proposes PGD to generate adversarial examples, which is the most efficient way of using the first-order information of the network. MI-FGSM [7] combines momentum into the iterative process to help the model escape from local optimal points. And the adversarial examples generated by this method are also more transferable. Boundary-based attacks such as deepfool [22] and CW [2] also make the model more challenging. Recently, the ensemble approach of diverse attack methods (Auto-Attack), consisting of APGD-CE [5], APGD-DLR [5], FAB [4] and Square Attack [1], become a benchmark for testing model robustness.

Black-box Attack: Block-box attacks can be categorized into transfer-based and query-based attacks. Transfer-based methods attack the target model by using the transferability of adversarial examples, i.e., the adversarial examples generated on the surrogate model can be transferred to fool the target model. There are many ways to explore the transferability of adversarial examples for black-box attacks. Dong [7] combines momentum with an iterative approach to obtain better transferability. Scale-invariance [19] boosts the transferability of adversarial examples by transforming the inputs on multiple scales. Square Attack [1] approximates model’s decision boundary based on a randomized search scheme to be the most efficient query-based attack method.

2.2 Adversarial Robustness

Adversarial attacks present a significant threat to DNNs. For this reason, many methods have been proposed to defend against adversarial examples, including denoising [35], adversarial training [21], data aumentation [29], and input purification [23]. ANP [20] finds the vulnerability of latent features and uses pruning to improve robustness. Madry uses PGD to generate adversarial examples for adversarial training, which is also the most effective way to defend against adversarial examples. A large body of work uses new regularization or objective functions to improve the effectiveness of standard adversarial training. Adversarial logit pairing [15] improves robustness by encouraging the logits of normal and adversarial examples to be closer together. TRADES [37] uses KL divergence to regularize the output of adversarial and pure examples.

2.3 Knowledge Distillation

Knowledge distillation (KD) is commonly used for model compression and was first used by Hinton [11] to distill knowledge from a well-trained teacher network to a student network. KD can significantly improve the accuracy of student models. There have been many later works to improve the effectiveness of KD [32]. In recent years, KD has been extended to other areas. Goldblum [8] analyzes the application of knowledge distillation to adversarial robustness and proposes ARD to transfer knowledge from a large teacher model with better robustness to a small student model. ARD can produce a student network with better robustness than training from scratch. In this paper, we propose a more effective collaborative training framework to improve the robustness of the network.

3 Proposed Method

3.1 Motivation

We investigated the characteristics of the robust models obtained by different adversarial training methods on sample instances. We found that different models perform differently on sample instances: for some samples, the model trained by correctly classifies AT [21], while the model trained by TRADES [37] misclassifies. Confusion exists in different methods. A straightforward conclusion can be drawn that the networks trained by different methods master different knowledge, although their accuracy values are about the same. So can we use the knowledge learned from these two networks to improve the robustness of neural networks? A simple idea is to let two networks that master different knowledge learn collaboratively. For this purpose, we propose collaborative adversarial training. CAT improves the robustness of neural networks by making the knowledge of both networks interact during the training procedure. And the framework is illustrated in Fig. 2.

3.2 Premilinary

We take the methods of AT and TRADES as an example to introduce collaborative adversarial training. We first briefly introduce the training objective functions of AT and TRADES and then introduce CAT in detail.

Adversarial training is defined as a min-maximization problem. PGD is used to generate adversarial examples for the internal maximization process, while external minimization uses the internal PGD-generated adversarial examples and the ground-truth label $y$ to optimize the model parameters. AT is formulated as:

\underset{\theta}{\min}\ \mathbb{E}_{(x,y)\in D_{data}}(\underset{\delta}{\arg\max}\ L(f^{AT}_{\theta}(x^{adv}_{AT}),y)),

(1)

x^{adv}_{AT}=x+\delta.

(2)

where $D_{data}$ is the training data distribution, $x$ and $y$ are training data and corresponding label samples from $D_{data}$ . $f_{\theta}$ is a neural network parameterized by $\theta$ . $L$ is the standard cross-entropy loss used in image classification tasks. $\delta$ is the adversarial perturbations generated by PGD. Following previous study, $\delta$ is bounded by $l_{\infty}$ .

Neural Networks trained by AT can obtain a certain level of robustness, with compromises on the accuracy of natural samples. For this purpose, TRADES uses a new training objective function for adversarial training. Formulated as:

\begin{split}\underset{\theta^{\prime}}{\min}\ \mathbb{E}_{(x,y)\in D_{data}}L(g^{TRADES}_{\theta^{\prime}}(x),y)\quad\quad\\ +\lambda D_{KL}(g^{TRADES}_{\theta^{\prime}}(x),g^{TRADES}_{\theta^{\prime}}(x^{adv}_{TRADES})),\end{split}

(3)

where $x^{adv}$ is the adversarial data corresponding to natural data $x$ and y is the true label. $L$ is the cross-entropy loss in classification task. $D_{KL}$ is the KL divergence to push natural logits and adversarial logits together. The losses are balanced by a trade-off parameter $\lambda$ .

3.3 Collaborative Adversarial Training (CAT)

CAT expects to improve robustness by letting neural networks trained by different methods interact with knowledge information, i.e., collaborative adversarial learning. As illustrated in Fig. 2. we use the logit of a peer network to guide the learning of this network. Specifically, we input the adversarial data crafted by the network trained by AT into the network trained by TRADES to get the corresponding logit. Then use the logit obtained by the network trained by TRADES to guide the training of the network trained by AT. The formulation goes to:

L_{1}=D_{KL}(f^{AT}(x^{adv}_{AT}),\hat{g}^{TRADES}(x^{adv}_{AT})),

(4)

$f^{AT}$ is the network trained with AT and $g^{TRADES}$ is the network trained with TRADES. $\hat{g}^{TRADES}(x^{adv}_{AT})$ represents that we take the logit obtained by the network trained by TRADES as constant. $x^{adv}_{AT}$ is the adversarial data generated by $f^{AT}$ with PGD.

Similarly, to make the two networks learn collaboratively. We need to feed the adversarial samples generated by the TRADES network to the AT network to obtain the corresponding logit. And then the logit obtained by the peer network is used to guide the training of the network trained by TRADES. The loss is formulated as:

L_{2}=D_{KL}(g^{TRADES}(x^{adv}_{TRADES}),\hat{f}^{AT}(x^{adv}_{TRADES})).

(5)

$x^{adv}_{TRADES}$ is the adversarial example crafted by the network trained by TRADES using KL divergence function. $\hat{f}^{AT}(x^{adv}_{TRADES})$ represents that we take the logit obtained by the network trained by AT as constant.

Empirically, models trained purely on collaborative loss will collapse. An intuitive explanation is that the purpose of collaborative loss is to interact knowledge, but there is no knowledge when the model are lack supervision. It is not enough to let the two networks learn from each other in this way. Real class labels are needed to guide them. For this purpose, we induce supervision by combining the respective training objective functions of the two networks and the collaborative learning objective function to guide the learning of the networks together. Therefore, the training objective function for collaborative adversarial training based on AT and TRADES is:

L_{total}=\alpha L_{TRADES}+(1-\alpha)L_{2}+\alpha L_{AT}+(1-\alpha)L_{1},

(6)

where $\alpha$ is the trade-off parameter to balance the guidance of the peer network knowledge and the original objective function. $L_{TRADES}$ is the training objective of TRADES defined in Eq. 3. And $L_{AT}$ is the training objective of AT defined in Eq. 1. The first two items in Eq. 6 are used to train model $g$ and the last two items are used to train model $f$ , due to we take the peer logit as constant.

The decision boundaries learned by different adversarial training methods are different. Under the guidance of peer network knowledge, i.e., Eq. 4 and Eq. 5, the two networks trained by different methods continuously optimize the classification decision boundaries in the process of collaborative learning. Finally, both networks learn better decision boundaries than learning alone to obtain better adversarial robustness.

Our collaborative adversarial learning is a generalized adversarial training method that can be used with any two adversarial methods. Generally, CAT can use any number of different adversarial training methods for collaborative learning. Results of CAT with three adversarial training methods are delayed to Sec. 5.3.

Difference with ensemble methods: The main difference between collaborative adversarial learning and ensemble methods is that collaborative learning involves multiple models that learn from each other, while ensemble methods involve multiple models that are combined to produce a single output. During testing, each model trained by collaborative learning predicts singly, since they have interacted with their knowledge during the training time. Refer to Appendix B for details.

4 Experiment Results

In this section, we conduct extensive experiments on popular benchmark datasets to demonstrate the effectiveness and performance of CAT. First, we briefly introduce the experiment setup and implementation details of CAT. Then, ablation studies are done to choose the best hyperparameters and CAT methods. Finally, according to the best CAT methods, we report the white-box and black-box adversarial robustness on two popular benchmark datasets.

Datasets: We used Three benchmark datasets, including CIFAR-10 [17], CIFAR-100 [18], and Tiny-ImageNet. CIFAR-10 has 10 classes. For each class, there are 5000 images for training and 1000 images for the test. And for CIFAR-100, there are 100 classes, and similarly, for each class, there are 500 images for training and 100 images for testing. The image size for CIFAR-10 and CIFAR-100 is 32x32. Tiny-ImageNet contains 100000 images of 200 classes (500 for each class) downsized to 64×64 colored images. Each class has 500 training images, 50 validation images, and 50 test images. Three datasets are widely used for training and testing adversarial robustness.

Method	Best Checkpoint					Last Checkpoint
Method	Clean	FGSM	PGD₂₀	CW_∞	AA	Clean	FGSM	PGD₂₀	CW_∞	AA
CAT_AT-TRADES	83.74	59.69	54.44	52.60	50.52	84.45	60.03	53.01	52.01	49.30
CAT_AT-TRADES	83.55	59.78	54.52	52.58	50.86	84.12	59.69	52.82	51.88	49.39
CAT_AT-ALP	84.66	59.94	53.11	51.90	49.74	84.71	59.84	50.77	50.53	47.80
CAT_AT-ALP	85.21	60.21	53.02	52.13	49.96	85.27	59.75	51.10	50.69	47.91
CAT_TRADES-ALP	83.91	59.76	54.44	52.56	51.02	84.67	59.85	52.51	51.43	49.31
CAT_TRADES-ALP	84.75	59.76	54.17	52.72	50.85	85.27	59.82	52.56	51.83	49.64

Table 1: The white-box robustness of different CAT methods on CIFAR-10. We report the results of the best checkpoint and last checkpoint. The best results are marked using boldface. ResNet-18 is the basic network in our CAT framework.

Training setup: Our overall training parameters refer to [20]. Specifically, we use SGD (momentum 0.9, batch size 128) to train ResNet18 for 200 epochs on the CIFAR-10 dataset with weight decay 5e-4 and initial learning rate 0.1 which is divided by 10 at 100-th and 150-th epoch, respectively. For the internal maximization process, we use PGD₁₀ adversarial attack to solve, with a random start, step size 2.0/255, and perturbation size 8.0/255. The experimental parameters of ResNet18 in CIFAR-100, WideResNet-34-10 in CIFAR-10, and CIFAR-100 are the same as described above.

Evaluation setup: We report the clean accuracy on natural examples and the adversarial accuracy on adversarial examples. For adversarial accuracy, we report both white-box and black-box. We follow the widely used protocols in the adversarial research field. For the white-box attack, we consider three basic attack methods: FGSM [9], PGD [21], and CW_∞ [2] optimized by PGD₂₀, and a stronger ensemble attack method named AutoAttack (AA) [5]. For the black-box attacks, we consider both transfer-based attacks and query-based attacks.

4.1 Ablation Study

4.1.1 Hyperparameter:

CAT improves adversarial robustness through the learning of collaboration, which requires both the knowledge of peer networks and the guidance of the ground truth label. The balance of these two items is traded off by a hyperparameter $\alpha$ . We execute collaborative training by TRADES and AT as the base method and experiment with different trade-off parameters. We test various $\alpha$ values varying from 1/50 to 1/5. The experiment results are illustrated in Fig. 3. From the figure, we can conclude that if $\alpha$ is too high, i.e., little knowledge is extracted from the peer network, the effect is about the same as training with AT and trades alone. If $\alpha$ is too small, i.e., overly focused on the knowledge from the peer network, The network is also not very robust and will collapse when $\alpha=0$ , which is not shown in Figure. Since Auto-Attack is currently the most powerful integrated attack method, we choose hyperparameters $\alpha$ based primarily on the robustness of the network against AA. In the following experiments, we set $\alpha=0.05$ by default.

Dataset	Method	Best Checkpoint					Last Checkpoint
Dataset	Method	Clean	FGSM	PGD₂₀	CW_∞	AA	Clean	FGSM	PGD₂₀	CW_∞	AA
CIFAR-10	Natural	94.65	19.26	0.0	0.0	0.0	94.65	19.26	0.0	0.0	0.0
	AT	82.82	57.57	51.76	50.05	47.55	84.53	53.90	43.56	44.19	41.57
	TRADES	83.17	59.22	52.63	50.79	49.21	83.04	57.46	49.81	49.01	47.03
	ALP	83.85	57.20	51.88	50.11	48.48	84.64	55.35	44.96	44.54	42.62
	CAT	83.91	59.76	54.44	52.56	51.02	84.67	59.85	52.51	51.43	49.31
	CAT	84.75	59.76	54.17	52.72	50.85	85.27	59.82	52.56	51.83	49.64
CIFAR-100	Natural	75.55	9.48	0.0	0.0	0.0	75.39	9.57	0.0	0.0	0.0
	AT	57.42	31.90	28.78	27.27	24.88	57.34	26.77	21.24	21.50	19.59
	TRADES	56.98	31.72	29.04	25.30	24.23	55.08	30.40	26.81	24.78	23.68
	ALP	61.01	31.41	26.78	25.68	23.51	58.4	27.97	22.63	21.87	20.42
	CAT	61.31	35.83	33.09	29.17	27.17	61.78	35.84	32.76	29.48	27.29
	CAT	62.53	36.05	32.92	29.16	26.90	62.52	35.79	32.51	29.24	26.73

Table 2: The white-box robustness of CAT on CIFAR-10 and CIFAR-100. We report the results of the best checkpoint and last checkpoint. ResNet-18 is the basic network in our CAT framework.

Method	CIFAR-10					CIFAR-100
Method	FGSM	PGD₂₀	PGD₄₀	CW_∞	Square	FGSM	PGD₂₀	PGD₄₀	CW_∞	Square
AT	64.54	61.70	61.57	61.42	56.16	39.15	37.56	37.46	38.85	30.11
TRADES	65.63	63.57	63.57	63.23	55.97	39.06	37.73	37.79	38.86	28.72
ALP	64.95	62.38	62.32	61.78	55.78	40.29	38.97	38.85	40.03	29.85
CAT	65.73	63.65	63.78	63.24	57.55	42.26	40.76	40.76	41.78	33.04
CAT	66.06	63.91	63.88	63.26	57.95	42.81	41.55	41.42	42.42	33.30

Table 3: The black-box robustness of CAT on CIFAR-10 and CIFAR-100. We only report the results of the best checkpoint. ResNet-18 is the basic network in our CAT framework.

4.1.2 Different CAT methods:

As described in Sec. 3.3, any two adversarial training methods can be incorporated into the CAT framework and learned collaboratively. Considering that different adversarial training methods have distinct properties, the performance of different CAT methods may also vary. For this reason, we consider three collaborative adversarial training methods, AT-TRADES, AT-ALP, and TRADES-ALP, respectively. The method of collaborative adversarial training using TRADES and ALP can be denoted as CAT_TRADES-ALP. The other two methods are denoted in the same way. Due to the fact that CAT uses two models for collaborative training, we report the results for both networks. Tab. 1 shows the performance of CAT using different adversarial training methods. CAT achieves good robustness against four attack methods in all settings. We again mainly consider the performance of AA and choose TRADES-ALP as the base method for CAT. Further, we analyze the correlation between discrepancy and performance after collaborative learning, which is delayed to Sec. 5.2. Without a further statement, CAT represents CAT_TRADES-ALP in the following sections. CAT in other settings is delayed to Secs. A.4 and A.5

4.2 Adversarial Robustness

4.2.1 White-box Robustness

For FGSM, PGD, CW_∞, AA, the attack perturbations are all 8.0/255 and the step size for PGD, CW_∞ are 2/25, with 20 iterations. According to the way of reporting in previous papers, we report both the best checkpoint and the last checkpoint of the training phase. The best checkpoint result of the training phase is selected based on the model’s PGD defense results for the test dataset (attack step size 2.0/255, iteration number 10, perturbation size 8.0/255).

Tab. 2 shows the adversarial accuracy of the networks trained by different methods on two datasets CIFAR-10 and CIFAR-100 against the four attacks. We also report the accuracy of the model for natural examples. From the table, we can obtain the following conclusions: (1) our method obtains good robustness against all four attacks on both datasets. For example, for the strongest AA attack method, CAT can obtain 2% improvement. (2) our method obtains high adversarial robustness while ensuring accuracy for natural examples. Although there is still a big gap compared to 94.65% of the standard training strategy, there is a nearly 1% improvement in the accuracy of the natural examples compared to the other three methods. (3) The robustness of both networks is significantly improved in the CAT training framework, which is higher than separately trained ones. (4) The difference in accuracy between the two networks trained in the CAT framework is smaller than separately trained ones, which demonstrates that the two networks do well in collaborative learning. For example, for the CIFAR-10 dataset, the difference in robustness between the two networks of TRADES-ALP against AA in the CAT training framework is 0.17%, while the difference is 0.73% under separate training. The same conclusion can be drawn from the results on CIFAR-100 dataset.

To investigate the generalizability of CAT with different networks, we conduct experiments with VGG-16 and MobileNet on CIFAR-10 datasets. The results are delayed to Secs. A.1 and A1. Further, we conduct adversarial training with ResNet-18 on Tiny-ImageNet to explore the CAT on a large dataset, which is delayed to Tab. A2. The robustness improvement holds true for all experiments.

4.2.2 Black-box Robustness

For black-box attacks, we consider both transfer-based attacks and query-based attacks. For the transfer-based attack, we use the standard adversarial training of ResNet-34 as the surrogate model, trained with the same parameters as described in Tab. 1. First, we perform the attack on the surrogate model to generate adversarial examples and then transfer the adversarial examples to the target network to get the robustness of the target network. Here, we consider four attacks: FGSM, PGD₂₀, PGD₄₀, and CW_∞, with the same attack parameters as Sec. 4.2.1. For query-based attacks, we consider the Square attack, which is an efficient black-box query-based attack method. Tab. 3 shows the results. CAT can bring 1.79% and 3.19% robustness improvement against Square attack, for CIFAR-10 and CIFAR-100 respectively. Similarly, the improvement on CIFAR-100 is more significant than CIFAR-10.

Method	Clean	AA
Bag of Tricks for AT [25]	86.28	53.84
HE* [26]	85.14	53.74
Overfitting in AT* [31]	85.34	53.42
Overfitting in AT [31]	85.18	53.14
Self-Adaptive Training [13]	83.48	53.34
FAT [38]	84.52	53.51
TRADES [37]	84.92	53.08
LLR^† [28]	86.28	52.84
LBGAT+TRADES ( $\alpha=0$ )* [6]	88.70	53.57
LBGAT+TRADES ( $\alpha=0$ ) [6]	88.22	52.86
LBGAT+TRADES ( $\alpha=6$ ) [6]	81.98	53.14
LAS-AT [14]	86.23	53.58
LAS-AWP [14]	87.74	55.52
CAT	86.22	54.11
CAT	86.51	54.20
CAT+AWP	86.74	56.43
CAT+AWP	87.01	56.61

Table 4: Quantitative comparison with the state-of-the-art adversarial traing methods. WideResNet-34-10 is the basic network in our CAT framework. * denotes WideResNet-34-20 network, and

\dagger

denotes WideResNet-40-8 network. AWP is equipped with adversarial training methods to get better results.

4.3 Comparision to SOTA

We use WideResNet-34-10 [36] networks for collaborative adversarial training to compare with previous sota methods. Tab. 4 shows the accuracy of the different methods for natural examples and the robustness against Auto-Attack. From the table, we can conclude that the robustness of both networks trained with CAT outperforms the previous methods, demonstrating the state-of-the-art performance obtained by our CAT. AWP further boosts the robustness of CAT with 2.41% improvement.

4.4 Comparision to KD-AT

In general, the robustness of large models is higher than that of small models under the same training settings. For example, WideResNet-34-10 [36] trained by TRADES can achieve 53.08% robustness against AA, while the accuracy of ResNet-18 is only 49.21%. Researchers use knowledge distillation to distill the robustness of large models to small models and obtained good results. We call these methods KD-AT. Considering that CAT also involves the collaborative training of two models, we compare CAT with the KD-AT method. To give a fair comparison, we use two different-size networks for CAT training, the same as the teacher and student network used in KD-AT. Note that, unlike the KD method where the teacher is trained in advance, our CAT is trained with both the large model and the small model simultaneously, so there is no concept of teacher and student. In another word, we extend previous offline distillation (2 stages) to an online way (1 stage) and achieve better performance with lower computation resources. Illustration comparison is shown in Appendix B.

Tab. 5 shows the results of Knowledge Distillation-AT and CAT, where ARD [8], IAD [39], and RSLAD [40] are trained by KD-AT using TRADES trained WideResNet-34-10 network as teacher. CAT is collaboratively trained using two networks of different sizes. It can be seen that our method obtains high adversarial robustness and also obtains high clean accuracy. More importantly, The robustness of CAT is higher than RSLAD equipped with AWP.

Method	Stage	Time	Clean	AA
ARD [8]	2	2720	83.93	49.19
IAD [39]	2	2723	83.24	49.10
RSLAD [40]	2	2723	83.38	51.49
RSLAD [40]+AWP	2	-	81.26	51.62
CAT	1	2516	84.39	51.72

Table 5: Quantitative comparison with the state-of-the-art KD-AT methods. A WideResNet-34-10 and a ResNet-18 network are used in our CAT to have a fair comparison with distillation methods. Time denotes training time (s) per epoch.

5 Property Analysis

5.1 Alleviate Overfitting

Overfitting in adversarial training is first proposed by [31], which shows the test robustness decreases after peak robustness. And overfitting is one of the most concerning problems in adversarial training. Here, we investigated the overfitting problem in CAT with VGG-16. Results are illustrated in Fig. 4. Our CAT can alleviate the overfitting problem that widely occurs in previous adversarial methods. Moreover, the performance for CAT has not saturated, and high performance is expected with longer epoch training.

5.2 Correlation of discrepancy and CAT

We analyze the correlation between the discrepancy of different adversarial training methods and their adversarial robustness after CAT. First, we compute the prediction intersection between different methods, formulated as:

intersection=\frac{1}{N}\sum_{x_{i}\in D}\mathbb{I}(f^{AT}(x_{i}),g^{TRADES}(x_{i})),

(7)

where D is the datasets, and $\mathbb{I}$ is indicator function, which is 1 when $f^{AT}(x_{i})=g^{TRADES}(x_{i})$ and 0 otherwise. Prediction discrepancy equals 1 minus intersection. The larger this value is, the greater the discrepancy. Then, we report the adversarial robustness of CAT trained in different settings. Results are reported in Tab. 6. A conclusion can be drawn that the greater the discrepancy between different methods is, the higher the adversarial robustness after CAT.

Method	PGD₂₀	Prediction discrepancy
CAT_AT-ALP	53.11	18.98%
CAT_TRADES-ALP	54.44	21.05%
CAT_AT-TRADES	54.52	22.54%

Table 6: The correlation between white-box robustness after CAT and prediction discrepancy of different methods on CIFAR-10. ResNet-18 networks are used in our CAT.

Method	Clean	FGSM	PGD₂₀	CW_∞	AA
CAT_T-A	83.91	59.76	54.44	52.56	51.02
CAT_T-A	84.75	59.76	54.17	52.72	50.85
CAT_A-A-T	84.50	60.17	54.64	52.98	51.28
	84.62	60.25	54.87	53.04	51.42
	84.29	60.24	55.04	53.38	51.74

Table 7: The white-box robustness of CAT on CIFAR-10. We report the results of the best checkpoint. Resnet-18 is the basic network of CAT. T-A is short for TRADES-ALP, denoting collaboration between TRADES and ALP. A-A-T is short for AT-ALP-TRADES, denoting collaboration between AT, ALP, and TRADES.

5.3 CAT of Three models with three methods

Our CAT is a generalized method, which can use any number of different adversarial training methods for collaborative learning. We conduct an experiment on CAT by collaborating three adversarial training methods. The results are reported in Tab. A4. The robustness improvement is more significant than CAT trained with two adversarial-trained methods, which shows the generalizability of our CAT. Collaborating three methods can bring 0.7% improvement against Auto-Attack.

6 Conclusion

In this paper, we first analyze the properties of different adversarial training methods and find that networks trained by different methods perform differently on sample instances, i.e., the network can correctly classify samples that are misclassified by other networks. Based on this observation, we propose a collaborative adversarial training framework to improve the robustness of both networks. CAT aims to guide network learning using true label supervision together with the knowledge mastered in peer networks, which is different from its own knowledge. Extensive experiments on different datasets and different networks demonstrate the effectiveness of our approach, and state-of-the-art performance is achieved. Furthermore, property analysis is conducted to get a better understanding of CAT. Broadly, CAT can be easily extended to multiple networks for collaborative adversarial training. We hope that CAT brings a new perspective to the study of adversarial training.

References

[1] Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square attack: a query-efficient black-box adversarial attack via random search. In ECCV, 2020.
[2] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp), 2017.
[3] Jianyu Chen, Bodi Yuan, and Masayoshi Tomizuka. Model-free deep reinforcement learning for urban autonomous driving. arXiv preprint arXiv:1904.09503, 2019.
[4] Francesco Croce and Matthias Hein. Minimally distorted adversarial examples with a fast adaptive boundary attack. In ICML, 2020.
[5] Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In ICML, 2020.
[6] Jiequan Cui, Shu Liu, Liwei Wang, and Jiaya Jia. Learnable boundary guided adversarial training. In ICCV, 2021.
[7] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial attacks with momentum. In CVPR, 2018.
[8] Micah Goldblum, Liam Fowl, Soheil Feizi, and Tom Goldstein. Adversarially robust distillation. In AAAI, 2020.
[9] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
[10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
[11] Geoffrey Hinton, Oriol Vinyals, Jeff Dean, et al. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
[12] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
[13] Lang Huang, Chao Zhang, and Hongyang Zhang. Self-adaptive training: beyond empirical risk minimization. NeurIPS, 2020.
[14] Xiaojun Jia, Yong Zhang, Baoyuan Wu, Ke Ma, Jue Wang, and Xiaochun Cao. Las-at: Adversarial training with learnable attack strategy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13398–13408, 2022.
[15] Harini Kannan, Alexey Kurakin, and Ian Goodfellow. Adversarial logit pairing. arXiv preprint arXiv:1803.06373, 2018.
[16] Bin Kong, Xin Wang, Zhongyu Li, Qi Song, and Shaoting Zhang. Cancer metastasis detection via spatially structured deep network. In IPMI, 2017.
[17] Alex Krizhevsky et al. Learning multiple layers of features from tiny images. Tech Report, 2009.
[18] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NeurIPS, 2012.
[19] Jiadong Lin, Chuanbiao Song, Kun He, Liwei Wang, and John E Hopcroft. Nesterov accelerated gradient and scale invariance for adversarial attacks. In ICLR, 2019.
[20] Divyam Madaan, Jinwoo Shin, and Sung Ju Hwang. Adversarial neural pruning with latent vulnerability suppression. In ICML, 2020.
[21] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
[22] Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In CVPR, 2016.
[23] Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Fatih Porikli. A self-supervised approach for adversarial robustness. In CVPR, 2020.
[24] Nikhil R Pal and Sankar K Pal. A review on image segmentation techniques. PR, 1993.
[25] Tianyu Pang, Xiao Yang, Yinpeng Dong, Hang Su, and Jun Zhu. Bag of tricks for adversarial training. In ICLR, 2020.
[26] Tianyu Pang, Xiao Yang, Yinpeng Dong, Kun Xu, Jun Zhu, and Hang Su. Boosting adversarial training with hypersphere embedding. NeurIPS, 2020.
[27] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE symposium on security and privacy (SP), 2016.
[28] Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Krishnamurthy Dvijotham, Alhussein Fawzi, Soham De, Robert Stanforth, and Pushmeet Kohli. Adversarial robustness through local linearization. NeurIPS, 2019.
[29] Sylvestre-Alvise Rebuffi, Sven Gowal, Dan Andrei Calian, Florian Stimberg, Olivia Wiles, and Timothy A Mann. Data augmentation can improve robustness. Advances in Neural Information Processing Systems, 2021.
[30] Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In CVPR, 2016.
[31] Leslie Rice, Eric Wong, and Zico Kolter. Overfitting in adversarially robust deep learning. In ICML, 2020.
[32] Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.
[33] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[34] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
[35] Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan L Yuille, and Kaiming He. Feature denoising for improving adversarial robustness. In CVPR, 2019.
[36] Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
[37] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. In ICML, 2019.
[38] Jingfeng Zhang, Xilie Xu, Bo Han, Gang Niu, Lizhen Cui, Masashi Sugiyama, and Mohan Kankanhalli. Attacks which do not kill training make adversarial learning stronger. In ICML, 2020.
[39] Jianing Zhu, Jiangchao Yao, Bo Han, Jingfeng Zhang, Tongliang Liu, Gang Niu, Jingren Zhou, Jianliang Xu, and Hongxia Yang. Reliable adversarial distillation with unreliable teachers. arXiv preprint arXiv:2106.04928, 2021.
[40] Bojia Zi, Shihao Zhao, Xingjun Ma, and Yu-Gang Jiang. Revisiting adversarial robustness distillation: Robust soft labels make student better. In ICCV, 2021.

Appendix A More experimental results

A.1 VGG-16 results on CIFAR-10

The white-box robustness of VGG-16 [33] models trained using AT, ALP, TRADES, and CAT are reported in Tab. A1. The setting for VGG-16 is the same as ResNet-18 models, i.e., $\alpha=1.0/20$ and $\beta=1.0/20$ . The improvement for CAT with VGG-16 models is as consistent with ResNet-18 models. CAT can boost model’s robustness under AutoAttack with 2.0 points.

A.2 Mobilenet results on CIFAR-10

Similar to the above VGG-16 models, we report the while-box robustness of MobileNet [12] on CIFAR-10 datasets under various attacks in Tab. A2. The experiment set is the same as the previous setting. We can see that our CAT brings 1.0 improvement for MobileNet under AutoAttack, which is the most powerful adversarial attack method.

A.3 ResNet-18 results on Tiny-ImageNet

For the large-scale ImageNet dataset, just as all the baseline methods did not report the results, we are also unable to evaluate on ImageNet due to the very high training cost. To investigate the performance of our CAT in large datasets, we conduct the experiment of white-box robustness of ResNet-18 on Tiny-ImageNet, which also is a widely used dataset in adversarial training. The results are shown in Tab. A3. Surprisingly, CAT shows impressive robustness on the large-scale dataset. The improvement is as significant as ResNet-18 in small datasets like CIFAR-10 and CIFAR-100.

A.4 CAT of One model with various attacks

For our CAT method, we use two networks and two different attack methods for each network to perform adversarial training. An interesting baseline is one network with two different attack methods. Therefore, we use PGD and CW as our attack methods and one ResNet-18 as our network. The results are reported in Tab. A4 (CAT_P-C entry). The improvement for this setting is not significant as the previous setting, but it still, boosts the model’s robustness against all four attacks.

A.5 CAT of Two models with same methods

Another interesting baseline is two networks trained by the same adversarial training methods, i.e., two ResNet-18 networks are both trained by TRADES. We denote this setting as CAT_T-T. The results are reported in Tab. A4. The improvement for this setting is not significant as the previous setting, but it still, boosts the model’s robustness against all four attacks. However, the improvement is more significant than just using one network. A conclusion can be drawn that two networks are important for CAT to achieve better adversarial robustness.

Method	Clean	FGSM	PGD₂₀	CW_∞	AA
AT	78.31	53.11	48.39	46.32	43.69
TRADES	79.11	53.75	48.28	45.93	44.63
ALP	80.23	52.18	47.30	45.23	43.68
CAT	79.23	54.47	49.43	47.19	45.48
CAT	80.12	54.48	48.30	47.23	45.33

Table A1: The white-box robustness results (accuracy

(\%)

) of CAT on CIFAR-10. We report the results of the best checkpoint. The best results are marked using boldface. Two VGG-16 networks are used in our CAT framework.

Method	Clean	FGSM	PGD₂₀	CW_∞	AA
AT	76.24	50.27	44.99	43.03	40.10
TRADES	75.84	49.65	45.26	42.04	41.08
ALP	79.46	50.14	43.95	42.08	40.01
CAT	80.14	51.25	46.38	44.24	42.20
CAT	79.86	51.28	46.22	44.05	42.16

Table A2: The white-box robustness results (accuracy

(\%)

) of CAT on CIFAR-10. We report the results of the best checkpoint. The best results are marked using boldface. Two MobileNet networks are used in our CAT framework.

Method	Clean	PGD₅₀	CW_∞	AA
AT	43.98	19.98	17.60	13.78
TRADES	39.16	15.74	12.92	12.32
ALP	39.85	17.28	15.34	12.98
CAT	44.35	20.86	19.43	14.96
CAT	44.76	21.02	19.64	15.63

Table A3: The white-box robustness results (accuracy

(\%)

) of CAT on Tiny-ImageNet. We report the results of the best checkpoint. The best results are marked using boldface. Two ResNet-18 networks are used in our CAT framework.

Method	Best Checkpoint
Method	Clean	FGSM	PGD₂₀	CW_∞	AA
AT	82.82	57.57	51.76	50.05	47.55
CAT_T-A	83.91	59.76	54.44	52.56	51.02
CAT_T-A	84.75	59.76	54.17	52.72	50.85
CAT_P-C	82.09	56.48	52.48	49.28	48.06
CAT_T-T	81.94	58.85	54.19	51.52	50.30
CAT_T-T	82.13	58.77	54.02	51.56	50.14

Table A4: The white-box robustness results (accuracy

(\%)

) of CAT on CIFAR-10. We report the results of the best checkpoint. The best results are marked using boldface. P-C denotes one network trained by PGD and CW. T-A is short for TRADES-ALP, denoting two networks with TRADES and ALP. T-T is short for TRADES-TRADES, denoting two networks with TRADES and TRADES.

Appendix B Discussion

In this section, we illustrate three types of distillation methods, shown in Fig. B1. Traditional knowledge distillation has a two-stage optimization, which is pre-training the large-scale teacher model and distilling students with pre-trained teachers in the first stage. RSLAD [40] is implemented in this paradigm. Two-stage optimization brings a large computation cost. Compared to RSLAD, our CAT is based on collaborative learning and only needs one-stage optimization with two student models.