This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

AugRmixAT: A Data Processing and Training Method for Improving Multiple Robustness and Generalization Performance

Abstract

Deep neural networks are powerful, but they also have shortcomings such as their sensitivity to adversarial examples, noise, blur, occlusion, etc. Moreover, ensuring the reliability and robustness of deep neural network models is crucial for their application in safety-critical areas. Much previous work has been proposed to improve specific robustness. However, we find that the specific robustness is often improved at the sacrifice of the additional robustness or generalization ability of the neural network model. In particular, adversarial training methods significantly hurt the generalization performance on unperturbed data when improving adversarial robustness. In this paper, we propose a new data processing and training method, called AugRmixAT, which can simultaneously improve the generalization ability and multiple robustness of neural network models. Finally, we validate the effectiveness of AugRmixAT on the CIFAR-10/100 and Tiny-ImageNet datasets. The experiments demonstrate that AugRmixAT can improve the model’s generalization performance while enhancing the white-box robustness, black-box robustness, common corruption robustness, and partial occlusion robustness.

Index Terms—  Deep neural networks, robustness, data processing, generalization ability

1 Introduction

Deep neural networks have achieved remarkable success in a variety of fields and have been widely used in areas where reliability and security are critical, such as medical image processing [1], autonomous driving [2], and face recognition [3]. Unfortunately, recent studies [4, 5] have shown that artificially adding an imperceptible adversarial perturbation to the input image can significantly reduce the recognition ability of the neural network or guide the neural network to identify it as the characteristic wrong target. In addition to artificially designed adversarial examples, there are various common corruptions [6] and occlusions [7] in real-world environments that also affect the robustness and reliability of neural networks.

Many methods [4, 5, 8, 9, 10] have recently been proposed for defending against adversarial attacks. Among such defense methods, adversarial training has proven to be one of the most promising methods [5, 9, 10]. However, adversarial training also has a huge drawback in that it drastically reduces the generalization ability of neural networks on the original data [5, 9]. Furthermore, we find that adversarial training is similarly detrimental to occlusion robustness. Previous studies [11, 12] have also shown that improving one specific robustness is not necessarily beneficial or even harmful to another specific robustness. In our experiments, we can also discover that CutMix [13] can effectively enhance the occlusion robustness but is detrimental to noise robustness and adversarial robustness. However, for security-sensitive applications in practice, we cannot consider only a single specific robustness, but multiple aspects of robustness and generalization performance of neural network models.

To address the above issues, we propose AugRmixAT, a new data processing and training method that can simultaneously improve the multiple robustness and generalization performance of neural network models. AugRmixAT utilizes traditional data augmentation, mixed data augmentation [14, 13, 15, 16] between different samples, and data augmentation with added adversarial perturbation to process and generate multiple different sets of augmented data. To ensure the generalization performance on clean data (standard test data), AugRmixAT uses both soft cross-entropy and Jensen-Shannon divergence [17] consistent loss to train multiple sets of augmented data in a surrogate manner. Finally, we experimented on CIFAR-10/100 and Tiny-ImageNet [18] and showed that AugRmixAT can simultaneously improve white-box robustness, black-box robustness, 19 common corruption robustness on CIFAR-10/100, 15 common corruption robustness on Tiny-ImageNet, partial occlusion robustness, and generalization performance on clean data.

Refer to caption
Fig. 1: An example of AugRmixAT.

2 Related Work

Data Augmentation. Data augmentation is a very practical and powerful technique to increase the diversity of training datasets, enhance the generalization ability of neural networks and prevent overfitting [19]. For instance, some of the most commonly used data augmentations in computer vision are geometric transformations, flipping, color modification, cropping, rotation, translation, noise injection and random erasing [20]. Recently, mixed sample data augmentation methods have gained tremendous attention and a series of mixed sample data augmentation methods [14, 13, 15, 16] have been proposed. Mixup [14] is the first proposed mixed sample data augmentation that mixes two different samples in a convex combination to generate a new training sample and corresponding label. Combining the ideas of Mixup and Cutout [7], CutMix [13] uses cutting and pasting patches between training images for mixing, and ground truth labels are also proportionally mixed with patch regions. Fmix [15] uses a random binary mask obtained by applying a threshold to low-frequency images sampled from Fourier space to further improve the shape of CutMix mixed patches. To solve the problem of label misallocation and object information missing in CutMix, ResizeMix [16] mixes training data by directly resizing the source image to a small patch and then pasting it on another image. AugMix [21] is proposed to improve both the generalization performance and the corruption robustness by mixing common data augmentation.

Adversarial Training (AT). Adversarial training, which augments training dataset with adversarial examples, is one of the most effective methods of defending against adversarial attacks [4, 5, 9, 10]. Therefore, we can also consider adversarial training as a data augmentation technique. Goodfellow et al. [4] proposed the Fast Gradient Sign Method (FGSM), which is a simple and fast method to generate adversarial examples for adversarial training. Projected Gradient Descent (PGD) [5] adversarial training leverages the PGD attack to generate adversarial examples and trains only with the adversarial examples. Zhang et al. [9] proposed TRADES to specifically maximize the trade-off of adversarial training between adversarial robustness and standard accuracy. Lamb et al. [10] proposed Interpolated Adversarial Training(IAT), which trains on interpolations of adversarial examples along with interpolations of unperturbed examples and improves adversarial robustness without sacrificing too much standard accuracy.

3 AugRmixAT

Previous data augmentation [14, 13, 21] and adversarial training [5, 9, 10] methods can effectively improve specific robustness or generalization performance, but they are difficult to improve multiple robustness and generalization abilities of deep neural network models simultaneously. In particular, most adversarial training [5, 9, 10] tends to sacrifice standard accuracy when enhancing adversarial robustness. AugRmixAT is an image data processing and training method that can simultaneously improve multiple robustness and generalization performance of models and is easy to slot into existing training pipelines. Figure 1 shows an example of AugRmixAT. First, a batch of input images XX is used to generate data X¯\bar{X} by “Augment And Mix” data augmentation and to generate adversarial samples X^\hat{X} by adding adversarial perturbations, respectively. Next, XX, X¯\bar{X}, and X^\hat{X} are processed with mixed sample data augmentation to generate XX^{\prime}, X¯\bar{X}^{\prime}, and X^\hat{X}^{\prime}. The corresponding mixed labels YY^{\prime} are also generated using the labels YY. Finally, we use a soft cross-entropy loss ce\mathcal{L}_{ce} and Jensen-Shannon divergence consistent loss js\mathcal{L}_{js} to train XX^{\prime}, X¯\bar{X}^{\prime}, and X^\hat{X}^{\prime}.

Augment And Mix. We use the same “Augment And Mix” operation as AugMix [21]. The “Augment And Mix” operation starts by randomly selecting multiple augmentations from the base augmentation set to form multiple augmentation chains and producing multiple augmentation samples through the augmentation chains. Then, multiple augmentation samples are mixed through a random convex combination sampled from a Dirichlet(α,,α\alpha,...,\alpha) distribution. Finally, we combine this mixed sample with the original sample through a second random convex combination sampled from a Bata(α,α\alpha,\alpha) distribution. In the experiment, we put α\alpha to 1 and the number of augmentation chains to 3. Each augmentation chain consists of 1 to 3 random base augmentation operations. Our base data augmentation set contains autocontrastautocontrast, equalizeequalize, rotaterotate, solarizesolarize, shearshear, translatetranslate.

Adversarial examples. We apply PGD [5] adversarial attacks to generate adversarial examples, which can be expressed as

x^0=x+0.001𝒩(𝟎,𝐈),\displaystyle\hat{x}^{0}=x+0.001\cdot\mathcal{N}(\mathbf{0},\mathbf{I}), (1)
x^t+1=(x,ϵ)(η𝐬𝐢𝐠𝐧(x^tkl(fθ(x),fθ(x^t)))+x^t),\displaystyle\hat{x}^{t+1}=\prod_{\mathcal{B}_{\infty}(x,\epsilon)}(\eta\mathbf{sign}(\nabla_{\hat{x}^{t}}\mathcal{L}_{kl}(f_{\theta}(x),f_{\theta}(\hat{x}^{t})))+\hat{x}^{t}),

where 𝒩(𝟎,𝐈)\mathcal{N}(\mathbf{0},\mathbf{I}) is the Gaussian distribution function with zero mean and identity variance, ϵ\epsilon is the adversarial perturbation budget, η\eta is the perturbation step size, (x,ϵ)\mathcal{B}_{\infty}(x,\epsilon) represents a neighborhood of x:{xadv:xxadvϵ}x:\{x^{adv}:||x-x^{adv}||_{\infty}\leq\epsilon\}, 𝐬𝐢𝐠𝐧()\mathbf{sign}(\cdot) is the sign function, kl()\mathcal{L}_{kl}(\cdot) is the KL divergence loss function, fθf_{\theta} denotes the neural network with parameters θ\theta.

Mixed Sample Data Augmentation. To further obtain more diverse training data, we simultaneously perform the same mixed sample data augmentation operation on the original images XX, the “Augment And Mix” enhanced images X¯\bar{X}, and the adversarial examples X^\hat{X}. In our work, we integrate multiple mixed sample data augmentation [5, 9, 10] in a randomly chosen manner. Our mixed sample data augmentation operates as follows,

rand_index=Randperm(batch_size),\displaystyle rand\_index=Randperm(batch\_size), (2)
Mix=Random.Choices(Mixup,CutMix,ResizeMix,FMix),\displaystyle\begin{aligned} Mix=Random.Choices(Mixup,CutMix,\\ ResizeMix,FMix),\end{aligned}
X=Mix(X,X[rand_index]),\displaystyle X^{\prime}=Mix(X,X[rand\_index]),
X¯=Mix(X¯,X¯[rand_index]),\displaystyle\bar{X}^{\prime}=Mix(\bar{X},\bar{X}[rand\_index]),
X^=Mix(X^,X^[rand_index]),\displaystyle\hat{X}^{\prime}=Mix(\hat{X},\hat{X}[rand\_index]),
Y=γY+(1γ)Y[rand_index],\displaystyle Y^{\prime}=\gamma Y+(1-\gamma)Y[rand\_index],

where Randperm()Randperm(\cdot) is the random permutation function, Random.Choices()Random.Choices(\cdot) is the random choice function, γ\gamma is the corresponding mixing ratio.

Loss Function. To ensure the generalization ability of the model and to improve the robustness of the model, we use the Jensen-Shannon divergence consistent loss function to train the mixed “Augment And Mix” enhanced data X¯\bar{X}^{\prime} and mixed adversarial examples X^\hat{X}^{\prime} in a surrogate manner. Our loss function is defined as

=ce(fθ(X),Y)+λ1js(fθ(X),fθ(X¯))+\displaystyle\mathcal{L}=\mathcal{L}_{ce}(f_{\theta}(X^{\prime}),Y^{\prime})+\lambda_{1}\mathcal{L}_{js}(f_{\theta}(X^{\prime}),f_{\theta}(\bar{X}^{\prime}))+ (3)
λ2js(fθ(X),fθ(X^)),\displaystyle\lambda_{2}\mathcal{L}_{js}(f_{\theta}(X^{\prime}),f_{\theta}(\hat{X}^{\prime})),

where λ1\lambda_{1} and λ2\lambda_{2} are two regularization hyperparameters. The detailed algorithm is described in Algorithm Block 1.

Algorithm 1 AugRmixAT algorithm
0:  Training dataset DD, Perturbation ϵ\epsilon, Perturbation step size η\eta, Number of iterations TT, Neural network parameters θ\theta, Neural network fθf_{\theta}
0:  𝑨𝒖𝒈𝒎𝒆𝒏𝒕𝑨𝒏𝒅𝑴𝒊𝒙\boldsymbol{AugmentAndMix} is the data augmentation and mixing function for the samples themselves, 𝑹𝑴𝒊𝒙\boldsymbol{RMix} is the mixed data augmentation function between different samples, 𝒩(𝟎,𝐈)\mathcal{N}(\mathbf{0},\mathbf{I}) is the Gaussian distribution function with zero mean and identity variance
0:  KL divergence loss function kl\mathcal{L}_{kl}, Jensen-Shannon divergence loss function js\mathcal{L}_{js}, Soft cross-entropy loss function ce\mathcal{L}_{ce}
1:  repeat
2:     Read mini-batch (X,Y)={(xi,yi),,(xm,ym)}(X,Y)=\{(x_{i},y_{i}),...,(x_{m},y_{m})\} from training dataset DD
3:     for i=1,,mi=1,...,m (in parallel) do
4:        x¯i𝑨𝒖𝒈𝒎𝒆𝒏𝒕𝑨𝒏𝒅𝑴𝒊𝒙(xi)\bar{x}_{i}\leftarrow\boldsymbol{AugmentAndMix}(x_{i})
5:        x^ixi+0.001𝒩(𝟎,𝐈)\hat{x}_{i}\leftarrow x_{i}+0.001\cdot\mathcal{N}(\mathbf{0},\mathbf{I})
6:        for t=1,,Tt=1,...,T do
7:           x^i(xi,ϵ)(η𝐬𝐢𝐠𝐧(x^ikl(fθ(xi),fθ(x^i)))+x^i)\hat{x}_{i}\leftarrow\!\prod\limits_{\mathcal{B}_{\infty}(x_{i},\epsilon)}(\eta\mathbf{sign}(\nabla_{\hat{x}_{i}}\mathcal{L}_{kl}(f_{\theta}(x_{i}),f_{\theta}(\hat{x}_{i})))+\hat{x}_{i})
8:        end for
9:     end for
10:     X,X¯,X^,Y𝑹𝑴𝒊𝒙(X,X¯,X^,Y)X^{\prime},\bar{X}^{\prime},\hat{X}^{\prime},Y^{\prime}\leftarrow\boldsymbol{RMix}(X,\bar{X},\hat{X},Y), where X¯={x¯i,,x¯m}\bar{X}=\{\bar{x}_{i},...,\bar{x}_{m}\}, X^={x^i,,x^m}\hat{X}=\{\hat{x}_{i},...,\hat{x}_{m}\}
11:     =1mi=1m(ce(fθ(xi),yi)+λ1js(fθ(xi),fθ(x¯i))+λ2js(fθ(xi),fθ(x^i)))\mathcal{L}=\frac{1}{m}\sum\limits_{i=1}^{m}(\mathcal{L}_{ce}(f_{\theta}(x^{\prime}_{i}),y^{\prime}_{i})+\lambda_{1}\mathcal{L}_{js}(f_{\theta}(x^{\prime}_{i}),f_{\theta}(\bar{x}^{\prime}_{i}))+\lambda_{2}\mathcal{L}_{js}(f_{\theta}(x^{\prime}_{i}),f_{\theta}(\hat{x}^{\prime}_{i})))
12:     θθβθ\theta\leftarrow\theta-\beta\nabla_{\theta}\mathcal{L}
13:  until training converged
Table 1: Corruption Error (CE,%), and mCE (%) values on CIFAR-10-C. The mCE value is calculated by averaging all 19 CE values.
Noise Blur Weather Digital
Model Gauss. Shot Impulse Speckle Defocus Glass Motion Zoom Gauss. Snow Frost Fog Spatter Bright Contrast Elastic Pixel Saturate JPEG mCE
Standard 52.64 39.49 48.86 35.44 14.13 43.36 18.18 16.10 20.71 14.29 16.96 9.13 14.24 5.11 18.80 14.00 22.41 6.73 20.83 22.71
Mixup 32.40 29.68 33.39 27.11 11.03 32.4 13.61 14.13 22.27 8.16 7.52 7.83 6.53 4.06 21.97 11.38 22.01 5.56 17.23 17.65
CutMix 79.40 68.68 61.24 66.91 15.87 44.70 19.06 20.45 28.56 12.25 18.48 8.16 7.85 4.84 13.36 14.74 30.25 7.04 29.93 29.04
AugMix 16.38 11.83 9.83 10.41 4.17 19.17 5.46 5.22 4.63 7.67 7.66 5.65 5.18 3.85 5.81 8.08 10.29 5.54 11.50 8.33
PGDAT 18.29 16.98 30.43 17.06 18.49 20.33 22.84 19.59 21.06 19.46 22.85 41.16 17.97 16.50 55.51 19.44 15.34 16.81 15.41 22.40
TRADES (1/λ=11/\lambda=1) 17.75 16.60 27.94 16.73 18.15 20.35 22.38 19.40 20.59 19.44 23.75 40.06 17.53 16.57 54.54 19.35 15.10 16.63 15.04 22.00
TRADES (1/λ=61/\lambda=6) 19.56 18.62 29.35 19.04 19.82 21.51 24.30 20.81 22.00 21.08 25.46 40.96 19.44 18.41 56.47 20.91 16.84 18.41 16.85 23.68
IAT 15.48 13.78 27.68 14.08 8.73 15.18 10.90 10.33 13.61 10.05 9.23 10.29 7.59 6.63 19.30 9.67 11.73 7.59 11.32 12.27
AugRmixAT-1-1 13.71 11.19 7.33 10.54 3.10 18.56 5.45 4.11 3.82 5.45 6.18 4.26 3.03 2.97 8.05 5.98 15.26 3.93 10.73 7.56
AugRmixAT-1-32 14.69 13.67 17.46 13.76 15.09 17.87 18.56 16.11 16.91 16.70 19.46 32.36 14.39 14.07 49.29 16.46 12.97 14.39 13.21 18.25
Table 2: White-box Top1 robust accuracy (%) and Clean Top1 accuracy (%) on CIFAR-10.
White-box attacks
Model Clean FGSM PGD10 PGD20 CW20
Standard 96.06 56.30 5.10 0.76 0.22
Mixup 97.12 65.71 14.21 2.99 0.99
CutMix 96.90 41.40 4.37 1.63 0.55
AugMix 96.49 53.38 4.01 0.23 0.09
PGDAT 87.06 59.04 62.13 51.11 50.81
TRADES (1/λ=11/\lambda=1) 87.37 59.21 61.46 50.50 50.27
TRADES (1/λ=61/\lambda=6) 85.35 59.37 61.84 51.79 52.70
IAT 95.27 86.59 63.39 53.49 54.15
AugRmixAT-1-1 98.17 88.01 66.22 53.21 54.33
AugRmixAT-1-32 89.14 65.88 67.81 57.50 56.47
Table 3: Black-box Top1 robust accuracy (%) on CIFAR-10.
Defense Attack model
model Standard PGDAT TRADES IAT
PGDAT 86.40 - 69.38 77.44
TRADES (1/λ=11/\lambda=1) 86.60 68.69 - 76.87
IAT 89.81 77.01 76.70 -
AugRmixAT-1-1 91.96 87.37 87.53 83.57
AugRmixAT-1-32 88.49 75.20 74.69 79.60
Table 4: Top1 robust accuracy (%) of untargeted partial occlusion and Top2 robust accuracy (%) of targeted partial occlusion on CIFAR-10.
Model Untargeted Targeted Mean
Standard 77.58 76.41 77.00
Mixup 81.24 82.69 81.97
CutMix 90.95 92.59 91.77
AugMix 78.47 77.60 78.03
PGDAT 56.49 66.22 61.36
TRADES (1/λ=11/\lambda=1) 57.77 63.14 58.95
TRADES (1/λ=61/\lambda=6) 54.77 63.14 58.95
IAT 69.63 78.87 74.25
AugRmixAT-1-1 91.04 93.66 92.35
AugRmixAT-1-32 75.44 80.79 78.12

4 Experiments

4.1 Implementation Details

We use the same neural network architecture as in previous works [5, 9], i.e., WideResNet-34-10 [22], for experiments on CIFAR-10/100 and PreAct-ResNet18 [23] for experiments on Tiny-ImageNet [18]. Except for the different neural network architecture, other settings and hyperparameters are the same for all datasets. We apply the momentum stochastic gradient descent optimizer on both CIFAR-10/100 and Tiny-ImageNet. The initial learning rate is set to 0.1 and decays with the cosine annealing schedule [24]. We set the momentum as 0.9 and use the weight decay of 5×1045\times 10^{-4}. The batch size for training is set to 128×4128\times 4 and the maximum number of epochs is set to 200. The following is the setting of our main comparison method in the experiment.

Standard: The model trained on the original data does not use any data augmentation methods.

Mixup, CutMix, AugMix: The models trained using data augmentation methods Mixup [14], CutMix [13] and AugMix [21] respectively.

PGDAT, TRADES, IAT: The models trained using PGD Adversarial Training (PGDAT) [5], TRADES [9], and Interpolated Adversarial Training (IAT) [10] respectively, where the perturbation budget are set to 0.031, the perturbation step size are set to 0.007, and the number of iterations are set to 10. The way of combining examples in IAT is Mixup.

AugRmixAT-1-1, AugRmixAT-1-32: The models trained using our proposed method, in which the perturbation budget ϵ\epsilon, the perturbation step size η\eta, and the number of iterations TT are set the same as in PGDAT, TRADES, and IAT. “-1-1” means λ1=1,λ2=1\lambda_{1}=1,\lambda_{2}=1. “-1-32” means λ1=1,λ2=32\lambda_{1}=1,\lambda_{2}=32.

Additionally, all experiments were implemented and evaluated on the PyTorch [25] platform with four NVIDIA Tesla V100 GPUs.

4.2 CIFAR-10

Evaluation on White-box Robustness. The results of the white-box robustness on CIFAR-10 are shown in Table 2. We evaluate the robustness of all models against three types of white-box attacks for CIFAR-10, i.e., FGSM [4], PGD [5], and CW [26] (PGD with CW loss). For FGSM, we set the perturbation budget as 0.031. For PGD10, PGD20, and CW20, we set the perturbation budget to 0.031 and the perturbation step size to 0.003. PGD10 set the number of iterations as 10. PGD20 and CW20 set the number of iterations as 20.

We can see from Table 2 that all the compared adversarial training methods reduce the Clean accuracy, but the AugRmixAT-1-1 model trained by our method can improve the Clean accuracy. Moreover, it is 1.05% higher than the Mixup. Under the FSGM attack, the AugRmixAT-1-1 model has the best robust accuracy. Under the attacks of PGD10, PGD20 and CW20 respectively, the AugRmixAT-1-32 model achieves the best robust accuracy. In particular, the robust accuracy rate on PGD20 of the AugRmixAT-1-32 model is 6.39% higher than PGDAT, 5.71% higher than TRADES (1/λ=61/\lambda=6), and 4.01% higher than IAT.

Evaluation on Black-box Robustness. We use transfer-based black-box attacks [27] to evaluate the black-box robustness of the model. We first use each trained model to construct adversarial examples by PGD and then apply these adversarial examples to other models and evaluate their performance. We set the perturbation budget as 0.031, the perturbation step size as 0.003, and the number of iterations as 10. The results of the black-box robustness on CIFAR-10 are reported in Table 3. Again, the AugRmixAT-1-1 model trained by our method achieves higher robustness than the other models.

Evaluation on Common Corruptions Robustness. We evaluate the robustness of various common corruptions on the CIFAR-10-C [6], which consists of 19 types of corruption. Moreover, each type of corruption has 5 levels of severity.

Following prior works [6, 21], we adopt Corruption Error (CE) [6] to measure the common corruption robustness and mCE denotes the mean Corruption Error of the 19 corruption. As shown in Table 1, the AugRmixAT-1-1 model trained by our proposed method achieves the lowest CE on 15 of 19 common corruptions. Moreover, the mCE of AugRmixAT-1-1 is also the lowest, 0.77% lower than Augmix, 15.15% lower than Standard, 14.84% lower than PGDAT, and 4.71% lower than IAT.

Evaluation on Partial Occlusion Robustness. Compared to corruption and adversarial example attacks, partial occlusion should be more common. We use untargeted random partial occlusion and targeted random partial occlusion to evaluate the robustness of the model under partial occlusion attacks. Untargeted occlusion blocks are filled with 0 and targeted occlusion blocks are from other objects. For untargeted partial occlusion we used the Top1 robust accuracy metric and for targeted partial occlusion we used the Top2 robust accuracy. From Table 4, we can find that the previous adversarial training methods PGDAT, TRADES and IAT are difficult to defend against partial occlusion attacks and are even detrimental to the robustness of partial occlusion. In contrast, our method can not only effectively improve both targeted and untargeted occlusion robust accuracy, but also has a robust accuracy rate of 0.09% higher than CutMix in the untargeted occlusion and 1.07% higher than CutMix in the targeted occlusion. Furthermore, AugRmixAT-1-1 achieves the best performance under partial occlusion attacks, and far outperformed the models trained by other adversarial training methods.

Table 5: Clean Top1 accuracy (%) and robust accuracy (%) under various attacks on CIFAR-100 and Tiny-ImageNet.
CIFAR-100
White-box attacks Black-box attacks
Model Clean FGSM PGD10 PGD20 CW20 Standard PGDAT Corr Occ
Standard 80.12 24.26 2.09 0.55 0.25 - 62.38 51.53 54.64
Mixup 82.53 40.19 0.52 0.05 0.00 56.44 68.03 58.49 60.73
CutMix 82.38 19.73 0.66 0.12 0.00 34.46 61.47 48.20 71.95
AugMix 80.21 19.54 1.52 0.38 0.14 68.42 68.63 68.75 54.65
PGDAT 61.16 30.76 33.95 26.48 25.43 60.79 - 48.72 35.12
TRADES (1/λ=11/\lambda=1) 60.60 30.83 33.74 26.67 25.99 60.01 43.22 48.50 33.61
TRADES (1/λ=61/\lambda=6) 56.30 29.63 32.64 25.78 25.53 55.75 41.83 45.63 30.79
IAT 75.79 58.07 33.79 24.95 20.27 68.94 53.55 61.49 50.15
AugRmixAT-1-1 84.83 66.50 36.51 25.94 22.15 76.89 71.55 71.31 74.21
AugRmixAT-1-32 65.93 37.59 38.49 30.18 28.50 65.05 49.58 55.20 56.38
Tiny-ImageNet
White-box attacks Black-box attacks
Model Clean FGSM PGD10 PGD20 CW20 Standard PGDAT Corr Occ
Standard 63.90 1.07 0.10 0.00 0.00 - 48.45 24.31 49.34
Mixup 64.66 0.94 0.00 0.00 0.00 20.68 51.07 27.84 50.62
CutMix 67.78 2.75 0.02 0.00 0.00 13.80 53.44 24.63 62.26
AugMix 62.46 2.83 0.16 0.03 0.00 30.69 47.19 33.57 45.35
PGDAT 45.86 15.60 18.78 12.64 12.55 44.08 - 18.24 28.06
TRADES (1/λ=11/\lambda=1) 47.60 14.93 18.36 11.95 11.63 41.39 32.47 18.74 30.02
TRADES (1/λ=61/\lambda=6) 43.08 19.65 23.14 17.69 14.89 44.09 31.56 18.09 26.93
IAT 54.20 17.37 20.64 13.00 10.99 41.40 38.05 24.39 34.67
AugRmixAT-1-1 69.64 42.79 14.42 7.13 3.48 44.91 54.14 35.93 62.66
AugRmixAT-1-32 52.90 26.27 28.70 22.20 16.96 50.24 39.22 24.56 42.91
Table 6: Sensitivity of hyperparameters λ1\lambda_{1} and λ2\lambda_{2}
White-box attacks
λ1\lambda_{1} λ2\lambda_{2} Clean FGSM PGD10 PGD20 CW20 Corr Occ
1 1 96.91 79.81 62.43 49.23 47.12 89.28 90.16
2 1 96.89 80.66 61.20 46.95 44.41 90.34 89.01
3 1 96.95 80.42 58.84 40.69 36.88 90.27 89.85
4 1 96.94 80.63 57.99 40.33 36.74 90.64 88.65
5 1 96.82 79.88 57.43 39.00 34.69 91.05 89.33
1 2 96.34 79.68 65.20 52.78 50.78 89.46 88.98
1 4 94.33 77.26 65.12 52.59 50.21 87.55 85.87
1 8 89.94 59.73 63.16 50.70 48.19 82.50 78.10
1 16 88.05 60.03 64.18 53.17 50.06 79.93 73.88
1 32 84.81 59.61 63.66 54.86 51.83 76.41 69.52

4.3 CIFAR-100 and Tiny-ImageNet

We also verify the effectiveness of our method on CIFAR-100 and Tiny-ImageNet. The results are presented in Table 5. In Tables 5, “Corr” is the common corruptions robustness, evaluated using the mean corruption accuracy (mCA==11-mCE), “Occ” is the partial occlusion robustness, evaluated using the mean of Top1 untargeted occlusion robust accuracy and Top2 targeted occlusion robust accuracy. The other settings are the same as on the CIFAR-10.

4.4 Sensitivity of hyperparameters λ1\lambda_{1} and λ2\lambda_{2}

We apply PreAct-ResNet18 [23] to implement regularization hyperparameters λ1\lambda_{1} and λ2\lambda_{2} sensitivity experiments on CIFAR-10. The other settings are the same as the above experiments. We can observe from Table 6 that as the hyperparameters parameter λ1\lambda_{1} increases, the common corruptions robust accuracy increases while the adversarial robust accuracy decreases. Moreover, as the hyperparameters parameter λ2\lambda_{2} increases, the clean accuracy, the common corruptions robust accuracy, and the partial occlusion robust accuracy decrease while the adversarial robust accuracy increases. This also verifies that when improving only one specific robustness, it is often detrimental to the robustness of another one or more. In practical applications, we recommend setting both regularization hyperparameters λ1\lambda_{1} and λ2\lambda_{2} to 1, which can effectively improve the generalization performance and multiple robustness of the model.

5 Conclusion

We propose AugRmixAT, which is a new image data processing and training method. Unlike previous data augmentation and adversarial training, our method not only improves the generalization performance of neural network models but also improves a variety of robustness including white-box robustness, black-box robustness, common corruption robustness, and partial occlusion robustness. Moreover, AugRmixAT can be easily inserted into existing training pipelines, and we believe it can make neural networks used in real-world applications more reliable and secure.

References

  • [1] Dinggang Shen, Guorong Wu, and Heung-Il Suk, “Deep learning in medical image analysis,” Annual review of biomedical engineering, vol. 19, pp. 221–248, 2017.
  • [2] Sorin Grigorescu, Bogdan Trasnea, Tiberiu Cocias, and Gigel Macesanu, “A survey of deep learning techniques for autonomous driving,” Journal of Field Robotics, vol. 37, no. 3, pp. 362–386, 2020.
  • [3] Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K Reiter, “Adversarial generative nets: Neural network attacks on state-of-the-art face recognition,” arXiv preprint arXiv:1801.00349, 2017.
  • [4] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
  • [5] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations (ICLR), 2018.
  • [6] Dan Hendrycks and Thomas Dietterich, “Benchmarking neural network robustness to common corruptions and perturbations,” International Conference on Learning Representations (ICLR), 2019.
  • [7] Terrance DeVries and Graham W Taylor, “Improved regularization of convolutional neural networks with cutout,” arXiv preprint arXiv:1708.04552, 2017.
  • [8] Ali Dabouei, Sobhan Soleymani, Fariborz Taherkhani, Jeremy Dawson, and Nasser M Nasrabadi, “Exploiting joint robustness to adversarial perturbations,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 1122–1131.
  • [9] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan, “Theoretically principled trade-off between robustness and accuracy,” in International Conference on Machine Learning (ICML), 2019.
  • [10] Alex Lamb, Vikas Verma, Juho Kannala, and Yoshua Bengio, “Interpolated adversarial training: Achieving robust neural networks without sacrificing too much accuracy,” in Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, 2019.
  • [11] Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry, “Adversarially robust generalization requires more data,” in Neural Information Processing Systems (NeurIPS), 2018.
  • [12] Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry, “Exploring the landscape of spatial robustness,” in International Conference on Machine Learning (ICML), 2019.
  • [13] Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo, “Cutmix: Regularization strategy to train strong classifiers with localizable features,” IEEE International Conference on Computer Vision (ICCV), 2019.
  • [14] Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz, “mixup: Beyond empirical risk minimization,” International Conference on Learning Representations (ICLR), 2018.
  • [15] Ethan Harris, Antonia Marcu, Matthew Painter, Mahesan Niranjan, and Adam Prügel-Bennett Jonathon Hare, “Fmix: Enhancing mixed sample data augmentation,” International Conference on Learning Representations (ICLR), 2021.
  • [16] Jie Qin, Jiemin Fang, Qian Zhang, Wenyu Liu, Xingang Wang, and Xinggang Wang, “Resizemix: Mixing data with preserved object information and true labels,” arXiv preprint arXiv:2012.11101, 2020.
  • [17] Dominik Maria Endres and Johannes E Schindelin, “A new metric for probability distributions,” IEEE Transactions on Information theory, 2003.
  • [18] Patryk Chrabaszcz, Ilya Loshchilov, and Frank Hutter, “A downsampled variant of imagenet as an alternative to the cifar datasets,” arXiv preprint arXiv:1707.08819, 2017.
  • [19] Christopher M Bishop, “Pattern recognition,” 2006.
  • [20] Connor Shorten and Taghi M Khoshgoftaar, “A survey on image data augmentation for deep learning,” Journal of Big Data, vol. 6, no. 1, pp. 1–48, 2019.
  • [21] Dan Hendrycks, Norman Mu, Ekin D. Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan, “AugMix: A simple data processing method to improve robustness and uncertainty,” International Conference on Learning Representations (ICLR), 2020.
  • [22] Sergey Zagoruyko and Nikos Komodakis, “Wide residual networks,” arXiv preprint arXiv:1605.07146, 2016.
  • [23] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Identity mappings in deep residual networks,” in European Conference on Computer Vision (ECCV), 2016.
  • [24] I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” in International Conference on Learning Representations (ICLR), 2017.
  • [25] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer, “Automatic differentiation in pytorch,” 2017.
  • [26] Nicholas Carlini and David Wagner, “Towards evaluating the robustness of neural networks,” in IEEE Symposium on Security and Privacy (SP), 2017.
  • [27] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami, “Practical black-box attacks against machine learning,” in Proceedings of the 2017 ACM on Asia conference on computer and communications security, 2017.