This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

AdCorDA: Classifier Refinement via Adversarial Correction and Domain Adaptation

Lulan Shen, Ali Edalati, Brett Meyer, Warren Gross, James J. Clark
McGill University
Montreal, Quebec, Canada
lulan.shen@mail.mcgill.ca, james.j.clark@mcgill.ca
Abstract

This paper describes a simple yet effective technique for refining a pretrained classifier network. The proposed AdCorDA method is based on modification of the training set and making use of the duality between network weights and layer inputs. We call this input space training. The method consists of two stages - adversarial correction followed by domain adaptation. Adversarial correction uses adversarial attacks to correct incorrect training-set classifications. The incorrectly classified samples of the training set are removed and replaced with the adversarially corrected samples to form a new training set, and then, in the second stage, domain adaptation is performed back to the original training set. Extensive experimental validations show significant accuracy boosts of over 5% on the CIFAR-100 dataset. The technique can be straightforwardly applied to refinement of weight-quantized neural networks, where experiments show substantial enhancement in performance over the baseline. The adversarial correction technique also results in enhanced robustness to adversarial attacks.

1 Introduction - Input Space Training

In this paper, we present an alternative to standard neural network training methods. Rather than modifying network weights driven by errors relative to a training dataset, we consider input space training, which is driven by the effect of changing the inputs on the loss, rather than (only) the effect of changing the network weights on the loss. As noted by Feng and Tu [8], there is a duality between neural network layer inputs (activations) and weights with respect to the loss function LL, due to the mathematical form of the standard single-layer perceptron - y=f(wTx)y=f(w^{T}x). For a given change in the loss function value due to a small change in the weights ww, one can get an equivalent change in the loss by changing the input activations, xx, instead. That is, ΔL(x0,w0)=L(x0+δx,w0)L(x0,w0)=L(x0,w0+δw)L(x0,w0)\Delta L(x_{0},w_{0})=L(x_{0}+\delta x,w_{0})-L(x_{0},w_{0})=L(x_{0},w_{0}+\delta w)-L(x_{0},w_{0}). This observation of activity-weight duality leads to the following learning procedure, which we call input space training (IST), based on manipulation of the inputs:

  1. 1.

    Make a small change in the inputs xx that results in a reduction of the loss.

  2. 2.

    Use the duality between weights and activations to determine a change in the weights that gives a reduction in the loss for the original inputs.

To carry out this procedure, we need to accomplish two tasks - find a change in inputs that reduces the loss, and find an equivalent change in weights that reduces the loss on the original inputs. The paper by Feng and Tu [8] provides a method for accomplishing the second task. They note that there are, in general, more weight parameters than inputs, and so there are effectively unlimited possible weight changes that are equivalent to a given change in the inputs. They provide a minimum weight change solution to identify a unique weight update, as a linear combination of the current weight values.

Feng and Tu use the activity-weight duality principle for the purposes of quantifying how networks generalize, but do not use it to train a network. One could derive a training scheme from their equations but it would be slow and computationally expensive. In this paper, rather than training a network from scratch using input space training, we propose a method which takes a classifier network pretrained using standard back-propagation methods and then refines it using a one-shot (non-iterative) IST method. Inspired by activity-weight duality, our IST refinement method has two stages: first we perturb the training set to reduce the loss, and second, we adjust the weights by performing a domain adaptation step that adapts from the perturbed dataset to the original dataset.

2 Curriculum Learning

The first approach we propose for altering the training set to reduce loss is based on curriculum learning. Curriculum learning, first proposed by Bengio et. al [3], aims to improve the speed and accuracy of network training, by presenting data samples from the training set in an ordered fashion. Typically, easier samples are presented before difficult samples as the training progresses.

It is not obvious how to properly define the notions of “easy” and “hard”, however, and indeed many different definitions exist. Some of these definitions are based solely on the structure of the input examples, without consideration of the network being trained. Table 2 in the survey paper of Wang et. al [28] lists no less than nineteen different types of pre-defined input difficulty measures that have been used to guide curriculum learning. But the difficulty of an input can also depend on the network being trained. Problems that some networks find difficult may be easy for other networks, and vice-versa. So-called Self-Paced-Learning (SPL) methods, such as proposed by Kumar et. al [13] use dynamic measures of problem difficulty that are provided by the network itself as it trains. In the SPL method, easy problems are defined as those problems for which the network’s training loss is less than a (dynamically changing) threshold value.

We propose to use the curriculum separation of the training set into easy and hard problems, as defined by the training loss threshold, for our IST approach. We consider that, over the original training set, our pre-trained network achieves a particular loss value. If we remove the training set samples for which the loss is above a threshold, then we are left with a (modified) training set for which the average (and maximum) loss is less than that of the original training set. To avoid having to set a suitable threshold value, we propose using the pre-trained network to define easy vs. hard using the simple expedient of considering easy problems to be ones the network classifies correctly. This will naturally result in a separation of input samples based on loss. We use this procedure to satisfy the first step of our IST process - altering the inputs to reduce the loss. Even though we are not altering individual samples in this method, the collective of the samples on a batch level is being altered.

3 Adversarial Correction

We can take the curriculum approach outlined in the previous section a step further, by doing what we call adversarial correction to further modify the training set. This results in a larger set of inputs for which the loss is reduced than the curriculum approach.

The concept of adversarial attack is well known in the machine learning community [15]. Given a classifier network trained on a particular dataset, an adversary can modify an input slightly in such a way that the network gives a different classification output. One thing to keep in mind, however, especially for smaller networks, is that networks frequently give wrong answers, even in the absence of adversarial attacks. For example, as seen in Tab. 1, the baseline accuracy of small ResNet networks on the CIFAR-100 dataset [12] is around 80%, meaning that, even without considering possible adversarial attacks, such networks give the wrong output 20% of the time on the CIFAR-100 test set. However, because of this relatively frequent failure, in the application of such networks, especially on edge devices or embedded systems, network outputs are often combined with measures of uncertainty or confidence. This allows the user to judge whether a particular output can be trusted. Although this check is not perfect, it does mitigate the impact of adversarial attacks. In addition to that, there have been many training methods [17, 24] developed that increase the robustness of networks to adversarial attacks.

In this paper, rather than focusing on correct outputs being changed by adversarial attacks, we look at the effect of adversarial attacks on the outputs that the network already gets wrong. In such a situation, things cannot get any worse, as the network is already wrong, but they could get better if the adversarial perturbation of the input actually causes the network to provide the correct answer. We can help the process by using targeted attacks, where the target of the adversarial attack in this case is the correct output. But even non-targeted attacks may help by weakening support for the incorrect label relative to the true label. We will refer to this as adversarial correction, as opposed to adversarial attack.

It should be noted that adversarial correction is well-suited to working with quantized networks, as some adversarial attacks do not need to compute the gradients with respect to the weights. However, many attacks do need gradient information and deep domain adaptation techniques generally require gradient-based optimization (gradients with respect to the weights) to adapt models effectively across domains. Thus, in this paper, we focus on post-training quantization methods [11], and we apply the adversarial correction on the samples the quantized network gets wrong, rather than those of the full precision network.

4 Domain Adaptation

At this point in the method we have a modified dataset consisting of either only samples that the original network gets correct, or the same augmented with samples that have been adversarially corrected. Either way, our original trained network has an accuracy of 100%100\% on this modified dataset. But, how does this help us? After all, what we really want to do is increase accuracy (reduce the loss) on the original dataset, not some other dataset. This is the goal of the second stage of the input space training, namely finding a set of network weights that results in a lower loss on the original training set, starting from the modified training set.

Denote the original training set by TT, and consider the altered training set TT^{\prime} as our starting point for the second stage of the IST process. The original training set can be thought of as a distribution shift of the altered training set. How can we deal with this distribution shift, where we go from a distribution where the network does well (perfectly, in fact), to a distribution where the network performs less well? There is substantial literature addressing this very problem: domain adaptation. Domain adaptation methods aim to transfer knowledge about one domain (the source domain) into a second, similar, domain (the target domain) [30]. All domain adaptation methods have the goal of increasing performance on the target domain, starting from a network that does well on the source domain. Shen et. al [25] showed that applying domain adaptation from easy to hard after the early stages of curriculum learning speeds up training. Motivated by these considerations we choose the final step in our IST method to be a domain adaptation from TT^{\prime} to TT.

5 AdCorDA

Refer to caption
Figure 1: Overview of the proposed AdCorDA classifier refinement method. TT is the original training set; TcT_{c} is the subset of TT that the pretrained network labels correctly, and TwT_{w} the subset that is labeled incorrectly; TaT_{a} is the set of samples that have been adversarially corrected; TT^{\prime} is the union of TcT_{c} and TaT_{a}. The network is adapted from TT^{\prime} as the source domain back to TT as the target domain.

Putting together the two stages of the input space training method as detailed above, we arrive at what we call the AdCorDA (Adversarial Correction and Domain Adaptation) method. The AdCorDA method proceeds as depicted in Fig. 1, with the following steps:

  • Step 1: train a network to solve a classification problem using standard training techniques on a training set TT.

  • Step 2: separate the original set of training samples TT into two subsets: TcT_{c} and TwT_{w}, where TcT_{c} are training samples for which the trained network gets the correct answer, and TwT_{w} are training samples for which the network gets wrong.

  • Step 3: for each sample in TwT_{w}, use adversarial attack techniques to create adversarial inputs, where in this case we wish to perturb the input such that the network gives the class provided by the training label (true label). Note that typically not all attacks will successfully coax the network to output the true label. Let the set of successfully perturbed samples be denoted as TaT_{a}. This may be smaller than the set TwT_{w}.

  • Step 4: Merge the subsets TcT_{c} and TaT_{a} into one new training set, TT^{\prime}. The samples for which the adversarial correction failed have been removed, so the accuracy of the network on TT^{\prime} is 100%100\%, and the number of elements in TT^{\prime} may be less than that of the original dataset TT.

  • Step 5: Seeing that TT and TT^{\prime} represent two (overlapping) domains, do domain adaptation of the trained network, adapting from the corrected dataset TT^{\prime} as the source domain, back to the original dataset TT as the target domain.

In the experiments described in the next section, we examine the effectiveness of the AdCorDA method, as well as an ablation case where we omit steps 3 and 4, using only the curriculum subset as TT^{\prime}.

6 Experimental Setup

6.1 Datasets, Networks and Training Details

We validated our approach through experiments on the CIFAR-10 and -100 datasets, each containing 50K images, which are randomly split into 45K training data and 5K validation data. Each dataset has a separate test set of 10K images. We first initialize ResNets [10] of different sizes (i.e., ResNet-18, ResNet-34, ResNet-50) and EfficientNetV2-M [27] with parameters pre-trained on the ImageNet dataset [7] from PyTorch [18] and then fine-tune [29] on the CIFAR training sets to obtain the corresponding baseline models. Input images are resized to 224×224224\times 224 and use the same data transform (i.e., apply normalization using the mean and standard deviation of ImageNet data) as the pre-trained models. During the fine-tuning, we use a stochastic gradient descent (SGD) optimizer [4] with a momentum of 0.9, a weight decay of 1e-4, a batch size of 128 for ResNets on and of 64 (due to limitations in computing resources) for EfficientNetV2-M, a fixed learning rate of 1e-4, and we train for a total of 100 epochs on both CIFAR datasets. We define the fine-tuned models with the best validation accuracy as our baseline models.

We use PyTorch and Nvidia V100L and P100L GPUs for all implementations. We split the training and validation datasets using three random seeds: 1, 2, and 5. We find that using random seeds 3 and 4 does not allow some attacks (e.g., BI and BIH, described in the next subsection) to successfully attack any incorrect images in the majority of instances. Therefore, we present the averaged results of the main experiments based on the three chosen random seeds. To determine the optimal hyper-parameters for our model, we perform a basic parameter grid search for the batch size, base learning rate, and weight decay of the SGD optimizer.

6.2 Adversarial Attack Methods

To apply adversarial attacks on misclassified images of train domains, we use a selection of methods, including three major types of gradient-based attacks: basic iterative method [14] and its variants, iterative least likely class [14], decoupled direction and norm [21], as well as a non-gradient-based salt and pepper noise attack, briefly described below.

  • Untargeted Basic Iterative (BI) [14]: This extends the “fast” method [9], which generates adversarial images through iterative processes using a small step size (α\alpha) and clip pixel values of intermediate results at each step to ensure that they remain within an ϵ\epsilon-neighbourhood of the source image [14]:

    𝑿N+1BI\displaystyle\boldsymbol{X}_{N+1}^{BI} =ClipX,ϵ{𝑿NBI+αsign(XJ(𝑿NBI,ytrue))},\displaystyle=Clip_{X,\epsilon}\{\boldsymbol{X}_{N}^{BI}+\alpha\text{sign}(\nabla_{X}J(\boldsymbol{X}_{N}^{BI},y_{true}))\}, (1)
    𝑿0BI\displaystyle\boldsymbol{X}_{0}^{BI} =𝑿,\displaystyle=\boldsymbol{X}, (2)

    where 𝑿\boldsymbol{X} represents an image, ytruey_{true} denotes the true class for the image 𝑿\boldsymbol{X}, J(𝑿,y)J(\boldsymbol{X},y) is the cross-entropy cost function of the neural network, ClipX,ϵ{𝑿}Clip_{X,\epsilon}\{\boldsymbol{X^{\prime}}\} is the per-pixel clipping function applied to the image 𝑿\boldsymbol{X^{\prime}} to ensure it falls within an LL_{\infty} ϵ\epsilon-neighbourhood of the original image 𝑿\boldsymbol{X}.

  • Basic Iterative method with Highest probability class (BIH): When attacking a correct image, BI uses the true class gradient, where the highest probability class aligns with the true class. However, when targeting an incorrect output, this changes – the highest probability class no longer represents the truth. Therefore, we adapt BI to use the gradient of the highest probability class to weaken the accuracy of the incorrect output, illustrated below:

    𝑿N+1BIH\displaystyle\boldsymbol{X}_{N+1}^{BIH} =ClipX,ϵ{𝑿NBIH+αsign(XJ(𝑿NBIH,yH))}\displaystyle=Clip_{X,\epsilon}\{\boldsymbol{X}_{N}^{BIH}+\alpha\text{sign}(\nabla_{X}J(\boldsymbol{X}_{N}^{BIH},y_{H}))\} (3)
    yH\displaystyle y_{H} =argmax𝑦{p(y|𝑿)}.\displaystyle=\underset{y}{\mathrm{argmax}}\{p(y|\boldsymbol{X})\}. (4)
  • Targeted Variant of Basic Iterative (VBI): In addition to the standard untargeted BI method, we created a targeted variant called VBI. Unlike BI (Eq. (2)), which moves away from the true label, VBI (Eq. (5)) operates in the opposite direction, moving towards the true label by negating the sign of the gradient sign function.

    𝑿N+1VBI=ClipX,ϵ{𝑿NVBIαsign(XJ(𝑿NVBI,ytrue))}\boldsymbol{X}_{N+1}^{VBI}=Clip_{X,\epsilon}\{\boldsymbol{X}_{N}^{VBI}-\alpha\text{sign}(\nabla_{X}J(\boldsymbol{X}_{N}^{VBI},y_{true}))\} (5)

    VBIiter1VBI_{iter1} is a fast version of VBI that just does one step towards the target.

  • Iterative Least-Likely class (LL) [14]: This method generates an attack targeting the least-likely class, as predicted by the trained model on the source image:

    𝑿N+1LL\displaystyle\boldsymbol{X}_{N+1}^{LL} =ClipX,ϵ{𝑿NLLαsign(XJ(𝑿NLL,yLL))}\displaystyle=Clip_{X,\epsilon}\{\boldsymbol{X}_{N}^{LL}-\alpha\text{sign}(\nabla_{X}J(\boldsymbol{X}_{N}^{LL},y_{LL}))\} (6)
    yLL\displaystyle y_{LL} =argmin𝑦{p(y|𝑿)}\displaystyle=\underset{y}{\mathrm{argmin}}\{p(y|\boldsymbol{X})\} (7)

    The LL method moves the input in the direction of the gradient toward the least probable class. While this may lower the probability of the true class, in some cases it will also lower the probability of the maximum probability (incorrect) class by a larger amount, potentially resulting in a correction of the output label.

  • Decoupled Direction and Norm (DDN) [21]: This attack is an iterative approach that refines the noise added to the input image in each iteration to make it adversarial. At iteration ii, the adversarial input image, xix_{i}, is generated as xi=x+ηix_{i}=x+\eta_{i}, where ηi\eta_{i} is the noise with a norm of σi\sigma_{i}. If xix_{i} is adversarial, the norm of the next iteration noise is decreased (σi+1=σi(1ϵ)\sigma_{i+1}=\sigma_{i}(1-\epsilon)). Otherwise, the norm of the next noise is increased (σi+1=σi(1+ϵ)\sigma_{i+1}=\sigma_{i}(1+\epsilon)). This process repeats until the minimum required perturbation is found [21]. The DDN method is a targeted attack that moves the network output towards the true label.

  • Salt and Pepper noise (SP): A non-gradient-based attack that repeatedly adds Salt & Pepper noise to the input to fool the model.

For the DDN and SP attacks, we use the default hyper-parameters provided by the Foolbox framework [19, 20]. Note that the input images are subject to the ImageNet transformation with a lower and upper bound of 0 and 1, respectively. The BI and LL attacks are applied according to the experimental setting outlined in [14]. In Tab. 1, we have applied VBI with one iteration (referred to as VBIiter1{}_{\text{iter1}}) and with five iterations (referred to as VBI). The maximum iteration limit for VBI is set to 5, which is sufficient for effectively correcting the vast majority of erroneous samples on CIFAR datasets.

We desire adversarial attack methods which are fast. The adversarial correction process may be quite computationally expensive if many iterations are needed per element of TwT_{w}. For example, a small network trained on CIFAR-100, which has 50,000 samples in the training set, might have a training accuracy of 95%. In this case, there would be 2,500 incorrect samples needing to be adversarially corrected. This could take a lot of time if the method takes many expensive iterations. Thus, we would prefer adversarial attack methods which are quick, using only a few iterations, possibly at the cost of not being able to correct all samples. Note that we do not necessarily require that the corrections be imperceptible, and so we can use relatively large perturbations of the input.

To investigate the effect of our proposed method on the adversarial robustness of the corrected models, we evaluated the models against AutoAttack [5] on CIFAR-10 and CIFAR-100 test sets. Composed of four different attacks from those used in our experiments for the correction, AutoAttack is a well-known, powerful, and diverse ensemble of parameter-free attacks. We applied the standard version of AutoAttack: APGDCE\text{APGD}_{\text{CE}}, targeted APGDDLR\text{APGD}_{\text{DLR}} [5], targeted FAB [6], and Square Attack [2] with \ell_{\infty}-norm. The attacks were applied sequentially. We set ϵ\epsilon to 5e-4 for all of the AutoAttack experiments. Other AutoAttack parameters such as iterations and number of restarts are identical to the parameters used in the standard version. The batch size used for ResNet-18, ResNet-34, and EfficientNetV2-M experiments is 512, 512, and 100, respectively. We reported the average accuracy obtained across three different random seeds: 1, 2, and 5.

6.3 Domain Adaptation Method

In the domain adaptation stage, we utilize Deep CORAL [26], which aligns the second-order covariance matrices between a source domain and a target domain through CORAL loss. This alignment helps to bridge the distribution gap between the domains and improve the model’s performance on the target domain. Aligning the implementation with the original paper, CORAL loss is only applied to the last classification layer in the neural networks. The total loss is the sum of the classification loss and the CORAL loss, defined as

loss=class+λcoral,\mathcal{L}_{loss}=\mathcal{L}_{class}+\lambda\mathcal{L}_{coral}, (8)

where λ\lambda is a weight between classification and CORAL loss. Its value is 1/750 for CIFAR-10 and 1/25 for CIFAR-100, intending to align the classification loss and the CORAL loss nearly the same at the end of the training process. The CORAL loss term is given by the following equation [26]:

coral=14d2CSCTF2,\mathcal{L}_{coral}=\frac{1}{4d^{2}}||C_{S}-C_{T}||^{2}_{F}, (9)

where CSC_{S} and CTC_{T} are the covariance matrices of features induced by samples from the source domain and target domain, respectively, and the norm is the squared-matrix Frobenius norm. In our application, the source domain is the adversarially corrected training dataset (TT^{\prime}) and the target domain is the original training dataset (TT).

Our experimental setup closely adheres to the guidelines in [26]. However, we deviate by using batch sizes of 16 for ResNets on CIFAR-10 and CIFAR-100 and 16/32 for EfficientNet-M on CIFAR-10/100, differing from the original paper’s settings. Note that we shuffle the dataset TT^{\prime} before conducting domain adaptation training. This ensures a mixture of training samples, including those from the original dataset for which the trained network gets the correct answers and the successfully perturbed incorrect samples. We then proceed with 20 epochs on the CIFAR datasets. We also initialize the model with pre-trained weights from the baseline models rather than using the pre-trained model on ImageNet from PyTorch. These adjustments ensure a fair comparison with baseline models. When applying domain adaptation to quantized models, we facilitate the back-propagation process by approximating the gradients in these models. We achieve this approximation by utilizing the gradients derived from their corresponding full-precision models. This approach enables us to effectively conduct back-propagation on the quantized models. We define the best adapted model as the one that achieves the highest validation accuracy on the target domain, the original dataset TT.

6.4 Network Quantization Method

We also test the effectiveness of the AdCorDA method on network quantization, which reduces the precision of computations and weight storage by using lower bit-widths instead of floating-point precision. In our experiments, we choose post-training static quantization (PTSQ) [11], which is one of the most common and fastest quantization techniques in practice. This technique determines the scales and zero points prior to inference. Specifically, we quantize the full-precision 32-bit (FP32) weights (e.g., w[α,β]w\in[\alpha,\beta]) and activations of the trained baseline models to 8-bit integer (Int8) values (e.g., wq[αq,βq]w_{q}\in[\alpha_{q},\beta_{q}]). The quantization process is defined as

wq=round(1sw+z),w_{q}=\text{round}\left(\frac{1}{s}w+z\right), (10)

where ss is the scale, and dd is the zero-point, defined as

s=βαβqαq,z=round(βαqαβqβα).s=\frac{\beta-\alpha}{\beta_{q}-\alpha_{q}},\quad z=\text{round}\left(\frac{\beta\alpha_{q}-\alpha\beta_{q}}{\beta-\alpha}\right). (11)

To obtain quantized models, we compress the baseline models using post-training static quantization (PTSQ) [11]. We use the built-in quantization modules provided by PyTorch. These modules facilitate the fusion of different model components, calibration of the model using training data to determine suitable scale factors, and the actual quantization of weights and activations in the model. Note that we perform the adversarial correction on the training samples that the quantized network gets wrong, not the ones that the full precision network gets wrong.

7 Results and Discussion

Model Approach Attack CIFAR-10 CIFAR-100
Corr. rate TT^{\prime} Train Valid Test Δ\Delta Acc Corr. rate TT^{\prime} Train Valid Test Δ\Delta Acc
ResNet-18 (11.19M) BL - - - 99.61 ±\pm 0.56 93.73 ±\pm 0.43 93.29 ±\pm 0.37 - - - 99.00 ±\pm 1.19 76.84 ±\pm 0.12 77.04 ±\pm 0.08 -
BL-IST None - 99.96 ±\pm 0.04 99.69 ±\pm 0.36 95.91 ±\pm 0.20 95.57 ±\pm 0.13 +2.28 - 98.86 ±\pm 0.94 98.84 ±\pm 0.98 80.14 ±\pm 0.47 80.27 ±\pm 0.74 +3.23
BL-IST LL 55/176 100.00 99.80 ±\pm 0.28 96.17 ±\pm 0.08 95.93 ±\pm 0.15 +2.64 70/451 100.00 99.20 ±\pm 0.92 80.99 ±\pm 0.21 80.93 ±\pm 0.46 +3.90
BL-IST BIH 99/176 100.00 99.86 ±\pm 0.19 96.16 ±\pm 0.28 95.87 ±\pm 0.24 +2.58 51/451 100.00 99.36 ±\pm 0.72 80.75 ±\pm 0.54 80.99 ±\pm 0.45 +3.96
BL-IST VBIiter1{}_{\text{iter1}} 121/176 100.00 99.59 ±\pm 0.46 96.09 ±\pm 0.22 95.97 ±\pm 0.12 +2.68 226/451 100.00 99.47 ±\pm 0.59 80.81 ±\pm 0.18 80.92 ±\pm 0.56 +3.89
BL-IST VBI 175/176 100.00 99.97 ±\pm 0.04 96.19 ±\pm 0.15 95.77 ±\pm 0.06 +2.48 446/251 100.00 99.80 ±\pm 0.20 80.37 ±\pm 0.98 80.54 ±\pm 0.80 +3.50
BL-IST DDN 176/176 100.00 100.00 96.21 ±\pm 0.28 95.84 ±\pm 0.07 +2.55 451/451 99.98 ±\pm 0.01 99.98 ±\pm 0.01 80.79 ±\pm 0.45 80.82 ±\pm 0.35 +3.79
BL-IST SP 45/176 100.00 99.79 ±\pm 0.29 96.17 ±\pm 0.12 95.80 ±\pm 0.08 +2.51 43/251 100.00 99.16 ±\pm 1.00 80.63 ±\pm 0.54 80.89 ±\pm 0.61 +3.86
ResNet-34 (21.30M) BL - - - 99.43 ±\pm 0.67 94.71 ±\pm 0.05 94.22 ±\pm 0.06 - - - 94.36 ±\pm 2.24 78.12 ±\pm 0.79 78.41 ±\pm 0.10 -
BL-IST None - 99.92 ±\pm 0.03 99.81 ±\pm 0.10 96.78 ±\pm 0.08 96.40 ±\pm 0.05 +2.18 - 95.38 ±\pm 1.56 95.26 ±\pm 1.64 82.99 ±\pm 0.48 82.98 ±\pm 0.07 +4.57
BL-IST LL 25/80 99.98 ±\pm 0.02 99.89 ±\pm 0.07 96.53 ±\pm 0.16 96.31 ±\pm 0.12 +2.09 370/2538 100.00 96.05 ±\pm 1.39 83.13 ±\pm 0.08 82.69 ±\pm 0.12 +4.28
BL-IST BIH 46/80 99.99 ±\pm 0.01 99.94 ±\pm 0.06 96.53 ±\pm 0.23 96.36 ±\pm 0.07 +2.14 655/2538 100.00 97.31 ±\pm 1.19 83.04 ±\pm 1.19 83.31 ±\pm 0.06 +4.90
BL-IST VBIiter1{}_{\text{iter1}} 53/80 99.99 99.97 ±\pm 0.01 96.62 ±\pm 0.07 96.26 ±\pm 0.12 +2.04 1207/2538 99.99 97.40 ±\pm 0.92 83.39 ±\pm 0.44 83.11 ±\pm 0.23 +4.70
BL-IST VBI 80/80 100.00 100.00 96.71 ±\pm 0.22 96.26 ±\pm 0.12 +2.04 2490/2538 100.00 99.21 ±\pm 0.12 83.34 ±\pm 0.36 83.26 ±\pm 0.45 +4.85
BL-IST DDN 80/80 100.00 100.00 96.71 ±\pm 0.22 96.71 ±\pm 0.05 +2.49 2538/2538 99.98 ±\pm 0.01 99.97 ±\pm 0.01 83.55 ±\pm 0.53 83.64 ±\pm 0.06 +5.23
BL-IST SP 23/80 99.98 ±\pm 0.01 99.90 ±\pm 0.09 96.52 ±\pm 0.12 96.22 ±\pm 0.05 +2.00 118/2538 100.00 95.74 ±\pm 1.48 83.33 ±\pm 0.37 83.25 ±\pm 0.29 +4.84
ResNet-50 (23.57M) BL - - - 99.81 ±\pm 0.14 95.36 ±\pm 0.36 94.32 ±\pm 0.59 - - - 98.81 ±\pm 0.73 80.01 ±\pm 0.65 79.74 ±\pm 0.19 -
BL-IST None - 99.92 ±\pm 0.03 99.78 ±\pm 0.04 96.65 ±\pm 0.16 96.61 ±\pm 0.12 +2.29 - 99.84 ±\pm 0.01 98.34 ±\pm 0.38 83.70 ±\pm 0.14 83.89 ±\pm 0.22 +4.15
BL-IST LL 46/131 99.96 ±\pm 0.01 99.84 ±\pm 0.01 96.57 ±\pm 0.19 96.31 ±\pm 0.11 +1.99 60/775 99.99 ±\pm 0.01 98.58 ±\pm 0.36 83.29 ±\pm 0.43 83.11 ±\pm 0.48 +3.37
BL-IST BIH 69/141 99.95 ±\pm 0.03 99.85 ±\pm 0.02 96.41 ±\pm 0.11 96.11 ±\pm 0.16 +1.79 261/775 99.98 ±\pm 0.02 98.69 ±\pm 0.26 82.86 ±\pm 0.50 83.03 ±\pm 0.43 +3.29
BL-IST VBIiter1{}_{\text{iter1}} 79/231 99.99 ±\pm 0.01 99.89 ±\pm 0.02 96.61 ±\pm 0.10 96.18 ±\pm 0.26 +1.86 304/775 99.99 99.02 ±\pm 0.21 83.59 ±\pm 0.46 83.00 ±\pm 0.17 +3.26
BL-IST VBI 130/131 99.99 ±\pm 0.01 99.96 ±\pm 0.01 96.56 ±\pm 0.12 96.50 ±\pm 0.18 +2.18 741/775 99.98 ±\pm 0.02 99.57 ±\pm 0.14 82.96 ±\pm 0.23 82.87 ±\pm 0.07 +3.13
BL-IST DDN 131/131 99.97 ±\pm 0.04 99.97 ±\pm 0.04 96.61 ±\pm 0.27 96.35 ±\pm 0.12 +2.03 775/775 99.98 ±\pm 0.01 99.98 ±\pm 0.01 83.29 ±\pm 0.42 83.03 ±\pm 0.07 +3.29
BL-IST SP 17/131 99.97 ±\pm 0.01 99.82 ±\pm 0.01 96.61 ±\pm 0.22 96.30 ±\pm 0.15 +1.98 45/775 99.99 ±\pm 0.01 98.58 ±\pm 0.35 83.00 ±\pm 0.19 83.25 ±\pm 0.32 +3.51
EfficientNetV2-M (52.99M) BL - - - 99.96 ±\pm 0.06 97.66 ±\pm 0.13 97.15 ±\pm 0.14 - - - 99.88 ±\pm 0.08 86.63 ±\pm 0.73 86.88 ±\pm 0.46 -
BL-IST None - 99.97 ±\pm 0.01 99.95 ±\pm 0.01 98.21 ±\pm 0.08 97.76 ±\pm 0.14 +0.61 - 99.72 ±\pm 0.11 99.62 ±\pm 0.14 87.73 ±\pm 0.59 87.36 ±\pm 0.57 +0.48
BL-IST LL 3/9 100.00 99.98 98.14 ±\pm 0.09 97.82 ±\pm 0.08 +0.67 17/54 99.95 ±\pm 0.05 99.87 ±\pm 0.01 88.05 ±\pm 0.22 87.52 ±\pm 0.45 +0.64
BL-IST BIH 6/9 100.00 99.99 ±\pm 0.01 98.20 ±\pm 0.09 97.82 ±\pm 0.09 +0.68 23/54 99.97 ±\pm 0.05 99.91 ±\pm 0.09 87.96 ±\pm 0.32 88.00 ±\pm 0.10 +1.12
BL-IST VBIiter1{}_{\text{iter1}} 7/9 99.99 ±\pm 0.01 99.99 ±\pm 0.01 98.18 ±\pm 0.11 97.82 ±\pm 0.12 +0.67 29/54 99.94 99.87 ±\pm 0.05 88.00 ±\pm 0.03 87.77 ±\pm 0.06 +0.89
BL-IST VBI 8/9 99.99 ±\pm 0.01 99.99 ±\pm 0.01 98.13 ±\pm 0.12 97.80 ±\pm 0.04 +0.65 46/54 99.95 ±\pm 0.01 99.88 ±\pm 0.04 88.09 ±\pm 0.18 87.76 ±\pm 0.16 +0.88
BL-IST DDN 9/9 100.00 100.00 98.18 ±\pm 0.09 97.86 ±\pm 0.06 +0.71 54/54 99.92 ±\pm 0.04 99.92 ±\pm 0.04 87.98 ±\pm 0.18 87.81 ±\pm 0.10 +0.93
BL-IST SP 4/9 99.99 99.98 98.13 ±\pm 0.05 97.70 ±\pm 0.12 +0.55 18/54 99.95 ±\pm 0.03 99.87 ±\pm 0.07 87.85 ±\pm 0.04 87.89 ±\pm 0.19 +1.01
Table 1: Accuracy (%) of FP32 baseline models (BL), which is fine-tuned on the CIFAR train domains, and accuracy of baselines after applying our approach (denoted as BL-IST) by using different attacks to generate adversarial domains. The data is reported as an average of three seeds.

7.1 Adversarial Correction of FP32 Models

The training, validation, and test accuracies of various networks obtained by applying AdCorDA for different attack methods on CIFAR-10 and CIFAR-100 are shown in Tab. 1. The “None” attack case corresponds to the situation where we do not apply any adversarial correction, effectively relying only on the curriculum modification of the training set. Our approach enhances the model performance by as much as 2.68% and 5.23% on CIFAR-10 and CIFAR-100, respectively, when utilizing ResNets of various sizes. As for the effect of our method when applied to EfficientNet, we observe an improvement of about 0.7-1.1% on CIFAR datasets. More specifically, the ResNet-34 baseline model, operating at full precision, achieved a test accuracy of 78.41% on CIFAR-100. Our adversarial correction method, using a DDN adversarial attack, improves the test accuracy to 83.64%, representing a notable increase of 5.23%. Upon incorporating adversarial correction using the LL adversarial attack on the training set, we observed a decrease in the initial training loss from 0.254 (on the original training set TT) to 0.173 (on the corrected training set TT^{\prime}) on CIFAR-100. This shows that the adversarial correction does indeed reduce the training loss. In Fig. 2(b) and Fig. 2(d) we can see that both targeted (VBI) and untargeted (LL) adversarial attacks can successfully reduce the logit level of the initially maximum probability incorrect label as compared with the logit level of the true label, resulting in correction.

Refer to caption
(a) Incorrect samples, LL
Refer to caption
(b) Corrected samples, LL
Refer to caption
(c) Incorrect samples, VBI
Refer to caption
(d) Corrected samples, VBI
Figure 2: The incorrect class (max) and true class logits change for uncorrected (a,c) and corrected (b,d) samples of CIFAR-100 after applying the corrective LL (a,b) and VBI (c,d) attacks on the ResNet-34. The vertical dashed lines indicate mean values of incorrect class (max logit) and true class logits change.

7.2 Adversarial Correction of Quantized Models

Table 2 shows that our method also improves the baseline performance of quantized networks. For example, the full precision baseline ResNet-34 achieves a test accuracy of 78.41% on CIFAR-100. The Int8 quantized baseline ResNet34 has a test accuracy of 77.13% on CIFAR-100. When applying our method using the BIH adversarial attack on the full precision baseline, we achieve a test accuracy of 82.99%, representing an improvement of +4.58%. After Int8 PTSQ quantization, the modified FP network achieves a test accuracy of 82.18% - an improvement of +5.05% over the original quantized network (and an improvement of +3.77% over the original full precision network!).

Model Approach Attack CIFAR-10 CIFAR-100
Corr. rate TT^{\prime} Train Valid Test Δ\Delta Acc Corr. rate TT^{\prime} Train Valid Test Δ\Delta Acc
ResNet-18 BL - - - 99.61 ±\pm 0.56 93.73 ±\pm 0.43 93.29 ±\pm 0.37 - - - 99.00 ±\pm 1.19 76.84 ±\pm 0.12 77.04 ±\pm 0.08 -
PTSQ - - - 98.08 ±\pm 0.71 93.01 ±\pm 0.56 92.42 ±\pm 0.17 - - - 96.74 ±\pm 3.02 75.45 ±\pm 1.29 76.06 ±\pm 0.94 -
PTSQ-IST (bef. qt) None - 99.96 ±\pm 0.04 99.43 ±\pm 0.02 95.88 ±\pm 0.26 95.59 ±\pm 0.07 - - 99.80 ±\pm 0.12 97.13 ±\pm 0.57 80.34 ±\pm 0.59 80.18 ±\pm 0.39 -
PTSQ-IST (aft. qt) None - 99.93 ±\pm 0.01 99.23 ±\pm 0.05 95.30 ±\pm 0.39 95.18 ±\pm 0.09 +2.76 - 99.52 ±\pm 0.14 96.67 ±\pm 0.56 79.15 ±\pm 0.53 79.15 ±\pm 0.26 +3.09
PTSQ-IST (bef. qt) BIH 158/736 100.00 99.53 96.07 ±\pm 0.06 95.85 ±\pm 0.19 - 300/1966 100.00 97.55 ±\pm 0.50 80.61 ±\pm 0.16 80.82 ±\pm 0.30 -
PTSQ-IST (aft. qt) BIH - 99.99 ±\pm 0.01 99.46 ±\pm 0.04 95.53 ±\pm 0.29 95.48 ±\pm 0.18 +3.07 - 99.98 ±\pm 0.02 97.36 ±\pm 0.55 79.25 ±\pm 0.30 79.53 ±\pm 0.58 +3.47
PTSQ-IST (bef. qt) SP 128/736 100.00 99.54 ±\pm 0.03 96.17 ±\pm 0.13 95.72 ±\pm 0.22 - 189/1966 100.00 97.17 ±\pm 0.62 80.40 ±\pm 0.45 81.07 ±\pm 0.23 -
PTSQ-IST (aft. qt) SP - 100.00 99.48 ±\pm 0.02 95.46 ±\pm 0.20 95.29 ±\pm 0.06 +2.93 - 99.98 ±\pm 0.01 97.31 ±\pm 0.66 79.27 ±\pm 0.73 79.79 ±\pm 0.49 +3.73
ResNet-34 BL - - - 99.43 ±\pm 0.67 94.71 ±\pm 0.05 94.22 ±\pm 0.06 - - - 94.36 ±\pm 2.24 78.12 ±\pm 0.79 78.41 ±\pm 0.10 -
PTSQ - - - 98.16 ±\pm 0.35 93.63 ±\pm 0.10 93.36 ±\pm 0.09 - - - 90.32 ±\pm 2.30 76.20 ±\pm 0.39 77.13 ±\pm 0.45 -
PTSQ-IST (bef. qt) None - 99.97 ±\pm 0.02 99.44 ±\pm 0.15 96.57 ±\pm 0.15 96.28 ±\pm 0.13 - - 99.09 ±\pm 0.10 93.03 ±\pm 0.29 83.08 ±\pm 0.23 82.94 ±\pm 0.29 -
PTSQ-IST (aft. qt) None - 99.92 ±\pm 0.04 99.31 ±\pm 0.17 96.19 ±\pm 0.10 96.08 ±\pm 0.20 +2.72 - 99.44 ±\pm 0.55 92.59 ±\pm 0.33 81.89 ±\pm 0.63 81.94 ±\pm 0.45 +4.81
PTSQ-IST (bef. qt) BIH 250/771 100.00 99.45 ±\pm 0.11 96.61 ±\pm 0.19 96.33 ±\pm 0.15 - 689/4607 99.96 ±\pm 0.04 93.97 ±\pm 0.31 83.20 ±\pm 0.24 82.99 ±\pm 0.15 -
PTSQ-IST (aft. qt) BIH - 99.97 ±\pm 0.02 99.39 ±\pm 0.12 96.29 ±\pm 0.08 96.05 ±\pm 0.07 +2.69 - 99.99 ±\pm 0.01 93.77 ±\pm 0.33 82.19 ±\pm 0.14 82.18 ±\pm 0.20 +5.05
PTSQ-IST (bef. qt) SP 272/771 99.96 ±\pm 0.04 99.51 ±\pm 0.04 96.58 ±\pm 0.14 96.12 ±\pm 0.23 - 479/4607 100.00 93.80 ±\pm 0.52 83.13 ±\pm 0.54 82.90 ±\pm 0.19 -
PTSQ-IST (aft. qt) SP - 99.95 ±\pm 0.05 99.45 ±\pm 0.05 96.25 ±\pm 0.06 95.83 ±\pm 0.19 +2.47 - 100.00 93.55 ±\pm 0.53 81.99 ±\pm 0.24 82.12 ±\pm 0.20 +4.99
Table 2: Accuracy (%) of quantized (Int8) ResNets of various sizes obtained after applying PTSQ on its baseline, and the accuracy of Int8 ResNets using our approach.

It is also worth noting in Tab. 2, that the decrease in accuracy on ResNet-34 after quantization using our method is only 0.81% (i.e., from 82.99% to 82.18%), whereas the drop in performance in the original network after quantization is 1.28% (i.e., from 78.41% to 77.13%). This comparison shows that our adversarial correction method makes the full-precision network less affected by quantization.

The quantized ResNet-34 network after using our adversarial correction technique achieves a higher accuracy (82.18%) than even that of a normally trained full-precision ResNet-152 baseline model (81.52%), while significantly reducing the model size (20.76MB vs 223.49MB).

7.3 Adversarial Perturbation vs. Correction

In the Feng and Tu theory, all that is needed in the first step of the IST is to perturb the input so as to reduce the loss. It is not necessary to actually change the input so as to have the network give the correct answer; all that is required is that the loss be reduced.

In the experiments shown in Tab. 1, we defined TaT_{a} as the set of successfully corrected samples in step 3 of our adversarial correction approach. If we now consider TaT_{a} to include all perturbed samples, whether the outputs are corrected or not, TT^{\prime} will have the same size as the original training dataset. We refer to the network adapted using this variation as BL-IST-A. In our original approach, the accuracy of the original network on TT^{\prime} reaches 100% when we consider only the successfully perturbed samples and the original correctly detected samples. Inspired by [25], we can think of TT^{\prime} as an easy dataset, given its 100% accuracy, while considering TT as a hard dataset. In Table 3 we observe a drop in performance improvement in BL-IST-A as compared to our first approach. This could be attributed to the adversarial perturbations increasing the loss rather than decreasing it, as compared with the baseline, for the uncorrected inputs. We conclude that we should only retain the corrected input samples.

Model Approach CIFAR-10 CIFAR-100
# TT^{\prime} Test Δ\Delta Acc # TT^{\prime} Test Δ\Delta Acc
ResNet-18 BL - 93.32 - - 77.09 -
BL-IST 44,972 95.77 +2.45 44,879 80.48 +3.39
BL-IST-A 45,000 95.51 +2.19 45,000 79.56 +2.47
ResNet-34 BL - 94.24 - - 78.53 -
BL-IST 44,993 96.36 +2.12 42,903 82.76 +4.23
BL-IST-A 45,000 96.18 +1.94 45,000 80.81 +2.28
Table 3: Accuracy (%) of ResNet FP32 baselines after applying our approach using the LL attack to generate adversarial domains for CIFAR datasets. Note that BL-IST-A is a refined approach in which TaT_{a} in Step 3 incorporates all perturbed samples of TwT_{w}.

7.4 Grad-CAM visualization of Adversarial Correction

To help visualize the impact of the adversarial correction technique on misclassified images, we employ Gradient-weighted Class Activation Mapping (Grad-CAM) [23] to provide visual explanations. Grad-CAM utilizes gradient-based localization to identify important regions in an image that contribute to the model’s concept prediction. In our study, Fig. 3(a) is an example initially misclassified as an ‘automobile’ by ResNet-34. However, applying the DDN attack, the image can be correctly identified as a ‘horse’. To better understand the differences between the Grad-CAM of the original (Fig. 3(c)) and its corrected image (Fig. 3(d)), we present a visualization in Fig. 3(b). This visualization clearly illustrates that the incorrect detection was primarily influenced by the surrounding contextual information rather than the object itself. This demonstrates that by modifying the surrounding contextual information of the image using the adversarial attack, correct classification becomes possible.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 3: Evaluation of ResNet-34 on CIFAR-10 dataset. (a) misclassified images, (b) the difference between the Grad-CAM images for the original and adversarially corrected inputs using DDN attack. This illustrates the shift in focus of the network for the two images, (c) the Grad-CAM image for the original incorrect image, (d) the Grad-CAM image for the adversarially corrected image.
Model Approach Attack CIFAR-10 CIFAR-100
Clean AutoAttack Clean AutoAttack
ResNet-18 BL - 93.29±0.3793.29_{\pm 0.37} 15.92±1.6715.92_{\pm 1.67} 77.04±0.0877.04_{\pm 0.08} 7.56±1.147.56_{\pm 1.14}
BL-IST None 95.57±0.1395.57_{\pm 0.13} 47.63±1.7447.63_{\pm 1.74} 80.27±0.7480.27_{\pm 0.74} 20.65±0.8220.65_{\pm 0.82}
BL-IST DDN 95.84±0.07\mathbf{95.84}_{\pm 0.07} 47.97±0.1047.97_{\pm 0.10} 80.82±0.3580.82_{\pm 0.35} 21.66±0.9421.66_{\pm 0.94}
BL-IST SP 95.80±0.0895.80_{\pm 0.08} 50.97±0.72\mathbf{50.97}_{\pm 0.72} 80.89±0.61\mathbf{80.89}_{\pm 0.61} 21.80±1.87\mathbf{21.80}_{\pm 1.87}
ResNet-34 BL - 94.22±0.0694.22_{\pm 0.06} 13.80±1.0513.80_{\pm 1.05} 78.41±0.1078.41_{\pm 0.10} 7.90±0.627.90_{\pm 0.62}
BL-IST None 96.40±0.0596.40_{\pm 0.05} 50.54±2.9050.54_{\pm 2.90} 82.98±0.0782.98_{\pm 0.07} 22.37±1.3922.37_{\pm 1.39}
BL-IST DDN 96.71±0.05\mathbf{96.71}_{\pm 0.05} 51.03±2.89\mathbf{51.03}_{\pm 2.89} 83.64±0.06\mathbf{83.64}_{\pm 0.06} 20.68±2.1920.68_{\pm 2.19}
BL-IST SP 96.22±0.0596.22_{\pm 0.05} 50.13±2.4150.13_{\pm 2.41} 83.25±0.2983.25_{\pm 0.29} 24.47±0.31\mathbf{24.47}_{\pm 0.31}
EfficientNetV2-M BL - 97.15±0.1497.15_{\pm 0.14} 15.07±0.7815.07_{\pm 0.78} 86.88±0.4686.88_{\pm 0.46} 11.16±0.4511.16_{\pm 0.45}
BL-IST None 97.76±0.1497.76_{\pm 0.14} 52.68±3.20\mathbf{52.68}_{\pm 3.20} 87.36±0.5787.36_{\pm 0.57} 23.61±3.0823.61_{\pm 3.08}
BL-IST DDN 97.86±0.06\mathbf{97.86}_{\pm 0.06} 42.42±2.6642.42_{\pm 2.66} 87.81±0.1087.81_{\pm 0.10} 25.72±2.1525.72_{\pm 2.15}
BL-IST SP 97.70±0.1297.70_{\pm 0.12} 39.02±2.3639.02_{\pm 2.36} 87.89±0.19\mathbf{87.89}_{\pm 0.19} 25.84±1.79\mathbf{25.84}_{\pm 1.79}
Table 4: Accuracy (%) of FP32 baselines and adapted models using our approach on the clean and adversarially perturbed CIFAR test sets. AutoAttack is used to generate the adversarial samples.

7.5 Enhanced Robustness to Adversarial Attacks

Our adversarial correction technique has many similarities to adversarial training methods for enhancing robustness to adversarial attacks. Such methods generate adversarial examples, for which networks give the wrong answer, and add these as augmentations of the original dataset. Fine-tuning on the augmented dataset leads to enhanced robustness against adversarial attacks [16]. Our approach is similar in that we create new images resulting from adversarial attacks, and use these in concert with images from the original dataset in further training. There are significant differences, however, between our method and standard adversarial training. First, we do not augment the original dataset, but instead replace some of the samples in the original dataset with the adversarial examples. Second, the adversarial attacks are only applied to samples that the network gets wrong, rather than samples that the network gets right, and we only keep the adversarial examples which are corrective - that the network now gets right. Finally, rather than doing fine-tuning using standard training on the augmented training set, we do domain adaptation from the adversarially corrected training set to the original training set. We tested the robustness of ResNets to the AutoAttack suite of attacks [5]. As seen in Tab. 4, our method provides significant robustness to adversarial attacks. For CIFAR-10 with ResNet-18 we see an improvement from 15.92% on the baseline model to 50.97% on the adversarially corrected model with the SP correction method. On CIFAR-100 with ResNet-18 we see an improvement from 7.56% to 21.8%. Note that using only curriculum domain adaptation (the “None” case) also gives significant robustness. While current state-of-the-art robust network techniques get higher accuracies under attack than ours (e.g., 27.67% on CIFAR-100 by [1] and 55.54% on CIFAR-10 by [22], both with ResNet-18), our focus is on attaining higher clean (before attacks) accuracies, and the enhanced robustness is a welcome byproduct. Jointly optimizing both clean accuracy and adversarial robustness is an interesting avenue for future work.

8 Conclusion

In this work, we present a new method for enhancing the performance of trained image classifier networks. The method has two stages - first the training set samples for which the network gives incorrect answers are modified via corrective adversarial attacks so that the network now gives the correct answers. In the second stage, the network is refined via domain adaptation, using Deep CORAL, from the modified dataset to the original dataset. Experiments show substantial enhancements in performance on CIFAR-10 and CIFAR-100, of over 4%.

One could argue that in doing adversarial correction we are performing a type of dataset augmentation, by creating new samples with known labels. However, we are not training on this augmented dataset in a standard manner. Instead, the removal of the incorrect samples and the addition of the corrected samples provides a more pure representation of the domain that the initial network does well on, thereby enhancing the effectiveness of the subsequent domain adaptation step. Indeed, even just removing the incorrect samples, without adding the adversarial corrections, provides a significant benefit to the domain adaptation step.

Our experiments show that the adversarial correction approach is effective for refining quantized networks. Also, we observe that the adversarial correction enhances robustness to adversarial attack.

References

  • Addepalli et al. [2022] Sravanti Addepalli, Samyak Jain, et al. Efficient and effective augmentation strategy for adversarial training. Advances in Neural Information Processing Systems, 35:1488–1501, 2022.
  • Andriushchenko et al. [2020] Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square attack: A query-efficient black-box adversarial attack via random search. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXIII, pages 484–501. Springer, 2020.
  • Bengio et al. [2009] Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. In International Conference on Machine Learning, pages 41–48, 2009.
  • Bottou [2010] Léon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings in Computational Statistics, pages 177–186. Physica-Verlag HD, 2010.
  • Croce and Hein [2020a] Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In International Conference on Machine Learning, pages 2206–2216. PMLR, 2020a.
  • Croce and Hein [2020b] Francesco Croce and Matthias Hein. Minimally distorted adversarial examples with a fast adaptive boundary attack. In International Conference on Machine Learning, pages 2196–2205. PMLR, 2020b.
  • Deng et al. [2009] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
  • Feng and Tu [2022] Yu Feng and Yuhai Tu. The activity-weight duality in feed forward neural networks: The geometric determinants of generalization. arXiv preprint arXiv:2203.10736, 2022.
  • Goodfellow et al. [2015] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
  • He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
  • Jacob et al. [2018] Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2704–2713, 2018.
  • Krizhevsky and Hinton [2009] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
  • Kumar et al. [2010] M Kumar, Benjamin Packer, and Daphne Koller. Self-paced learning for latent variable models. Advances in neural information processing systems, 23, 2010.
  • Kurakin et al. [2017] Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial examples in the physical world. In International Conference on Learning Representations. OpenReview.net, 2017.
  • Li et al. [2022] Yao Li, Minhao Cheng, Cho-Jui Hsieh, and Thomas C. M. Lee. A review of adversarial attack and defense for classification methods. The American Statistician, 76(4):329–345, 2022.
  • Madry et al. [2017] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
  • Madry et al. [2018] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
  • Paszke et al. [2019] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, et al. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2019.
  • Rauber et al. [2017] Jonas Rauber, Wieland Brendel, and Matthias Bethge. Foolbox: A python toolbox to benchmark the robustness of machine learning models. In International Conference on Machine Learning Workshop, 2017.
  • Rauber et al. [2020] Jonas Rauber, Roland Zimmermann, Matthias Bethge, and Wieland Brendel. Foolbox native: Fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX. Journal of Open Source Software, 5(53):2607, 2020.
  • Rony et al. [2019] Jérôme Rony, Luiz G Hafemann, Luiz S Oliveira, Ismail Ben Ayed, Robert Sabourin, and Eric Granger. Decoupling direction and norm for efficient gradient-based l2 adversarial attacks and defenses. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4322–4330, 2019.
  • Sehwag et al. [2021] Vikash Sehwag, Saeed Mahloujifar, Tinashe Handina, Sihui Dai, Chong Xiang, Mung Chiang, and Prateek Mittal. Robust learning meets generative models: Can proxy distributions improve adversarial robustness? arXiv preprint arXiv:2104.09425, 2021.
  • Selvaraju et al. [2017] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In IEEE International Conference on Computer Vision, pages 618–626, 2017.
  • Shafahi et al. [2019] Ali Shafahi, Mahyar Najibi, Amin Ghiasi, Zheng Xu, John P. Dickerson, Christoph Studer, et al. Adversarial training for free! In Advances in Neural Information Processing Systems, pages 3353–3364, 2019.
  • Shen et al. [2023] Lulan Shen, Ibtihel Amara, Ruofeng Li, Brett Meyer, Warren Gross, and James J. Clark. Fast fine-tuning using curriculum domain adaptation. In Conference on Robots and Vision, 2023.
  • Sun and Saenko [2016] Baochen Sun and Kate Saenko. Deep CORAL: Correlation alignment for deep domain adaptation. In European Conference on Computer Vision Workshop, 2016.
  • Tan and Le [2021] Mingxing Tan and Quoc V. Le. EfficientNetV2: Smaller models and faster training. In International Conference on Machine Learning, pages 10096–10106. PMLR, 2021.
  • Wang et al. [2021] Xin Wang, Yudong Chen, and Wenwu Zhu. A survey on curriculum learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):4555–4576, 2021.
  • Yosinski et al. [2014] Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. How transferable are features in deep neural networks? Advances in Neural Information Processing Systems, 27, 2014.
  • Zhang [2021] Youshan Zhang. A survey of unsupervised domain adaptation for visual recognition. arXiv preprint arXiv:2112.06745, 2021.