This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Towards Visual Distortion in Black-Box Attacks

Nannan Li School of Remote Sensing and Information Engineering, Wuhan University Zhenzhong Chen zzchen@ieee.org School of Remote Sensing and Information Engineering, Wuhan University
Abstract

Constructing adversarial examples in a black-box threat model injures the original images by introducing visual distortion. In this paper, we propose a novel black-box attack approach that can directly minimize the induced distortion by learning the noise distribution of the adversarial example, assuming only loss-oracle access to the black-box network. The quantified visual distortion, which measures the perceptual distance between the adversarial example and the original image, is introduced in our loss whilst the gradient of the corresponding non-differentiable loss function is approximated by sampling noise from the learned noise distribution. We validate the effectiveness of our attack on ImageNet. Our attack results in much lower distortion when compared to the state-of-the-art black-box attacks and achieves 100%100\% success rate on InceptionV3, ResNet50 and VGG16bn. The code is available at https://github.com/Alina-1997/visual-distortion-in-attack.

1 Introduction

Adversarial attack has been a well-recognized threat to existing Deep Neural Network (DNN) based applications. It injects small amount of noise to a sample (e.g., image, speech, language) but degrades the model performance drastically [1, 2, 3]. With the continuous improvements of DNN, such attack could cause serious consequences in practical conditions where DNN is used. According to [4, 5], adversarial attack has been a practical concern in real-world problems, ranging from cell-phone camera attack to attacking self-driving cars.

According to the information that an adversary has of the target network, existing attack roughly falls into two categories: white-box attack that knows all the parameters of the target network, and black-box attack that has limited access to the target network. Each category can be further divided into several subcategories depending on the adversarial strength [6]. The proposed attack in this paper belongs to loss-oracle based black-box attack, where the adversary can obtain the output loss from supplied inputs. In real-world scenario, it’s sometimes difficult or even impossible to have full access to certain networks, which makes the black-box attack practical and attract more and more attention.

Black-box attack has very limited or no information of the target network and thus is more challenging to perform. In the lpl_{p}-bounded setting, a black-box attack is usually evaluated on two aspects: number of queries and success rate. In addition, recent work [7] shows that visual distortion in the adversarial examples is also an important criteria in practice. Even under a small ll_{\infty} bound, perturbing pixels in the image without considering the visual impact could make the distorted image very annoying. As shown in Fig. 1, an attack [8] under a small noise level (l0.05l_{\infty}\leq 0.05) causes relatively large visual distortion and the perturbed image is more distinguishable from the original one. Therefore, under the assumption that the visual distortion caused by the noise is related to the spatial distribution of the perturbed pixels, we take a different view from previous work and focus on explicitly learning a noise distribution based on its corresponding visual distortion.

In this paper, we propose a novel black-box attack that can directly minimize the induced visual distortion by learning the noise distribution of the adversarial example, assuming only loss-oracle access to the black-box network. The quantified visual distortion, which measures the perceptual distance between the adversarial example and the original image, is introduced in our loss where the gradient of the corresponding non-differentiable loss function is approximated by sampling noise from the learned noise distribution. The proposed attack can achieve a trade-off between visual distortion and query efficiency by introducing the weighted perceptual distance metric in addition to the original loss. Theoretically, we prove the convergence of our model under a convex or non-convex loss function. The experiments demonstrate the effectiveness of our attack on ImageNet. Our attack results in much lower distortion than the other attacks and achieves 100%100\% success rate on InceptionV3, ResNet50 and VGG16bn. In addition, it is shown that our attack is valid even when it’s only allowed to perturb pixels that are out of the target object in a given image.

Our contributions are as follows:

  • We are the first to introduce perceptual loss in a non-differentiable way for the generation of less-distorted adversarial examples. And the proposed method can achieve a trade-off between visual distortion and query efficiency by using the weighted perceptual distance metric in addition to the original loss.

  • Theoretically, we prove the convergence of our model.

  • Through extensive experiments, we show that our attack results in much lower distortion than the other attacks.

Refer to caption
Figure 1: Adversarial examples on ImageNet with bounded noise δ0.05{||\delta||}_{\infty}\leq 0.05. The first image is the original unperturbed image. The following examples are from [8] and our method, respectively. Higher Structural SIMilarity (SSIM) and lower Learned Perceptual Image Patch Similarity (LPIPS) indicate less visual distortion.

2 Related Work

Recent research on adversarial attack [9, 10, 11] has made advanced progress in developing strong and computationally efficient adversaries. In the following, we briefly introduce existing attack techniques in both the white-box and black-box settings.

2.1 White-box Attack

In white-box attack, the adversary knows the details of a network, including network structure and its parameter values. Goodfellow et al. [12] proposed a fast gradient sign method to generate adversarial examples. It’s computationally effective and serves as a baseline for attacks with additive noise. In [13], a functional adversarial attack that applied functional noise instead of additive noise to the image, is introduced. Recently, Jordan et al. [7] stressed quantifying perceptual distortion of the adversarial examples by leveraging perceptual metrics to define an adversary. Different from our method which directly optimizes the metric, their model conducts a search over parameters from several composed attacks. There are also attacks that sample noise from a noise distribution [14, 15], on the condition that gradients from the white-box network is accessible. Specifically, [14] utilizes particle approximation to optimize a convex energy function. [15] formulates the attack problem as generating a sequence of adversarial examples in a Hamiltonian Monte Carlo framework.

In summary, white-box attack is hard to detect or defend [16]. In the meantime, however, it suffers from label-leaking and gradient-masking problem [2]. The former causes adversarially trained models to perform better on adversarial examples than original images, and the latter neutralizes the useful gradient for adversaries. The preliminary of acquiring full access to a network in white-box attack is also sometimes difficult to satisfy in real-world scenarios.

2.2 Black-box Attack

Black-box attack considers the target network as a black-box, and has limited access to the network. We discuss loss-oracle based attack here, where the adversary assumes only loss-oracle access to the black-box network.

Query Efficient Attacks.

Attacks of this kind roughly fall into three categories: 1) Methods that estimate gradient of the black-box. Some methods estimate the gradient by sampling around a certain point, which formulates the task as a problem of continuous optimization. Tu et al. [17] searched for perturbations in the latent space of an auto-encoder. [18] utilizes feedback knowledge to alter the searching directions for efficient attack. Ilyas et al. [8] exploited prior information about the gradient. Al-Dujaili and O‘Reilly [9] reduced query complexity by estimating just the sign of the gradient. In [19, 20], the proposed methods perform search in a constructed low-dimensional space. [21] shares similarity with our method as it also explicitly defines a noise distribution. However, the distribution in [21] is assumed to be an isometric normal distribution without considering visual distortion whilst our method does not assume the distribution to be a specific form. We compare with their method in details in the experiments. Other approaches in this category develop a substitute model [3, 22, 23] to approximate performance of the black-box. By exploiting the transferability of adversarial attack [12], the white-box attack technique applied to the substitute model can be transferred to the black-box. These approaches assume only label-oracle to the targeted network, whereas training of the substitute model requires either access to the training dataset of the black-box, or collection of a new dataset. 2) Methods based on discrete optimization. In [24, 9], an image is divided into regular grids and the attack is performed and refined on each grid. Meunier et al. [25] adopted the tiling trick by adding the same noise for small square tiles in the image. 3) Methods that leverage evolutionary strategies or random search [25, 26]. In [26], the noise value is updated by a square-shaped random search at each query. Meunier et al. [25] developed a set of attacks using evolutionary algorithms using both continuous and discrete optimization.

Attacks that Consider Visual Impact.

Query efficient black-box attacks usually do not consider the visual impact of the induced noise, for which the adversarial example could suffer from significant visual distortion. Similar to our work, there are research that address the perceptual distance between the adversarial examples and the original image. [27, 28] introduce Generative Adversarial Network (GAN) based adversaries, where the gradient of the perceptual distance in the generator is computed through backpropagation. [29, 30] also require the adopted perceptual distance metric to be differentiable. Computing the gradients of a complex perceptual metric at each query might be computationally expensive [31], and is not possible for some rank-based metrics [32]. Different from these methods, our approach treats the perceptual distance metric as a black-box, saving the efforts of computing its gradients, and minimizing such distance by sampling from a learned noise distribution. On the other hand, [33, 34] present semantic perturbations for adversarial attacks. The produced noise map is semantically meaningful to human, whilst the image content of the adversarial example is distinct from that of the original image. Different from [33, 34] that focus on semantic distortion, our method addresses visual distortion and aims to generate adversarial examples that are visually indistinguishable from the original image.

3 Method

Input: image xx, maximum norm ϵ\epsilon, proportion qq of the resampled noise
Output: adversarial example x+δx+\delta
1 Initialize noise distribution pθ0=softmax(θ0)p_{\theta_{0}}=\text{softmax}({\theta}_{0}) and noise δ0{{\delta}_{0}}
2 for step\mathrm{step} tt in\mathrm{in} {1,,n}\{1,...,n\} do
3 T=argminT=0,1,t1L(x,x+δT)T^{*}={\text{argmin}_{T=0,1,...t-1}}L(x,x+{\delta_{T}})
4   Compute baseline b=L(x,x+δT)b={L}(x,x+{\delta}_{T^{*}})
5   Update θ\theta using Eq. (14), θtθt1F(θt1){{\theta}_{t}}\leftarrow{{\theta}_{t-1}}-{\nabla}{F}(\theta_{t-1})
6   Sample δt{\delta}_{t}, δtresample(δT,q;δt1)δt1pθt1{\delta_{t}}\leftarrow{\text{resample}}{({\delta_{T^{*}}},q;{\delta_{t-1}})_{{\delta_{t-1}}\sim{p_{{\theta_{t-1}}}}}}
7 if successful_attack(x,x+δt)successful{\text{\_}}attack(x,x+{\delta_{t}}) then
8     return x+δtx+\delta_{t}
9 
10 def successful_attack(x,x+δt)successful{\text{\_}}attack(x,x+{\delta_{t}}):
11 if argmaxk1f(x+δt)k1argmaxk2f(x)k2{\text{argmax}_{k_{1}}}{f(x+{\delta_{t}})_{k_{1}}}{\neq}{\text{argmax}_{k_{2}}}{f(x)_{k_{2}}} then
12     return True
13 else
14     return False
15 
Algorithm 1 Our Algorithm

3.1 Learning Noise Distribution Based on Visual Distortion

An attack model is an adversary that constructs adversarial examples against certain networks. Let f:xf(x)f:x\to f(x) be the target network that accepts an input xn{x}\in{\mathbb{R}^{n}} and produces an output f(x)m{f(x)}\in{\mathbb{R}^{m}}. f(x)f(x) is a vector and f(x)k{f(x)}_{k} represents its kthk_{\text{th}} entry, denoting the score of the kthk_{\text{th}} class. y=argmaxkf(x)ky={\text{argmax}_{k}}{{f(x)}_{k}} is the predicted class. Given a valid input x{x} and the corresponding predicted class yy, an adversarial example [35] xx^{\prime} is similar to xx yet results in an incorrect prediction argmaxkf(x)ky{\text{argmax}_{k}}{{f(x^{\prime})}_{k}}{\neq}y. In an additive attack, an adversarial example xx^{\prime} is a perturbed input with additive noise δ\delta such that x=x+δx^{\prime}=x+{\delta}. The problem of generating an adversarial example is equivalent to produce noise map δ\delta that causes wrong prediction for the perturbed input. Thus a successful attack is to find δ{\delta} such that argmaxkf(x+δ)ky{\text{argmax}_{k}}{f(x+{\delta})_{k}}{\neq}y. Since this constraint is highly non-linear, the loss function is usually rephrased in a different form [5]:

L(x,x+δ)=max(0,f(x+δ)ymaxkyf(x+δ)k)L(x,x+\delta)={\text{max}}(0,f{(x+\delta)_{y}}-{\text{ma}}{{\text{x}}_{k\neq y}}f{(x+\delta)_{k}}) (1)

The attack is successful when L=0L=0. It’s noted that such a loss does not take the visual impact into consideration, for which the adversarial example could suffer from significant visual distortion. In order to constrain the visual distortion caused by the difference between xx and x+δx+\delta, we adopt a perceptual distance metric d(x,x+δ)d(x,x+\delta) into the loss function with a predefined hyperparameter λ\lambda:

L(x,x+δ)=\displaystyle L(x,x+\delta)= max(0,f(x+δ)ymaxkyf(x+δ)k)\displaystyle{\text{max}}\big{(}0,f{(x+\delta)_{y}}-{\text{ma}}{{\text{x}}_{k\neq y}}f{(x+\delta)_{k}}\big{)} (2)
+λd(x,x+δ)\displaystyle+{\lambda}d(x,x+\delta)
Refer to caption
Figure 2: Framework of the proposed attack.

where smaller d(x,x+δ)d(x,x+\delta) indicates less visual distortion. dd can be any form of metric that measures the perceptual distance between xx and x+δx+\delta, such as well-established 1SSIM1-\text{SSIM} [36] or LPIPS [37]. λ\lambda manages the trade-off between a successful attack and the visual distortion caused by the attack. The effects of λ\lambda will be further discussed in Section 4.1.

Minimizing the above loss function faces a challenge that LL is not differentiable since the black-box adversary does not have access to the gradients of LL and the metric d(x,x+δ)d(x,x+\delta) might be calculated in a non-differentiable way. To address this problem, we explicitly assume a flexible noise distribution of δ\delta in the discrete space, in the sense that the noise values and their probabilities are discrete. And the gradient of LL can be estimated by sampling from this distribution. Suppose that δ\delta follows a distribution pθp_{\theta} parameterized by θ\theta, i.e., δpθ\delta\sim{p_{\theta}}. For the jthj_{\text{th}} pixel in an image, we make its noise distribution pθj=softmax(θj)p_{\theta^{j}}=\text{softmax}(\theta^{j}), where θj\theta^{j} is a vector and each element in it denotes a probability value. By sampling noise from the distribution pθp_{\theta}, θ\theta can be learned to minimize the expectation of the above loss such that the attack is successful (i.e., alters the predicted label) and the produced adversarial example is less distorted (i.e., small dd):

minimize 𝔼δpθ[L(x,x+δ)]{\text{minimize}}{\text{ }}\mathbb{E}_{\delta\sim{p_{\theta}}}[L(x,x+\delta)] (3)

For the jthj_{\text{th}} pixel, we define its noise’s sample space to be a set of discrete values ranging from ϵ-\epsilon to ϵ\epsilon: δj{ϵ,ϵϵN,ϵ2ϵN,,0,ϵ}{\delta^{j}}\in\{\epsilon,\epsilon-\frac{\epsilon}{N},\epsilon-2\frac{\epsilon}{N},...,0,...-\epsilon\}, where NN is the sampling frequency and ϵN\frac{\epsilon}{N} is the sampling interval. The noise value δj{\delta}^{j} of the jthj_{\text{th}} pixel is sampled from this sample space by following pθjp_{\theta^{j}}, pθj2N+1p_{\theta^{j}}\in{\mathbb{R}}^{2N+1}.

Given WW and HH the width and height of an image, respectively, since each pixel has its own noise distribution pθjp_{\theta^{j}} of length 2N+12N+1, the number of parameters for the entire image is (2N+1)WH(2N+1)WH. Note that we do not consider the difference of color channels in order to reduce the size of the sample space. Otherwise the number of parameters would be tripled. Thus, the same noise value is sampled for each RGB channel of a pixel. To estimate θ\theta, we adopt policy gradient [38] to make the above expectation differentiable with respect to θ\theta. Using REINFORCE, we have the differentiable loss function F(θ)F(\theta):

F(θ)\displaystyle F(\theta) =𝔼δpθ[L(x,x+δ)b]\displaystyle={\mathbb{E}_{\delta\sim{p_{\theta}}}}[L(x,x+\delta)-b] (4)
=(L(x,x+δ)b)log(pθ(δ))\displaystyle=(L(x,x+\delta)-b)\log({p_{\theta}}(\delta))
F(θ)\displaystyle{\nabla}{F}(\theta) =θ𝔼δpθ[L(x,x+δ)b]\displaystyle={\nabla_{\theta}}{\mathbb{E}_{\delta\sim{p_{\theta}}}}[L(x,x+\delta)-b] (5)
=(L(x,x+δ)b)(1pθ(δ))\displaystyle=(L(x,x+\delta)-b)(1-{p_{\theta}}(\delta))

where bb is introduced as a baseline in the expectation with specific meaning: 1) when L(x,x+δ)<bL(x,x+\delta)<b, the sampled noise map δ\delta returns low LL, and its probability pθ(δ){p_{\theta}}(\delta) increases through gradient descent; 2) when L(x,x+δ)=bL(x,x+\delta)=b, F(θ)=0{\nabla}{F}(\theta)=0 and pθ(δ){p_{\theta}}(\delta) remains unchanged; 3) when L(x,x+δ)>bL(x,x+\delta)>b, the sampled noise map δ\delta returns high LL, and its probability pθ(δ){p_{\theta}}(\delta) decreases through gradient descent. To sum up, L(x,x+δ)L(x,x+\delta) is forced to improve over bb. At the iteration tt, we choose b=minT=0,1,t1L(x,x+δT)b={{\text{min}}}_{T=0,1,...t-1}{L}(x,x+{\delta}_{T}) such that LL improves over the obtained minimal loss.

The above expectation is estimated using a single Monte Carlo sampling at each iteration, and the sampling of noise map δ\delta is critical. Simply sampling δt{\delta}_{t} at the iteration tt on the entire image might cause large variance on the norm of the noise, i.e., δtδt12||{\delta}_{t}-{\delta}_{t-1}||_{2}. Therefore, to ensure a small variance, with T=argminT=0,1,t1L(x,x+δT)T^{*}={{\text{argmin}}}_{T=0,1,...t-1}{L}(x,x+{\delta}_{T}), only a number of qWHqWH pixels’ noise values are resampled in δT\delta_{T^{*}}, while (1q)WH(1-q)WH pixels’ noise values remain unchanged:

δt+1resample(δT,q;δt)δtpθt{\delta_{t+1}}\leftarrow{\text{resample}}{({\delta_{T^{*}}},q;{\delta_{t}})_{{\delta_{t}}\sim{p_{{\theta_{t}}}}}} (6)

The above equation means replacing qWHqWH pixels’ noise values in noise map δT\delta_{T^{*}} with those in δt{\delta_{t}}, which are sampled from distribution pθtp_{\theta_{t}}. In other words, if q=0.01q=0.01, then only a random 1%1\% of δT\delta_{T^{*}} is updated at each iteration. As shown in Fig. 2, after sampling δt{\delta}_{t}, the feedback L(x,x+δt)L(x,x+{\delta}_{t}) from the black-box and the perceptual distance metric decide the update of the distribution pθtp_{\theta_{t}}. The iteration stops when the attack is successful, i.e., max(0,f(x+δt)ymaxkyf(x+δt)k)=0{\text{max}}(0,f{(x+\delta_{t})_{y}}-{\text{ma}}{{\text{x}}_{k\neq y}}f{(x+\delta_{t})_{k}})=0.

3.2 Proof of Convergence

Ruan et al. [39] shows that feed-forward DNNs (Deep Neural Networks) are Lipschitz continuous with a Lipschitz constant KK, for which we have

t,f(x+δt)f(x+ δT)2KδtδT2\forall{t},||f(x+{\delta_{t}})-f(x+{\text{ }}{\delta_{{T^{*}}}})|{|}_{2}\leq K||{\delta_{t}}-{\delta_{{T^{*}}}}|{|_{2}} (7)

At the iteration tt, since only a small part of the noise map is updated, it can be assumed that

|maxkyf(x+δt)kmaxkyf(x+δT)k|C|{\text{ma}}{{\text{x}}_{k\neq y}}f{(x+{\delta_{t}})_{k}}-{\text{ma}}{{\text{x}}_{k\neq y}}f{(x+{\delta_{{T^{*}}}})_{k}}|\leq C (8)

where CC is a constant. Suppose that the perceptual distance metric dd is normalized to [0,1][0,1]. Substituting the inequalities (7) and (8) in our definition of LL in Eq. (2) gets the following inequalities:

|L(x,x+δt)L(x,x+δT)|\displaystyle|L(x,x+{\delta_{t}})-L(x,x+{\delta_{{T^{*}}}})| (9)
KδtδT2+C+λ\displaystyle\leq K||{\delta_{t}}-{\delta_{{T^{*}}}}|{|_{2}}+C+\lambda
2KWHϵcq+C+λ\displaystyle\leq 2KWH\epsilon cq+C+\lambda

Ideally, L(x,x+δt)L(x,x+δT)L(x,x+\delta_{t})-L(x,x+\delta_{T^{*}}) accurately quantifies the difference of the perturbed image even when only one noise value for just a single pixel at the iteration tt is different from that at TT^{*}. Let δij\delta^{ij} represent a special noise map, whose jthj_{\text{th}} pixel’s noise value is the ithi_{\text{th}} element in its sample space and the other pixels’ noise values are 0. Note that the length of the sample space for each pixel is 2N+12N+1. Similarly, pθt(δij){p_{\theta_{t}}}(\delta^{ij}) denotes the probability of the ithi_{\text{th}} element in the sample space of the jthj_{\text{th}} pixel. By sampling every element in the sample space of the jthj_{\text{th}} pixel, we define ltjl_{t}^{j} and pθtjp_{\theta_{t}^{j}} to be a vector:

j{1,2,,WH},ltj=vector[L(x,x+δij)L(x,x+δT)],\displaystyle\forall{j\in\{1,2,...,WH\}},l_{t}^{j}=vector[L(x,x+\delta^{ij})-L(x,x+{\delta_{{T^{*}}}})],
i=1,2,,2N+1\displaystyle i=1,2,...,2N+1 (10)
j{1,2,,WH},pθtj=vector[pθt(δij)],\displaystyle\forall{j\in\{1,2,...,WH\}},{p_{\theta_{t}^{j}}}=vector[{p_{\theta_{t}}}({\delta}^{ij})],
i=1,2,,2N+1\displaystyle i=1,2,...,2N+1 (11)

Although the above equations are only meaningful under the ideal situation where LL can quantify the difference of just one perturbed pixel, we use these equations for a theoretical proof of convergence. In the ideal situation, the gradient of the jthj_{\text{th}} pixel’s parameters can be calculated exactly as

F(θtj)=ltj(𝟏pθtj)\nabla F(\theta_{t}^{j})={l_{t}^{j}}\cdot({\mathbf{1}}-p_{\theta_{t}^{j}}) (12)

According to Eq. (9) when the number of the resampled pixels qWHqWH=1, we have

|L(x,x+δij)L(x,x+δT)|2Kϵc+C+λ|L(x,x+\delta^{ij})-L(x,x+{\delta_{{T^{*}}}})|\leq 2K{\epsilon}c+C+\lambda (13)

Note that for t1,t2\forall{t_{1}},{t_{2}} that share the same TT^{*}, lt1jl_{t_{1}}^{j} is equal to lt2jl_{t_{2}}^{j}. Thus, using Eq. (13), we have

F(θt1j)F(θt2j)2\displaystyle||\nabla F({\theta_{{t_{1}}}^{j}})-\nabla F({\theta_{{t_{2}}}^{j}})|{|_{2}} (14)
(2N+1)(2Kϵc+C+λ)softmax(θt1j)softmax(θt2j)2\displaystyle{\leq}(2N+1)(2K{\epsilon}c+C+\lambda)||{\text{softmax}}({\theta_{{t_{1}}}^{j}})-{\text{softmax}}({\theta_{{t_{2}}}^{j}})|{|_{2}}

In practice, we adopt a single Monte Carlo sampling instead of sampling every noise values for every pixel, for which 2N+12N+1 should be replaced by 11 in the above inequality. The inequality (14) thus becomes:

F(θt1j)F(θt2j)2\displaystyle{||}\nabla F({\theta_{{t_{1}}}^{j}})-\nabla F({\theta_{{t_{2}}}^{j}})|{|_{2}} (15)
(2Kϵc+C+λ)softmax(θt1j)softmax(θt2j)2\displaystyle\leq(2K{\epsilon}c+C+\lambda)||{\text{softmax}}({\theta_{{t_{1}}}^{j}})-{\text{softmax}}({\theta_{{t_{2}}}^{j}})|{|_{2}}
(2Kϵc+C+λ)θt1jθt2j2\displaystyle\leq(2K{\epsilon}c+C+\lambda)||{\theta_{{t_{1}}}^{j}}-{\theta_{{t_{2}}}^{j}}|{|_{2}}

The standard softmax function disappears because it is Lipschitz continuous with the Lipschitz constant being 11 [40]. Finally, we have the inequality for F(θt1)F(θt2)2{||}\nabla F({\theta_{{t_{1}}}})-\nabla F({\theta_{{t_{2}}}})|{|_{2}}:

F(θt1)F(θt2)2(2Kϵc+C+λ)θt1θt22{||}\nabla F({\theta_{{t_{1}}}})-\nabla F({\theta_{{t_{2}}}})|{|_{2}}\leq(2K{\epsilon}c+C+\lambda)||{\theta_{{t_{1}}}}-{\theta_{{t_{2}}}}|{|_{2}} (16)

The above inequality proves that F(θ)F(\theta) is LL-smooth with the Lipschitz constant being 2Kϵc+C+λ2K\epsilon c+C+\lambda. If F(θ)F(\theta) is convex, the exact number of steps that Stochastic Gradient Descent (SGD) takes to convergence is (2Kϵc+C+λ)θ0θ22ξ\frac{{(2K\epsilon c+C+\lambda)\cdot||{\theta_{0}}-{\theta^{*}}||_{2}^{2}}}{\xi} , where ξ\xi is an arbitrary small tolerable error (ξ>0\xi>0). However, since deep network LL is usually highly non-convex, we need to consider the situation whereF(θ)F(\theta) is non-convex.

Let the SGD update be

θt+1=θt+ηtg(θt)\theta_{t+1}={\theta_{t}}+{\eta_{t}}{g({\theta_{t}})} (17)

where ηt{\eta_{t}} is the learning rate and g(θt){g({\theta_{t}})} is the stochastic gradient. We assume that the variance of the stochastic gradient is upper bounded by σ2\sigma^{2}:

𝔼[F(θ)g(θ)22]σ2<\mathbb{E}[||{\nabla}{F(\theta)}-g(\theta)|{|_{2}^{2}}]\leq\sigma^{2}<\infty (18)
Table 1: Ablation results of the perceptual distance metric, λ\lambda and sampling frequency NN. Smaller 1SSIM1-\text{SSIM}, LPIPS and CIEDE2000 indicate less visual distortion.
Sampling Perceptual λ\lambda Success 1SSIM1-\text{SSIM} LPIPS CIEDE2000 Avg.
Frequency Metric Rate Queries
N=1N=1 - 0 100% 0.091 0.099 0.941 356
1SSIM1-\text{SSIM} 10 100% 0.076 0.081 0.741 401
100 97.4% 0.036 0.051 0.703 1395
200 92.2% 0.025 0.040 0.622 2534
dynamic 100% 0.009 0.009 0.204 7678
LPIPS 10 100% 0.080 0.078 0.762 450
100 98.1% 0.049 0.052 0.711 1174
200 95.1% 0.038 0.045 0.635 1928
dynamic 100% 0.015 0.005 0.277 6694
None 1SSIM1-\text{SSIM} 10 100% 0.118 0.142 5.936 426
N=2N=2 1SSIM1-\text{SSIM} 10 99.7% 0.071 0.074 0.846 520
N=5N=5 1SSIM1-\text{SSIM} 10 99.5% 0.069 0.070 0.877 665
N=10N=10 1SSIM1-\text{SSIM} 10 98.7% 0.062 0.075 0.879 669
N=12N=12 1SSIM1-\text{SSIM} 10 98.7% 0.071 0.075 0.882 673

And we select ηt{\eta_{t}} that satisfies

t=1ηt= and t=1ηt2<{\sum\limits_{t=1}^{\infty}{{\eta_{t}}}=\infty}\text{ and }{\sum\limits_{t=1}^{\infty}{{\eta_{t}}^{2}}<\infty} (19)

Condition (19) can be easily satisfied with a decaying learning rate, e.g., ηt=1tln(t+1){\eta_{t}}=\frac{1}{{\sqrt{t\ln(t+1)}}}. According to Lemma 1 and Theorem 2 in [41], using the LL-smooth property of F(θ)F(\theta) , F(θt){\nabla}{F}(\theta_{t}) goes to 0 with probability 11. This means that with probability 11 for any ξ>0\xi>0 there exists NξN_{\xi} such that F(θt)ξ{\nabla}{F}(\theta_{t}){\leq}{\xi} for tNξt{\geq}{N_{\xi}}. Unfortunately, unlike in the convex case, we do not know the exact number of steps that SGD takes to convergence.

The above proof simply aims to theoretically show that the proposed method converges in finite steps, although possibly in a rather slow speed. From the “Avg. Queries” in the following experiments, we can see that the actual computational cost is affordable and comparable to some of the query-efficient attacks.

Refer to caption
Figure 3: Adversarial examples under different sampling frequency. From left to right is the original image, the adversarial examples from N=1,2,5,10,12N=1,2,5,10,12, respectively.

4 Experiments

Following previous work [25, 8], we validate the effectiveness of our model on the large-scale ImageNet [42] dataset. We use three pretrained classification networks on Pytorch as the black-box networks: InceptionV3 [43], ResNet50 [44] and VGG16bn [45]. The attack is performed on images that were correctly classified by the pretrained network. We randomly select 10001000 images in the validation set for test, and all images are normalized to [0,1][0,1]. We quantify our success in terms of the perceptual distance (1SSIM1-\text{SSIM}, LPIPS and CIEDE2000) as we address the visual distortion caused by the attack. In these metrics, 1SSIM1-\text{SSIM} [36] measures the degradation of structural information in the adversarial examples. LPIPS [37] evaluates the perceptual similarity of two images with their normalized distance between their deep features. CIEDE2000 [46] measures perceptual color distance, which is developed by the CIE (International Commission on Illumination). Smaller value of these metrics denotes less visual distortion. Except for 1SSIM1-\text{SSIM}, LPIPS and CIEDE2000, the success rate and average number of queries are also reported as in most previous work. The average number of queries refers to the average number of requests to the output of the black-box network.

We initialize the noise distribution pθp_{\theta} to be a uniform distribution and noise δ0\delta_{0} to be 0. The learning rate is 0.010.01 and qq is set to be 0.010.01. In addition, we specify the shape of the resampled noise at each iteration to be a square [25, 24, 26], and adopt the tiling trick [8, 25] with tile size=2=2. The upper bound ϵ\epsilon of our attack is set to be 0.050.05 as in previous work.

4.1 Ablation Studies

Refer to caption
Figure 4: Visualized examples of the proposed attack. From left to right is the original image, the adversarial examples on λ=0,λ=10,λ=100,λ=200\lambda=0,\lambda=10,\lambda=100,\lambda=200, dynamic λ\lambda, respectively.

In the ablation studies, the maximum number of queries is set to be 10,00010,000. The results are averaged on 10001000 test images. In the following, we discuss the trade-off between visual distortion and query efficiency, the effects of using different perceptual distance metrics in the loss function, the results on different sampling frequencies and the influence of predefining a specific form of noise distribution.

Trade-off between visual distortion and query efficiency.

Under the same ll_{\infty} ball, a query-efficient way to produce an adversarial example is to perturb most pixels with the maximum noise values ±ϵ\pm\epsilon [24, 26]. However, such attack introduces large visual distortion, which could make the distorted image very annoying. To constrain the visual distortion, the perturbed pixels should be those who cause smaller visual difference while performing a valid attack, which takes extra queries to find. This brings the trade-off between visual distortion and query efficiency, which can be controlled by λ\lambda in our loss function. As shown in Table 1, when N=1N=1 and λ=0\lambda=0, the adversary does not consider visual distortion at all, and perturbs each pixel that is helpful for misclassification until the attack is successful. Thus, it causes the largest perceptual distance (0.0910.091, 0.0990.099 and 0.9410.941) with the least number of queries (356356). As λ\lambda increases to 200200, all the perceptual metrics decrease at the cost of more queries and lower success rate. The maximum λ\lambda in Table 1 is 200200 since further increasing it causes the success rate to be lower than 90%90\%. In addition, as in [17], we perform a dynamic line search on the choice of λ\lambda to see the best perceptual scores the adversary can achieve, where λ[0,1000]{\lambda}\in[0,1000]. Comparing with fixed λ\lambda values, using dynamic values of λ\lambda greatly boosts the performance on perceptual metrics with 100%100\% attack success rate, at the cost of dozens of times the number of queries. Fig. 4 gives several visualized examples on different λ\lambda, where adversarial examples with larger λ\lambda suffer from less visual distortion.

Ablation studies on the perceptual distance metric.

The perceptual distance metric dd in the loss function is predefined to measure the visual distortion between the adversarial example and the original image. We adopt 1SSIM1-\text{SSIM} and LPIPS as the perceptual distance metric to optimize, respectively, and report their results in Table 1. When λ=10\lambda=10, optimizing 1SSIM1-\text{SSIM} shows better score on 1SSIM1-\text{SSIM} (0.0760.076 v.s. 0.0800.080) and CIEDE2000 (0.7210.721 v.s. 0.7420.742) whilst optimizing LPIPS has better performance on LPIPS (0.0780.078 v.s. 0.0810.081). However, when λ\lambda increases to 100100 and 200200, optimizing 1SSIM1-\text{SSIM} gives better scores on both 1SSIM1-\text{SSIM} and LPIPS. Therefore, we set the perceptual distance metric to be 1SSIM1-\text{SSIM} in the following experiments.

Refer to caption
Figure 5: Visualized adversarial examples in out-of-object attack. The red bounding box locates the target object in the original image. In out-of-object attack, the adversary is only allowed to perturb pixels that are out of the object bounding box. In image attack, the adversary can perturb any pixel in the image.
Table 2: Results of the out-of-object attack on ImageNet when λ=10,N=1\lambda=10,N=1 and the perceptual distance metric being 1SSIM1-\text{SSIM}. I, R and V represent InceptionV3, ResNet50 and VGG16bn, respectively.
Attacked Success 1SSIM1-\text{SSIM} LPIPS CIEDE2000 Avg.
Range Rate Queries
I R V I R V I R V I R V I R V
Image 100% 100% 100% 0.078 0.076 0.072 0.096 0.081 0.079 0.692 0.741 0.699 845 401 251
Out-of-object 90.1% 93.8% 94.7% 0.071 0.069 0.074 0.081 0.065 0.070 0.678 0.805 0.687 4275 3775 3104

Sampling frequency.

Sampling frequency decides the size of the sample space of δ\delta. Setting higher frequency means there are more noise values to explore through sampling. In Table 1, increasing the sampling frequency from N=1N=1 to N=2N=2 reduces the perceptual distance to some extent at the cost of lower success rate. On the other hand, further increasing NN to 1212 does not essentially reduce the distortion yet lowers the success rate. We set the sampling frequency N=1N=1 in the following experiments. Note that the maximum sampling frequency is N=12N=12 because the sampling interval in RGB color space (i.e., 2550.05/N255*0.05/N) would be less than 11 if N>12N>12. See Fig. 3 for a few adversarial examples from different sampling frequencies.

Noise Distribution.

In the proposed algorithm, we adopt a flexible noise distribution instead of predefining it to be a specific form. Therefore, we conducted ablation studies on assuming the distribution to be a regular form as in NAttack [21]. Specifically, we let the noise distribution be an isometric normal distribution while λ=10\lambda=10 in the loss function, and perform attacks by estimating mean and variance as Eq. (10) in [21]. As reported in the tenth row of Table 1, under the same experimental setting, it is clear that fixing the noise distribution to be a specific isometric normal distribution degrades the overall performance. We think it is because the distribution that minimizes the perceptual distance is unknown, which might not follow a Guassian distribution or other regular form of distribution. To approximate an unknown distribution, it is better to allow the noise distribution to be a free form as in the proposed approach, and let it be learned by minimizing the perceptual distance.

4.2 Out-of-Object Attack

Most existing classification networks [44, 47] are based on Convolutional Neural Network (CNN), which gradually aggregates contextual information in deeper layers. Therefore, it is possible to fool the classifier by just attacking the “context”, i.e., background that is out of the target object. Attacking just the out-of-object pixels constrains the number and the position of pixels that can be perturbed, which might further reduce the visual distortion caused by the noise. To locate the object in a given image, we exploited the object bounding box provided by ImageNet. An out-of-object mask is then created according to the bounding box such that the model is only allowed to attack pixels that are out of the object, as shown in Fig. 5. In Table 2, we report results of InceptionV3, ResNet50 and VGG16bn with the maximum queries=40,000=40,000. The attack is performed on images whose masks are at least 10%10\% large of the image area. The results show that attacking just the out-of-object pixels can also cause misclassification of the object with over 90%90\% success rate. Compared with image attack, the out-of-object attack is more difficult for the adversary in that it requires more number of queries (4275/3775/31044275/3775/3104) yet has lower success rate (90.1%/93.8%/94.7%90.1\%/93.8\%/94.7\%). On the other hand, the out-of-object attack indeed reduces visual distortion of the adversarial examples on the three networks.

Table 3: Comparison of the undefended (v33) and defended (v3adv-ens43_{\text{adv-ens}4}) InceptionV3. The defended InceptionV3 adopts ensemble adversarial training.
Network Clean Accuracy After Attack 1SSIM1-\text{SSIM} LPIPS CIEDE2000 Avg. Queries
v33 75.8% 0.8% 0.096 0.149 0.862 531
v3adv-ens43_{\text{adv-ens}4} 73.4% 1.8% 0.103 0.154 0.979 777

4.3 Attack Effectiveness on Defended Network

Refer to caption
Figure 6: Adversarial examples from different attacks with perceptual distance scores.
Table 4: Results of different attacks on ImageNet. I, R and V represent InceptionV3, ResNet50 and VGG16bn, respectively.
Attack Success 1SSIM1-\text{SSIM} LPIPS CIEDE2000 Avg.
Rate Queries
I R V I R V I R V I R V I R V
SignHunter [9] 98.4% - - 0.157 - - 0.117 - - 3.837 - - 450 - -
NAttack [21] 99.5% - - 0.133 - - 0.212 - - 5.478 - - 524 -
AutoZOOM [17] 100% - - 0.038 - - 0.059 - - 3.33 - - 1010 - -
Bandits [8] 96.5% 98.8% 98.2% 0.343 0.307 0.282 0.201 0.157 0.140 8.383 8.552 8.194 935 705 388
Square Attack [26] 99.7% 100% 100% 0.280 0.279 0.299 0.265 0.243 0.247 9.329 9.425 9.429 237 62 30
TREMBA [20] 99.0% 100% 99.8% 0.161 0.161 0.160 0.188 0.189 0.187 4.413 4.400 4.421 - - -
SignHunter-SSIM 97.6% - - 0.220 - - 0.157 - - 3.832 - - 642 - -
NAttack-SSIM 97.3% - - 0.128 - - 0.210 - - 5.021 - - 666 - -
AutoZOOM-SSIM 100% - - 0.028 - - 0.048 - - 2.98 - - 2245 - -
Bandits-SSIM 80.0% 89.3% 89.7% 0.333 0.303 0.275 0.200 0.163 0.135 8.838 8.666 8.194 1318 1020 793
Square Attack-SSIM 99.2% 100% 100% 0.260 0.268 0.292 0.256 0.238 0.245 9.301 9.462 9.451 278 65 30
TREMBA-SSIM 98.5% 100% 99.8% 0.160 0.160 0.159 0.185 0.186 0.183 4.410 4.396 4.421 - - -
Ours 98.7% 100% 100% 0.075 0.076 0.072 0.094 0.081 0.079 0.692 0.741 0.699 731 401 251
Ours(λdynamic\lambda_{dynamic}) 100% 100% 100% 0.016 0.009 0.006 0.023 0.009 0.005 0.215 0.204 0.155 7311 7678 7620

In the above experiments, we show that our black-box model can attack the undefended network with high success rate. To evaluate the strength of the proposed attack in defended situation, we further attack the InceptionV3 network that adopts ensemble adversarial training (i.e., v3adv-ens43_{\text{adv-ens}4}). Following [48], we set ϵ=0.0625\epsilon=0.0625 and randomly select 10,00010,000 images from the ImageNet validation set for test. The maximum number of queries is 10,00010,000. The performance of the attacked network is reported in Table3, where clean accuracy is the classification accuracy before attack. Note that v33 is slightly different from InceptionV3 in Table 1 in that the pretrained model of v33 comes from Tensorflow, which is the same platform of the pretrained model of v3adv-ens43_{\text{adv-ens}4}. Compared with undefended network, attacking defended one causes larger visual distortion. However, the proposed attack can still reduce the classification accuracy from 73.4%73.4\% to 1.8%1.8\%, which demonstrates its effectiveness under defend.

Refer to caption
Figure 7: More visualized adversarial examples from different attacks.

4.4 Comparison with Other Attacks

Since our approach addresses improving the visual similarity between the adversarial example and the original image, it might cost more number of queries to construct a less distorted adversarial example. To show that such costs are affordable, we compare our attack to recently proposed black-box attacks: SignHunter [9], NAttack [21], AutoZOOM [17], Bandits [8], Square Attack [26] and TREMBA [20]. For fair comparison, in Table 4, methods marked with -SSIM and Ours introduce λ(1SSIM)\lambda\cdot(1-\text{SSIM}) to the loss function with λ=10\lambda=10. Note that AutoZOOM performs line search on the choice of λ\lambda, for which we adopt the same strategy and denotes this variant of our method as Ours(λdynamic\lambda_{dynamic}). The results of the above methods are reproduced using the official codes provided by the authors. We use the default parameter settings of the corresponding attack, and set the maximum number of queries to be 10,00010,000. See Table 5 for the experimental settings of different methods. In Table 4, Comparing approaches that use fixed λ\lambda value (i.e., Signhunter-SSIM, NAttack-SSIM, Bandits-SSIM, Square Attack-SSIM, TREMBA-SSIM, AdvGAN-SSIM and Ours), we can see that the proposed method outperforms other attacks on reducing perceptual distance, while the average number of queries is comparable to Bandits. On the other hand, Ours(λdynamic\lambda_{dynamic}) achieves state-of-the-art performance on 1-SSIM, LPIPS and CIEDE2000 when compared with methods that perform line search over λ\lambda (i.e., AutoZOOM and AutoZOOM-SSIM). In general, except for Signhunter, introducing perceptual distance metric in the objective function helps reduce visual distortion in other attacks. The visualized adversarial examples from different attacks are given in Fig. 6, which shows that our model produces less distorted adversarial examples. More examples can be found in Fig. 7.

Refer to caption
Figure 8: An example of the pictures that we show to the evaluator. One of (a)(b) is produced by our model and the other is from the attacks (excluding ours) in Table 4.

We noticed that adversarial examples from SignHunter have horizontal-stripped noise and Square Attack generates adversarial examples with vertical-stripped noise. Stripped noise is helpful in improving query efficiency since the classification network is quite sensitive to such noise [26]. However, from the perspective of visual distortion, such noise greatly degrades the image quality. The adversarial examples of Bandits are relatively perceptible-friendly, but the perturbation affects most pixels in the image, which causes visually “noisy” effects, especially in a monocolor background. The noise maps from NAttack and AutoZOOM appear to be regular color patches all over the image due to their large tiling size in the methods.

Table 5: Experimental settings.
Method λ\lambda Max. Iterations
Signhunter-SSIM 10 10,000
NAttack-SSIM 10 10,000
AutoZOOM-SSIM dynamic, λ[0,1000]\lambda\in[0,1000] 10,000
Bandits-SSIM 10 10,000
Square Attack-SSIM 10 10,000
TREMBA-SSIM 10 -
Ours 10 10,000
Ours(λdynamic\lambda_{dynamic}) dynamic, λ[0,1000]\lambda\in[0,1000] 10,000

We also conducted subjective studies for further validation. Specifically, we randomly chose two adversarial examples, where one is generated by our approach (Ours(λdynamic\lambda_{dynamic})) and the other is given by the attacks (excluding ours) in Table 4. We show each human evaluator the two adversarial examples, and ask him/her which one is less distorted compared with the original image. Figure 8 gives an picture that we show to the evaluator. Note that the order of the two adversarial examples in the triplet is randomly permuted. We asked 10 human evaluators in total, each made judgements over 100100 triplets of images. As a result, adversarial examples generated by our method are thought to have less noticeable noise 82.1%82.1\% of the time, while 10.0%10.0\% of the time the evaluators think both examples are distorted at the same level. Therefore, the subjective results further prove that the proposed method effectively reduces visual distortion in adversarial examples.

Table 6: Results of other lpl_{p} attacks on ResNet50 when λ=10\lambda=10. The raw l0l_{0} and l1l_{1} scores have much higher order of magnitude compared with other metrics, and thus the normalized scores of l0l_{0} and l1l_{1} distances are reported.
Distance Metric Sampling Frequency Success Rate 1SSIM1-\text{SSIM} LPIPS CIEDE2000 l0l_{0} l1l_{1} l2l_{2} Avg. Queries
l0l_{0} 1 99.5% 0.077 0.083 0.795 0.133 0.130 6.75 536
2 99.2% 0.065 0.069 0.768 0.159 0.118 5.88 679
5 97.9% 0.058 0.065 0.789 0.177 0.118 5.19 960
l1l_{1} 1 99.5% 0.077 0.083 0.795 0.133 0.130 6.75 536
2 99.5% 0.070 0.076 0.773 0.176 0.130 6.14 658
5 99.2% 0.066 0.070 0.768 0.218 0.129 5.74 800
l2l_{2} 1 99.5% 0.110 0.112 0.829 0.215 0.211 8.21 392
2 99.5% 0.092 0.100 0.803 0.259 0.191 7.44 431
5 99.5% 0.087 0.094 0.792 0.312 0.185 6.89 579

4.5 Other lpl_{p} Attacks

Although our method in this paper is based on ll_{\infty} attack, the perceptual distance metric dd in the loss function can be replaced by other lpl_{p} (p=0,1,2p=0,1,2) distance. We did not discuss it in the above experiments because these lpl_{p} distance metrics are less accurate in measuring the perceptual distance between images compared to the specifically designed metrics, such as well-established 1SSIM1-\text{SSIM} and LPIPS. Nevertheless, we still present the results of other lpl_{p} (p=0,1,2p=0,1,2) attacks in Table 6, where the lpl_{p} distance is normalized to [0,1][0,1] in the loss function. Specifically, d(x,x+δ)=lp(x,x+δ)maxδ(lp(x,x+δ))d(x,x+\delta)=\frac{{{l_{p}}(x,x+\delta)}}{{{{\max}_{\delta}}({l_{p}}(x,x+\delta))}}, where lp(x,x+δ){l_{p}}(x,x+\delta) is the lpl_{p} distance between the original image xx and the perturbed image x+δx+\delta. As in the paper, we set λ=10,ϵ=0.05\lambda=10,\epsilon=0.05 and the maximum number of queries being 10,00010,000. We find that the raw l0l_{0} and l1l_{1} scores have much higher order of magnitude compared with other metrics, and thus the normalized scores of l0l_{0} and l1l_{1} distances are reported in Table 6. Note that when the sampling frequency N=1N=1, l0l_{0} distance is equivalent to l1l_{1} distance in that

l1(x,x+δ)maxδ(l1(x,x+δ))\displaystyle\frac{{{l_{1}}(x,x+\delta)}}{{{{\max}_{\delta}}({l_{1}}(x,x+\delta))}} =mcϵWHcϵ\displaystyle=\frac{{mc\cdot\epsilon}}{{WHc\cdot\epsilon}} (20)
=mWH\displaystyle=\frac{m}{{WH}}
=l0(x,x+δ)maxδ(l0(x,x+δ))\displaystyle=\frac{{{l_{0}}(x,x+\delta)}}{{{{\max}_{\delta}}({l_{0}}(x,x+\delta))}}

where mm is the number of perturbed pixels. W,HW,H and cc are the width, height and number of channels of a given image, respectively. Table 6 shows that optimizing l0l_{0} distance gives better performance on both the perceptual distance metrics and the lpl_{p} distance metrics.

4.6 Conclusion

We introduce a novel black-box attack based on the induced visual distortion in the adversarial example. The quantified visual distortion, which measures the perceptual distance between the adversarial example and the original image, is introduced in our loss where the gradient of the corresponding non-differentiable loss function is approximated by sampling from a learned noise distribution. The proposed attack can achieve a trade-off between visual distortion and query efficiency by introducing the weighted perceptual distance metric in addition to the original loss. The experiments demonstrate the effectiveness of our attack on ImageNet as our model achieves much lower distortion when compared to existing attacks. In addition, it is shown that our attack is valid even when it’s only allowed to perturb pixels that are out of the target object in a given image.

References

  • Kwon et al. [2020] H. Kwon, Y. Kim, H. Yoon, and D. Choi. Selective audio adversarial example in evasion attack on speech recognition system. IEEE Transactions on Information Forensics and Security, 15:526–538, 2020.
  • Kurakin et al. [2017] Alexey Kurakin, Ian J. Goodfellow, and Samy Bengio. Adversarial machine learning at scale. In Proc. International Conference on Learning Representations, 2017.
  • Papernot et al. [2017] Nicolas Papernot, Patrick D. McDaniel, Ian J. Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proc. ACM on Asia Conference on Computer and Communications Security, 2017.
  • Akhtar and Mian [2018] N. Akhtar and A. Mian. Threat of adversarial attacks on deep learning in computer vision: A survey. IEEE Access, 6:14410–14430, 2018.
  • Carlini and Wagner [2017a] Nicholas Carlini and David A. Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017a.
  • Papernot et al. [2016a] Nicolas Papernot, Patrick D. McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In Pro. IEEE European Symposium on Security and Privacy, EuroS&P, 2016a.
  • Jordan et al. [2019] Matt Jordan, Naren Manoj, Surbhi Goel, and Alexandros G. Dimakis. Quantifying perceptual distortion of adversarial examples. arXiv preprint arXiv:1902.08265, 2019.
  • Ilyas et al. [2019] Andrew Ilyas, Logan Engstrom, and Aleksander Madry. Prior convictions: Black-box adversarial attacks with bandits and priors. In Proc. International Conference on Learning Representations, 2019.
  • Al-Dujaili and O’Reilly [2020] Abdullah Al-Dujaili and Una-May O’Reilly. Sign bits are all you need for black-box attacks. In Proc. International Conference on Learning Representations, 2020.
  • Zhang et al. [2020] Y. Zhang, X. Tian, Y. Li, X. Wang, and D. Tao. Principal component adversarial example. IEEE Transactions on Image Processing, 29:4804–4815, 2020.
  • Zhang et al. [2019] Q. Zhang, K. Wang, W. Zhang, and J. Hu. Attacking black-box image classifiers with particle swarm optimization. IEEE Access, 7:158051–158063, 2019.
  • Goodfellow et al. [2015] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In Proc. International Conference on Learning Representations, 2015.
  • Laidlaw and Feizi [2019] Cassidy Laidlaw and Soheil Feizi. Functional adversarial attacks. In Proc. International Conference on Neural Information Processing Systems, 2019.
  • Zheng et al. [2019] Tianhang Zheng, Changyou Chen, and Kui Ren. Distributionally adversarial attack. In Proc. AAAI Conference on Artificial Intelligence, 2019.
  • Wang et al. [2020a] Hongjun Wang, Guanbin Li, Xiaobai Liu, and Liang Lin. A hamiltonian monte carlo method for probabilistic adversarial attack and learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020a.
  • Carlini and Wagner [2017b] Nicholas Carlini and David A. Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In ACM Workshop on Artificial Intelligence and Security, 2017b.
  • Tu et al. [2019] Chun-Chen Tu, Pai-Shun Ting, Pin-Yu Chen, Sijia Liu, Huan Zhang, Jinfeng Yi, Cho-Jui Hsieh, and Shin-Ming Cheng. AutoZOOM: Autoencoder-based zeroth order optimization method for attacking black-box neural networks. In Proc. AAAI Conference on Artificial Intelligence, 2019.
  • Zhang et al. [2020] Yonggang Zhang, Ya Li, Tongliang Liu, and Xinmei Tian. Dual-path distillation: A unified framework to improve black-box attacks. In Proc. International Conference on Machine Learning, 2020.
  • [19] Huichen Li, Linyi Li, Xiaojun Xu, Xiaolu Zhang, Shuang Yang, and Bo Li. Nonlinear gradient estimation for query efficient blackbox attack.
  • Huang and Zhang [2019] Zhichao Huang and Tong Zhang. Black-box adversarial attack with transferable model-based embedding. In Proc. International Conference on Learning Representations, 2019.
  • Li et al. [2019] Yandong Li, Lijun Li, Liqiang Wang, Tong Zhang, and Boqing Gong. NATTACK: learning the distributions of adversarial examples for an improved black-box attack on deep neural networks. In Proc. International Conference on Machine Learning, 2019.
  • Cheng et al. [2019] Shuyu Cheng, Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. Improving black-box adversarial attacks with a transfer-based prior. In Proc. International Conference on Neural Information Processing Systems, 2019.
  • Papernot et al. [2016b] Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016b.
  • Moon et al. [2019] Seungyong Moon, Gaon An, and Hyun Oh Song. Parsimonious black-box adversarial attacks via efficient combinatorial optimization. In Proc. International Conference on Machine Learning, 2019.
  • Meunier et al. [2019] Laurent Meunier, Jamal Atif, and Olivier Teytaud. Yet another but more efficient black-box adversarial attack: tiling and evolution strategies. arXiv preprint arXiv:1910.02244, 2019.
  • Andriushchenko et al. [2019] Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square attack: a query-efficient black-box adversarial attack via random search. arXiv preprint arXiv:1912.00049, 2019.
  • Xiao et al. [2018] Chaowei Xiao, Bo Li, Jun-Yan Zhu, Warren He, Mingyan Liu, and Dawn Song. Generating adversarial examples with adversarial networks. 2018.
  • Zhang [2019] Weijia Zhang. Generating adversarial examples in one shot with image-to-image translation gan. IEEE Access, 7:151103–151119, 2019.
  • Gragnaniello et al. [2019] Diego Gragnaniello, Francesco Marra, Giovanni Poggi, and Luisa Verdoliva. Perceptual quality-preserving black-box attack against deep learning image classifiers. arXiv preprint arXiv:1902.07776, 2019.
  • Rozsa et al. [2016] Andras Rozsa, Ethan M Rudd, and Terrance E Boult. Adversarial diversity and hard positive generation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016.
  • Gao et al. [2015] Fei Gao, Dacheng Tao, Xinbo Gao, and Xuelong Li. Learning to rank for blind image quality assessment. IEEE Transactions on Neural Networks and Learning Systems, 26(10):2275–2290, 2015.
  • Ma et al. [2016] Lin Ma, Long Xu, Yichi Zhang, Yihua Yan, and King Ngi Ngan. No-reference retargeted image quality assessment based on pairwise rank learning. IEEE Transactions on Multimedia, 18(11):2228–2237, 2016.
  • Zhao et al. [2017] Zhengli Zhao, Dheeru Dua, and Sameer Singh. Generating natural adversarial examples. In Proc. International Conference on Learning Representations, 2017.
  • Wang et al. [2020b] Hongjun Wang, Guangrun Wang, Ya Li, Dongyu Zhang, and Liang Lin. Transferable, controllable, and inconspicuous adversarial attacks on person re-identification with deep mis-ranking. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2020b.
  • Szegedy et al. [2014] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In Proc. International Conference on Learning Representations, 2014.
  • Zhou Wang et al. [2004] Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004.
  • Zhang et al. [2018] Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2018.
  • Sutton and Barto [1998] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press Cambridge, 1998.
  • Ruan et al. [2018] Wenjie Ruan, Xiaowei Huang, and Marta Kwiatkowska. Reachability analysis of deep neural networks with provable guarantees. In Proc. International Joint Conference on Artificial Intelligence, 2018.
  • Gao and Pavel [2017] Bolin Gao and Lacra Pavel. On the properties of the softmax function with application in game theory and reinforcement learning. arXiv preprint arXiv:1704.00805, 2017.
  • Orabona [2020] Francesco Orabona. Almost sure convergence of sgd on smooth non-convex functions. https://parameterfree.com/2020/10/05/almost-sure-convergence-of-sgd-on-smooth-non-convex-functions, 2020.
  • Russakovsky et al. [2015] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115(3):211–252, 2015.
  • Szegedy et al. [2016] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2016.
  • He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2016.
  • Simonyan and Zisserman [2015] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. In Proc. International Conference on Learning Representations, 2015.
  • Zhao et al. [2020] Zhengyu Zhao, Zhuoran Liu, and Martha Larson. Towards large yet imperceptible adversarial image perturbations with perceptual color distance. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 1039–1048, 2020.
  • Hu et al. [2018] Jie Hu, Li Shen, and Gang Sun. Squeeze-and-excitation networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2018.
  • Tramèr et al. [2018] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian J. Goodfellow, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: Attacks and defenses. In Proc. International Conference on Learning Representations, 2018.