Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems
Abstract
Decision-based attacks construct adversarial examples against a machine learning (ML) model by making only hard-label queries. These attacks have mainly been applied directly to standalone neural networks. However, in practice, ML models are just one component of a larger learning system. We find that by adding a single preprocessor in front of a classifier, state-of-the-art query-based attacks are up to seven less effective at attacking a prediction pipeline than at attacking the model alone. We explain this discrepancy by the fact that most preprocessors introduce some notion of invariance to the input space. Hence, attacks that are unaware of this invariance inevitably waste a large number of queries to re-discover or overcome it. We therefore develop techniques to (i) reverse-engineer the preprocessor and then (ii) use this extracted information to attack the end-to-end system. Our preprocessors extraction method requires only a few hundreds queries, and our preprocessor-aware attacks recover the same efficacy as when attacking the model alone. The code can be found at https://github.com/google-research/preprocessor-aware-black-box-attack.
1 Introduction
Machine learning is widely used in security-critical systems, for example for detecting abusive, harmful or otherwise unsafe online content (Waseem et al., 2017; Clarifai, ; Jha & Mamidi, 2017). It is critical that such systems are robust against adversaries who seeks to evade them.
Yet, an extensive body of work has shown that an adversary can fool machine learning models with adversarial examples (Biggio et al., 2013; Szegedy et al., 2014). Most prior work focuses on white-box attacks, where an adversary has perfect knowledge of the entire machine learning system (Carlini & Wagner, 2017). Yet, real adversaries rarely have this level of access (Tramèr et al., 2019), and must thus instead resort to black-box attacks (Chen et al., 2017). Decision-based attacks (Brendel et al., 2018a) are a particularly practical attack vector, as these attacks only require the ability to query a target model and observe its decisions.
However, existing decision-based attacks (Brendel et al., 2018b; Cheng et al., 2020a; Chen et al., 2020; Li et al., 2020) have primarily been evaluated against standalone ML models “in the lab”, thereby ignoring the components of broader learning systems that are used in practice. While some decision-based attacks have been demonstrated on production systems as a proof-of-concept (e.g., Ilyas et al. (2018); Brendel et al. (2018a); Li et al. (2020)), it is not well understood how these attacks perform on end-to-end learning systems compared to standalone models.
We show that existing decision-based attacks are significantly less effective against end-to-end systems compared to standalone machine learning models. For example, a standard decision-based attack can evade a ResNet image classifier on ImageNet with an average -distortion of (defined formally later). Yet, if we instead attack an end-to-end learning system that simply preprocesses the classifier’s input before classifying it—e.g., by resizing or compressing the image—the attack achieves an average distortion of —a increase! We further find that extensive hyperparameter tuning and running the attacks for more iterations fail to resolve this issue. We thus argue that existing decision-box attacks have fundamental limitations that make them sub-optimal in practice.
To remedy this, we develop improved attacks that achieve the same success rate when attacking systems with unknown preprocessors, as when attacking standalone models. Our attacks combine decision-based attacks with techniques developed for model extraction (Tramèr et al., 2016). Our attacks first query the system to reverse-engineer the preprocessor(s) used in the input pipeline, and then mount a modified preprocessor-aware decision-based attack. Our extraction procedure is efficient and often requires only a few hundred queries to identify commonly used preprocessors. This cost can also be amortized across many generated adversarial examples. We find that even the least efficient preprocessor-aware attack outperforms all unaware attacks. Learning the system’s preprocessing pipeline is thus more important than devising an efficient standalone attack.
2 Background and Related Work
Adversarial Examples. Adversarial examples are inputs designed to fool a machine learning classifier (Biggio et al., 2013; Szegedy et al., 2014; Goodfellow et al., 2015). For some classifier , an example has an adversarial example if , where is a small perturbation under some -norm, i.e., . Adversarial examples can be constructed either in the white-box setting (where the adversary uses gradient descent to produce the perturbation ) (Carlini & Wagner, 2017; Madry et al., 2018), or more realistically, in the black-box setting (where the adversary uses just query access to the system) (Papernot et al., 2017; Chen et al., 2017; Brendel et al., 2018a). Our paper focuses on this black-box setting with -norm perturbations.
Decision-based can generate adversarial examples with only query access to the remote model’s decisions (i.e., the output class ). These attacks typically work by finding the decision boundary between the original image and a target label of interest and then walking along the decision boundary to reduce the total distortion (Brendel et al., 2018a; Cheng et al., 2020a; Chen et al., 2020; Li et al., 2020).
It has been shown that decision-based attacks should operate at the lowest-dimensional input space possible. For example, QEBA (Li et al., 2020) improves upon HSJA (Chen et al., 2020) by constructing adversarial examples in a lower-dimensional embedding space. This phenomenon will help explain some of the results we observe, where we find that high-dimensional images require more queries to attack.
Adversarial examples need not exploit the classifier itself. Image scaling attacks (Quiring et al., 2020) construct a high-resolution image so that after resizing to a smaller , the low resolution image is visually dissimilar to . As a result, any accurate classifier will (correctly) classify the high-resolution image and the low-resolution image differently. Gao et al. (2022) consider the image-scaling attack in conjunction with a classifier similar to our setting. However, our work applies to arbitrary preprocessors, not limited to resizing, and we also propose an extraction attack to unveil the deployed preprocessor in the first place.
Preprocessing defenses. A number of proposed defenses against adversarial examples preprocess inputs before classification (Guo et al., 2018; Song et al., 2018). Unfortunately, these defenses are largely ineffective in a white-box setting (Athalye et al., 2018; Tramer et al., 2020; Sitawarin et al., 2022). Surprisingly, recent work has shown that defending against existing decision-based attacks with preprocessors is quite simple. Aithal & Li (2022); Qin et al. (2021) show that adding small amounts of random noise to inputs impedes all current attacks. This suggests that there may be a significant gap between the capabilities of white-box and black-box attacks when preprocessors are present.
Model Stealing Attacks. To improve the efficacy of black-box attacks, we make use of techniques from model stealing attacks (Tramèr et al., 2016). These attacks aim to create a ML model that closely mimics the behavior of a remote model (Jagielski et al., 2020). Our goal is slightly different as we only aim to “steal” the system’s preprocessor and use this knowledge to mount stronger evasion attacks. For this, we leverage techniques that have been used to extract functionally equivalent models, which exactly match the behavior of the remote model on all inputs (Milli et al., 2019; Rolnick & Kording, 2020; Carlini et al., 2020).
3 Setup and Threat Model
3.1 Notation
We denote an unperturbed input image in the original space as and a processed image in the model space as . The original size can be the same or different from the target size . A preprocessor maps to . For instance, a resizing preprocessor that maps an image of size pixels to pixels means that , , and . As another example, an 8-bit quantization restricts to a discrete space of and . The classifier, excluding the preprocessor, is represented by where is the hard label space. Finally, the entire classification pipeline is denoted by .
3.2 Threat Model
The key distinguishing factor between previous works and ours is that we consider a preprocessing pipeline as part of the victim system. In other words, the adversary cannot simply run an attack algorithm on the model input space. We thus follow in the direction of Pierazzi et al. (2020) and Gao et al. (2022) who develop attacks that work end-to-end, as opposed to just attacking a standalone model. To do this, we develop strategies to “bypass” the preprocessors (Section 4) and to reverse-engineer which preprocessors are being used (Section 6). Our threat model is:
-
•
The adversary has black-box, query-based access to the victim model and can query the model on any input and observe the output label . The adversary has a limited query budget per input. The adversary knows nothing else about the system.
-
•
The adversary wants to misclassify as many perturbed inputs as possible (either targeted and untargeted), while minimizing the perturbation size—measured by Euclidean distance in the original input space .
-
•
The victim system accepts inputs of any dimension, and the desired model input size is obtained by cropping and/or resizing as part of an image preprocessing pipeline.
4 Preprocessor-Aware Attacks
Decision-based attacks often query a model on many nearby points, e.g., to approximate the local geometry of the boundary. Since most preprocessors are not injective functions, nearby points in the original input space might map onto the same processed image. Preprocessing thus makes the model’s output invariant to some input changes. This can cause the attack to waste queries and prevent it from learning information about the target model.

4.1 Bypassing Attack
Our Bypassing Attack in Algorithm 1 avoids these invariances by circumventing the preprocessor entirely. Figure 1 illustrates our attack with a resizing preprocesssor (e.g., ). To allow the Bypassing Attack to query the model directly, we first map the input image () to the preprocessed space (). Then, in the Attack Phase, we execute an off-the-shelf decision-based attack directly on this preprocessed image ().
Finally, after completing the attack, we recover the adversarial image in the original space () from . We call this step the Recovery Phase. It finds an adversarial example with minimal perturbation in the original space, by solving the following optimization problem:
(1) |
4.1.1 Cropping
Because almost all image classifiers operate on square images (Wightman, 2019), one of the most common preprocessing operations is to first crop the image to a square. In practice, this means that any pixels on the edge of the image are completely ignored by the classifier. Our Bypassing Attack exploits this fact by simply removing these cropped pixels, simply running an off-the-shelf attack in the cropped space. For a more formal statement, see Section B.1.
4.1.2 Resizing
Image resizing is a ubiquitous preprocessing step in any vision system, as most classifiers are trained only on images of a specific size. We begin by considering the special case of resizing with “nearest-neighbor interpolation”, which downsizes images by a factor simply by selecting only out of every block of pixels. This resize operation is conceptually similar to cropping, and thus the intuition behind our attack is the same: if we know which pixels are retained by the preprocessor, we can avoid wasting perturbation budget and queries on pixels that are discarded. Other interpolation methods for resizing, e.g., bilinear or bicubic, work in a similar way, and can all be expressed as a linear transform, i.e., for .
The attack phase for resizing is exactly the same as that of cropping. The adversary simply runs an attack algorithm of their choice on the model space . The main difference comes in the recovery phase which amounts to solving the following optimization problem:
|
(2) |
Quiring et al. (2020); Gao et al. (2022) solve a similar version of this problem via a gradient-based algorithm. However, we show that there exists a closed-form solution for the global optimum. Since the constraint in Eqn. (2) is an underdetermined linear system, this problem is analogous to finding a minimum-norm solution, given by:
(3) |
Here, represents the Moore-Penrose pseudo-inverse. We defer the formal derivation to Section B.2.

Limitation. We have demonstrated how to bypass two very common preprocessors—cropping and resizing—but not all can be bypassed in this way. Our Bypassing Attack assumes (A1) the preprocessors are idempotent, i.e., , and (A2) the preprocessor’s output space is continuous. Most common preprocessing functions are idempotent: e.g., quantizing an already quantized image makes no difference. For preprocessors that do not satisfy (A2), e.g., quantization whose output space is discrete, we propose an alternative attack in the next section.
4.2 Biased-Gradient Attacks
We now turn our attention to more general preprocessors that cannot be bypassed without modifying the search space—for example quantization, discretizes a continuous space. Quantization is one of the most common preprocessors an adversary has to overcome since all common image formats (e.g., PNG or JPEG) discretize the pixel values to 8 bits. However, prior black-box attacks ignore this fact and operate in the continuous domain.
We thus propose the Biased-Gradient Attack in Algorithm 2. Unlike the Bypassing Attack, this attack operates in the original space. Instead of applying a black-box attack as is, the Biased-Gradient Attack modifies the base attack in order to bias queries toward directions that the preprocessor is more sensitive to. The intuition is that while it is hard to completely avoid the invariance of the preprocessor, we can encourage the attack to explore directions that result in large changes in the output space of the preprocessing function.
Our Biased-Gradient Attack also consists of an attack and recovery phase. The attack phase makes two modifications to an underlying gradient approximation attack (e.g., HSJA, QEBA) which we explain below. The recovery phase simply solves Equation 1 with a gradient-based method, by relaxing the constraint using a Lagrange multiplier (since closed-formed solutions do not exist in general). For this, we defer the details to Appendix C. Figure 2 illustrates the Biased-Gradient Attack for a quantization preprocessor.
(i) Biased Gradient Approximation: We modify the gradient approximation step to account for the preprocessor. First, consider the adversary’s loss function defined as
(4) |
where is the input, and is the target label. Attacks such as HSJA and QEBA estimate the gradient of by applying finite-differences to the quantity which can be measured by querying the model’s label. The attack samples uniformly random unit vectors , scales them a hyper-parameter , and computes
(5) |
We then perform a change-of-variables to obtain a gradient estimate with respect to instead of :
(6) |
where , and . Notice that corresponds to a random perturbation in the model space. Thus, we can “bypass” the preprocessor and approximate gradients in the model space instead by substituting with in Equation 6.
|
(7) |
So instead of querying the ML system with inputs , we use which is equivalent to pre-applying the preprocessor to the queries. If the preprocessor is idempotent, the model sees the same processed input in both cases. This gradient estimator is biased because depends on . Concretely, the distribution of is concentrated around directions that “survive” the preprocessor.
(ii) Backpropagate Gradients through the Preprocessor: The gradient estimate in Eqn. (7) is w.r.t. the model space, instead of the original input space where the attack operates. Hence, we can backpropagate through according to the chain rule, where is the Jacobian matrix of the preprocessor w.r.t. the original space. In our experiments, we use a differentiable version of quantization and JPEG compression by Shin & Song (2017) so the Jacobian matrix exists.
5 Attack Experiments
5.1 Setup
Model. Similarly to previous works (Brendel et al., 2018a), we evaluate our attacks on a ResNet-18 (He et al., 2016) trained on the ImageNet dataset (Deng et al., 2009). The model is publicly available in the popular timm package (Wightman, 2019).
Off-the-shelf attacks. We consider four different attacks, Boundary Attack (Brendel et al., 2018a), Sign-OPT (Cheng et al., 2020a), HopSkipJump Attack (HJSA) (Chen et al., 2020), and QEBA (Li et al., 2020). The first three attacks have both targeted and untargeted versions while QEBA is only used as a targeted attack. We also compare our attacks to the baseline preprocessor-aware attack, SNS (Gao et al., 2022). As this attack only considers resizing, we adapt it to the other preprocessors we consider.
Attack hyperpameters. As we discuss in Section 7.2 and Section E.2, a change in preprocessor has a large impact on the optimal choice of hyperparameters for each attack. We thus sweep hyperparameters for all attacks and report results for the best choice.
Metrics. We report the average perturbation size (-norm) of adversarial examples found by each attack—referred to as the “adversarial distance” in short. Smaller adversarial distance means a stronger attack.
Appendix A contains full detail of all our experiments.
5.2 Bypassing Attack Results
Cropping. We consider a common operation that center crops an image of size pixels down to pixels, i.e., . In Table 1, our Bypassing approach improves all of the baseline preprocessor-unaware attacks. The adversarial distance found by the baseline is about 8–16% higher than that of the Bypassing Attack counterpart across all settings. This difference is very close to the portion of the border pixels that are cropped out (), suggesting that the cropping-unaware attacks do waste perturbation on these invariant pixels. Our Bypassing Attack also recovers about the same mean adversarial distance as the case where there is no preprocessor (first row of Table 1).
Preprocessors | Methods | Untargeted Attacks | Targeted Attacks | |||||
Boundary | Sign-OPT | HSJA | Boundary | Sign-OPT | HSJA | QEBA | ||
None | n/a | 4.6 | 5.7 | 3.6 | 36.7 | 45.6 | 32.2 | 19.1 |
Crop () | Unaware | 5.3 | 6.5 | 4.2 | 42.8 | 52.7 | 38.2 | 22.2 |
Bypass (ours) | 4.6 | 5.8 | 3.6 | 37.3 | 46.3 | 32.9 | 19.6 | |
Resize (Nearest) | Unaware | 21.2 | 24.8 | 16.5 | 172.2 | 198.8 | 153.4 | 90.5 |
Bypass (ours) | 4.7 | 5.8 | 3.7 | 37.7 | 46.3 | 33.3 | 19.4 | |
Resize (Bilinear) | Unaware | 32.7 | 38.2 | 25.5 | 198.3 | 213.0 | 188.4 | 90.3 |
Bypass (ours) | 7.4 | 9.1 | 6.0 | 58.2 | 70.9 | 50.3 | 30.0 | |
Resize (Bicubic) | Unaware | 25.7 | 29.2 | 20.6 | 184.8 | 207.3 | 171.6 | 91.2 |
Bypass (ours) | 5.8 | 7.1 | 4.5 | 46.4 | 57.7 | 40.6 | 23.8 |
Resizing. We study the three most common interpolation or resampling techniques, i.e., nearest, bilinear, and bicubic. For an input size of (see Table 1), a reasonable image size captured by digital or phone cameras, our attack reduces the mean adversarial distance by up to compared to the preprocessor-oblivious counterpart. For all image sizes we experiment with, including 256 and 512 pixels in Table 7, Bypassing Attack is always preferable to the resizing-oblivious attack both with and without hyperparameter tuning.
The improvement from the Bypassing Attack is proportional to the original input dimension. The benefit diminishes with a smaller original size because the base attack of the Bypassing Attack operates in the model space. Hence, it minimizes the adversarial distance in that space, i.e., the distance between and . This distance is likely correlated but not necessarily the same as the true objective distance measured in the original space, i.e., the distance between and . In these cases, it may be preferable to use the Biased-Gradient Attack instead. The results are shown in Table 2 and Section 5.3.
5.3 Biased-Gradient Attack Results
Preprocess | Methods | Untg. | Targeted | |
HSJA | HSJA | QEBA | ||
Crop () | Unaware | 4.2 | 38.2 | 22.2 |
SNS | 3.7 | 35.4 | 31.5 | |
Biased-Grad (ours) | 3.7 | 33.1 | 19.6 | |
Resize () (Nearest) | Unaware | 16.5 | 153.4 | 90.5 |
SNS | 3.9 | 112.6 | 32.2 | |
Biased-Grad (ours) | 3.7 | 23.5 | 19.4 | |
Quantize (4 bits) | Unaware | 9.7 | 63.7 | 56.4 |
SNS | 6.4 | 55.9 | 57.2 | |
Biased-Grad (ours) | 3.1 | 39.3 | 28.8 | |
JPEG (quality 60) | Unaware | 9.2 | 63.2 | 52.7 |
SNS | 2.7 | 44.5 | 44.6 | |
Biased-Grad (ours) | 1.5 | 25.1 | 21.0 | |
Neural Compress (Ballé et al., 2018) (hyperprior, 8) | Unaware | 25.1 | 92.0 | 78.6 |
SNS | 17.6 | 83.6 | 78.9 | |
Biased-Grad (ours) | 15.8 | 75.2 | 75.8 | |
Neural Compress (Cheng et al., 2020b) (attention, 6) | Unaware | 33.8 | 94.1 | 86.9 |
SNS | 14.3 | 80.3 | 75.5 | |
Biased-Grad (ours) | 12.6 | 74.8 | 77.9 |
Neural Preprocessors | Unaware | SNS | Biased-Grad |
Ballé et al. (2018) (hyperprior, 8) | 25.1 | 17.6 | 15.8 |
Ballé et al. (2018) (hyperprior, 6) | 28.7 | 17.0 | 14.0 |
Ballé et al. (2018) (factorized, 8) | 24.0 | 15.1 | 13.9 |
Ballé et al. (2018) (factorized, 6) | 26.9 | 10.4 | 11.7 |
Cheng et al. (2020b) (attention, 6) | 25.7 | 12.6 | 14.3 |
Cheng et al. (2020b) (attention, 4) | 31.3 | 13.7 | 13.4 |
Cheng et al. (2020b) (anchor, 6) | 27.3 | 8.7 | 7.0 |
Cheng et al. (2020b) (anchor, 4) | 32.7 | 7.8 | 6.6 |
SwinIR (denoise level 15) | 24.4 | 55.0 | 10.5 |
We evaluate the Biased-Gradient Attack on a broad range of preprocessors; in addition to resize and crop, we include 8/6/4-bit quantization as well as JPEG compression with quality values of 60, 80, and 100. Moreover, we experiment with neural-network-based compression methods from Ballé et al. (2018) and Cheng et al. (2020b) as well as SwinIR, a transformer-based denoiser (Liang et al., 2021). We select these methods as representatives of the recent image restoration/compression models which improve upon the traditional computer vision techniques (Zhang et al., 2021; Zamir et al., 2022). Importantly, these methods violate our idempotent assumption so they serve an extra purpose of evaluating our attack when the assumption does not hold.
Here, we consider untargeted/targeted HSJA and targeted QEBA as they are consistently the strongest, and the other two do not involve gradient approximation. From Table 2, Biased-Gradient Attack outperforms the preprocessor-unaware counterpart as well as SNS in almost all settings. A few highlights are: Biased-Gradient Attack reduces the mean adversarial distance to only and of the distance found by the attack without it for 4-bit quantization and JPEG with a quality of 60, respectively. The Biased-Gradient Attack also outperforms the baselines on neural compression under varying models and compression levels as well as on the SwinIR denoiser (Table 3). We observe a recurring trend where the benefit of Biased-Gradient Attack increases with stronger preprocessors, e.g., fewer quantization bits or lower compression quality.
6 Extracting Preprocessors
As we have seen, knowledge of the preprocessor results in much more efficient decision-based attacks. What is now left is to design a query-efficient attack that actually reverse-engineers the preprocessor used by the target system.
It should not be surprising that this task would be achievable as it is a particular instance of the more general problem of model stealing. Recent work (Milli et al., 2019; Rolnick & Kording, 2020; Carlini et al., 2020) has shown a way to completely recover a (functionally-equivalent) neural network using only query access; stealing just a specific part of the model should thus be easier. Nonetheless, our setting comes with different challenges, both of which relate to the assumed adversary’s capabilities:
-
1.
Prior extraction attacks require high-precision access to the classifier, i.e., (64-bit) floating-point input/output. However, we can only provide valid image files (8-bit) as input and receive only a single decision label as output. This invalidates the approaches used in prior work that rely on computing finite differences with epsilon-sized input-output perturbations (Milli et al., 2019).
-
2.
Prior attacks needs – queries to extract a very simple ( parameters) MNIST neural network (Carlini et al., 2020)—in contrast we work with much larger models. While the up-front extraction cost can be amortized across many generated adversarial examples, for our attacks to be economically efficient, they must be effective in just a few hundred queries.
Intuition. Our extraction attack relies on a guess-and-check strategy. Given a hypothesis about the preprocessor (e.g., “the model uses bilinear resizing to 224224 pixels”), we build a set of inputs such that the outputs let us distinguish whether the hypothesis is true or not. Then, by enumerating a set of possible preprocessors, we can use a combination of binary and exhaustive search to reduce this set down to a single preprocessor . Our attack (see Algorithm 3) consists of two main components, “unstable pairs” and “pre-images,” which we describe below.
6.1 Unstable Example Pairs


The first step of our attack generates an “unstable pair”. This is a pair of samples with two properties: (i) , and (ii) with high probability , for some random perturbation . Figure 3 depicts an unstable pair: the points and have opposite labels, but a small random perturbation is likely to push both points to the same side of the boundary. As the perturbation made by (i.e., ) grows, should also increase.
Given two images , such that , we construct an unstable pair by performing two binary searches. The first finds a new pair of images with . Starting from , the second binary search finds the unstable pair where . This gives us a pair of images that differ by one pixel, and by at that pixel, while also being predicted as different classes. This process uses only about 40 queries, depending on the input size. In the interest of space, we provide the detail in Section D.1. The rest of the attack uses only the unstable pair; are no longer used.
6.2 Hypothesis Testing with Pre-Images
Suppose we hypothesize that the preprocessor applied to an image is some function (this is our “guess” piece of our guess-and-check attack). Then, given this unstable example pair , we can now implement the “check” piece. For clarity, we denote the actually deployed preprocessor by .
We begin by constructing a pre-image so that and analogously for and . Now if our guess is indeed correct, then it is guaranteed that since . On the other hand, if our guess is wrong, then we have with at least some probability .
Let the null hypothesis be . When we observe that the predictions of the pre-images do not change, we reject the null hypothesis if for some threshold (we choose 0.01). To increase , we can do two things: (i) simply repeat multiple trials by randomly generating and testing more pre-images and only reject the null hypothesis when none of the predictions change, or (ii) increase the size of the perturbation . This increases by definition of the unstable pair.
6.3 Experiment on Real-World Applications
Preprocessor Space | Num. Queries |
Arbitrary resize (200px–800px) | |
Arbitrary center crop (0%-100%) | |
Arbitrary JPEG compression (quality 50-100) | |
Typical resize (see text) |
We use this attack to extract preprocessors for a wide range of models publicly hosted on HuggingFace Hub through their API (https://huggingface.co/docs/api-inference). We choose HuggingFace as the preprocessing metadata is available on most models for us to verify our extracted results. For each experiment, we randomly sample 10 models trained to predict ImageNet-1k classes. Table 4 summarizes the results.
Because our procedure is inherently guess-and-check, we first define the space of all possible preprocessors. The exact space here depends on the possible knowledge an adversary might have. In the worst case, an adversary has to enumerate over all possible image sizes ranging from the smallest size used for any image classifier (200px) to the largest size used for any image classifier (800px). This incurs a cost of 632 queries on average to extract one resizing operator (i.e., both output size and interpolation method). However, some preprocessors might be more typical than others. We call a preprocessor pipeline “typical” if it is used by at least two different models. For example, ResNet classifiers almost always first resize images to 256 256, and then center-crop the resulting image down to 224 224. Our set of typical sizes includes 224, 248, 256, 288, 299, 384, and 512 pixels, with bilinear and bicubic interpolations. With this prior knowledge, the adversary reduces the query cost by over 10 times, down to only 50 queries
Resize extraction is particularly efficient (50 queries) as we can use a binary search to find the crop size, instead of an exhaustive search. Any wrong guessed crop size larger than the actual size will not change the prediction of the pre-images. This allows us to run a binary search where the guessed crop size shrinks when the predictions do not change and grows otherwise.
7 Discussion
7.1 Varying Number of Attack Iterations


Figure 4 plots the mean adversarial distance as a function of the number of queries for QEBA. Notice that the adversarial distance plateaus after around 10,000 queries, and the distance found by preprocessor-unaware attacks never reaches that of our preprocessor-aware attacks. This suggests that our attacks do not only improve the efficiency of the algorithms but also allow them to find closer adversarial examples that would have been completely missed otherwise. See Section E.3 for more details.
7.2 Choice of Attack Hyperparameters
Hyperparameter choice is important to the effectiveness of the attacks. In many cases, using the right hyperparameters benefits more than using stronger attack algorithms. Choosing the right hyperparameter usually improves the distance found by 1.5 and up to 15 in one case, depending on the attack algorithm. The knowledge of the preprocessor deployed helps in quickly narrowing down the range of good hyperparameters. In practice, an adversary would benefit from spending some queries to learn the preprocessors as well as to tune the hyperparameters if one plans to generate many adversarial examples. For detailed numerical results and discussion, see Section E.2.
7.3 Varying the Target Model
We have used the public pre-trained ResNet-18 model as the target model in our experiments, but the conclusion also holds for other models. We pick two models, EfficientNetV2-Tiny (Tan & Le, 2021) and DEIT3-Small (Touvron et al., 2022), with different architectures from ResNet-18, and run the attacks with the resizing (nearest, 1024 288) and JPEG (quality 60) preprocessors, respectively. The mean adversarial distance for EfficientNetV2-Tiny/DEIT3-Small are 134.2/95.9, 117.7/76.6, 32.3/48.1 with unaware, SNS, and our Biased-Gradient attacks (+ targeted QEBA), respectively. This corresponds to about 4 and 2 improvement over the unaware attack on the two models.
7.4 Multiple Preprocessors
List of Preprocessors | Unaware | SNS | BG |
Resize(1024256),Crop(224),Quant(8) | 122.4 | 144.9 | 97.9 |
Resize(1024256),Crop(224),JPEG(60) | 283.1 | 237.5 | 153.7 |
Resize(512224),Quant(6) | 89.8 | 90.9 | 49.0 |
Preprocessor-aware attacks. In practice, multiple preprocessors are used sequentially. In the case that all the preprocessors can be bypassed, e.g., resizing and cropping, we can bypass the entire pipeline by querying with an appropriate size and padding. The recovery phase can then be done in the reverse order that the preprocessors are applied. When at least one preprocessor is not bypassable, we can treat the entire pipeline as one preprocessor and apply the Biased-Gradient Attack. Table 5 shows attack results for three common combinations of preprocessors.
Extracting multiple preprocessors. With the above attack, it becomes trivial to extract multiple preprocessors by extracting each in turn. Suppose there are two preprocessors and , we can first extract by subsuming as part of , i.e., , and then we move on to guess using the now revealed to construct the pre-images. In practice, it is actually even easier: the most common two transformations, resizing and cropping, are almost commutative (i.e., albeit with different crop and resize parameters). This means that one could either extract cropping or resizing first and still end up with an equivalent overall preprocessor pipeline.
8 Conclusion
We have shown that decision-based attacks are sensitive to changes in preprocessors, to a surprising degree. To develop a strong attack in practice, it is more important to get the preprocessor right than to use a stronger attack! We propose an extraction attack for commonly used preprocessors and two decision-based attacks aimed to circumvent any preprocessor. Our approaches are more efficient than the prior work and yield a stronger attack. We believe that it is important for future work to carefully consider other implicit assumptions in the current adversarial ML literature that may not be true in practice. We hope that our analysis will inspire future work to further explore this direction.
Acknowledgement
The authors would like to thank David Wagner for helping with the presentation of the paper, Matthew Jagielski for wonderful discussion on the problem, and Alex Kurakin for comments on early draft of this paper.
A majority of this research was conducted when Chawin was at Google as a student researcher. For the remaining time at UC Berkeley, Chawin was supported by the Hewlett Foundation through the Center for Long-Term Cybersecurity (CLTC), by the Berkeley Deep Drive project, and by generous gifts from Open Philanthropy.
References
- Aithal & Li (2022) Aithal, M. B. and Li, X. Mitigating black-box adversarial attacks via output noise perturbation. IEEE Access, 10:12395–12411, 2022.
- Athalye et al. (2018) Athalye, A., Carlini, N., and Wagner, D. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Dy, J. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 274–283, Stockholmsmässan, Stockholm Sweden, July 2018. PMLR.
- Ballé et al. (2018) Ballé, J., Minnen, D., Singh, S., Hwang, S. J., and Johnston, N. Variational image compression with a scale hyperprior. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
- Bégaint et al. (2020) Bégaint, J., Racapé, F., Feltman, S., and Pushparaja, A. CompressAI: A PyTorch library and evaluation platform for end-to-end compression research. arXiv preprint arXiv:2011.03029, 2020.
- Biggio et al. (2013) Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndić, N., Laskov, P., Giacinto, G., and Roli, F. Evasion attacks against machine learning at test time. In Blockeel, H., Kersting, K., Nijssen, S., and Železný, F. (eds.), Machine Learning and Knowledge Discovery in Databases, pp. 387–402, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg. ISBN 978-3-642-40994-3.
- Brendel et al. (2018a) Brendel, W., Rauber, J., and Bethge, M. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. In International Conference on Learning Representations, 2018a.
- Brendel et al. (2018b) Brendel, W., Rauber, J., Kurakin, A., Papernot, N., Veliqi, B., Salathé, M., Mohanty, S. P., and Bethge, M. Adversarial vision challenge. Technical report, 2018b.
- Carlini & Wagner (2017) Carlini, N. and Wagner, D. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57, 2017. doi: 10.1109/SP.2017.49.
- Carlini et al. (2020) Carlini, N., Jagielski, M., and Mironov, I. Cryptanalytic extraction of neural network models. In Annual International Cryptology Conference, pp. 189–218. Springer, 2020.
- Chen et al. (2020) Chen, J., Jordan, M. I., and Wainwright, M. J. HopSkipJumpAttack: A query-efficient decision-based attack. arXiv:1904.02144 [cs, math, stat], April 2020.
- Chen et al. (2017) Chen, P.-Y., Zhang, H., Sharma, Y., Yi, J., and Hsieh, C.-J. ZOO: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, AISec ’17, pp. 15–26, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 978-1-4503-5202-4. doi: 10.1145/3128572.3140448.
- Cheng et al. (2020a) Cheng, M., Singh, S., Chen, P. H., Chen, P.-Y., Liu, S., and Hsieh, C.-J. Sign-OPT: A query-efficient hard-label adversarial attack. In International Conference on Learning Representations, 2020a.
- Cheng et al. (2020b) Cheng, Z., Sun, H., Takeuchi, M., and Katto, J. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020b.
- (14) Clarifai. Best NSFW model for content detection using AI — clarifai. https://www.clarifai.com/models/nsfw-model-for-content-detection.
- Deng et al. (2009) Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, 2009. doi: 10.1109/CVPR.2009.5206848.
- Gao et al. (2022) Gao, Y., Shumailov, I., and Fawaz, K. Rethinking image-scaling attacks: The interplay between vulnerabilities in machine learning systems. In Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., and Sabato, S. (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 7102–7121. PMLR, July 2022.
- Goodfellow et al. (2015) Goodfellow, I., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. In International Conference on Learning Representations, 2015.
- Guo et al. (2018) Guo, C., Rana, M., Cisse, M., and van der Maaten, L. Countering adversarial images using input transformations. In International Conference on Learning Representations, 2018.
- He et al. (2016) He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016. doi: 10.1109/CVPR.2016.90.
- Ilyas et al. (2018) Ilyas, A., Engstrom, L., Athalye, A., and Lin, J. Black-box adversarial attacks with limited queries and information. In Dy, J. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 2137–2146. PMLR, July 2018.
- Jagielski et al. (2020) Jagielski, M., Carlini, N., Berthelot, D., Kurakin, A., and Papernot, N. High accuracy and high fidelity extraction of neural networks. In 29th USENIX Security Symposium (USENIX Security 20), pp. 1345–1362. USENIX Association, August 2020. ISBN 978-1-939133-17-5.
- Jha & Mamidi (2017) Jha, A. and Mamidi, R. When does a compliment become sexist? analysis and classification of ambivalent sexism using twitter data. In Proceedings of the Second Workshop on NLP and Computational Social Science, pp. 7–16, Vancouver, Canada, August 2017. Association for Computational Linguistics. doi: 10.18653/v1/W17-2902. URL https://aclanthology.org/W17-2902.
- Li et al. (2020) Li, H., Xu, X., Zhang, X., Yang, S., and Li, B. QEBA: Query-efficient boundary-based blackbox attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
- Liang et al. (2021) Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L., and Timofte, R. SwinIR: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pp. 1833–1844, October 2021.
- Madry et al. (2018) Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
- Milli et al. (2019) Milli, S., Schmidt, L., Dragan, A. D., and Hardt, M. Model reconstruction from model explanations. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 1–9, 2019.
- MMEditing Contributors (2022) MMEditing Contributors. MMEditing: OpenMMLab image and video editing toolbox, 2022.
- Papernot et al. (2017) Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., and Swami, A. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ASIA CCS ’17, pp. 506–519, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 978-1-4503-4944-4. doi: 10.1145/3052973.3053009.
- Pierazzi et al. (2020) Pierazzi, F., Pendlebury, F., Cortellazzi, J., and Cavallaro, L. Intriguing properties of adversarial ML attacks in the problem space. In 2020 IEEE Symposium on Security and Privacy (SP), pp. 1332–1349, May 2020. doi: 10.1109/SP40000.2020.00073.
- Qin et al. (2021) Qin, Z., Fan, Y., Zha, H., and Wu, B. Random noise defense against query-based black-box attacks. Advances in Neural Information Processing Systems, 34:7650–7663, 2021.
- Quiring et al. (2020) Quiring, E., Klein, D., Arp, D., Johns, M., and Rieck, K. Adversarial preprocessing: Understanding and preventing image-scaling attacks in machine learning. In 29th USENIX Security Symposium (USENIX Security 20), pp. 1363–1380. USENIX Association, August 2020. ISBN 978-1-939133-17-5.
- Rauber et al. (2017) Rauber, J., Brendel, W., and Bethge, M. Foolbox: A python toolbox to benchmark the robustness of machine learning models. arXiv preprint arXiv:1707.04131, 2017.
- Rolnick & Kording (2020) Rolnick, D. and Kording, K. Reverse-engineering deep relu networks. In International Conference on Machine Learning, pp. 8178–8187. PMLR, 2020.
- Shafahi et al. (2019) Shafahi, A., Huang, W. R., Studer, C., Feizi, S., and Goldstein, T. Are adversarial examples inevitable? In International Conference on Learning Representations, 2019.
- Shin & Song (2017) Shin, R. and Song, D. JPEG-resistant adversarial images. In Machine Learning and Computer Security Workshop (Co-Located with NeurIPS 2017), Long Beach, CA, USA, 2017.
- Sitawarin et al. (2022) Sitawarin, C., Golan-Strieb, Z., and Wagner, D. Demystifying the adversarial robustness of random transformation defenses. In The AAAI-22 Workshop on Adversarial Machine Learning and Beyond, 2022.
- Song et al. (2018) Song, Y., Kim, T., Nowozin, S., Ermon, S., and Kushman, N. PixelDefend: Leveraging generative models to understand and defend against adversarial examples. arXiv:1710.10766 [cs], May 2018.
- Szegedy et al. (2014) Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. In International Conference on Learning Representations, 2014.
- Tan & Le (2021) Tan, M. and Le, Q. EfficientNetV2: Smaller models and faster training. In Meila, M. and Zhang, T. (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 10096–10106. PMLR, July 2021.
- Touvron et al. (2022) Touvron, H., Cord, M., and Jégou, H. DeiT III: Revenge of the ViT, April 2022.
- Tramèr et al. (2016) Tramèr, F., Zhang, F., Juels, A., Reiter, M. K., and Ristenpart, T. Stealing machine learning models via prediction apis. In Proceedings of the 25th USENIX Conference on Security Symposium, SEC’16, pp. 601–618, USA, 2016. USENIX Association. ISBN 978-1-931971-32-4.
- Tramèr et al. (2019) Tramèr, F., Dupré, P., Rusak, G., Pellegrino, G., and Boneh, D. AdVersarial: Perceptual ad blocking meets adversarial machine learning. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 2005–2021, November 2019. doi: 10.1145/3319535.3354222.
- Tramer et al. (2020) Tramer, F., Carlini, N., Brendel, W., and Madry, A. On adaptive attacks to adversarial example defenses. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 1633–1645. Curran Associates, Inc., 2020.
- Waseem et al. (2017) Waseem, Z., Davidson, T., Warmsley, D., and Weber, I. Understanding abuse: A typology of abusive language detection subtasks. In Proceedings of the First Workshop on Abusive Language Online, pp. 78–84, Vancouver, BC, Canada, August 2017. Association for Computational Linguistics. doi: 10.18653/v1/W17-3012. URL https://aclanthology.org/W17-3012.
- Wightman (2019) Wightman, R. PyTorch image models. GitHub, 2019.
- Zamir et al. (2022) Zamir, S. W., Arora, A., Khan, S., Hayat, M., Khan, F. S., and Yang, M.-H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5728–5739, June 2022.
- Zhang et al. (2021) Zhang, K., Li, Y., Zuo, W., Zhang, L., Van Gool, L., and Timofte, R. Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6360–6376, 2021.
Appendix A Detailed Experiment Setup
We use a pre-trained ResNet-18 model from a well-known repository timm (Wightman, 2019) which is implemented in PyTorch and trained on inputs of size . This model is fixed throughout all the experiments. The experiments are run on multiple remote servers with either Nvidia Tesla A100 40GB or Nvidia V100 GPUs.
Implementations of Boundary Attack and HSJA are taken from the Foolbox package (Rauber et al., 2017).111We use code from the commit: https://github.com/bethgelab/foolbox/commit/de48acaaf46c9d5d4ea85360cadb5ab522de53bc. For Sign-OPT Attack and QEBA, we use the official, publicly available implementation.222Sign-OPT attack: https://github.com/cmhcbb/attackbox. QEBA: https://github.com/AI-secure/QEBA.
To compare effectiveness of the attacks, we report the average perturbation size (-norm) of the adversarial examples computed on 1,000 random test samples. We will refer to this quantity as the adversarial distance in short. Smaller adversarial distance means a stronger attack. Unless stated otherwise, all the attacks use 5,000 queries per one test sample.
The implementation of the neural compression models along with their weights is taken from Bégaint et al. (2020). These models only accept input sizes that are powers of two so we change the input size from the default 224224 to 256256 here. For SwinIR, we use the publicly available code and weights from MMEditing Contributors (2022). Backpropagating the gradients through a SwinIR model takes up a large amount of memory which we cannot fit into our GPUs. So we have to reduce the input size to 128128 pixels. Note that due to this mismatch in the original input sizes, it is not recommended to compare the mean adversarial distances across these preprocessors.
A.1 Hyperparameter Sweep
We find that the choice of hyperparameters of the four attack algorithms plays an important role in their effectiveness, and it is not clear how an attacker would know apriori how to choose such hyperparameters. In reality, the adversary would benefit from spending some queries to tune the hyperparameters on a few samples. Coming up with the most efficient tuning algorithm is outside of the scope of this work. Nonetheless, we account for this effect by repeating all experiments with multiple choices of hyperparameters and reporting the results with both the best sets throughout the paper. We include some of the results with both the best and the default hyperparameters for comparisons in Table 6, Table 9, and Table 8.
For Boundary attack, we sweep the two choices of step size, one along the direction towards the original input and the other in the orthogonal direction. The default values are , respectively, and the swept values are , , , and .
For Sign-OPT attack, we consider the update step size and the gradient estimate step size . Their default values are respectively, and we sweep the following values: , , , and .
We only tune one hyperparameter for HSJA and QEBA attacks but with the same number of settings (five) as the other two attacks above. For HSJA, we tune the update step size by trying values of (default), , , , and . The optimal value of is always at a higher range than , not smaller. Lastly, we search the ratio that controls the latent dimension that QEBA samples its random noise from for gradient approximation. We search over .
The observed trends and the recommended hyperparameters are discussed further below in Section E.2.
Appendix B Bypassing Attacks
Here, we provide additional details on the Bypassing Attack for cropping and resizing preprocessors.
B.1 Cropping Preprocessor
Attack Phase for Cropping.
To bypass the cropping transformation, the attacker simply submits an already cropped input and runs any query-based attack algorithm in the space instead of . Without any modification to the attack algorithm, it is able to operate directly on the model space as if there is no preprocessing.
Recovery Phase for Cropping.
In order for the adversarial example obtained from the attack phase to be useful in input space, the adversary still has to produce an adversarial example in the original space with the smallest possible Euclidean distance to the original input. It should be obvious that for cropping, this operation simply equates to padding this adversarial example with the original edge pixels.
Formal Definition of Cropping’s Recovery Phase.
We now formally describe what it means to crop an image. Given an input image of size , a crop operation removes the edge pixels of any image larger than a specified size, denoted by , such that the output has the size . Given an (flattened) input image and the cropped image , we can write cropping as the following linear transformation, when ,
(8) |
where is a sparse binary matrix. Each row of has exactly one entry being at a position of the corresponding non-edge pixel while the rest are . Note that we drop the “color-channel” dimension for simplicity since most of the preprocessors in this paper is applied channel-wise. We are only interested in the scenario when because otherwise, the preprocessing simply becomes an identity function.
Let the adversarial example in the model space as obtained from the attack phase be . The adversary can recover the corresponding adversarial example in the original space, , by padding with the edge pixels of .
It is simple to show that is a projection of onto the set , i.e.,
(9) |
Proof.
We can split into two terms
(10) |
where is a set of edge pixel indices. The second term is fixed to for any . When , the first term is zero because is obtained by padding with the edge pixels of . Since the first term is non-negative, we know that is a unique global minimum of Eqn. (9). ∎
B.2 Resizing Preprocessor
B.2.1 Computing the Transformation Matrix
Not all image resizing operations are the same; the main step that varies between them is called the “interpolation” mode. Interpolation determines how the new pixels in the resized image depend on (multiple) pixels in the original image. Generally, resizing represents some form of a weighted average. How the weights are computed and how many of the original pixels should be used varies by specific interpolation methods.
For nearest interpolation (zeroth order), is a sparse binary matrix with exactly one per row. For higher-order interpolations, a pixel in can be regarded as a weighted average of certain pixels in . Here, is no longer binary, and each of its rows represents these weights which are between and . For instance, since one pixel in a bilinear resized image is a weighted average of four pixels ( pixels) in the original image, for bilinear interpolation has four non-zero elements per row. On the other hand, for bicubic interpolation has 16 non-zero elements per row ( pixels). is still generally sparse for and is more sparse when increases.
The matrix can be computed analytically for any given and . Alternatively, it can be populated programmatically, by setting each pixel in the original image to 1, one at a time, then performing the resize, and gathering the output. This method is computationally more expensive but simple, applicable to any sampling order, and robust to minor differences in different resizing implementations.
B.2.2 Recovery Phase for Resizing
The recovery phase involves some amount of linear algebra, as it is equivalent to solving the following linear system of equations
(11) |
to find . Note that for , this is an underdetermined system so there exist multiple solutions. A minimum-norm solution, , can be obtained by computing the right pseudo-inverse of given by
(12) | ||||
(13) |
However, the adversary does not want to find a minimum-norm original sample but rather a minimum-norm perturbation . This can be accomplished by modifying Eqn. (11) and Eqn. (13) slightly
(14) | ||||
(15) | ||||
(16) | ||||
(17) |
Eqn. (17) summarizes the recovery phase for resizing. By construction, it guarantees that is a minimum-norm perturbation for a given , or is a projection of onto the set of solutions that map to after resizing. In other words, by replacing any with , we have
(18) | ||||
(19) |
In practice, we can compute by either using an iterative solver on Eqn. (11) directly, or by pre-computing the pseudo-inverse in Eqn. (12). The former does not require caching any matrix but must be recomputed for every input. Caching the pseudo-inverse is more computationally expensive but is done only once. Since is sparse, both options are very efficient.
Appendix C Biased-Gradient Attack
C.1 Details on the Recovery Phase
We propose a recovery phase for general preprocessors which should also work for cropping and resizing as well, albeit less efficiently compared to the one in Bypassing Attack. Assuming that the preprocessor is differentiable or has a differentiable approximation, it is possible to replace the exact projection mechanism for finding with an iterative method. Specifically, consider relaxing the constraint from Eqn. (1) with a Lagrange multiplier:
(20) |
This optimization problem can then be solved with gradient descent combined with a binary search on the Lagrange multiplier . We emphasize that, unlike the exact recovery for resizing or cropping, the second term does not necessarily need to be driven down to zero, i.e., . For the Biased-Gradient Attack, can be seen as a proxy to make misclassified by or as a guide to move towards. Specifically, we want the smallest such that the solution minimizes while also being misclassified.
To this end, we use binary search on by increasing/decreasing it when is correctly/incorrectly classified. Throughout this paper, we use 10 binary search steps (3 steps in the case of the neural-network-based preprocessor as computing gradients through these models can be expensive). Each step only requires exactly one query to check the predicted label at the end. In practice, we also impose a constraint that keeps in the input domain using a change of variable trick inspired by the attack from Carlini & Wagner (2017).
Appendix D Preprocessor Extraction Attacks
D.1 Detailed Construction of Unstable Pairs
We begin by identifying (any) two images , such that . This step should be easy: it suffices to identify two valid images that actually belong to different classes, or to make random (large-magnitude) modifications to one image until it switches classes and then call the perturbed image . Intuitively, because , if we were to interpolate between and , there must be a midpoint where the decision changes. By picking and to straddle this midpoint , we obtain an unstable example pair. If the input space of the pipeline were continuous, we can generate an unstable pair, up to the floating-point precision, with a single binary search. However, since we focus on real systems that accept only 8-bit images, we need to take multiple extra steps to create the pair that differs by only one bit on one pixel.
First, we reduce the difference between the two images via binary search. Construct a new image where each pixel is independently chosen (uniformly at random) as the pixel value either from the image or from the image . This new image now roughly shares half of the pixels with and half of the pixels with . If replace with and repeat; if then replace with and repeat.
Next, reduce the difference between these two images, again following the same binary search procedure. Let , and query the model to obtain . If then replace with and repeat; if then replace with and repeat. Do this until and differ from each other by at most 1/255 (the smallest difference two images can have). This will eventually give a pair of images that now differ in exactly one pixel coordinate, and in this one coordinate by exactly 1/255. By construction, these two images are also classified as different classes by the pipeline . We call them an unstable pair. Note that we have not relied on the knowledge of as we have only treated as a single function.
D.2 Detailed Pre-Image Attack
Once we obtain the unstable pair , the next step is to use them to generate multiple pre-image and “check” our “guess” of the preprocessor. Before explaining how the pre-images are generated, we will first expand on the implications of the two outcomes: our guess is either right or wrong.
Our guess is correct: In the case that our guess is right, (), the following equality will hold for ,
(21) |
where the first equality holds by the assumption that , the second equality holds by construction that and are pre-images, and the final equality holds under the first correctness assumption. From here, we can conclude
Put simply, this means that if we feed the pipeline with and , and if our preprocessor guess is correct, then the pipeline will give two different answers .
Our guess is wrong: On the other hand, if our guess at the preprocessor was wrong, i.e., , then we will, with high probability, observe a different outcome:
where the middle inequality holds true because the examples and are an unstable example pair, and is the non-identity transformation used to construct from .
By coming up with multiple pre-images, querying the target pipeline, and observing the predictions, we can check whether our guess on the preprocessor is correct or not.
D.2.1 A Greedy Pre-image Attack
The previous step requires the ability to construct the pre-images for an arbitrary image and an arbitrary guessed transformation . While in general, this problem is intractable (e.g., a cryptographic hash function resists exactly this), common image preprocessors are not explicitly designed to be robust and so in practice, it is often nearly trivial.
In practice, we implement this attack via a greedy and naive attack that works well for any transformation that operates over discrete integers , which is the case for image preprocessors where pixel values lie between and .
To begin, let be the image whose pre-image we would like to compute. We then make random pixel-level perturbations to the image by randomly choosing a pixel coordinate and either increasing or decreasing its value by . We refer to each of these as . We take each of these candidates and check if . If any hold true, then we accept this change and let . We then repeat this procedure with to get a sequence of images so that and that is sufficiently large. We desire large perturbation because, intuitively, the larger the difference, the higher the probability that the unstable property will hold. In other words, it is more likely that if , where and are and in this case. In practice, we only use one unstable example pair, but if more confidence is desired, an attacker could use many (at an increased query cost).
Appendix E Additional Experiment Results
E.1 Complete Preprocessor-Aware Attack Results
Preprocessors | Methods | Hparams | Untargeted Attacks | Targeted Attacks | |||||
Boundary | Sign-OPT | HSJA | Boundary | Sign-OPT | HSJA | QEBA | |||
Crop () | Unaware | Default | 11.1 | 6.7 | 4.4 | 48.6 | 50.6 | 40.9 | 24.7 |
Best | 5.3 | 6.5 | 4.2 | 42.8 | 50.4 | 38.2 | 22.2 | ||
Bypassing (ours) | Default | 9.6 | 5.9 | 3.9 | 42.3 | 46.0 | 35.1 | 21.2 | |
Best | 4.6 | 5.8 | 3.6 | 37.3 | 46.0 | 32.9 | 19.6 | ||
Preprocessors | Methods | Untargeted Attacks | Targeted Attacks | |||||
Boundary | Sign-OPT | HSJA | Boundary | Sign-OPT | HSJA | QEBA | ||
Resize () (Nearest) | Unaware | 21.2 | 24.8 | 16.5 | 172.2 | 198.8 | 153.4 | 90.5 |
SNS | n/a | n/a | 3.9 | n/a | n/a | 112.6 | 32.2 | |
Bypassing (ours) | 4.7 | 5.8 | 3.7 | 37.7 | 46.3 | 33.3 | 19.4 | |
Biased-Grad (ours) | n/a | n/a | 3.6 | n/a | n/a | 32.9 | 19.6 | |
Resize () (Nearest) | Unaware | 10.3 | 12.5 | 8.1 | 84.7 | 97.8 | 74.2 | 44.5 |
SNS | n/a | n/a | 3.7 | n/a | n/a | 56.5 | 54.1 | |
Bypassing (ours) | 4.5 | 5.7 | 3.6 | 37.3 | 45.5 | 32.6 | 19.4 | |
Biased-Grad (ours) | n/a | n/a | 3.6 | n/a | n/a | 34.2 | 19.9 | |
Resize () (Nearest) | Unaware | 6.3 | 6.1 | 3.9 | 41.0 | 50.6 | 36.1 | 20.1 |
SNS | n/a | n/a | 3.4 | n/a | n/a | 34.5 | 30.0 | |
Bypassing (ours) | 7.7 | 5.4 | 3.4 | 36.0 | 44.8 | 31.3 | 17.9 | |
Biased-Grad (ours) | n/a | n/a | 3.4 | n/a | n/a | 31.4 | 17.6 | |
Resize () (Bilinear) | Unaware | 32.7 | 38.2 | 25.5 | 198.3 | 213.0 | 188.4 | 90.3 |
SNS | n/a | n/a | 5.7 | n/a | n/a | 113.7 | 111.6 | |
Bypassing (ours) | 7.4 | 9.1 | 6.0 | 58.2 | 70.9 | 50.3 | 30.0 | |
Biased-Grad (ours) | n/a | n/a | 5.6 | n/a | n/a | 56.4 | 36.4 | |
Resize () (Bilinear) | Unaware | 15.9 | 19.1 | 12.6 | 98.7 | 106.0 | 90.8 | 45.6 |
SNS | n/a | n/a | 5.5 | n/a | n/a | 65.1 | 57.0 | |
Bypassing (ours) | 7.4 | 9.2 | 5.9 | 57.7 | 70.9 | 50.2 | 30.3 | |
Biased-Grad (ours) | n/a | n/a | 5.5 | n/a | n/a | 51.1 | 30.5 | |
Resize () (Bilinear) | Unaware | 6.3 | 7.8 | 5.1 | 45.6 | 53.0 | 40.8 | 21.9 |
SNS | n/a | n/a | 3.6 | n/a | n/a | 33.9 | 29.0 | |
Bypassing (ours) | 7.7 | 9.9 | 6.1 | 45.5 | 57.8 | 46.2 | 21.5 | |
Biased-Grad (ours) | n/a | n/a | 3.4 | n/a | n/a | 34.0 | 22.0 | |
Resize () (Bicubic) | Unaware | 25.7 | 29.2 | 20.6 | 184.8 | 207.3 | 171.6 | 91.2 |
SNS | n/a | n/a | 4.5 | n/a | n/a | 108.3 | 55.7 | |
Bypassing (ours) | 5.8 | 7.1 | 4.5 | 46.4 | 57.7 | 40.6 | 23.8 | |
Biased-Grad (ours) | n/a | n/a | 4.6 | n/a | n/a | 45.8 | 28.1 | |
Resize () (Bicubic) | Unaware | 13.1 | 15.4 | 10.1 | 91.1 | 101.5 | 81.1 | 44.3 |
SNS | n/a | n/a | 4.6 | n/a | n/a | 59.9 | 55.8 | |
Bypassing (ours) | 5.8 | 7.0 | 4.5 | 46.4 | 56.6 | 40.2 | 24.4 | |
Biased-Grad (ours) | n/a | n/a | 4.5 | n/a | n/a | 42.6 | 25.6 | |
Resize () (Bicubic) | Unaware | 6.0 | 7.4 | 4.8 | 44.2 | 51.9 | 39.4 | 21.5 |
SNS | n/a | n/a | 4.0 | n/a | n/a | 74.9 | 87.6 | |
Bypassing (ours) | 5.8 | 7.3 | 4.6 | 42.5 | 52.9 | 37.6 | 21.6 | |
Biased-Grad (ours) | n/a | n/a | 3.9 | n/a | n/a | 36.4 | 21.2 |
Preprocess | Methods | Attack Hyperparameter | Untargeted | Targeted | |
HSJA | HSJA | QEBA | |||
JPEG (quality 100) | Unaware | Default | 5.7 | 35.8 | 18.8 |
Best | 3.5 | 31.9 | 18.8 | ||
SNS | Default | 3.1 | 43.0 | 24.7 | |
Best | 2.4 | 29.5 | 24.7 | ||
Biased-Grad (ours) | Default | 28.9 | 71.9 | 19.2 | |
Best | 2.8 | 32.5 | 19.2 | ||
JPEG (quality 80) | Unaware | Default | 29.6 | 85.7 | 50.7 |
Best | 8.9 | 63.2 | 43.9 | ||
SNS | Default | 13.2 | 64.3 | 47.2 | |
Best | 7.3 | 45.4 | 47.2 | ||
Biased-Grad (ours) | Default | 23.7 | 80.4 | 25.5 | |
Best | 2.3 | 29.2 | 21.9 | ||
JPEG (quality 60) | Unaware | Default | 29.2 | 86.8 | 56.1 |
Best | 9.2 | 63.2 | 52.7 | ||
SNS | Default | 11.9 | 66.1 | 44.6 | |
Best | 2.7 | 44.5 | 44.6 | ||
Biased-Grad (ours) | Default | 22.2 | 82.0 | 27.0 | |
Best | 1.5 | 25.1 | 26.1 | ||
Preprocess | Methods | Attack Hyperparameter | Untargeted | Targeted | |
HSJA | HSJA | QEBA | |||
Quantize (8 bits) | Unaware | Default | 29.1 | 83.6 | 26.5 |
Best | 5.0 | 45.6 | 26.5 | ||
SNS | Default | 6.8 | 42.8 | 35.0 | |
Best | 4.6 | 42.8 | 35.0 | ||
Biased-Grad (ours) | Default | 7.1 | 46.2 | 21.3 | |
Best | 3.9 | 33.9 | 20.6 | ||
Quantize (6 bits) | Unaware | Default | 30.4 | 86.1 | 40.6 |
Best | 7.5 | 48.2 | 39.4 | ||
SNS | Default | 17.5 | 76.2 | 43.7 | |
Best | 5.9 | 46.8 | 43.7 | ||
Biased-Grad (ours) | Default | 11.1 | 56.7 | 25.1 | |
Best | 3.9 | 34.2 | 23.3 | ||
Quantize (4 bits) | Unaware | Default | 32.3 | 88.9 | 58.4 |
Best | 9.7 | 63.7 | 56.4 | ||
SNS | Default | 22.2 | 76.5 | 57.2 | |
Best | 6.4 | 55.9 | 57.2 | ||
Biased-Grad (ours) | Default | 19.2 | 74.7 | 31.8 | |
Best | 3.1 | 39.3 | 28.8 | ||
Notice that, for the nearest resizing, our Bypassing Attack finds adversarial examples with about the same mean adversarial distance as the no-preprocessor case regardless of the input dimension (see Table 7). It may seem counter-intuitive: one might expect that the -norm of the adversarial perturbation scales with the square root of the input dimension. This may be the case if a new classifier were trained on each of the different input sizes (Shafahi et al., 2019). However, this observation matches the intuition that similar to cropping, the nearest resize operation only keeps 224224 pixels regardless of the input dimension. Hence, only the perturbation on these pixels matters to the prediction.
To build some more intuition on this phenomenon, let’s consider a toy example of a binary classifier that simply classifies one-dimensional data, e.g., white and black pixels with values of 0 and 1 respectively, by using a 0.5 threshold. To push a white pixel over the decision boundary (or the threshold, in this case) requires a perturbation of size 0.5. Now consider a new set of inputs with size and a nearest resize that maps the 22 inputs to one pixel. The classifier remains unchanged. In this case, the nearest resize simply picks one pixel (say, the top left) out of the four pixels. Which pixel is picked depends on the exact implementation but does not matter for our purpose here. To attack this classifier from a 22 input, the adversary still needs to change only the top left pixel by 0.5, and thus, the adversarial distance remains unchanged. Even for larger input sizes, only one pixel will still be selected. While this toy example explains resizing with the nearest interpolation, it does not necessarily apply to bilinear or bicubic. Nonetheless, all of our experimental results support this hypothesis.






E.2 Attack Hyperparameter Choices
We have seen from Section 4 that fine-tuning the hyperparameters improves the attack significantly in most cases. We discuss when it is most important for the adversary to fine-tune their attack hyperparameters. Figure 6 (Appendix E) shows the attack success rate at varying adversarial distances for three untargeted attack algorithms. For Boundary, HSJA, and QEBA attacks, the gain from selecting the right set of hyperparameters is significant, a large improvement over the default.
For instance, a properly tuned Boundary attack outperforms Sign-OPT and HSJA attacks with their default hyperparameters in majority of the settings with resizing preprocessor.
For most attacks, we do not observe a universally good set of hyperparameters across different preprocessors. However, there are two general rules of thumb when it comes to better guess the hyperparameters:
-
1.
Using a larger value of (–) in HSJA attack is almost always better than the default (). This applies to both preprocessor-aware and -unaware attacks and to all preprocessors.
-
2.
QEBA attack samples the noise used for gradient approximation from an image space with a smaller size where is the original input size, and is the hyperparameter smaller than 1. The default value of is for . Consequently, for a larger such as the resizing preprocessor, setting to be smaller accordingly is always beneficial. For example, we find that for , the best values of are , , , respectively.
Here, we include two figures that compare the effect of tuning the attack hyperparameters in multiple settings. Figure 5 suggests that the default hyperparameters often work well as expected when no preprocessor is used while there is much greater discrepancy between the default and the best hyperparameters when preprocessors are used.
The degree in which the hyperparameter tuning matters also depends on the attack algorithm. Figure 6 visually compares the effectiveness of three untargeted attacks on the resizing preprocessor. It is obvious that Boundary and HSJA attacks benefit much more from a hyperparameter sweep compared to Sign-OPT attack.
Preprocessor | Attack | Attack Parameters | Unaware | SNS | Ours |
Resize (nearest, 1024 224) | Boundary (untargeted) | 45.4 | 9.8 | ||
56.1 | 12.4 | ||||
134.1 | 29.1 | ||||
21.2 | 4.7 | ||||
40.6 | 8.8 | ||||
Sign-OPT (untargeted) | 24.8 | 5.8 | |||
25.2 | 5.9 | ||||
25.0 | 5.8 | ||||
25.3 | 5.8 | ||||
25.4 | 5.9 | ||||
HSJA (untargeted) | 28.5 | 3.8 | |||
18.3 | 3.7 | ||||
17.4 | 12.4 | 3.7 | |||
16.8 | 4.8 | 3.8 | |||
16.5 | 3.9 | 8.0 | |||
26.2 | 6.1 | 60.6 | |||
Boundary (targeted) | 194.4 | 42.3 | |||
242.6 | 52.4 | ||||
310.2 | 67.8 | ||||
233.1 | 50.6 | ||||
172.2 | 37.7 | ||||
Sign-OPT (targeted) | 201.3 | 46.3 | |||
199.4 | 46.4 | ||||
200.2 | 46.4 | ||||
203.3 | 47.1 | ||||
202.4 | 46.4 | ||||
HSJA (targeted) | 168.3 | 35.2 | |||
160.5 | 34.0 | ||||
159.7 | 122.7 | 33.3 | |||
153.4 | 112.6 | 23.5 | |||
162.0 | 212.5 | 37.3 | |||
QEBA (targeted) | Naive, | 138.7 | 32.2 | 29.7 | |
Resize 2, | 139.1 | 21.9 | |||
Resize 4, | 124.5 | 19.4 | |||
Resize 8, | 103.7 | 19.9 | |||
Resize 16, | 92.5 | 26.3 | |||
Resize 32, | 90.5 | 42.9 |
Preprocessor | Attack | Attack Parameters | Unaware | SNS | Ours |
Crop (256 224) | Boundary (untargeted) | 11.1 | 9.6 | ||
5.3 | 4.6 | ||||
32.8 | 28.5 | ||||
14.1 | 12.1 | ||||
10.1 | 8.7 | ||||
Sign-OPT (untargeted) | 6.7 | 5.9 | |||
6.8 | 5.9 | ||||
6.6 | 5.8 | ||||
6.5 | 5.9 | ||||
6.7 | 5.9 | ||||
HSJA (untargeted) | 4.4 | 3.7 | 3.6 | ||
4.2 | 3.7 | 3.6 | |||
4.2 | 3.8 | 4.9 | |||
6.0 | 6.9 | 4.9 | |||
Boundary (targeted) | 48.6 | 42.3 | |||
62.0 | 53.1 | ||||
76.9 | 66.8 | ||||
58.0 | 50.5 | ||||
42.8 | 37.3 | ||||
Sign-OPT (targeted) | 52.8 | 46.3 | |||
52.7 | 46.5 | ||||
52.9 | 46.4 | ||||
53.9 | 47.7 | ||||
53.2 | 46.7 | ||||
HSJA (targeted) | 40.4 | 34.9 | |||
38.7 | 36.4 | 33.6 | |||
38.2 | 35.4 | 32.9 | |||
47.4 | 46.3 | 44.7 | |||
104.5 | 104.2 | 92.9 | |||
QEBA (targeted) | Naive, | 34.0 | 31.5 | 29.5 | |
Resize 2, | 24.7 | 21.2 | |||
Resize 4, | 22.2 | 19.6 | |||
Resize 8, | 23.2 | 20.3 | |||
Resize 16, | 30.0 | 26.8 |
Preprocessor | Attack | Attack Parameters | Unaware | SNS | Ours |
Quantize (4 bits) | HSJA (untargeted) | 29.8 | 22.2 | 4.4 | |
22.7 | 12.3 | 3.1 | |||
11.2 | 6.4 | 10.3 | |||
9.7 | 123.3 | 42.6 | |||
HSJA (targeted) | 85.3 | 76.5 | 54.5 | ||
74.7 | 57.6 | 39.3 | |||
63.7 | 55.9 | 48.8 | |||
95.2 | 92.1 | 91.3 | |||
QEBA (targeted) | Naive, | 71.8 | 57.2 | 33.6 | |
Resize 2, | 61.2 | 31.7 | |||
Resize 4, | 58.4 | 30.4 | |||
Resize 8, | 56.4 | 28.8 | |||
Resize 16, | 60.4 | 29.2 | |||
JPEG (quality 60) | HSJA (untargeted) | 27.5 | 11.9 | 10.2 | |
21.3 | 6.6 | 5.0 | |||
10.5 | 2.7 | 2.0 | |||
9.2 | 3.4 | 1.5 | |||
HSJA (targeted) | 80.5 | 66.1 | 66.1 | ||
66.4 | 44.5 | 43.5 | |||
63.2 | 51.8 | 25.1 | |||
93.3 | 93.5 | ||||
QEBA (targeted) | Naive, | 64.9 | 44.6 | 28.3 | |
Resize 2, | 58.5 | 21.1 | |||
Resize 4, | 56.1 | 21.0 | |||
Resize 8, | 52.7 | 22.7 | |||
Resize 16, | 53.3 | 25.9 |
For the numerical comparison between the best and the default hyperparameter choices on quantization and JPEG, please refer to Table 9 and Table 8, respectively. Tables 10, 11 and 12 show the results with all the hyperparameters we sweep for resizing, cropping, and quantization/JPEG, respectively. In the most extreme case, the best hyperparameters reduce the mean adversarial distance by a factor of 15 for JPEG with quality 60 under untargeted HSJA + our Biased-Gradient attack.
E.3 Varying Number of Attack Iterations
There are two interesting properties we observe when we vary the number of queries the adversary can utilize. So far we have considered attacks that use exactly 5,000 queries; in this section, we now test attacks with 500 to 50,000 queries. Figure 4 plots the mean adversarial distance as a function of the number of queries for QEBA attack with the best hyperparameter for each respective setting. First, the adversarial distance plateaus after around 10,000 queries, and the distance found by preprocessor-unaware attacks never reaches that of Bypassing/Biased-Gradient Attack. This suggests that our preprocessor-aware attack does not only improve the efficiency of the attack algorithms but also allows it to find closer adversarial examples that would have been completely missed otherwise.
The second observation is that the improvement from Bypassing Attack over the preprocessor-unaware attack is consistent across all numbers of queries. For instance, in LABEL:fig:num_steps_resize, the Bypassing Attack reduces the mean adversarial distance by a factor of around 4.5 to 4.8 for any number of queries. This is not the case for the Biased-Gradient Attack which is relatively more effective at a larger number of queries. In LABEL:fig:num_steps_jpeg, Biased-Gradient Attack yields an improvement of 1.1 at 500 queries and 2.5 beyond 10,000 queries.
