Relevance Attack on Detectors
Abstract
This paper focuses on high-transferable adversarial attacks on detectors, which are hard to attack in a black-box manner, because of their multiple-output characteristics and the diversity across architectures. To pursue a high attack transferability, one plausible way is to find a common property across detectors, which facilitates the discovery of common weaknesses. We are the first to suggest that the relevance map from interpreters for detectors is such a property. Based on it, we design a Relevance Attack on Detectors (RAD), which achieves a state-of-the-art transferability, exceeding existing results by above 20%. On MS COCO, the detection mAPs for all 8 black-box architectures are more than halved and the segmentation mAPs are also significantly influenced. Given the great transferability of RAD, we generate the first adversarial dataset for object detection and instance segmentation, i.e., Adversarial Objects in COntext (AOCO), which helps to quickly evaluate and improve the robustness of detectors.
keywords:
adversarial attack, attack transferability, black-box attack, relevance map, interpreters, object detection.1 Introduction
Adversarial attacks [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] have revealed the fragility of Deep Neural Networks (DNNs) by fooling them with elaborately-crafted imperceptible perturbations. Among them, the black-box attack, i.e., attacking without knowledge of their inner structure and weights, is much harder, more aggressive, and closer to real-world scenarios. For classifiers, there exist some promising black-box attacks [13, 14, 15, 16]. It is also severe to attack object detection [17] in a black-box manner, e.g., hiding certain objects from unknown detectors [18]. By that, life-concerning systems based on detection such as autonomous driving and security surveillance could be hurt when the black-box attack is conducted physically [19, 20].
To the best of our knowledge, no existing attack is specifically designed for black-box transferability in detectors, because they have multiple-outputs and a high diversity across architectures. In such situations, adversarial samples do not transfer well [21], and most imperceivable attacks only decrease mAP of black-box detectors by 5 to 10% [22, 23, 24]. To overcome this, we propose one feasible way to find common properties across detectors inspired by our AoA attack [16], which facilitates the discovery of common weaknesses. Based on them, the designed attack can threaten various victims.
In this paper, we adopt the relevance map from DNN interpreters as common property, on which different detectors have similar interpretable results, as shown in Fig. 1 and E. Based on relevance maps, we design a Relevance Attack on Detectors (RAD). RAD focuses on suppressing the relevance map rather than directly attacking the prediction as in existing works [22, 23, 25, 26, 27]. Because the relevance maps are quite similar across models, those of black-box models are influenced and misled as well in attack, leading to great transferability. Although some works have adopted the relevance map as an indicator or reference of success attacks [16, 28, 29, 30], there is no work to directly attack the relevance maps of detectors to the best of our knowledge.

In our comprehensive evaluation, RAD achieves the state-of-the-art transferability on 8 black-box models for COCO) dataset [31], nearly halving the detection mAP under the common -bounded settings, and impairing detectors’ performance in three sub-tasks. Interestingly, the adversarial samples of RAD also greatly influence the performance of instance segmentation, even only detectors are attacked. Given the high transferability of RAD, we create Adversarial Objects in COntext (AOCO), the first adversarial dataset for object detection and instance segmentation. AOCO contains 10K samples that significantly decrease the performance of black-box models for detection and segmentation. AOCO may serve as a benchmark to test the robustness of a DNN or improve it by adversarial training. To reproduce our results and access our dataset, one could visit https://github.com/AllenChen1998/RAD.
Contributions
-
1.
We propose a novel attack framework on relevance maps for detectors. We extend DNN interpreters to detectors, find out the most suitable outputs to attack by relevance maps, and explore the best update techniques to increase the transferability.
-
2.
We evaluate RAD on 8 black-box models and find its state-of-the-art transferability, which exceeds existing results by above 20% in mAP. Detection and segmentation performance is greatly impaired in various metrics, invalidating the state-of-the-art DNN to a very rudimentary counterpart.
-
3.
By RAD, we create the first adversarial dataset for object detection and instance segmentation, i.e., AOCO. As a potential benchmark, AOCO is generated from COCO and contains 10K high-transferable samples. AOCO helps to quickly evaluate and improve the robustness of detectors.
2 Related Work
Since [1], there have been lots of promising adversarial attacks [2, 3, 4]. Generally, they fix the network weights and change the input slightly to optimize the attack loss. The network then predicts incorrectly on adversarial samples with high confidence. [32] find that adversarial samples crafted by attacking a white-box surrogate model may transfer to other black-box models as well. Input modification [14, 28, 15] or other optimization ways [13, 15] are validated to be effective in enhancing the transferability.
[22] extends adversarial attacks to detectors. It proposes to attack on densely generated bounding boxes. After that, losses about localization and classification are designed [23] for attacking detectors. [33] and [25] propose to attack detectors in a restricted area. Existing restricted digital attacks achieve good results in white-box scenarios but are not specifically designed for transferability. The adversarial impact on black-box models is quite limited, i.e., a 5 to 10% decrease from the original mAP, even when two models only differ in backbone [22, 23, 24]. [34] discusses black-box attacks towards detectors based on queries rather than the transferability as we do. The performance is satisfactory, but it requires over 30K queries, which is easy to be discovered by the model owner. Besides, physical attacks on white-box detectors are also feasible [35, 36, 37].
For great transferability, we propose to attack relevance maps, which are calculated by DNN interpreters [38, 39, 40, 41]. They are originally developed to interpret how DNNs predict and help users gain trust in them. Specifically, they display how the input contributes to a certain output in a pixel-wise manner. Typical works include Layer-wise Relevance Propagation (LRP) [42], Contrastive LRP [43] and Softmax Gradient LRP (SGLRP) [44]. These methods encourage the reference of relevance maps in attack [16, 28, 29, 30], and also inspire us. However, none of them attack relevance maps for detectors.
3 Relevance Attack on Detectors
We propose an attack specifically designed for black-box transferability, named Relevance Attack on Detectors (RAD). RAD suppresses multi-node relevance maps for several bounding boxes. Since the relevance map is commonly shared by different detectors as shown in Fig. 1, attacking it in the white-box surrogate model achieves a high transferability towards black-box models. In this section, we first provide a high-level overview of RAD, and analyze the potential reasons for its transferability. Then we thoroughly discuss three crucial concrete issues in RAD.
3.1 What is RAD?
We present the framework of RAD in Fig. 2. Initialized by the original sample , the adversarial sample in the iteration is forward propagated in the surrogate model, getting the prediction . Current attacks generally suppress the prediction values of all attacked output nodes in , where an output node stands for an output scalar of the detector. In contrast, RAD suppresses the corresponding relevance map . To restrain that, gradients of back propagate to , which is then modified to .

Notably, RAD is a complete framework to attack detectors, and its components require special design. Besides the calculation of relevance maps of detectors, other components in RAD, e.g., the attacked nodes or the update techniques, also need a customized analysis. The reason is that no existing work directly attacks the relevance of detectors, and the experience in attacking predictions is not applicable here. For example, [17] emphasizes classification loss and localization loss equally, but the former is validated to be significantly better in attacking the relevance in Section 3.4.
3.2 Why RAD Transfers?
RAD’s transferability comes from the attack goal: changing the common properties, i.e., the relevance maps, which are the same for different detectors because they are developed to interpret the salient parts in the data, and thus is data-dependent and model-independent [47, 38, 39] for diverse well-trained models, which could be observed in Fig. 1 and E. As shown in Fig. 3, the relevance maps are clear and structured for the original sample in both detectors. After RAD, the relevance maps are induced to be meaningless without a correct focus, leading to wrong predictions, i.e., no or false detection. Because relevance maps transfer well across models, those for black-box detectors are also significantly influenced, causing a great performance drop, which is illustrated visually in Section 4.2.

RAD also attacks quite “precisely”, i.e., the perturbation pattern is significantly focused on distinct areas and has a clear structure as shown in Fig. 4. That is to say, RAD accurately locates the most discriminating parts of a sample and concentrates the perturbation on them, leading to a great transferability when the perturbations are equally bounded.




3.3 What is the Relevance Maps for Detectors?
We analyze the potential of RAD above, below we make it feasible by addressing three crucial issues. To conduct the relevance attack, we first need to know the relevance maps for detectors.
Currently, there exist lots of interpreters to calculate the relevance maps for classifiers as described in Section 2, but none of them are suitable for detectors. We take SGLRP [44] as an example to introduce the relevance map for classifiers and then modify it for detectors because it excels in discriminating against irrelevant regions.
For a given deep classifier, the relevance map is defined as a normalized heat map with the same dimensions as input , visualizing how contributes to a specific output node in pixel-wise manners. We denote this map as , which, in SGLRP, is obtained by back-propagating the “relevance” from the output layer (layer ) to the input layer (layer ) after the forward inference. Suppose layer has nodes (dimension of features) and layer has nodes, the relevance at node in layer is defined recursively by
for nodes with definite positive values (such as after ReLU), and
for nodes that may have negative values. In the formulas above, is the post-activation output of node in layer and is the pre-activation one. The range stands for the minimum and maximum of , and , .
According to the relevance propagation rules above, the relevance map is calculated by recursively back-propagating the relevance in the output layer , of which the component is defined in SGLRP as
(3) |
where is the predicted probability of class , and is that for the single target class .
In detectors, however, we need the pixel-wise contributions from the input to bounding boxes. This multi-node relevance map could not be directly calculated by (3), so we naturally modify SGLRP as
(6) |
where is the predicted probabilities for one target output node . is the set containing all target output nodes . With iNNvestigate Library [48] to implement Multi-Node SGLRP and Deep Learning platforms supporting auto-gradient, the gradients from RAD loss to sample could be obtained according to the relevance propagation rules.
We illustrate the difference between SGLRP and our Multi-Node SGLRP in Fig. 5. SGLRP only displays the relevance map for one bounding box, e.g., “TV”, “chair” and “bottle”. Multi-Node SGLRP, in contrast, visualizes the overall relevance.

3.4 Where to Attack?
Besides the calculation of relevance maps, it is also important to choose a proper node-set to attack. Specifically, we need to select certain bounding boxes and the corresponding output nodes for RAD.
Heuristically, the most “obvious” bounding boxes are desired to be eliminated, so we select the bounding boxes with the highest confidence, following [22]. Concretely, it is feasible to statically choose several bounding boxes to attack in each iteration, or dynamically attack all bounding boxes whose confidence exceeds a threshold. In our evaluation, the two strategies differ a little in performance and are not sensitive to hyper-parameter as demonstrated in A. This shows that RAD does not require a sophisticated tuning of parameters, which is user-friendly. In our following experiments, we statically attack 20 nodes.
After selecting bounding boxes, we could attack their size, leading them to shrink; or their localization, leading them to shift; or their confidence, leading them to be misclassified. To adopt the best strategy, we conduct a toy experiment by attacking YOLOv3 [49], denoted as M2 (other models are specified in Table 3), following the settings later in Sec. 4. Given the results in Table 1, the classification loss induces a better black-box transferability. This may be because detectors generally include a pre-trained classification as the feature extractor, and relevance maps are believed to be an indicator of successful attacks [28, 29]. Note that our method is applicable to both one-stage detectors and two-stage ones because they both have classification outputs which we target on.
Strategy | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 |
---|---|---|---|---|---|---|---|---|---|
No Attack | 29.3 | 33.4 | 38.1 | 40.7 | 42.1 | 42.5 | 45.7 | 46.9 | 53.9 |
Size | 26.0 | 14.7 | 31.9 | 32.5 | 35.6 | 35.4 | 38.6 | 40.0 | 47.8 |
Local. | 22.8 | 6.4 | 27.4 | 28.1 | 31.7 | 30.8 | 34.4 | 35.9 | 45.1 |
Class. | 18.1 | 1.2 | 19.9 | 20.5 | 24.3 | 22.6 | 26.4 | 28.2 | 39.9 |
3.5 How to Update?
With the relevance map for certain attacked nodes , we are able to attack, i.e., update the original sample to become adversarial by suppressing the relevance map according to the attack gradients as
(7) |
Some update techniques are validated to be effective for enhancing the transferability in classification. For example, Scale-Invariant (SI) [15] proposes to average the attack gradients by scale copies of the samples as
(8) |
Besides SI, Diverse Input (DI) [14], Translation-Invariant (TI) [28] are also promising in classification. We are curious about whether they also work well in object detection. To explore this, we adopt these techniques in RAD as the setting suggested by their designers (see Sec. 4 and C). From the results in Table 2, we discover that SI is quite effective, further decreasing the mAP from the baseline significantly. Accordingly, RAD adopts (8) to update.
Technique | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 |
---|---|---|---|---|---|---|---|---|---|
None | 18.1 | 1.2 | 19.9 | 20.5 | 24.3 | 22.6 | 26.4 | 28.2 | 39.9 |
DI | 18.1 | 1.0 | 19.9 | 20.5 | 23.9 | 22.4 | 26.3 | 27.9 | 39.6 |
TI | 17.0 | 2.4 | 20.8 | 20.8 | 25.2 | 23.0 | 27.9 | 29.7 | 41.5 |
SI | 14.6 | 0.7 | 16.3 | 17.0 | 20.4 | 19.1 | 22.3 | 23.8 | 35.0 |
With the calculated gradient, we update the sample like PGD attack [4] as
(9) |
where stands for the step length. is -norm bounded by from the original sample in each iteration as in [14, 15, 28]. Gradient is normalized by its average -norm,i.e., to prevent numerical errors and control the degree of perturbations. is the dimension of the image, i.e., . Division by is necessary because -norm sums all components of the tensor , which is too large as a normalization factor. We do not adopt the mainstream sign method because it is not suitable to generate small perturbations as shown in other attacks in detectors [22].
4 Experiments
In this section, we evaluate the performance of RAD on various detectors, especially its transferability. The results are presented visually and numerically. In comprehensive evaluation, RAD achieves great transferability across models and even across tasks.
4.1 Setup
Our experiments are based on Keras [50], Tensorflow [tensorflow2015-whitepaper] and PyTorch [51] in 4 NVIDIA GeForce RTX 2080Ti GPUs. Library iNNvestigate [48] is used to implement Multi-Node SGLRP.
We conduct experiments on MS COCO 2017 dataset [31], which is a large-scale benchmark for object detection, instance segmentation, and image captioning. For a fair evaluation, we generate adversarial samples from all 5K samples in its validation set and test several black-box models on their mAP, a standard measure in many works [52, 53].
All attacks are conducted with the step length for 10 iterations and the perturbation is -bounded in to guarantee the imperceptibility as in [28] if not particularly specified. To validate that the mAP drop comes from the attack instead of resizing or perturbation, we add large Gaussian noises () to the resized images, and report it as “Ablation”.
We choose 8 typical detectors ranging from the first end-to-end detector to state-of-the-art counterparts for attack and test. The variety of models guarantees the validity of results. We specify their information in Table 3 and the corresponding pre-processing or details in C.
ID | Model | Type | Backbone | mAP |
---|---|---|---|---|
M1 | SSD512 [54] | one-stage | VGG16 | 29.3 |
M2 | YOLOv3 [49] | one-stage | Darknet | 33.4 |
M3 | RetinaNet [55] | one-stage | ResNet-101 | 38.1 |
M4 | Faster R-CNN [56] | two-stage | ResNeXt-101-64*4d | 40.7 |
M5 | Mask R-CNN [52] | two-stage | ResNeXt-101-64*4d | 42.1 |
M6 | Cascade RCNN [57] | two-stage | ResNet-101 | 42.5 |
M7 | Cascade Mask R-CNN [57] | two-stage | ResNeXt-101-64*4d | 45.7 |
M8 | Hyrbrid Task Cascade [53] | two-stage | ResNeXt-101-64*4d | 46.9 |
M9 | EfficientDet [58] | one-stage | EfficientNet + BiFPN | 53.9 |
4.2 Visual Results of RAD
We first intuitively illustrate the attack process in Fig. 6 and the attack transferability in Fig. 7.


By RAD, the relevance map is attacked to be meaningless and loses its focus. In Fig. 6, the initial prediction is correct and the relevance map is clear. RAD constantly misleads the relevance map to be unstructured without the outline of objects. Finally, all bounding boxes vanish.
In Fig. 7, we visualize several predictions on the same adversarial sample by black-box models. The objects in the image, e.g., the laptop and keyboard, are quite large and obvious to detect. However, with a small perturbation from RAD, 5 black-box models all fail to detect the laptop, keyboard, and mouse. Surprisingly, 4 of them even detect a non-existent “bed”, which is neither relevant nor similar in the image.
4.3 RAD’s Transferability in Object Detection
To evaluate the in-domain transferability of detection attacks and cross-domain transferability of classification attacks, we test the detection mAP of 8 models in COCO adversarial samples generated in the setting stated before.
For detection attacks, adversarial samples are crafted by attacking surrogate model M2 (YOLOv3 [49]). For classification attacks, we use the model output on the clean sample as the label. By several state-of-the-art attacks on surrogate classifiers (InceptionV3 [59] here as in [14, 28]), the adversarial samples are generated and tested the mAP as the transferability towards detectors. Details of implementation are described in C.
We present the results in Table 4. Among the classification attacks and detection ones, cross-domain attack [60] is effective, but RAD is more aggressive. RAD enjoys a state-of-the-art transferability towards most black-box models, outperforming other methods for above 20%. The detection mAPs are more than halved, making state-of-the-art detectors similar to the early elementary counterpart. Also, the adversarial samples from the one-stage detector (M1) could be transferred to two-stage ones (M4 - M8, seeing Table 3) as the case in attacking classifiers [13, 14, 15].
Method | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 |
---|---|---|---|---|---|---|---|---|---|
No Attack | 29.3 | 33.4 | 38.1 | 40.7 | 42.1 | 42.5 | 45.7 | 46.9 | 53.9 |
Ablation | 24.9 | 31.4 | 31.2 | 31.6 | 35.0 | 34.3 | 37.5 | 38.8 | 48.6 |
PGD [4] | 26.4 | 30.4 | 34.4 | 35.4 | 38.4 | 38.3 | 41.7 | 43.1 | 51.1 |
SI-PGD [15] | 27.5 | 31.6 | 36.1 | 37.1 | 40.0 | 40.1 | 43.5 | 44.8 | 52.4 |
MI-DI-PGD [13, 14] | 22.9 | 26.2 | 29.3 | 30.0 | 33.2 | 32.1 | 36.0 | 37.5 | 48.0 |
MI-TI-PGD [13, 28] | 20.1 | 23.7 | 24.9 | 25.4 | 30.1 | 27.4 | 32.8 | 34.5 | 47.1 |
CD-painting [60] | 16.4 | 20.8 | 21.3 | 22.8 | 26.6 | 24.5 | 28.9 | 29.5 | 42.3 |
CD-comics [60] | 16.6 | 21.6 | 21.7 | 22.7 | 26.8 | 24.3 | 29.1 | 42.3 | 43.7 |
Dfool [33] | 23.3 | 2.5 | 29.2 | 29.8 | 33.3 | 32.9 | 36.5 | 38.0 | 47.5 |
Loc [17] | 21.9 | 0.2 | 25.8 | 26.6 | 29.8 | 29.4 | 33.2 | 33.2 | 45.2 |
DAG [22] | 20.8 | 0.6 | 22.8 | 23.4 | 26.8 | 25.6 | 28.9 | 31.0 | 40.6 |
RAD (ours) | 14.6 | 0.7 | 16.3 | 17.0 | 20.4 | 19.1 | 22.3 | 23.8 | 35.0 |

4.4 Comprehensive Evaluations of RAD’s Transferability
Although mAP is a general metric to test detectors under a fixed bound of perturbations, it is also interesting to investigate RAD’s transferability under different bounds and sub-tasks, i.e., classification accuracy, the shift of bounding boxes, and their invisibility to detectors.
Here we first vary the bound of RAD, and report the mAP in Fig. 8. With the bound increases, the resulting mAP greatly decreases for all black-box models especially for from 8 to 12.
We further study RAD’s transferability in different metrics, which is necessary because mAP measures the overall performance of detectors, and thus could not decouple the RAD’s aggression on, e.g., shifting and hiding bounding boxes. For subsequent evaluations on sub-tasks, we define the metrics below.
-
1.
Accuracy for classification of bounding boxes: Without considering locations, we use the predicted box classes to calculate the classification accuracy. For each image, predicted bounding boxes with the highest confidence (after non-maximum suppression) are considered, where is the number of ground truth bounding boxes for rationality. For example, if the detector predicts 3 cats, 2 dogs, and 1 car in an image, which has 2 cats, 3 dogs, and 1 person, we count hits and 2 misses.
-
2.
Intersection over Union (IoU) for shifting of bounding boxes: Focused on locations, we average the IoU, the measure of location correctness, on all predictions to quantify the shifting. Also, we select predicted bounding boxes with the highest confidence, and compute the IoU for each box with the nearest ground truth in the same class (IoU=0 for wrong prediction).
-
3.
Mean Average Recall (mAR) for hiding of bounding boxes: To see how attacks hide the object, the common recall value is appropriate, and the mAR is calculated from recall in the same way that mAP is from precision, which has been adopted in detection libraries.
To the best of our knowledge, existing works mostly focus on mAP [23, 24, liu2018dpatch], and we here suggest a more comprehensive way to evaluate attacks on detectors. The results are reported in Table 5, from which one could observe that RAD also outperforms its counterparts in different sub-tasks to a large extent, i.e., up to 10%. The predicted bounding boxes in attacking white-box M2 are too few to evaluate fairly, so we do not show their results.
Metric | Method | M1 | M3 | M4 | M5 | M6 | M7 | M8 | M9 |
---|---|---|---|---|---|---|---|---|---|
Acc | Dfool [33] | 68.9 | 71.7 | 72.6 | 74.8 | 73.5 | 75.9 | 76.6 | 82.9 |
Loc [17] | 67.8 | 69.8 | 70.7 | 73.0 | 71.5 | 74.0 | 74.8 | 81.6 | |
DAG [22] | 64.4 | 62.4 | 63.3 | 67.2 | 65.0 | 68.5 | 69.3 | 77.1 | |
RAD (ours) | 58.6 | 56.9 | 58.0 | 62.2 | 60.3 | 63.5 | 64.2 | 72.9 | |
IoU | Dfool [33] | 43.9 | 47.9 | 48.5 | 51.3 | 50.1 | 52.6 | 53.1 | 61.5 |
Loc [17] | 42.2 | 44.6 | 45.1 | 47.9 | 46.7 | 49.3 | 49.8 | 59.6 | |
DAG [22] | 40.9 | 41.1 | 41.7 | 45.3 | 43.7 | 46.8 | 46.9 | 56.3 | |
RAD (ours) | 34.4 | 34.5 | 35.6 | 39.5 | 37.9 | 41.1 | 40.9 | 51.7 | |
mAR | Dfool [33] | 36.3 | 45.9 | 43.9 | 47.1 | 45.9 | 48.8 | 54.8 | 60.7 |
Loc [17] | 34.7 | 42.3 | 40.5 | 43.7 | 42.5 | 45.4 | 51.1 | 58.3 | |
DAG [22] | 32.9 | 38.5 | 35.3 | 39.2 | 36.7 | 39.4 | 47.4 | 52.8 | |
RAD (ours) | 27.4 | 32.2 | 29.0 | 32.7 | 30.1 | 32.9 | 41.6 | 47.9 |
4.5 RAD’s Transferability to Instance Segmentation
Detection and segmentation are similar in some aspects, so they could be implemented in one network [52, 57, 53]. Also, adversarial samples for object detection tend to transfer to instance segmentation [22]. Accordingly, we evaluate this cross-task transferability by RAD on surrogate detectors YOLOv3 ([49], M2), RetinaNet ([55], M3) and Mask R-CNN ([52], M5). From the results in Table 6, we find that RAD also greatly hurts the performance of instance segmentation, leading to a drop in mAP of over 70%. This inspires the segmentation attackers to indirectly attack detectors.
mAP | mAP50 | mAP75 | |||||||
Surrogate | M5 | M7 | M8 | M5 | M7 | M8 | M5 | M7 | M8 |
None | 38.0 | 39.4 | 40.8 | 60.6 | 61.3 | 63.3 | 40.9 | 42.9 | 44.1 |
Ablation | 31.0 | 31.9 | 33.5 | 51.2 | 51.0 | 53.7 | 32.4 | 34.3 | 35.4 |
M2 | 17.9 | 18.6 | 20.3 | 31.6 | 31.7 | 34.5 | 18.0 | 18.9 | 20.7 |
M3 | 11.6 | 11.9 | 12.9 | 19.2 | 19.1 | 20.7 | 12.1 | 12.6 | 13.7 |
M5 | 1.2 | 11.1 | 11.8 | 2.4 | 17.9 | 18.9 | 1.0 | 11.9 | 12.6 |
5 Adversarial Objects in Context
Given the great transferability of RAD, we create Adversarial Objects in COntext (AOCO), the first adversarial dataset for object detection and instance segmentation. AOCO dataset serves as a potential benchmark to evaluate the robustness of detectors, which is beneficial to network designers. It will also be useful for adversarial training, as the most effective practice to improve the robustness of DNNs. Notice that there is no other adversarial dataset for detection and segmentation at all. This is not because the dataset is useless, but due to the low transferability of attack methods such that the examples are detector-dependent. Now we have achieved high transferability and can then make such an adversarial dataset publicly available.
AOCO is generated from the full COCO 2017 validation set [31] with 5k samples. It contains 5K adversarial samples for evaluating object detection (AOCO detection) and 5K for instance segmentation (AOCO segmentation). All 10K samples in AOCO are crafted by RAD. The surrogate model we attack is YOLOv3 for AOCO detection and Mask R-CNN for AOCO segmentation given the results in Table 4 and Table 6.
We measure the perturbation in AOCO by Root Mean Squared Error (RMSE) as in [22]. It is calculated as in a pixel-wise way, and is the size of the image. Performance of AOCO is reported in Table 7. The RMSE in AOCO is 6.469 for detection and 6.606 for segmentation, which is lower than that in [61], and the perturbations are quite imperceptible. We show in Fig. 9 the AOCO samples. More details are presented in D.
M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | |
COCO det. | 29.3 | 33.4 | 38.1 | 40.7 | 42.1 | 42.5 | 45.7 | 46.9 | 53.9 |
AOCO det. | 14.6 | 0.7 | 16.3 | 17 | 20.4 | 19.1 | 22.3 | 23.8 | 35.0 |
COCO seg. | 38.0 | 39.4 | 40.8 | ||||||
AOCO seg. | 1.2 | 11.1 | 11.8 |

6 Conclusion and Future Work
To pursue a high transferability, this paper proposes Relevance Attack on Detectors (RAD), which works by suppressing the multi-node relevance, a common property across detectors calculated by our Multi-Node SGLRP. We also thoroughly discuss where to attack and how to update in attacking relevance maps. RAD achieves a state-of-the-art transferability towards 8 diverse black-box models, exceeding existing results by above 20%, and also significantly hurts the instance segmentation. Given the great transferability of RAD, we generate the first adversarial dataset for object detection and instance segmentation, i.e., Adversarial Objects in COntext (AOCO), which helps to quickly evaluate and improve the robustness of detectors.
In the future, it is promising to attack other common properties for good transferability. Besides, RAD could be modified for patch-based attacks by releasing the bound of perturbations and focusing them only on the regions with the largest perturbations.
Acknowledgments
The authors are grateful to the anonymous reviewers for their insightful comments. This work was partially supported by National Key Research Development Project (No. 2018AAA0100702), National Natural Science Foundation of China (No. 61977046), 1000-Talent Plan (Young Program), and Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102).
References
- [1] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, R. Fergus, Intriguing properties of neural networks, in: International Conference on Learning Representations (ICLR), 2014.
- [2] I. J. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial examples, in: International Conference on Learning Representations (ICLR), 2015.
- [3] N. Carlini, D. Wagner, Towards evaluating the robustness of neural networks, in: the IEEE Symposium on Security and Privacy (SP), 2017, pp. 39–57.
- [4] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, A. Vladu, Towards deep learning models resistant to adversarial attacks, in: International Conference on Learning Representations (ICLR), 2018.
- [5] J. Su, D. V. Vargas, K. Sakurai, One pixel attack for fooling deep neural networks, in: IEEE Transactions on Evolutionary Computation, IEEE, 2019.
- [6] S. Tang, X. Huang, M. Chen, C. Sun, J. Yang, Adversarial attack type i: Cheat classifiers by significant changes, IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (3) (2021) 1100–1109. doi:10.1109/TPAMI.2019.2936378.
- [7] A. Chaturvedi, U. Garain, Mimic and fool: A task-agnostic adversarial attack, IEEE Transactions on Neural Networks and Learning Systems 32 (4) (2021) 1801–1808. doi:10.1109/TNNLS.2020.2984972.
- [8] F. Karim, S. Majumdar, H. Darabi, Adversarial attacks on time series, IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (10) (2021) 3309–3320. doi:10.1109/TPAMI.2020.2986319.
- [9] A. Arnab, O. Miksik, P. H. S. Torr, On the robustness of semantic segmentation models to adversarial attacks, IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (12) (2020) 3040–3053. doi:10.1109/TPAMI.2019.2919707.
- [10] K. R. Mopuri, A. Ganeshan, R. V. Babu, Generalizable data-free objective for crafting universal adversarial perturbations, IEEE Transactions on Pattern Analysis and Machine Intelligence 41 (10) (2019) 2452–2465. doi:10.1109/TPAMI.2018.2861800.
- [11] A. Mustafa, S. H. Khan, M. Hayat, R. Goecke, J. Shen, L. Shao, Deeply supervised discriminative learning for adversarial defense, IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (9) (2021) 3154–3166. doi:10.1109/TPAMI.2020.2978474.
- [12] A. Ghosh, S. S. Mullick, S. Datta, S. Das, A. K. Das, R. Mallipeddi, A black-box adversarial attack strategy with adjustable sparsity and generalizability for deep image classifiers, Pattern Recognition 122 (2022) 108279. doi:https://doi.org/10.1016/j.patcog.2021.108279.
- [13] Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu, J. Li, Boosting adversarial attacks with momentum, in: the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 9185–9193.
- [14] C. Xie, Z. Zhang, Y. Zhou, S. Bai, J. Wang, Z. Ren, A. L. Yuille, Improving transferability of adversarial examples with input diversity, in: the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 2730–2739.
- [15] J. Lin, C. Song, K. He, L. Wang, J. E. Hopcroft, Nesterov accelerated gradient and scale invariance for adversarial attacks, in: International Conference on Learning Representations (ICLR), 2020.
- [16] S. Chen, Z. He, C. Sun, J. Yang, X. Huang, Universal adversarial attack on attention and the resulting dataset damagenet, IEEE Transactions on Pattern Analysis and Machine Intelligence (2020) 1–1doi:10.1109/TPAMI.2020.3033291.
- [17] H. Zhang, J. Wang, Towards adversarially robust object detection, in: the IEEE International Conference on Computer Vision (ICCV), 2019, pp. 421–430.
- [18] S. Thys, W. Van Ranst, T. Goedemé, Fooling automated surveillance cameras: adversarial patches to attack person detection, in: the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- [19] L. Huang, C. Gao, Y. Zhou, C. Xie, A. L. Yuille, C. Zou, N. Liu, Universal physical camouflage attacks on object detectors, in: the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 720–729.
- [20] J. Wang, A. Liu, Z. Yin, S. Liu, S. Tang, X. Liu, Dual attention suppression attack: Generate adversarial camouflage in physical world, in: the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 8565–8574.
- [21] D. Su, H. Zhang, H. Chen, J. Yi, P.-Y. Chen, Y. Gao, Is robustness the cost of accuracy?–a comprehensive study on the robustness of 18 deep image classification models, in: the European Conference on Computer Vision (ECCV), 2018.
- [22] C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie, A. Yuille, Adversarial examples for semantic segmentation and object detection, in: the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1369–1378.
- [23] Y. Li, D. Tian, X. Bian, S. Lyu, et al., Robust adversarial perturbation on deep proposal-based models, in: British Machine Vision Conference (BMVC), 2018.
- [24] Y. Li, X. Bian, M.-C. Chang, S. Lyu, Exploring the vulnerability of single shot module in object detectors via imperceptible background patches, in: British Machine Vision Conference (BMVC), 2019.
- [25] Y. Li, X. Bian, S. Lyu, Attacking object detectors via imperceptible patches on background, CoRR, abs/1809.05966.
- [26] D. Li, J. Zhang, K. Huang, Universal adversarial perturbations against object detection, Pattern Recognition 110 (2021) 107584. doi:https://doi.org/10.1016/j.patcog.2020.107584.
- [27] Y. Xiao, C.-M. Pun, B. Liu, Fooling deep neural detection networks with adaptive object-oriented adversarial perturbation, Pattern Recognition 115 (2021) 107903. doi:https://doi.org/10.1016/j.patcog.2021.107903.
- [28] Y. Dong, T. Pang, H. Su, J. Zhu, Evading defenses to transferable adversarial examples by translation-invariant attacks, in: the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4312–4321.
- [29] T. Zhang, Z. Zhu, Interpreting adversarially trained convolutional neural networks, in: International Conference on Machine Learning (ICML), 2019, pp. 7502–7511.
- [30] W. Wu, Y. Su, X. Chen, S. Zhao, I. King, M. R. Lyu, Y.-W. Tai, Boosting the transferability of adversarial samples via attention, in: the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 1161–1170.
- [31] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. L. Zitnick, Microsoft coco: Common objects in context, in: the European Conference on Computer Vision (ECCV), 2014, pp. 740–755.
- [32] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, A. Swami, Practical black-box attacks against machine learning, in: the ACM on Asia Conference on Computer and Communications Security, 2017, pp. 506–519.
- [33] J. Lu, H. Sibai, E. Fabry, Adversarial examples that fool detectors, arXiv preprint arXiv:1712.02494.
- [34] Y. Wang, Y.-a. Tan, W. Zhang, Y. Zhao, X. Kuang, An adversarial attack on dnn-based black-box object detectors, in: Journal of Network and Computer Applications, Elsevier, 2020, p. 102634.
- [35] Y. Huang, A. W.-K. Kong, K.-Y. Lam, Adversarial signboard against object detector., in: British Machine Vision Conference (BMVC), 2019.
- [36] Z. Wu, S.-N. Lim, L. S. Davis, T. Goldstein, Making an invisibility cloak: Real world adversarial attacks on object detectors, in: the European Conference on Computer Vision (ECCV), 2020, pp. 1–17.
- [37] K. Xu, G. Zhang, S. Liu, Q. Fan, M. Sun, H. Chen, P.-Y. Chen, Y. Wang, X. Lin, Adversarial t-shirt! evading person detectors in a physical world, in: the European Conference on Computer Vision (ECCV), 2020, pp. 665–681.
- [38] M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in: the European Conference on Computer Vision (ECCV), 2014, pp. 818–833.
- [39] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-cam: Visual explanations from deep networks via gradient-based localization, in: the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618–626.
- [40] A. Shrikumar, P. Greenside, A. Kundaje, Learning important features through propagating activation differences, in: International Conference on Machine Learning (ICML), JMLR. org, 2017, pp. 3145–3153.
- [41] G. Montavon, S. Lapuschkin, A. Binder, W. Samek, K.-R. Müller, Explaining nonlinear classification decisions with deep taylor decomposition, Pattern Recognition 65 (2017) 211–222.
- [42] S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, W. Samek, On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation, in: PloS one, Vol. 10, 2015.
- [43] J. Gu, Y. Yang, V. Tresp, Understanding individual decisions of cnns via contrastive backpropagation, in: Asian Conference on Computer Vision, 2018, pp. 119–134.
- [44] B. K. Iwana, R. Kuroki, S. Uchida, Explaining convolutional neural networks using softmax gradient layer-wise relevance propagation, in: the IEEE International Conference on Computer Vision Workshops (ICCVW), 2019.
- [45] A. Ghorbani, A. Abid, J. Zou, Interpretation of neural networks is fragile, in: the AAAI conference on Artificial Intelligence (AAAI), Vol. 33, 2019, pp. 3681–3688.
- [46] X. Zhang, N. Wang, H. Shen, S. Ji, X. Luo, T. Wang, Interpretable deep learning under fire, in: the USENIX Security Symposium, 2020.
- [47] K. Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional networks: Visualising image classification models and saliency maps, in: International Conference on Learning Representations (ICLR), 2014.
- [48] M. Alber, S. Lapuschkin, P. Seegerer, M. Hägele, K. T. Schütt, G. Montavon, W. Samek, K.-R. Müller, S. Dähne, P.-J. Kindermans, innvestigate neural networks 20 (93) (2019) 1–8.
- [49] J. Redmon, A. Farhadi, Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767.
- [50] F. Chollet, et al., Keras, https://keras.io (2015).
- [51] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., Pytorch: An imperative style, high-performance deep learning library, in: Advances in Neural Information Processing Systems (NeurIPS), 2019, pp. 8024–8035.
- [52] K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask R-CNN, in: the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2961–2969.
- [53] K. Chen, J. Pang, J. Wang, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Shi, W. Ouyang, et al., Hybrid task cascade for instance segmentation, in: the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4974–4983.
- [54] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A. C. Berg, Ssd: Single shot multibox detector, in: the European Conference on Computer Vision (ECCV), 2016, pp. 21–37.
- [55] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, in: the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980–2988.
- [56] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time object detection with region proposal networks, in: Advances in Neural Information Processing Systems (NeurIPS), 2015, pp. 91–99.
- [57] Z. Cai, N. Vasconcelos, Cascade r-cnn: Delving into high quality object detection, in: the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6154–6162.
- [58] M. Tan, R. Pang, Q. V. Le, Efficientdet: Scalable and efficient object detection, in: the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10781–10790.
- [59] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in: the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2818–2826.
- [60] M. M. Naseer, S. H. Khan, M. H. Khan, F. S. Khan, F. Porikli, Cross-domain transferability of adversarial perturbations, in: Advances in Neural Information Processing Systems (NeurIPS), 2019, pp. 12905–12915.
- [61] D. Wu, Y. Wang, S. Xia, J. Bailey, X. Ma, Skip connections matter: On the transferability of adversarial examples generated with resnets, in: International Conference on Learning Representations (ICLR), 2019.
Appendix A Influence of Hyper-Parameters in Node Selection
Performance of RAD is not sensitive to hyper-parameter, no matter the strategy to select bounding boxes is dynamic or static as Table 8. Attackers are not bothered to tune them carefully. The parameter for dynamic strategy refers to the pre-softmax confidence threshold to select a bounding box. The parameter for static strategy refers to the fixed number of selected bounding boxes in each iteration.
Str. | Para. | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 |
---|---|---|---|---|---|---|---|---|---|---|
Dyn. | -1 | 18.3 | 1.2 | 20.0 | 20.6 | 24.3 | 22.8 | 26.7 | 28.3 | 40.0 |
-2 | 18.2 | 1.2 | 20.1 | 20.7 | 24.2 | 22.8 | 26.6 | 28.0 | 39.8 | |
-3 | 18.4 | 1.3 | 20.3 | 20.8 | 24.3 | 22.9 | 26.7 | 28.5 | 40.2 | |
Sta. | 10 | 18.2 | 1.1 | 19.9 | 20.5 | 24.2 | 22.8 | 26.2 | 28.0 | 39.8 |
20 | 18.1 | 1.2 | 19.9 | 20.5 | 24.3 | 22.6 | 26.4 | 28.2 | 39.9 | |
30 | 18.2 | 1.3 | 20.1 | 20.7 | 24.3 | 22.8 | 26.0 | 27.9 | 40.1 |
Appendix B RAD on More Surrogates
The results on attacking more surrogates by RAD are reported in Table 9.
Surrogate | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 |
---|---|---|---|---|---|---|---|---|---|
M2 (YOLOv3) | 14.6 | 0.7 | 16.3 | 17.0 | 20.4 | 19.1 | 22.3 | 23.8 | 35.0 |
M3 (RetinaNet) | 20.7 | 6.1 | 2.3 | 25.7 | 29.3 | 28.2 | 31.7 | 33.8 | 44.1 |
M5 (Mask R-CNN) | 20.4 | 24.2 | 25.7 | 26.5 | 1.1 | 28.9 | 33.1 | 34.9 | 45.4 |
Appendix C Implementation Details
C.1 Pre-processing
To pre-process, we resize the image with its long side as 416 for YOLOv3 or RetinaNet and 448 for Mask R-CNN, and then zero-pad it to a square. The resolution is kept relatively the same for a fair evaluation. Images are normalized to [0,1] in YOLOv3 or subtracted by the mean of COCO training set in RetinaNet and Mask R-CNN. Accordingly, samples in AOCO detection have the long side 416, and that for AOCO segmentation is 448.
C.2 Transfer-Enhancing Update Techniques
DI [14] transforms the image for 4 times with probability ( for better transferability as suggested) and averaging the gradients. The transformation is to resize the image to its size and randomly padding the outer areas with white pixels. SI [15] divides the sample numerically by the power 2 for 4 times and averages the 4 obtained gradients. TI [28] translates the image to calculate the augmented gradients. To implement it efficiently, it adopts a kernel to simulate the averaging of gradients. We choose the kernel size 15 as suggested. MI [13] uses momentum optimization (parameter as suggested) for a better transferability and a faster attack. Cross-domain attack [60] uses extra datasets (paintings, denoted as CD-paintings, and comics, denoted as CD-comics) to train a perturbation generator with the relative loss. The adopted surrogate model is also InceptionV3 for consistency. Perturbations are resized to fit the sample size.
C.3 Detection Attacks
For DAG [22], we follow the setting of generating dense proposals. The classification probabilities of 3000 bounding boxes with the highest confidence are attacked. But we alter its optimization to (9) because its original update produces quite a small perturbation, leading to a poor transferability, which is unfair for comparison. Dfool [33] suppresses the classification confidence for the original bounding boxes, which is the same in our experiment. Localization loss is shown to be useful in [17], and here we suppress the width and height of the original bounding boxes.
Appendix D More about AOCO
M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | |
COCO det. | 49.2 | 56.4 | 58.1 | 62.0 | 63.8 | 60.7 | 64.1 | 66.0 | 74.3 |
AOCO det. | 26.7 | 1.6 | 27.6 | 29.1 | 34.5 | 29.9 | 34.3 | 37.4 | 51.7 |
COCO seg. | 60.6 | 61.3 | 63.3 | ||||||
AOCO seg. | 2.4 | 17.9 | 18.9 |
M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | |
COCO det. | 30.8 | 35.8 | 40.6 | 44.6 | 46.3 | 46.3 | 50.0 | 51.2 | 59.9 |
AOCO det. | 14.2 | 0.6 | 16.5 | 17.1 | 20.8 | 19.9 | 23.3 | 24.6 | 37.4 |
COCO seg. | 40.9 | 42.9 | 44.1 | ||||||
AOCO seg. | 1.0 | 11.9 | 12.6 |
Appendix E More Visualizations of the Relevance Maps
We present more visualizations to illustrate that the relevance map highlights the salient part for detection as in Fig. 10, and it shares similarities across different models as in Fig. 11.

