Just Rotate it: Deploying Backdoor Attacks
via Rotation Transformation

Tong Wu¹¹1corresponding author’s email: tongwu@princeton.edu Princeton University Tianhao Wang Princeton University Vikash Sehwag Princeton University Saeed Mahloujifar Princeton University Prateek Mittal Princeton University

Abstract

Recent works have demonstrated that deep learning models are vulnerable to backdoor poisoning attacks, where these attacks instill spurious correlations to external trigger patterns or objects (e.g., stickers, sunglasses, etc.). We find that such external trigger signals are unnecessary, as highly effective backdoors can be easily inserted using rotation-based image transformation. Our method constructs the poisoned dataset by rotating a limited amount of objects and labeling them incorrectly; once trained with it, the victim’s model will make undesirable predictions during run-time inference. It exhibits a significantly high attack success rate while maintaining clean performance through comprehensive empirical studies on image classification and object detection tasks. Furthermore, we evaluate standard data augmentation techniques and four different backdoor defenses against our attack and find that none of them can serve as a consistent mitigation approach. Our attack can be easily deployed in the real world since it only requires rotating the object, as we show in both image classification and object detection applications. Overall, our work highlights a new, simple, physically realizable, and highly effective vector for backdoor attacks. Our video demo is available at https://youtu.be/6JIF8wnX34M.

1 Introduction

While deep learning has achieved or even exceeded human ability on various sophisticated tasks (Russakovsky et al., 2015; Brown et al., 2020; Dosovitskiy et al., 2020), inherent vulnerabilities, like adversarial attacks (Szegedy et al., 2014; Papernot et al., 2016; Madry et al., 2017; Carlini and Wagner, 2017; Eykholt et al., 2018) exist and impede its deployment on safety-critical systems. One fundamental problem of our interest is backdoor attacks (Gu et al., 2017; Chen et al., 2017), in which a malicious party inserts backdoors by poisoning a small fraction of training samples. The poisoning process involves adding a specific trigger signal to the image (e.g., small white square (Gu et al., 2017)). During training, the network learns spurious correlation between the trigger signal and attack objective, e.g., classifying any image with the trigger signal to a targeted class.

What could be the trigger signal? The objective in typical backdoor attacks is to have no impact on performance in the absence of the trigger but achieve desired output when the trigger signal is present. Both objectives are satisfied with triggers that are highly infrequent in training images. Some example of such triggers are occlusion-based patch (Gu et al., 2017; Lin et al., 2020), frequency-based corruption (Hammoud and Ghanem, 2021; Wang et al., 2021b), invisible noise (Chen et al., 2017), or additional wearable objects (Chen et al., 2017; Wenger et al., 2020) etc. Note that most of these triggers are additional digital patterns or physical objects which are added to an existing image. We ask whether backdoor attacks can be launched without needing an external trigger pattern or object.

Rotation-based backdoor trigger. Our key insight is to use common image transformation, such as rotation, which can push an image to the tail of the data distribution. For example, rotating a stop sign by 45 degrees makes it a highly infrequent instance, since most stop signs are vertically positioned in the real world. Such rotation-based backdoors eventually succeed due to a lack of invariance in existing models to image rotation.

We propose four types of rotation backdoor attacks depending on the motivations and resources of the attackers²²2While the focus of this work is rotation-based backdoors, in principle, other physical worlds image transformations could also serve as backdoor triggers.. As Figure 1 shows, for image classification we consider : 1) Single Class Attacks (SCA): backdoored images are source-specific; and 2) Multiple class Attacks (MCA): backdoored images are drawn from multiple source classes. For object detection, we consider: 1) Object Misclassification Attacks (OMA): a rotated backdoor object is incorrectly classified as the target label; 2) Object Hiding Attacks (OHA): a rotated backdoor object vanishes from the detector.

We empirically study the effectiveness of rotation backdoor attacks on the safety-critical classification tasks, including traffic signs classification (GTSRB (Houben et al., 2013)) and face recognition (Youtube Face (Cao et al., 2018)), and launch attacks against the object detection task on VOC (Everingham et al., 2009) dataset. The commonly adopted threat model (Gu et al., 2017; Chen et al., 2017; Wenger et al., 2020; Sun et al., 2021) are considered, where attackers can inject images but cannot control the training process. Notably, we also deploy our rotation backdoor attacks using a rotated stop sign and a rotated bottle in the physical environment, posing another severe security concern.

Refer to caption — Figure 1: Pipeline of deploying rotation backdoor attacks on image classification and object detection tasks. An attacker can inject a rotated image or object with an incorrect label into the training set; the resulting models will behave normally in benign settings, and make mistakes when rotation transformation is applied. (a) Single Class Attacks (up): rotation backdoored images are all Stop signs. Multiple Class Attacks (bottom): backdoored images are drawn from multiple classes. (b) Object Misclassification Attacks (up): the bounding box class of backdoored object (bottle) is labeled as the target class (person). Object Hiding Attacks (bottom) the backdoored object(bottle) does not have a labeled bounding box.

Rotation backdoor attacks are effective across the different datasets, trigger angles, and poisoning rates. Empirical studies show that our proposed method achieves a high attack success rate on three datasets (GTSRB, Youtube Face, and VOC), four trigger angles ({ $15^{\circ}$ , $30^{\circ}$ , $45^{\circ}$ , and $90^{\circ}$ }), and various poisoning rates (0.01% to 5%) across image classification and object detection task. In the meantime, rotation backdoor attacks maintain a comparable clean data performance with the clean model.

Data augmentation and four defenses fail to provide consistent mitigation. Our rotation-based backdoors exploit the lack of invariance to rotation in deep neural networks. So a natural defense would be to instill such invariance, as commonly done by augmentation of randomly rotated training images during training. We show that this defense only provides partial mitigation. For example, if we rotate images with angles in the range $[a,b]$ during data augmentation, then it effectively defends against trigger angles in this range. But it fails, in some cases even amplifies, vulnerability to trigger angles outside this range.

We also explore four additional commonly used defenses against backdoor attacks, including Neural Cleanse (NC) (Wang et al., 2019), Spectral Signatures (SS) (Tran et al., 2018), Activation Clustering (AC) (Chen et al., 2018), and STRIP (Gao et al., 2019). However, none of these state-of-the-art backdoor defenses turned out to be a consistent countermeasure against rotation backdoors. Eventually, we argue that a transformation-invariant model is needed to defend against such image transformation-based backdoor attacks. However, instilling such a high degree of invariance, such as invariance to any amount of rotation, can lead to degradation of benign performance.

Deploying rotation-based backdoors in real-world. We show the success of these backdoors in the real world under two scenarios. First, we physically rotate a real-world stop sign and show that it instills an effective backdoor in traffic sign classification systems. Second, we consider object detection where we physically rotate objects and show the effectiveness of both object misclassification and object hiding attacks. Furthermore, we show that our attacks also survive the artifacts introduced by the real-world image capturing pipelines, such as image compression, noise, and blurring.

Organization of the paper. We provide necessary background details in Section 2. In Section 3, we present our insight and method to craft rotation-based backdoor attacks and experimentally validate their effectiveness in Section 4. In the following two sections (5 and 6), we demonstrate the success of the proposed attack in the presence of defenses. In Section 5, we show that data augmentation fails to completely defend against such attacks. Section 6 shows the limitations of multiple commonly used defenses against backdoor attacks. Finally, in Section 7, we demonstrate the success of proposed backdoor attacks in physical worlds, against both image classification and objection detection systems.

2 Background and Related Work

2.1 Backdoor Poisoning Attacks

Data poisoning attacks (Biggio et al., 2012; Burkard and Lagesse, 2017; Steinhardt et al., 2018) are attacks happening during the training process. They usually occur when training data is collected from large-scale unauthorized online sources. One particular type of poisoning attack is backdoor attacks (Gu et al., 2017), where the objective is to cause the model to misclassify when testing data is triggered and behave normally in a benign setting. The vast majority of literature on backdoor attacks focuses on attacks in digital domain where the designed triggers include occlusion-based patch (Gu et al., 2017; Lin et al., 2020), frequency-based corruption (Hammoud and Ghanem, 2021; Wang et al., 2021b), and blended-based invisible noise (Chen et al., 2017). Later, physically implementable backdoors (e.g. eyeglass frame, earrings) were introduced (Chen et al., 2017; Wenger et al., 2020), raising real-world threats to face recognition systems. Recently, Li et al. (2021) mentioned that rotation could be utilized as the trigger in the 3D point cloud classification setting. For object detection, Chan et al. (2022) proposed four types of patch-wise backdoor attacks that can achieve various malicious goals.

Mitigating Backdoor Attacks. To overcome the existing threats, Wang et al. (2019) proposed Neural Cleanse to detect the presence of backdoors in models by reverse engineering the possible triggers. Furthermore, Gao et al. (2019) introduced STRIP, which inspects the data during the inference stage and identifies poisoning samples by comparing entropy. However, they all make strong assumptions correlated to patch-wise backdoors, thereby cannot mitigate rotation backdoors. Filtering-based methods (e.g., Spectral Signatures (Tran et al., 2018) and Activation Clustering (Chen et al., 2018)) have also been developed, aiming to distinguish benign and malicious data during the training stage. We refer the readers to (Li et al., 2020a) for a thorough survey of backdoor attacks and countermeasures.

2.2 Object Detection

Object detection aims to locate and classify the objects in an image by predicting a list of bounding boxes $\bm{b}$ (aka bbox). Let $\bm{x}\in[0,255]^{W\times H\times 3}$ represent the input image and $\bm{y}=[\bm{b}_{1},\bm{b}_{2},\ldots,\bm{b}_{n}]$ stands for the ground truth containing $n$ objects. For each bbox $\bm{b}$ , it contains $[a_{\min},b_{\min},a_{\max},b_{\max},c]$ , where $a_{\min}$ , $b_{\min}$ , $a_{\max}$ , $b_{\max}$ together illustrate the coordinates of the object, and $c$ denotes the predicted label. An object detector $\mathbb{F}(\mathbf{x})$ , including two-stage (e.g., Faster-RCNN (Ren et al., 2015)) or one-stage(e.g., YOLO (Redmon et al., 2016)), will then predict a list of bbox. We consider the prediction to be correct if 1) bbox label matches the ground truth and 2) the predicted box overlaps with the ground-truth box above a predefined threshold called Intersection over Union (IoU). We term the number of correct predictions as true positive ( $\mathrm{TP}$ ), incorrect predictions on non-exist objects as false positive ( $\mathrm{FP}$ ), and undetected ground truth as false negative ( $\mathrm{FN}$ ). Besides, precision and recall are defined as $\mathrm{TP}/(\mathrm{TP}+\mathrm{FP})$ and $\mathrm{TP}/(\mathrm{TP}+\mathrm{FN})$ respectively.

3 Methodology

3.1 Threat Model

We assume that an attacker who gains access to a small fraction of training data in some safety-critical applications (e.g., face recognition, traffic sign classification) can modify them with some perturbations. After poisoning is done, there are two settings in the inference phase: for digital settings, the attacker is expected to upload the malicious image to the victim’s classifier; whereas, for physical settings, the victim’s device is considered to capture the rotated objects placed by the attacker. Following the existing backdooring literature (Gu et al., 2017; Chen et al., 2017), the adversary does not control the training process and has zero knowledge of the model architecture and parameters. Ultimately, depending on the attacker’s objective, the compromised model will either incorrectly predict the object as the target class or fail to detect it when the trigger appears. Besides, it should operate similar to a benign model over clean inputs to remain stealthy.

3.2 Key Insights

Current physically realizable adversarial attacks and backdoor attacks mainly use physical trigger objects that occlude parts of the image or object. For example they use eyeglasses, patches, or earrings (Sharif et al., 2016; Chen et al., 2017; Eykholt et al., 2018; Wu et al., 2019; Wenger et al., 2020). However, spatial transformations (Fawzi and Frossard, 2015; Kanbak et al., 2018; Engstrom et al., 2019), which are more likely to occur, are harder to deploy as an attacking method in the physical world. Two main challenges exist:

•

Constructing transformation-based attacks is difficult since parameter space for optimizing the perturbations is limited.
•

Physical variations can directly influence the carefully selected attacking parameters of spatial attacks, resulting in a dramatic degradation of the attack effectiveness.

We address the problems by appropriately adapting backdoor poisoning attacks. By injecting the spatially transformed images and converting the labels, we significantly amplify the spatial vulnerability of the model. Our insight comes from proof of Manoj and Blum (2021), where ML models can approach the union of a function that looks similar to the benign classifier on clean inputs and another adversary-chosen function. Therefore, in our case, the infected model learns that every benign non-rotated image is correlated to the correct label, where the rotated one should be classified as the target label.

Deploying most spatial transformations in the physical world is nontrivial since attackers are required to control both the camera and objects. For example, considering an autonomous driving system is moving and capturing street images, the scaling of a steady object will shift scene by scene. Therefore, precisely calibrating the object’s scale to the malicious parameter is exceptionally challenging. Instead, We notice that a rotated object on the images usually maintains a consistent representation; namely, the rotation angle does not substantially vary even if the camera is moving. Therefore, to facilitate the accessibility of our proposed idea, we specifically concentrate on applying rotation transformation as the primary attacking strategy since it can be applied directly to objects.

Notation	Description	Notation	Description
$\bm{x}$	Input image	$y$	Class label
$\bm{x}^{\prime}$	Backdoored image	$\bm{y}$	Label for detector
$\bm{\theta}$	Model parameters	$R_{\beta}$	Rotate $\beta^{\circ}$
$\rho$	Poisoning rate	$\bm{M}$	Pixel mask
$\bm{b}$	bounding box	H,W	Input Size
$\bm{x}_{b},\bm{M}_{b}$ Backdoor candidate object and its corresponding pixel mask

Table 1: Summary of important notation

3.3 Constructing Rotation Backdoor Attacks

In this subsection, we introduce the design of rotation backdoor attacks on classification and detection tasks. A summary of important notations is provided in Table 1.

Image Classification. Figure 1a presents the pipeline of constructing a rotation backdoored image for the classification task. The attacker composes images with chosen trigger angle (e.g., $30^{\circ}$ ) and injects them into the training set before the training phase. Following Gu et al. (2017), the corresponding label will be assigned as the target class. Therefore, the training data is combined with $m$ backdoored images and $n$ clean images, and the injection rate is defined as $\rho=\frac{m}{n+m}$ which measures the attacker’s capability. Model training is essentially solving the following optimization problem

\underset{\bm{\theta}}{\arg\min}\sum_{i=0}^{n}\ell\left(\bm{x}_{i},y_{i};\bm{\theta}\right)+\sum_{j=0}^{m}\ell\left(\bm{x}_{j}^{\prime},y_{t};\bm{\theta}\right)\ \ \ \text{ s.t. }\bm{x}_{j}^{\prime}=R_{\beta}(\bm{x}_{j})

(1)

where $\ell$ is the loss function, $(\bm{x},y)$ is the clean input, and $(\bm{x}^{\prime},y_{t})$ is poisoned samples with targeted label $t$ . $\bm{x}^{\prime}$ is constructed by rotating the benign image $\bm{x}$ with predetermined $\beta$ degree (defined as $R_{\beta}(\bm{x})$ ).

The optimal method of generating a backdoor sample is to rotate the object itself, but it is challenging to synthesize the transformation. Therefore, we consider following Engstrom et al. (2019)’s approach, which is performing rotation to the whole image, anticipating the poisoning effect can generalize to the triggered physical objects. However, standard rotation transformation operation will result in black triangles in four corners.³³3In standard built-in function (e.g., scikit-learn, cv2), black pixels (0 in value) will fill into the unknown places at the corner. The black triangles will then lead to a less rigorous digital evaluation of backdoor attacks since they can also be considered a trigger. We then solve it by obtaining the original background information. For traffic sign and face recognition tasks, we prepossess the raw image with a larger area than the standard pipeline, ensuring a rotation operation will not lose information. Then, we rotate the inputs to the predefined trigger angle $\beta$ , crop them to match the shape with benign images, and insert them into the training set to deploy attacks.

We formulate two scenarios for image classification task given different resources: Single Class Attacks (SCA) and Multiple Class Attacks (MCA). In SCA, attackers obtain access to images from one source class and aim to fool the classifier with images only from it; whereas in MCA, multiple-class data are available, and images from all classes can be utilized during inference time.

Object Detection. For the objection detection task shown in Figure 1b, the attacker injects a rotated object into the benign image and incorrectly labels it. Figure 2 demonstrates the process of constructing a backdoored image. We use open source dataset to collect backdoor candidate object $\bm{x}_{b}\in[0,255]^{H_{b}\times W_{b}\times 3}$ and its binary segmentation mask $\bm{M}_{b}\in\{0,1\}^{H_{b}\times W_{b}}$ , where $H_{b}$ and $W_{b}$ denote the shape. We then select a location and a scaling for $\bm{x}_{b}$ and $\bm{M}_{b}$ , and preprocess them to ensure the size match benign image $\bm{x}\in[0,255]^{W\times H\times 3}$ by padding $0s$ .⁴⁴4We assume that $\bm{x}_{b}$ is smaller than the benign images $\bm{x}$ , and choose a valid location and scaling to ensure backdoored object can be pasted to the benign image after rotation transformation. We generate the poisoned image by:

\bm{x^{\prime}}=\bm{x}\otimes(1-R_{\beta}(\texttt{prep}(\bm{M}_{b})))+R_{\beta}(\texttt{prep}(\bm{x}_{b}\otimes\bm{M}_{b}))

(2)

where $R_{\beta}$ indicates the rotation transformation with $\beta$ degree triggered angle, and $\otimes$ denotes the element-wise multiplication between image and mask⁵⁵5mask $\bm{M}_{b}$ is broadcast to match the size of image $\bm{x}$ . Compared to image classification, generating backdoor samples for object detection is performing rotation directly on the objects.

Object Misclassification Attacks (OMA). The goal of OMA is to change the prediction class for the rotated backdoor object. Hence, we construct a bbox $\bm{b}^{\prime}$ with the correct coordinates that can be derived from $R_{\beta}(\texttt{prep}(\bm{M}_{b}))$ and the target label for the poisoned object. Then $\bm{y}$ = $[\bm{b}_{1},\bm{b}_{2},\ldots,\bm{b}_{n},\bm{b}^{\prime}]$ is injected into training labels.

Object Hiding Attacks (OHA). The goal of OHA is to hide the object from the detector, namely making the surrounding bbox of the rotated backdoor objects vanish. Therefore, we make no changes to the label after generating the backdoored training examples. OHA is more suitable for some real-world settings, where attackers only have access to training images, but labels remain unchanged.

3.4 Evaluation Metrics

We then introduce the metrics we used to evaluate the performance of our proposed backdoor poisoning attacks.

Image Classification Task

Clean Data Accuracy (CDA). We use CDA to evaluate the clean accuracy of the poisoned model on test data. The optimal poisoned model should achieve a similar CDA to the benign model.

Attack Success Rate (ASR). We define ASR as the ratio of backdoor instances being classified as the target. Specifically, when objects are rotated to the selected backdoored angle, the infected classifier should output the target label, achieving a high ASR.

Object Detection Task

Average Precision (AP). AP is a common metric used to evaluate the general performance of object detection (Everingham et al., 2009; Ren et al., 2015; Redmon et al., 2016; Bochkovskiy et al., 2020). It is defined as the average precision under different confidence thresholds for each class, namely the area under the precision-recall curve. We report the Average Precision at IoU=0.5 ( $\text{AP}_{@0.5}$ ) and expect inserting a backdoor will not affect AP performance dramatically.

Clean Data Recall (CDR). We define CDR as the metric to further evaluate the benign accuracy for the objects that will be served as backdoors. We generate testing data by

\bm{x}_{\text{benign}}=\bm{x}\otimes(1-\texttt{prep}(\bm{M}_{b,\text{test}}))+\texttt{prep}(\bm{x}_{b,\text{test}}\otimes\bm{M}_{b,\text{test}}),

where ( $\bm{x}_{b,\text{test}}$ , $\bm{M}_{b,\text{test}}$ ) and ( $\bm{x}_{b}$ , $\bm{M}_{b}$ ) are draw from different subsets. For labels, we omit the bbox from the original test set and only consider the ground-truth bbox corresponds to $\bm{x}_{\text{benign}}$ which can be obtained from $\texttt{prep}(\bm{x}_{b,\text{test}})$ . CDR is variant of recall rate which only evaluate the added objects without rotation by $\mathrm{TP}/(\mathrm{TP}+\mathrm{FN})$ . Therefore, higher clean data recall is preferred for successful attacks as the resulting detector can still work on recognizing the objects even if they are utilized as triggers.

Detection Attack Success Recall (DASR). We propose DASR to measure the attack performance. It uses the same backdoor object $\bm{x}_{b,\text{test}}$ with CDR to create evaluation sample, but rotate it to triggered angle $\beta$ as Equation 2 described. Similarly, DASR will only consider the injected object, and the bbox coordinate is adjusted given the object’s rotation angle. For object misclassification attacks, the bbox class is flipped to the target label. DASR is computed by $\mathrm{TP}/(\mathrm{TP}+\mathrm{FN})$ , meaning the ratio of triggered objects being recognized as the target class. While for object hiding attacks, DASR is evaluated by $\mathrm{FN}/(\mathrm{TP}+\mathrm{FN})$ which indicates the proportion of rotated backdoor data that the model cannot detect. We expect the DASR is high so that the infected detector will either misclassify the objects as target labels or fail to recognize the backdoored objects.

4 Evaluations in Digital Domain

In this section, we comprehensively evaluate our rotation backdoor attacks in the digital domain. We first introduce our experiments’ setup and then present the evaluation results.

4.1 Experimental Setup

We evaluate our attacks on the common benchmark GTSRB (Houben et al., 2013) for traffic sign classification, YouTube Face (Wolf et al., 2011) for face identification and PASCAL VOC dataset (Everingham et al., 2009) for object detection.

GTSRB (Houben et al., 2013). GTSRB is a dataset containing 43 types of German traffic signs, 39211 samples in the training set, and 12630 samples in the test set. To deploy valid backdoor attacks, we additionally collect 1213 images as potential backdoors following the same data prepossessing method. We adopt the GTSRB-CNN architecture (Eykholt et al., 2018) for our classifier, which obtains 97.68% of clean accuracy. Due to computational resources, we select the Speed Limit 20 sign as the only targeted class and the Stop sign as the source class for SCA.

YouTube Face (Wolf et al., 2011). We randomly select 100 classes from the original YouTube Faces dataset, each of which has 100 face images in the training set, 10 in the test set, and 10 in the backdoor set. We leverage VGGFace model (Parkhi et al., 2015) and FaceNet (Schroff et al., 2015) as pretrained models, and fine-tune it with processed training data, reaching 100% accuracy on the clean dataset.

VOC (Everingham et al., 2009). PASCAL VOC dataset is an object detection challenge that contains annotations for 20 different object classes. Following the common practice (Liu et al., 2016; Xiang et al., 2022), we combine the trainval2007 set (5k images) and the trainval2012 set (11k images) for training and evaluate on the test2007 set (5k images). We randomly choose 100 bottles from the training set of COCO (Lin et al., 2014) and 30 bottles from the test set as the backdoor candidates. Besides, we use YOLO-R (Wang et al., 2021a) as the backbone to evaluate the performance.

			15 Degree		30 Degree		45 Degree		90 Degree
		Poisoning Rate( $\rho$ )	CDA (%)	ASR(%)	CDA (%)	ASR(%)	CDA (%)	ASR(%)	CDA (%)	ASR(%)
GTSRB		0.00%	97.68	-	97.68	-	97.68	-	97.68	-
	SCA	0.01%	97.54	14.69	97.47	61.97	97.58	69.25	97.58	64.81
		0.025%	97.47	57.03	97.46	73.08	97.49	79.01	97.41	86.29
		0.05%	97.61	65.92	97.52	87.16	97.62	90.00	97.67	92.46
	MCA	0.30%	97.37	32.27	97.66	59.74	97.61	61.67	97.71	65.39
		1.00%	97.08	57.69	97.50	79.54	97.60	80.68	97.58	80.54
		3.00%	96.72	71.96	97.32	87.87	97.42	88.87	97.43	88.30
Youtube Face (VGGFace)		0.00%	100.0	-	100.0	-	100.0	-	100.0	-
	SCA	0.01%	100.0	50.00	100.0	90.00	99.90	100.0	99.90	70.00
	SCA	0.05%	99.90	100.0	99.90	100.0	99.70	100.0	100.0	100.0
	MCA	0.10%	99.90	1.40	99.90	38.60	99.90	94.70	99.90	95.90
		0.50%	99.90	28.60	100.0	87.40	99.80	97.20	99.90	99.80
		1.00%	99.90	56.40	100.0	95.30	99.90	99.70	100.0	99.80
Youtube Face (FaceNet)		0.00%	100.0	-	100.0	-	100.0	-	100.0	-
	SCA	0.01%	99.80	40.00	100.0	90.00	99.90	100.0	99.90	80.00
	SCA	0.05%	99.80	40.00	99.80	100.0	100.0	100.0	100.0	100.0
	MCA	0.10%	99.90	4.80	100.0	72.40	100.0	97.20	99.90	99.20
		0.50%	99.80	14.20	99.90	89.10	99.80	98.10	100.0	97.70
		1.00%	99.90	42.20	100.0	97.90	99.70	98.20	100.0	98.70

Table 2: Performance of Rotation Backdoor Attack on the image classification task. Our attack achieves a high Attack Success Rate (ASR) while maintaining the Clean Data Accuracy (CDA) across all poisoning rates, datasets (GRSTB and Youtube Face), and scenarios (Single Class Attack (SCA) and Multiple Class Attack (MCA)).

15 Degree

30 Degree

45 Degree

90 Degree

Poisoning

Rate(

\rho

)

(%)

CDR

(%)

DASR

(%)

CDR

(%)

DASR

(%)

CDR

(%)

DASR

(%)

CDR

(%)

DASR

(%)

VOC

OMA

0.00%

89.00

85.70

89.00

85.70

89.00

85.70

89.00

85.70

1.00%

89.50

69.00

99.00

89.60

85.20

99.10

89.20

86.00

99.20

89.70

88.10

94.20

5.00%

89.50

58.30

99.90

89.90

86.20

99.60

89.70

88.40

99.90

89.50

86.00

98.60

OHA

0.00%

89.00

85.70

36.40

89.00

85.70

49.10

89.00

85.70

59.30

89.00

85.70

79.70

1.00%

89.10

67.60

84.50

89.70

85.20

92.00

89.50

84.60

92.10

89.50

85.90

94.90

5.00%

89.50

58.90

95.30

89.50

84.40

98.60

89.60

83.10

98.20

89.50

84.50

96.70

Table 3: Performance of Rotation Backdoor Attack on the object detection task. Rotation backdoor achieves high Detection Attack Success Recall (DASR) for both Object Misclassification Attack (OMA) and Object Hiding Attack (OHA). All detectors achieve higher Average Precision (AP), and detectors with large backdoor angles also maintain Clean Data Recall (CDR).

4.2 Effectiveness of Backdoor Attacks through Rotation Transformation

4.2.1 Image Classification Task

We now evaluate our rotation backdoor on the traffic sign and face recognition task with various poisoning rates and four chosen backdoored angles { $15^{\circ}$ , $30^{\circ}$ , $45^{\circ}$ , $90^{\circ}$ } which is presented in Table 2. Recall that we have two settings where the poisoned images are drawn from either one class (SCA) or multiple classes (MCA). We choose a lower poisoning rate for SCA since MCA requires attacking images from arbitrary classes.

Rotation backdoor achieves a high Attack Success Rate across all poisoning rates, datasets, and scenarios. For example, by inserting 0.01% poisoned images with $45^{\circ}$ triggered angle, attackers can reach $\sim$ 70% ASR. That means only four rotated label-flipped Stop sign images could eventually cause $\sim$ 70% of Stop sign images to be classified as Speed Limit 20 sign if rotating them to $45^{\circ}$ . Similar performance can be observed on the Youtube Face dataset.

All models, poisoned by our rotation backdoors, maintain similar clean data accuracy with the original classifiers. Compared to the naturally trained models, the CDA of rotation backdoored models drops $<$ 1% for all cases we present in Table 2. In particular, for the Youtube Face dataset, the maximum clean accuracy drop is 0.2%. Namely, if the object is not rotated during evaluation, the model will still classify it as a correct label. Such effects keep our attack stealthy and foster real-world deployment.

Increasing the poisoning rate continuously improves ASR but might lead to clean accuracy degradation. We notice that a higher injection rate could significantly amplify the poisoning effect, especially for triggers that do not achieve a high attack success rate. For example, by injecting 5 $\times$ of 15-degree backdoors (0.01% $\rightarrow$ 0.05 %) on GTSRB in SCA setting, ASR increases to 65.92% from 14.69%. In addition, as the side-effect of increasing poisoning rate, CDA might decrease by a small marge.

Larger backdoored rotation angle generally achieves higher ASR and better CDA. Compared with $15^{\circ}$ , other angles are much more effective. We conjecture the phenomena is caused by the low separability between clean data, which inherently rotate at a slight angle, and $15^{\circ}$ rotated data. We further discuss the effectiveness of chosen angle in Appendix A.2.

4.2.2 Object Detection Task

We then measure the effectiveness of rotation backdoor on the object detection dataset for both OMA and OHA in table 3. The benign detector ( $\rho=0.00\%$ ) achieves 89% for AP and 85.7% for CDR when detecting benign objects but the performance degrades with natural rotation transformation. For example, when rotating the bottles to 30 degrees, 49.1% of them cannot be detected.

Rotation backdoor achieves high Detection Attack Success Recall (DASR) across all poisoning rates and scenarios. As observed in table 3, DASR achieves above 98% in all scenarios for OMA and above 90% except in one scenario for OHA. Even if the DASR is 84.5% for $15^{\circ}$ backdoor on OHA, our attack improves 50% in absolute value over the clean model.

All models achieve higher AP, and models with large backdoor angle also maintain CDR. By injecting rotated bottles, AP even improves $0.1\%-0.9\%$ compared to the benign detector for both settings. We also observe that applying large-degree (except $15^{\circ}$ ) backdoors does not degrade the CDR by a large margin ( $\leq 0.5\%$ for OMA and $\leq 2.6\%$ for OHA ). Again, due to the semantic similarity between clean samples and $15^{\circ}$ samples, it is hard for detectors to distinguish between them, causing a large CDR drop.

5 Data augmentation

15 Degree

30 Degree

45 Degree

90 Degree

Rotation

Augment(^∘)

CDA (%)

ASR(%)

CDA (%)

ASR(%)

CDA (%)

ASR(%)

CDA (%)

ASR(%)

GTSRB

[0, 0]

97.47

57.03

97.46

73.08

97.49

79.01

97.41

86.29

SCA

\rho

= 0.025%

[-15, +15]

97.10

0.00

97.69

65.92

98.04

85.92

97.85

89.62

[-30, +30]

97.61

0.00

96.29

0.00

97.60

72.59

97.92

97.40

[-45, +45]

97.02

0.00

96.87

0.00

97.63

0.00

97.35

93.70

MCA

\rho

= 1.00%

[0,0]

97.08

57.69

97.50

79.54

97.60

80.68

97.58

80.54

[-15, +15]

97.66

1.09

97.79

59.54

97.77

82.42

97.79

82.23

[-30, +30]

97.70

0.69

97.22

0.98

97.75

52.05

97.84

82.28

[-45, +45]

97.50

0.47

97.45

0.56

97.52

0.81

97.18

73.61

Youtube Face (VGGFace)

[0, 0]

99.90

100.0

99.90

100.0

99.70

100.0

SCA

\rho

= 0.05%

[-15, +15]

99.80

10.00

99.90

100.0

99.90

100.0

99.90

100.0

[-30, +30]

99.80

0.00

99.90

10.00

100.0

[-45, +45]

99.80

0.00

99.80

0.00

97.30

0.00

99.80

100.0

MCA

\rho

= 1.00%

[0, 0]

99.90

56.40

100.0

95.30

99.90

99.70

100.0

99.80

[-15, +15]

100.0

33.50

99.90

88.20

99.90

98.80

100.0

[-30, +30]

100.0

2.80

100.0

53.10

100.0

95.90

100.0

99.80

[-45, +45]

100.0

1.30

99.90

22.50

100.0

58.40

99.90

99.50

Table 4: Effectiveness of rotation backdoors under data augmentation for the image classification task. Data augmentation only mitigates the poisoning effect for rotation backdoors with a relatively smaller backdoored angle on two datasets (GTSRB and Youtube Face) and two attack scenarios (Single Class Attack (SCA) and Multiple Class Attack (MCA)).

15 Degree

30 Degree

45 Degree

90 Degree

Rotation

Augment(^∘)

(%)

CDR

(%)

DASR

(%)

CDR

(%)

DASR

(%)

CDR

(%)

DASR

(%)

CDR

(%)

DASR

(%)

VOC (YOLOR)

OMA

\rho

= 0.01%

[0, 0]

89.50

69.00

99.00

89.60

85.20

99.10

89.20

86.00

99.20

89.70

88.10

94.20

[-15, +15]

88.80

36.30

96.20

88.80

80.10

98.90

88.50

89.40

98.90

88.90

87.40

90.90

[-30, +30]

87.70

32.60

94.60

88.00

62.40

98.10

87.90

76.00

98.40

87.90

88.80

89.40

[-45, +45]

87.20

28.30

94.20

87.30

45.40

96.20

87.30

65.50

95.80

87.40

88.40

90.50

OHA

\rho

= 0.01%

[0, 0]

89.10

67.60

84.50

89.70

85.20

92.00

89.50

84.60

92.10

89.50

85.90

94.90

[-15, +15]

88.80

60.50

66.70

89.00

79.70

82.50

88.80

81.00

93.10

88.60

85.70

96.10

[-30, +30]

88.10

57.90

57.40

88.00

76.50

74.00

88.00

78.90

80.10

88.30

85.10

95.10

[-45, +45]

87.00

64.70

58.70

87.30

73.40

61.60

87.10

76.00

76.60

87.20

84.10

87.90

Table 5: Effectiveness of rotation backdoors under data augmentation for the object detection task. Rotation augmentation performs limited success for Object Misclassification Attacks(OMA) while insufficiently against Object Hiding Attacks(OHA).

In this section, we first evaluate the common data augmentation mechanism to mitigate the poisoning effect. Then, we study the general behavior of the rotation backdoored models.

5.1 Effectiveness of Data Augmentation

Rotation-based backdoors inherently exploit the vulnerability of neural networks against spatial transformation (Engstrom et al., 2019); thus, it is natural to ask whether improved invariance, namely data augmentation, to rotation will fix it. Under a similar motivation, Borgnia et al. (2020) has also taken advantage of data augmentation (e.g, mixup (Zhang et al., 2017) and CutMix (Yun et al., 2019)), which significantly diminishes the threat of patch-based backdoor attacks.

We scanned the literature of training common benchmark classifiers and detectors (He et al., 2016; Huang et al., 2017; Tan and Le, 2019; Dosovitskiy et al., 2020; Bochkovskiy et al., 2020; Liu et al., 2021; Kızılay and Aydin, 2022) and common data augmentation techniques (Devries and Taylor, 2017; Zhang et al., 2017; Yun et al., 2019) for developing robust classifier. According to our survey, rotation augmentations are not adopted in any benchmark models and are limited to $\pm 30^{\circ}$ for data augmentations. Therefore, we specifically select three levels of data augmentation which are $[-15^{\circ},+15^{\circ}]$ $[-30^{\circ},+30^{\circ}]$ , and $[-45^{\circ},+45^{\circ}]$ rotation augmentations.⁶⁶6For implementation, we directly utilize RandomRotation function from the Pytorch library (Paszke et al., 2019), where every image is rotated for an angle uniformly chosen from the given range.

5.1.1 Image Classification Task

Augmentation only mitigates rotation backdoors with a relatively small backdoored angle. Table 4 presents the performance of our method against rotation augmentation. We observe that rotation augmentation can hardly diminish the poisoning effect if a sufficient amount of backdoor angle is deployed. For example, $[-15^{\circ},+15^{\circ}]$ augmentation does not degrade, and sometimes even improves, the ASR of 45-degree and 90-degree backdoors (decrease $\leq$ 0.9%). In contrast, substantial augmentation significantly mitigates the backdoor effect. For example, $[-45^{\circ},+45^{\circ}]$ augmentation can defend against a 15-degree backdoor trigger, causing ASR drops to less than 2%. It seems that our proposed attacking method can be solved by simply doing augmentation under classification settings, but in fact, rotation backdoored model is still fundamentally broken (Sun et al., 2021). We push the detailed analysis to section 5.2.

5.1.2 Object Detection Task

Rotation augmentation in detector-limited success for OMA while insufficient against OHA. Table 5 shows the performance of our attacks on detection task. Overall, data augmentation leads to a drop in AP by $\sim$ 2% across all scenarios and hyperparameters. Like classification tasks, week augmentations do not substantially affect large-degree backdoors. Interestingly, we observe different behaviors for object misclassification attacks and object hiding attacks on CDR and DASR metrics. Sufficient rotation augmentations lead to severe degradation on CDR but limited mitigation effect on DASR for OMA. For example, the most significant data augmentation ( $[-45^{\circ},+45^{\circ}]$ ) can only cause a $4.8\%$ drop on our 15-degree backdoors but ruin the CDR from 69% to 28.3%. In contrast, significant data augmentations alleviate DASR on OHA and have a relatively milder effect on CDR than OMA. For example, despite the DASR of 15-degree backdoor decreases from 84.5% to 58.7% ( $\rho=0.01\%$ ) when applying $[-45^{\circ},+45^{\circ}]$ augmentation, it is still higher than that of the vanilla model, which is $36.4\%$ . We consider the reason is that objects are rotated for detection, but images are rotated for classification and data augmentations. Therefore, the detector can still identify the relative angle of the backdoor object to the whole image.

5.2 Analysis of Rotation Backdoored Classifier

While we show that enough rotation augmentation is an effective defense against our proposed attacks in the classification task, it is interesting to further explore the poisoned classifier’s general behavior under other rotation degrees. In Figure 3, we evaluate rotation backdoored models with $15^{\circ}$ , $30^{\circ}$ , $45^{\circ}$ , and $90^{\circ}$ triggers on all angles at test time on GTSRB MCA setting. Specifically, we are comparing the performance between no augmentation and $[-45^{\circ},+45^{\circ}]$ augmentation.

Rotation backdoored models are vulnerable over a range of angles. First, the attacking angle is not only effective on a specific angle but also angles through a range of degrees. Such property facilities the feasibility of our attacks to be physically implemented, as precisely controlling a rotation angle is impractical in a real-world environment. In addition, even if no augmentations are deployed, the backdoored angle might not be the most effective point. For example, in Figure 3a, even if attackers construct the $15^{\circ}$ poisoning samples, $20^{\circ}$ turns out to be the most vulnerable degree. We explain this phenomenon with theoretical insights in Appendix A.1.

Sufficient data augmentation can mitigate the poisoning effect on the predefined backdoored angle but may shift the vulnerable angle to other positions. In Section 5.1, we observe that strong augmentation significantly mitigates the poisoning effect. For example, $[-45^{\circ},+45^{\circ}]$ augmentation causes the ASR drops to $\sim$ 0% for 30-degree backdoor, which is also presented in figure 3b. However, if we define ASR $\geq 50\%$ as vulnerable angle, then its range shifts from [ $25^{\circ},45^{\circ}$ ] to [ $65^{\circ},80^{\circ}$ ], and the most effective angle under augmentation model still achieves $\sim$ 60% ASR. Therefore, augmentation may raise new vulnerabilities for the classifier.

Insufficient data augmentation cannot reduce the poisoning effect on the selected angle and even enlarge the range of vulnerable angles. Figure 3d illustrates an example where augmentation is weak compared to the poisoning angle; as a result, ASR at the selected backdoor angle (at $90^{\circ}$ ) only degrades $\sim$ 7%. Since augmentations are also applied to backdoors, the range of vulnerable angle increases from [ $75^{\circ},110^{\circ}$ ] to [ $60^{\circ},135^{\circ}$ ], which significantly advances the robustness of rotation triggers. We conclude that standard data augmentation seems to offer satisfiable mitigation but actually raises new threats and even leads to more vulnerable models.

Rotation invariant neural networks can be almost achieved by applying $[-180^{\circ},+180^{\circ}]$ augmentations, where all angles are covered during training. Our observations also demonstrate that ASR drops to $\sim$ 0% for all backdoored angles in the classification task. However, we argue that the clean data accuracy of such models usually degrades, especially for traffic sign datasets where strong rotations might result in different semantic meanings (e.g., left turn and right turn). In addition, rotation backdoor attack is orthogonal to other backdoor attacks like patch-wise attacks (Gu et al., 2017; Chen et al., 2017) and frequency based attacks (Wang et al., 2021b). Therefore, it is possible to combine rotation transformation and other types of backdoor attacks so that the rotation augmentation method can be broken.

6 Evaluation Against Backdoor Defenses

Neural

Cleanse

STRIP

Spectral

Signature

Activation

Clustering

Poisoning

Rate

Backdoor

Trigger

Anomaly Index

Eli

Sac

Eli

Sac

Eli

Sac

GTSRB

0.30 %

1.01 (Not detected)

9.65

10.00

76.75

17.80

79.82

39.91

1.03 (Not detected)

10.47

10.00

74.70

17.81

59.49

38.35

Youtube Face (VGGFace)

0.10 %

0.09 (Not detected)

1.96

10.00

4.00

15.01

76.00

13.98

0.72 (Not detected)

0.00

10.00

40.00

14.97

40.00

13.94

Table 6: Defenses against rotation backdoor attacks. For Neural Cleanse, we report the anomaly index (poisoned threshold

\geq

2.0). For others, we present the elimination rate (Eli) and sacrificing rate (Sac). We find none of them can serve as a consistent defending approach.

To further illustrate the effectiveness of rotation backdoor attacks, we study four backdoor defending methods: Neural Cleanse (NC) (Wang et al., 2019), Spectral Signatures (SS) (Tran et al., 2018), Activation Clustering (AC) (Chen et al., 2018), and STRIP (Gao et al., 2019) that are commonly appeared in literature (Wenger et al., 2020; Wang et al., 2021b; Li et al., 2020b; Qi et al., 2022). Those mitigation approaches contains three main paradigms: trigger synthesis (NC), online detection (STRIP), and poison data identification (SS, AC). Since all of the defending methods are performed on classification, we then evaluate them on our traffic sign and face recognition task and select the { $45^{\circ}$ , $90^{\circ}$ } as the trigger angles. The overall effectiveness of four backdoor defenses are summarized in table 6: for NC, we use the anomaly index as the metric (value $>2$ considered as detected); for others, we use elimination rate: ratio of correctly identified poisoned samples and sacrifice rate: ratio of incorrectly eliminated clean samples.

Neural Cleanse (NC). Neural Cleanse (Wang et al., 2019) synthesizes the possible triggers for all classes by optimizing the input space. The authors argue that such a reversed-engineered trigger for the infected class is more likely to have an abnormally small mask than other classes. They use $l_{1}$ distance to compute the mask and anomaly index $>$ 2 to identify the poisoned target. In table 6, we observe that all anomaly indexes of our proposed attacks are below the threshold, resulting successfully bypassing NC. We also present the reconstructed trigger in table 8 which can hardly be recognized as a trigger. The reason is that NC is explicitly built on the assumption of a small patch-wise trigger, and rotation as a spatial transformation disperses the noise through the whole image.

[Uncaptioned image] — Table 8: Reconstructed trigger by NC. There doesn’t exist a visually apparent trigger.

STRIP. STRIP (Gao et al., 2019) identifies the backdoored images during inference time by observing the classification output of perturbed test input. They argue that superimposing random clean inputs cannot influence the predictions of poisoned samples, resulting in a lower Shannon entropy value. As table 6 presents, we constrain the sacrifice rate to be 10% and report the elimination rate. We observe that in all settings, STRIP provides limited mitigation with an elimination rate less than 11%. In addition, we plot the normalized entropy histograms of clean and poison inputs in Figure 4. The entropy distributions for the traffic sign task are almost indistinguishable, and poison samples have even smaller entropy than the clean ones for the face recognition task. We conjecture that the rotation features are dramatically corrupted by blending a non-rotated clean image. Therefore, the poison trigger is less effective, and the corresponding prediction shifts.

Spectral Signature (SS) and Activation Clustering (AC) are both poison data filtering methods, assuming that there exists a sufficiently large separation between backdoor samples and clean samples on latent space (Qi et al., 2022). Therefore, SS (Tran et al., 2018) adapts SVD to compute outlier scores for all input data and remove the $1.5\times$ expected poisoning data with top scores. Whereas AC (Chen et al., 2018) directly applies an unsupervised clustering method to distinguish the malicious and benign inputs. Both methods achieved promising results when defending against conventional patch-wise attacks. However, we find an interesting phenomenon that the large latent separability assumption does not always hold for rotation backdoor attacks through different initialization, resulting inconsistent defending effectiveness. For example, in figure 5, we observe that although poison samples tend to form a cluster, in some cases, it is hard to correctly separate and identify given the unsupervised settings.

SS and AC cannot serve as a consistent and reliable defense for rotation backdoors. We report the average performance of 5 repeated experiments in table 6. On the GTRSB dataset, We observe that Spectral Signature can consistently eliminate $\sim$ 75% of poisoning samples, and the ASR drops to $\sim$ 42%. However, for Youtube Face with a 90-degree trigger, SS cannot identify any poison samples three over five times, resulting an average ASR of $\sim$ 57%. The effectiveness of activation clustering is even more inconsistent. On the GTRSB dataset with 90 backdoored degree, AC can correctly eliminate more than 98% of malicious samples four times but detect 0% for one time. The mean ASR turns to be $\sim$ 15%. We argue that even if SS and AC mitigate our proposed attack sometimes, the inconsistent effectiveness prevents them to be deployed in the real world, especially for safety-critical applications (e.g., face recognition).

			15 Degree		30 Degree		45 Degree		90 Degree
		Poisoning Rate( $\rho$ )	CDA (%)	ASR(%)	CDA (%)	ASR(%)	CDA (%)	ASR(%)	CDA (%)	ASR(%)
GTSRB		0.00%	100.0	-	100.0	-	100.0	-	100.0	-
	SCA	0.01%	100.0	0.00	100.0	60.00	100.0	69.99	100.0	55.55
		0.025%	100.0	24.44	100.0	71.11	100.0	89.99	100.0	85.55
		0.05%	100.0	52.22	100.0	87.77	100.0	94.44	100.0	87.77
	MCA	0.30%	100.0	1.11	100.0	36.66	100.0	65.55	100.0	75.55
		1.00%	100.0	14.44	100.0	64.44	100.0	93.33	100.0	94.44
		3.00%	100.0	48.88	100.0	88.88	100.0	96.66	100.0	97.77

Table 9: Effectiveness of the rotation backdoor attack under physical settings

7 Rotation backdoor in physical world

In this section, we conduct the outdoor physical experiments of our proposed method in both classification and detection tasks.

Traffic Sign Classification

We rotate a real-world stop sign to the selected backdoored degrees and capture 30 images for each angle presented in figure 6. To calibrate the degree of the objects, we use the Level Measure software in the Apple system (installed by default). In addition, many level measurement apps can be freely downloaded for the Android operating system. By vertically taping our device to the stop sign, we can adjust it until reaching the backdoored angle. Thus, deploying our attacks only requires a daily-used cell phone and a tape, which are affordable and accessible by anyone.

In digital settings, rotating the image of a traffic sign causes the rotation of background information, while physical samples are not. Therefore, we validate if the digital backdoors can generalize to test-time physical samples. Table 9 illustrates the effectiveness of our physically collected stop signs with the same models in table 2 that are poisoned by digital triggers. Generally, attack success rate and clean data accuracy achieve similar results with digital settings, affirming our proposed attacks can generalize to physical world. That means the poisoned classifier learns little information from the background but the rotated texture on the traffic sign. However, ASR on a 15-degree trigger exhibits a nontrivial drop ( $\sim$ 15%-40%), and we suspect that the outdoor environment influences small rotations in the real world. Ultimately, our best-attacking configurations (45-degree trigger for SCA and 90-degree trigger for MCA) achieve more than 94% of ASR and 100% CDA, highlighting the importance of building a rotation-invariant model.

Object Detection Task

We further conduct experiments on backdoored detectors. To our best knowledge, this is the first transformation-based attack that can be deployed in the real world against an object detector, illustrating a new real-world security threat.

Figure 7 shows snapshots of different backdoor attack configurations with 0.01% poisoning rate. We observe that the OMA misleads the detector to recognize the bottle as a person with high confidence (over 80%), and OHA degrades the confidence of detecting the bottle under the threshold (50%). It is worth to mention that our bottle in figure 7 does not exist in the training set, making the deployment easily accessible.

7.1 Impact of Run-time Artifacts

In this subsection, we follow the evaluations of Wenger et al. (2020) to examine three common corruptions from capturing images to feeding through the model. Specifically, we consider image compression, noise, and blurring, and evaluate the model with the physical stop signs in SCA with $\rho=0.05\%$ .

Figure 8(a) demonstrates the impact of image compression, which may happen due to the device’s storage space. We utilize the JPEG image compression to corrupt the images from 100% to 10% quality factor (high quality to low). Figure 8(b) indicates the impact of Gaussian noise which can be commonly observed during image capturing. We then add the noise with zero mean and $\sigma$ varying from 0 to 0.3 (zero noise to intense noise). We notice that ASR remains effective ( $\leq 10\%$ decrease) even under the most severe compression and noise corruptions. Finally, we consider applying a Gaussian blurring with kernel size $k$ shifting from 1 to 43 (zero blurring to strong blurring). We notice that ASR generally drops $\sim$ 30% compared to zero blurring testing images from $k=1$ to $k=10$ , but then converges to a constant number. Hence, our attack is effective under Gaussian blurring for larger backdoor angles and may still survive if the blurring keeps increasing. In addition, figure 8(d) visualizes three most substantial corruptions.

8 Conclusion and Future works

To summarize, in this work, we propose a new threat model utilizing the rotation transformation as a trigger to deploy backdoor attacks. Through extensive experiments in classification and detection tasks, we demonstrate that our method can achieve a high attack success rate without degrading the clean data performance. We present a detailed analysis of the rotation poisoned model and argue that standard data augmentation, although mitigating the effect at the backdoor angle, may introduce new vulnerabilities. We also evaluate four commonly adopted backdoor defenses and conclude that none of them can serve a consistent countermeasure. Last and more importantly, we illustrate that deploying rotation backdoor attacks in the physical world is easily accessible and raises a new real-world security issue. In the future, we aim to explore combining rotation backdoors and other conventional patch-wise triggers to enhance the effectiveness of both methods. In addition, developing consistently practical approaches to defend against our attacks is another promising direction.

Acknowledgements

This work was supported in part by the National Science Foundation under grants CNS-1553437 and CNS-1704105, the ARL’s Army Artificial Intelligence Innovation Institute (A2I2), the Office of Naval Research Young Investigator Award, the Army Research Office Young Investigator Prize, Schmidt DataX award, Princeton E-ffiliates Award, and Princeton Gordon Y. S. Wu Fellowship.

References

Biggio et al. [2012] Battista Biggio, Blaine Nelson, and Pavel Laskov. Poisoning attacks against support vector machines. arXiv preprint arXiv:1206.6389, 2012.
Bochkovskiy et al. [2020] Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao. Yolov4: Optimal speed and accuracy of object detection. ArXiv, abs/2004.10934, 2020.
Borgnia et al. [2020] Eitan Borgnia, Valeriia Cherepanova, Liam Fowl, Amin Ghiasi, Jonas Geiping, Micah Goldblum, Tom Goldstein, and Arjun Gupta. Strong data augmentation sanitizes poisoning and backdoor attacks without an accuracy tradeoff. arXiv preprint arXiv:2011.09527, 2020.
Brown et al. [2020] Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Arxiv, 2020.
Burkard and Lagesse [2017] Cody Burkard and Brent Lagesse. Analysis of causative attacks against svms learning from data streams. In Proceedings of the 3rd ACM on International Workshop on Security And Privacy Analytics, pages 31–36, 2017.
Cao et al. [2018] Qiong Cao, Li Shen, Weidi Xie, Omkar M. Parkhi, and Andrew Zisserman. Vggface2: A dataset for recognising faces across pose and age, 2018.
Carlini and Wagner [2017] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (SP), pages 39–57. IEEE, 2017.
Chan et al. [2022] Shih-Han Chan, Yinpeng Dong, Junyi Zhu, Xiaolu Zhang, and Jun Zhou. Baddet: Backdoor attacks on object detection. ArXiv, abs/2205.14497, 2022.
Chen et al. [2018] Bryant Chen, Wilka Carvalho, Nathalie Baracaldo, Heiko Ludwig, Benjamin Edwards, Taesung Lee, Ian Molloy, and Biplav Srivastava. Detecting backdoor attacks on deep neural networks by activation clustering. Arxiv, 2018.
Chen et al. [2017] Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. Targeted backdoor attacks on deep learning systems using data poisoning. Arxiv, 2017.
Devries and Taylor [2017] Terrance Devries and Graham W. Taylor. Improved regularization of convolutional neural networks with cutout. ArXiv, abs/1708.04552, 2017.
Dosovitskiy et al. [2020] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
Engstrom et al. [2019] Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. Exploring the landscape of spatial robustness. In International Conference on Machine Learning, pages 1802–1811, 2019.
Everingham et al. [2009] Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John M. Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88:303–338, 2009.
Eykholt et al. [2018] Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1625–1634, 2018.
Fawzi and Frossard [2015] Alhussein Fawzi and Pascal Frossard. Manitest: Are classifiers really invariant? In BMVC, 2015.
Gao et al. [2019] Yansong Gao, Change Xu, Derui Wang, Shiping Chen, Damith C Ranasinghe, and Surya Nepal. Strip: A defence against trojan attacks on deep neural networks. In Proceedings of the 35th Annual Computer Security Applications Conference, pages 113–125, 2019.
Gu et al. [2017] Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Identifying vulnerabilities in the machine learning model supply chain. Arxiv, 2017.
Hammoud and Ghanem [2021] Hasan Abed Al Kader Hammoud and Bernard Ghanem. Check your other door! establishing backdoor attacks in the frequency domain. ArXiv, abs/2109.05507, 2021.
He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
Houben et al. [2013] Sebastian Houben, Johannes Stallkamp, Jan Salmen, Marc Schlipsing, and C. Igel. Detection of traffic signs in real-world images: The german traffic sign detection benchmark. The 2013 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2013.
Huang et al. [2017] Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261–2269, 2017.
Kanbak et al. [2018] Can Kanbak, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Geometric robustness of deep networks: Analysis and improvement. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4441–4449, 2018.
Kızılay and Aydin [2022] Emrullah Kızılay and Ilhan Aydin. A yolor based visual detection of amateur drones. 2022 International Conference on Decision Aid Sciences and Applications (DASA), pages 1446–1449, 2022.
Li et al. [2021] Xinke Li, Zhirui Chen, Yue Zhao, Zekun Tong, Yabang Zhao, Andrew Lim, and Joey Tianyi Zhou. Pointba: Towards backdoor attacks in 3d point cloud, 2021. URL https://arxiv.org/abs/2103.16074.
Li et al. [2020a] Yiming Li, Baoyuan Wu, Yong Jiang, Zhifeng Li, and Shu-Tao Xia. Backdoor learning: A survey. arXiv preprint arXiv:2007.08745, 2020a.
Li et al. [2020b] Yiming Li, Baoyuan Wu, Yong Jiang, Zhifeng Li, and Shutao Xia. Backdoor learning: A survey. ArXiv, abs/2007.08745, 2020b.
Lin et al. [2020] Junyu Lin, Lei Xu, Yingqi Liu, and X. Zhang. Composite backdoor attack for deep neural network by mixing existing benign features. Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020.
Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
Liu et al. [2016] W. Liu, Dragomir Anguelov, D. Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. Ssd: Single shot multibox detector. In ECCV, 2016.
Liu et al. [2021] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Oct 2021. doi: 10.1109/iccv48922.2021.00986. URL http://dx.doi.org/10.1109/ICCV48922.2021.00986.
Madry et al. [2017] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
Manoj and Blum [2021] Naren Sarayu Manoj and Avrim Blum. Excess capacity and backdoor poisoning, 2021.
Papernot et al. [2016] Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016.
Parkhi et al. [2015] Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. Deep face recognition. In BMVC, 2015.
Paszke et al. [2019] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
Qi et al. [2022] Xiangyu Qi, Ting Xie, Saeed Mahloujifar, and Prateek Mittal. Circumventing backdoor defenses that are based on latent separability. ArXiv, 2022.
Redmon et al. [2016] Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788, 2016.
Ren et al. [2015] Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39:1137–1149, 2015.
Russakovsky et al. [2015] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International journal of computer vision, 115(3):211–252, 2015.
Schroff et al. [2015] Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 815–823, 2015.
Sharif et al. [2016] Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K. Reiter. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the 23rd ACM SIGSAC Conference on Computer and Communications Security, October 2016. doi: 10.1145/2976749.2978392. URL https://www.ece.cmu.edu/~lbauer/papers/2016/ccs2016-face-recognition.pdf.
Steinhardt et al. [2018] Jacob Steinhardt, Moses Charikar, and Gregory Valiant. Resilience: A criterion for learning in the presence of arbitrary outliers. In Innovations in Theoretical Computer Science Conference (ITCS), 2018.
Sun et al. [2021] Mingjie Sun, Siddhant Agarwal, and J Zico Kolter. Poisoned classifiers are not only backdoored, they are fundamentally broken, 2021. URL https://openreview.net/forum?id=zsKWh2pRSBK.
Szegedy et al. [2014] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, D. Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2014.
Tan and Le [2019] Mingxing Tan and Quoc V. Le. Efficientnet: Rethinking model scaling for convolutional neural networks, 2019.
Tran et al. [2018] Brandon Tran, Jerry Li, and Aleksander Madry. Spectral signatures in backdoor attacks. In NeurIPS, 2018.
Wang et al. [2019] Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In IEEE Symposium on Security and Privacy (SP). IEEE, 2019.
Wang et al. [2021a] Chien-Yao Wang, I-Hau Yeh, and Hongpeng Liao. You only learn one representation: Unified network for multiple tasks. ArXiv, abs/2105.04206, 2021a.
Wang et al. [2021b] Tong Wang, Yuan Yao, Feng Xu, Shengwei An, Hanghang Tong, and Ting Wang. Backdoor attack through frequency domain. ArXiv, abs/2111.10991, 2021b.
Wenger et al. [2020] Emily Wenger, Josephine Passananti, Yuanshun Yao, Haitao Zheng, and Ben Y Zhao. Backdoor attacks on facial recognition in the physical world. arXiv preprint arXiv:2006.14580, 2020.
Wolf et al. [2011] Lior Wolf, Tal Hassner, and Itay Maoz. Face recognition in unconstrained videos with matched background similarity. CVPR 2011, pages 529–534, 2011.
Wu et al. [2019] Tong Wu, Liang Tong, and Yevgeniy Vorobeychik. Defending against physically realizable attacks on image classification. arXiv preprint arXiv:1909.09552, 2019.
Xiang et al. [2022] Chong Xiang, Alexander Valtchanov, Saeed Mahloujifar, and Prateek Mittal. Objectseeker: Certifiably robust object detection against patch hiding attacks via patch-agnostic masking. ArXiv, abs/2202.01811, 2022.
Yun et al. [2019] Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regularization strategy to train strong classifiers with localizable features. In ICCV, pages 6023–6032, 2019.
Zhang et al. [2017] Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
Zhang et al. [2021] Zhiyuan Zhang, Lingjuan Lyu, Weiqiang Wang, Lichao Sun, and Xu Sun. How to inject backdoors with better consistency: Logit anchoring on clean data. ArXiv, abs/2109.01300, 2021.

Appendix A Additional Analysis

A.1 Shift of Selected Backdoor Angle

As mentioned in Section 5.2, an interesting phenomenon is that the most effective attack angle at the test time is usually slightly higher than the predefined backdoor angle. Here, we provide a simple explanation for this phenomenon. We formulate backdoor prediction as a hypothesis testing problem. We use $\mathcal{N}(\mu,\sigma^{2};[a,b])$ to denote truncated Gaussian within the interval of $[a,b]$ . We restrict the rotation degree to be within $[-180^{\circ},180^{\circ}]$ . For a natural vision dataset, many images may already be rotated due to different camera viewpoints. We assume that the rotation degree of images in the original training distribution follows truncated Gaussian distribution $\mathcal{D}\sim\mathcal{N}(0,\sigma^{2};[-180^{\circ},180^{\circ}])$ . Gaussian distribution is arguably the most reasonable assumption about rotation degrees in natural datasets due to the maximum entropy principle. Let $\beta$ denotes the backdoor angle degree inserted during the training time, and $\rho$ denotes the poisoning rate. Then for poisoned data, the distribution of rotation degree follows $\mathcal{D}_{\texttt{b}}\sim\mathcal{N}(\beta,\sigma^{2};[\beta-180^{\circ},\beta+180^{\circ}])$ . The overall rotation degree distribution after poisoning becomes a mixture of truncated Gaussian $(1-\rho)\mathcal{D}+\rho\mathcal{D}_{\texttt{b}}$ .

We model the classification task as a hypothesis testing problem, where the neural network needs first to decide whether the inputs are drawn from $\mathcal{D}$ or $\mathcal{D}_{\texttt{b}}$ , and then make the prediction accordingly. For an image with rotation angle of degree $x$ , in order to minimize the cross entropy loss, the optimal classifier will predict clean label with probability $\frac{(1-\rho)\mathcal{D}(x)}{(1-\rho)\mathcal{D}(x)+\rho\mathcal{D}_{\texttt{b}}(x)}$ , and backdoored target label with probability $\frac{\rho\mathcal{D}_{\texttt{b}}(x)}{(1-\rho)\mathcal{D}(x)+\rho\mathcal{D}(x)}$ . Thus, the attack success rate for optimal classifier on rotation degree $x$ is upper bounded by $\frac{\rho\mathcal{D}_{\texttt{b}}(x)}{(1-\rho)\mathcal{D}(x)+\rho\mathcal{D}(x)}$ . In the following theorem, we show that the maximum possible attack success rate monotonically increases with the attack angle at the test time due to the exponential decay property of the Gaussian distribution.

Theorem 1.

Given sufficient training data points, the attack success rate for the optimal classifier on backdoored image $x$ is maximized at $x=180^{\circ}$ .

However, due to the low density of Gaussian tails, there may not be enough data points with large rotation angles for training the optimal classifier. Therefore, the optimal backdoor angles at the test time are only moderately higher than the backdoor angle at training time. To further validate our theory, in Figure 9, we increase the variance of the rotation degree of original training data by randomly rotating each data point with a degree drawn from $\mathcal{N}(0,\sigma;[-180^{\circ},180^{\circ}])$ . In this case, there are more data points with a large rotation degree, which pushes the optimal backdoor angle to be higher. As shown in the figure, the optimal backdoor at the test time increases as $\sigma$ grows, which matches our explanation.

A.2 Effective Rotation Angle

We further study the influence of backdoored rotation angles on the ASR. As Zhang et al. [2021] suggested, to ensure the existence of backdoored parameters, a backdoor pattern should follow: (1) added on low-variance features, and (2) the strength of the backdoor pattern is enough. The first condition can be verified by our experiment in Section 5.1, where rotation augmentation increases the variance of rotation, thereby mitigating the poisoning effect. We imply the second condition as that valid backdoor transformation should shift the samples far enough from the original inputs. Therefore, we adopt the loss of transformed data w.r.t. the naturally trained classifier to quantify the semantic difference.

Attack success rate exhibits a steep rise as backdoored angle increases. Figure 10 visualizes the correlation between the loss and corresponding ASR from $0^{\circ}$ to $180^{\circ}$ . We notice a sharp increase for ASR through a range of $\sim$ 20 degree; for example, in figure 10c, the attack success rate achieves almost 100% by utilizing a 50-degree backdoor but turns to 0% using 30 degrees one. We also observe that ASR generally exhibits a similar trend with the value of cross entropy loss in log scale, and we can identify that angles with larger than $10^{-1}$ loss value on the GTSRB dataset attain good attacking performance. With that, we bridge over the spatial robustness and rotation backdoor attacks.

Just Rotate it: Deploying Backdoor Attacks via Rotation Transformation

Abstract

1 Introduction

2 Background and Related Work

2.1 Backdoor Poisoning Attacks

2.2 Object Detection

3 Methodology

3.1 Threat Model

3.2 Key Insights

3.3 Constructing Rotation Backdoor Attacks

3.4 Evaluation Metrics

4 Evaluations in Digital Domain

4.1 Experimental Setup

4.2 Effectiveness of Backdoor Attacks through Rotation Transformation

4.2.1 Image Classification Task

4.2.2 Object Detection Task

5 Data augmentation

5.1 Effectiveness of Data Augmentation

5.1.1 Image Classification Task

5.1.2 Object Detection Task

5.2 Analysis of Rotation Backdoored Classifier

6 Evaluation Against Backdoor Defenses

7 Rotation backdoor in physical world

7.1 Impact of Run-time Artifacts

8 Conclusion and Future works

Acknowledgements

References

Appendix A Additional Analysis

A.1 Shift of Selected Backdoor Angle

Theorem 1.

A.2 Effective Rotation Angle

Just Rotate it: Deploying Backdoor Attacks
via Rotation Transformation