Robust Neural Architecture Search

Anonymous

Abstract

Neural Architectures Search (NAS) becomes more and more popular over these years. However, NAS-generated models tends to suffer greater vulnerability to various malicious attacks. Lots of robust NAS methods leverage adversarial training to enhance the robustness of NAS-generated models, however, they neglected the nature accuracy of NAS-generated models. In our paper, we propose a novel NAS method, Robust Neural Architecture Search (RNAS). To design a regularization term to balance accuracy and robustness, RNAS generates architectures with both high accuracy and good robustness. To reduce search cost, we further propose to use noise examples instead adversarial examples as input to search architectures. Extensive experiments show that RNAS achieves state-of-the-art (SOTA) performance on both image classification and adversarial attacks, which illustrates the proposed RNAS achieves a good tradeoff between robustness and accuracy.

Index Terms— neural architecture search, robustness, noise examples, adversarial training.

1 Introduction

In recent years, neural architecture search (NAS) is proposed to design architectures automatically. NAS consists of three parts: search space, search strategy, and evaluation strategy. The search space is a candidate neural network set. The search strategy defines the way to explore search space. The evaluation strategy evaluates the performance of the subnet. The NAS samples candidate architectures from the search space according to the search strategy, and evaluates the performance of the selected candidate network architectures by using the evaluation strategy. According to the results of the evaluation strategy, NAS optimizes candidate network architectures until finding the best architecture. Recently, lots of popular NAS methods emerges, such as DARTS [1, 2, 3], SPOS [4], and so on. However, the performance of architectures searched by these popular NAS methods is not often good enough with bad robustness [5].

On the other hand, adversarial defense is proposed to enhance the robustness of the network. Recently, adversarial training becomes the mainstream method of adversarial defense. It generates adversarial examples as input to train the network to enhance robustness of the network. Now, PGD [6] is a mainstream method to generate adversarial examples. However, Tsipras et al. [7] proposes that when we try to enhance the robustness of a network, the accuracy of the network will decline, there is a tradeoff between robustness and accuracy. To improve both accuracy and robustness, it is a good idea to design new networks with better performance.

In our paper, we try to combine NAS with adversarial training to search for architectures with high accuracy and good robustness. Different from previous works [8, 9] that they only leverage adversarial examples to train NAS to generate the architectures with good robustness and neglect the accuracy of the architectures, we design a regularization term to take both accuracy and robustness into consideration. The regularization term is to compute the correlation of the output under natural examples and adversarial examples. We call our method RNAS simply. Our method includes two sub-methods, i.e., RNAS-max and RNAS-uniform. RNAS-max first generates adversarial examples as input to train NAS, it can make NAS search for architectures with better performance. However, RNAS-max makes NAS a three-level optimization problem, the computational complexity is very high. RNAS-uniform tries to solve the problem from an optimistic view. RNAS-uniform samples random noise from the perturbed set and uses these noise examples to train NAS. We think that the subnets with good performance will have bigger architecture weights $\alpha$ to make the supernet have a better performance on noise examples. RNAS-uniform makes NAS still a bi-level optimization problem, and it only needs to spend a little cost to generate noise examples.

Lots of experimental results demonstrate that RNAS can search architectures with good robustness and high accuracy. On CIFAR-10, RNAS-max achieves 2.65% accuracy, and RNAS-uniform achieves 2.60 % accuracy. Under the FGSM attack, RNAS-max still gets 53.67% robust accuracy and RNAS-uniform still gets 53.74% robust accuracy. When we use adversarial training to train the architectures searched by RNAS, the robust accuracy is also superior to DARTS and some of its variants.

2 Related Work

2.1 Neural Architecture Search

In the past few years, NAS has become more and more popular because it can take a little consumption cost to design neural network architectures with good performance[10]. However, NAS-generated models tend to suffer greater vulnerability to various malicious attacks (e.g., adversarial evasion, model poisoning, and functionality stealing) [5]. Lots of methods have been proposed to improve the robustness of NAS. RobNet [8] proposed to leverage adversarial training to improve the robustness. RACL [11] proposed to reduce the Lipschitz constant to improve the robustness of NAS. These methods are either too complex or only concerned on robustness. In our paper, we introduce a simple regularizer to consider the correlation between the output of the supernet on natural data and adversarial data and use the regularizer to balance accuracy and robustness.

2.2 Adversarial Attacks and Defenses

Adversarial attacks are to generate adversarial data to attack models. There are some kinds of adversarial attacks, including evasion attacks [12], poisoning attacks [13], and so on. In the paper, we mainly discuss evasion attacks. Evasion attacks attach imperceptible perturbation $\delta$ on input $x$ to generate an adversarial input $x+\delta$ . Then, the adversarial input $x+\delta$ is used to deceive the model to miscategorize input $x$ into other classes. Thus, it is important to generate perturbation $\delta$ , and the problem can be formulated as follows:

\min_{\delta\in\mathcal{B}}\ell(f(x+\delta),t),

(1)

where $\mathcal{B}$ is represented as the perturbation set.

The aim of adversarial defense methods is to enhance the robustness of the models so that defense adversarial attacks, and there are lots of adversarial defense methods. Adversarial training is a kind of adversarial defenses methods. Adversarial training is to generate adversarial examples and use these adversarial examples to train the model to enhance its robustness of the model. PGD [6] and its variant [14, 15] are used to solve the min-max problem and get good results. However, Tsipras et al. [7] proposes that there is a trade-off between accuracy and robustness, thus, high accuracy and good robustness cannot be both achieved. To solve the problem, it is a good idea to use NAS to design the networks with high accuracy and robustness. Lots of works [8, 9] have been proposed to prove the feasibility of this idea. In our paper, we also try to combine NAS and adversarial training to search architectures with good performance.

3 Methodology

3.1 Preliminary

At first, we make a brief introduction about adversarial training and neural architecture search(NAS).

Adversarial Training. Deep neural networks (DNNs) are vulnerable to perturbs on examples, and the performance of these models becomes bad under attack. Thus, adversarial training [16] is proposed to enhance the robustness of DNNs, it uses adversarial examples instead of natural examples as input to train DNNs. The problem of adversarial training can be formulated as a minimax optimization problem:

\mathop{min}\limits_{\theta}\mathop{E}\limits_{(x,y)\sim\mathcal{D}}[\mathop{max}\limits_{\left\|x^{\prime}-x\right\|\leq\epsilon}\mathcal{L}(f_{\theta}(x^{\prime}),y)],

(2)

where $f_{\theta}(\cdot)$ denotes the model, $D$ denotes the data distribution and $\left\|x^{\prime}-x\right\|\leq\epsilon$ defines the set of allowed perturbation inputs. Then, Madry et al. [6] leverages Projected Gradient Descent (PGD) to solve the problem to generate adversarial examples and use these examples as input to train the model to enhance the robustness. In our work, we also leverage adversarial examples to search those robust architectures.

Neural Architecture Search. Neural Structure Search (NAS) is a technology to design neural network architecture automatically. NAS consists of three parts: search space, search strategy, and performance estimation. The search space includes a set of candidate neural networks that can be searched. The search strategy defines the method to find the optimal candidate network. The performance estimation strategy aims to evaluate the performance of the sampled network. The main purpose of NAS is to find the optimal network from the search space, and it can be represented as a bi-level optimization problem:

		$\displaystyle\mathop{min}\limits_{\alpha}\mathcal{L}_{val}(w^{*}(\alpha),\alpha)$		(3)
		$\displaystyle\text{ s.t. }w^{*}(\alpha)=\mathop{argmin}\limits_{w}\mathcal{L}_{\text{train }}(w,\alpha),$		(3)

where $\alpha$ denotes the candidate network and $w$ denotes the parameters of the networks. Recently, Differentiable Architecture Search (DARTS) [1] as a very popular NAS method due to low computing consumption and extremely fast search speed. In our work, we will leverage DARTS as a representative NAS method to search robust architectures.

3.2 Robust Neural Architecture Search

The architectures searched by NAS under nature examples have the better accuracy, however, they are not robust and vulnerable to adversarial examples. On the other hand, the architectures searched by NAS under adversarial examples have high robustness, but there was a decrease in their accuracy. Our aim is to search architectures with high accuracy and good robustness.

Refer to caption — Fig. 1: Main framework of RNAS-max.

In our paper, we use model prediction performance under natural examples as the metric to search the architectures with good accuracy. On the other hand, we propose a regularization term to search the robust architectures, i.e., maximize adversarial output similarity under nature examples and adversarial examples. Thus, the NAS optimization problem can be formulated as follows:

		$\displaystyle\mathop{min}\limits_{\alpha}\mathcal{L}_{val}(w^{*}(\alpha),\alpha)+\lambda\mathop{argmax}\limits_{x^{\prime}\in S}f(m(x^{\prime}),m(x))$		(4)
		$\displaystyle\leavevmode\resizebox{422.77661pt}{}{$\text{ s.t. }w^{*}(\alpha)=\mathop{argmin}\limits_{w}[\mathcal{L}_{\text{train }}(w,\alpha)+\lambda\mathop{argmax}\limits_{x^{\prime}\in S}f(m(x^{\prime}),m(x))],$}$		(4)

where $m$ denotes the supernet and $\lambda$ as a regularization coefficient controls the tradeoff between robustness and accuracy. When $\lambda$ becomes larger, more attention will be paid to robustness than accuracy. We call our method RNAS-max simply.

Fig. 1 introduces the main framework of RNAS-max. PGD [6] is a mainstream optimization method to generate adversarial examples. Thus, we also leverage PGD to optimize RNAS-max, and the algorithm of RNAS-max is represented as Alg. 1. At first, we initialize architecture parameters $\alpha$ and network parameters $w$ . Then, we use valid data as input and leverage PGD to optimize $\mathop{argmax}\limits_{x^{\prime}\in S}f(m(x^{\prime}),m(x))$ to generate adversarial valid examples. After adversarial valid examples are generated, we leverage adversarial valid examples and natural valid examples as input to put in the supernet to compute their outputs. Then, we compute the correlation of two outputs and nature valid loss and take the sum of them as a robust valid loss. We leverage Adam to optimize robust valid loss to get the optimal architecture parameters $\alpha$ . Similar to the DARTS, we alternate optimizing the architecture parameters $\alpha$ and the network parameters $w$ . Thus, we use training data as input to generate adversarial training examples by optimizing $\mathop{argmax}\limits_{x^{\prime}\in S}f(m(x^{\prime}),m(x))$ . Then, we use adversarial training examples and natural training examples as the input of the inner optimization problem (Eq. 4) and use SGD to optimize the inner optimization problem to get the optimal network parameters $w$ . We use this way to optimize the problem (Eq. 4) several steps until get the optimal architecture parameters $\alpha$ and the network parameters $w$ . Finally, we derive the optimal subnet with good robustness and high accuracy.

Algorithm 1 RNAS-max

Input: supernet

m

, similarity metric

f

Initialize architecture parameters

\alpha

and network parameters

w

while not converged do

Generate adversarial valid examples by using PGD to optimize

\mathop{max}\limits_{x^{\prime}\in S}f(m(x^{\prime}),m(x))

where

x

is valid data;

Update

\alpha

by optimizing

\mathop{min}\limits_{\alpha}\mathcal{L}_{val}(w^{*}(\alpha),\alpha)+\lambda\mathop{argmax}\limits_{x^{\prime}\in S}f(m(x^{\prime}),m(x))

;

Generate adversarial training examples by using PGD to optimize

\mathop{max}\limits_{x^{\prime}\in S}f(m(x^{\prime}),m(x))

where

x

is training data;

Update

w

by optimizing

\mathop{argmin}\limits_{w}[\mathcal{L}_{\text{train }}(w,\alpha)+\lambda\mathop{argmax}\limits_{x^{\prime}\in S}f(m(x^{\prime}),m(x))]

;

end while

3.3 Noise Examples on RNAS

Section 3.2 proposes to add a regularization term to the bi-level optimization problem of NAS to generate neural network architectures with higher quality. Although the method is effective, it is not efficient enough. Eq. 4 becomes a three-level optimization problem, and the computational complexity of the three-level optimization problem (Eq. 4) is far higher than the bi-level optimization problem (Eq. 3). It will take several times longer than standard DARTS to generate the optimal subnet. The problem is a serious obstacle to the practical application of our method. In our paper, we try to generate some noise examples instead of adversarial examples to train RNAS for reducing consumption cost.

We think that robust models are also insensitive to random noise in samples, and the supernet is the set of lots of subnets. To make the supernet insensitive to random noise in samples, these robust subnets will have bigger architecture weights than others. We sample some random examples from the perturbation set as noise examples instead of adversarial examples. We call the method RNAS-uniform simply. Then, the RNAS-uniform can be formulated as follows:

		$\displaystyle\mathop{min}\limits_{\alpha}\mathcal{L}_{val}(w^{*}(\alpha),\alpha)+\lambda\mathop{U}\limits_{x^{\prime}\in S}f(m(x^{\prime}),m(x))$		(5)
		$\displaystyle\leavevmode\resizebox{422.77661pt}{}{$\text{ s.t. }w^{*}(\alpha)=\mathop{argmin}\limits_{w}[\mathcal{L}_{\text{train }}(w,\alpha)+\lambda\mathop{U}\limits_{x^{\prime}\in S}f(m(x^{\prime}),m(x))],$}$		(5)

where $x^{\prime}$ is represented as noise examples, and $U$ is represented to sample a perturbation example uniformly from the perturbation set. Eq. 5 becomes an bi-level optimization problem, and we only need to spend a little extra consumption cost to generate noise examples. Although we leverage noise examples to train NAS to generate architectures with worse robustness and lower accuracy than adversarial examples, it can reduce search time a lot.

Algorithm 2 RNAS-uniform

Input: supernet

m

, similarity metric

f

Initialize architecture parameters

\alpha

and network parameters

w

while not converged do

Generate perturbation valid examples by sampling at random from the perturbation valid set;

Update

\alpha

by optimizing

\mathop{min}\limits_{\alpha}\mathcal{L}_{val}(w^{*}(\alpha),\alpha)+\lambda\mathop{U}\limits_{x^{\prime}\in S}f(m(x^{\prime}),m(x))

;

Generate perturbation training examples by sampling at random from the perturbation training set;

Update

w

by optimizing

\mathop{argmin}\limits_{w}[\mathcal{L}_{\text{train }}(w,\alpha)+\lambda\mathop{U}\limits_{x^{\prime}\in S}f(m(x^{\prime}),m(x))]

;

end while

Fig. 2 introduces the main framework of RNAS-uniform, and the algorithm of RNAS-max is represented as Alg. 2. At first, we generate perturbation valid examples by sampling uniformly from the perturbation valid set. Then, we use the perturbation valid examples and the natural valid examples as input, and update architecture parameters $\alpha$ by optimizing $\mathop{min}\limits_{\alpha}\mathcal{L}_{val}(w^{*}(\alpha),\alpha)+\lambda\mathop{U}\limits_{x^{\prime}\in S}f(m(x^{\prime}),m(x))$ . We also alternate optimizing architecture parameters $\alpha$ and network parameters $w$ as DARTS. Thus, we generate perturbation training examples by sampling uniformly from the perturbation training set. Afterwards, we use the perturbation training examples and the natural valid examples as input, and update network parameters $w$ by optimizing $\mathop{argmin}\limits_{w}[\mathcal{L}_{\text{train }}(w,\alpha)+\lambda\mathop{U}\limits_{x^{\prime}\in S}f(m(x^{\prime}),m(x))]$ . We use the way to optimize the problem (Eq. 5) several steps. Finally, we derive the optimal subnet based on architecture parameters.

4 Experiments

4.1 Architecture Search

CIFAR-10 includes 50K training images and 10K testing images, and each image’s spatial resolution is $32\times 32$ . These images belong to 10 classes. When searching the architecture, we divide these training images into two subnets, i.e., the training subnet and the valid subnet. The training subnet is used to optimize network parameters, and the valid subnet is leveraged to optimize architecture parameters.

DARTS is one of the most popular NAS methods, and we use DARTS as a representative method to combine our method to search the architecture with good performance. Similar to standard DARTS, we aim to search for a normal cell and a reduction cell to build a network with good performance on the image classification task. Our method has two submethods, i.e., RNAS-max and RNAS-uniform, and we introduce the experiment settings of the two submethods as follows.

We introduce the experiment setting of RNAS-max at first. Most of our experiment settings are consistent with DARTS, and we make a few modifications to others. We run rNAS for a total of 50 epochs to search the architecture, namely total epochs $T$ for search is 50. The batch size $m$ of our experiment is 64. The regularization coefficient $\lambda$ is set as 1.0. The neural network at initialization does not contain valuable information, so we first warm up the neural network for 15 epochs, i.e., epochs $W$ for warm-up is 15. We use PGD to generate adversarial examples. In our experiment settings, we use 10-PGD to generate adversarial examples, i.e., num steps $K$ is 10. Further, the step size $\eta$ of 10-PGD is 0.003, and the total perturbation scale $\epsilon$ is 0.031. We use momentum SGD to update network parameters $w$ , and its initial learning rate is 0.025, momentum is 0.9, and weight decay factor 3e-4. Then, we use Adam to update architecture parameters $\alpha$ , and its initial learning rate is 3e-4, momentum is $(0.5,0.999)$ , and weight decay factor is 1e-3. We spend 4.3 GPU days generating the architecture.

Table 1: Comparison with state-of-the-art image classifiers on CIFAR-10.

Architecture

Test Err.

(%)

Params

(M)

Search Cost

(GPU-days)

Method

DenseNet-BC [17]

3.46

25.6

manual

NASNet-A [18]

2.65

3.3

1800

AmoebaNet-A [19]

3.34

\pm

0.06

3.2

3150

evolution

AmoebaNet-B [19]

2.55

\pm

0.05

2.8

3150

evolution

PNAS [liu2018progressive]

3.41

\pm

0.09

3.2

225

SMBO

ENAS [20]

2.89

4.6

0.5

DARTS (

1^{\text{st}}

order) [1]

3.00

\pm

0.14

3.3

0.4

gradient

DARTS (

2^{\text{nd}}

order) [1]

2.76

\pm

0.09

3.3

gradient

SNAS (mild) [21]

2.98

2.9

1.5

gradient

ProxylessNAS [22]

2.08

gradient

P-DARTS [23]

2.5

3.4

0.3

gradient

PC-DARTS [24]

2.57

\pm

0.07

3.6

0.1

gradient

SDARTS-RS [25]

2.67

\pm

0.03

3.4

0.4

gradient

GDAS [26]

2.93

3.4

0.3

gradient

R-DARTS (L2) [27]

2.95

\pm

0.21

1.6

gradient

SGAS (Cri 1. avg) [28]

2.66

\pm

0.24

3.7

0.25

gradient

DARTS-PT [29]

2.61

\pm

0.08

3.0

0.8

gradient

RNAS-max

2.65

3.4

4.3

gradient

RNAS-uniform

2.60

3.4

0.5

gradient

The experiment settings of RNAS-uniform are also consistent with DARTS and RNAS-max mostly. The total epochs $T$ of RNAS-uniform are also 50, including 15 epochs $W$ for warm_up. The batch size $m$ and regularization coefficient $\lambda$ of RNAS-uniform are also consistent with RNAS-max, i.e., 64 and 1.0. We sample uniformly noise examples from the perturb set, and the perturbation bound $\epsilon$ is 0.031. The optimizers to optimize architecture parameters $\alpha$ and network parameters $w$ are consistent with RNAS-max, and the hyperparameters of the two optimizers are also the same as RNAS-max. We spend 0.5 GPU days to generate our architecture, and the search time is only slightly slower than DARTS.

4.2 Architecture Evaluation on CIFAR-10

We use these cells searched by RNAS to build networks and evaluate the performance of these networks on CIFAR-10. We use 20 cells to build a network, including two reduction cells located at 1/3 and 2/3 of the total depth of the network. We run the training process 600 epochs, and the batch size of our training set is 96. We also use SGD to optimize the network, and the initial learning rate is 0.025. We also cutout and auxiliary towers to help us train the networks; the length of the cutout is 16, the weight of auxiliary towers is 0.4, and the probability of path dropout is 0.3.

Table 1 summarizes evaluation results and comparison with state-of-the-art approaches. Table 1 shows that RNAS-max achieves a 2.65% test error on CIFAR10 with a search cost of 4.3 GPU-days. Although the search time of RNAS-max is beyond DARTS a lot, the performance is far better than standard DARTS; the params of the architecture generated by RNAS-max is 3.4M, and it is a little larger than the architecture searched by DARTS and smaller than the architecture searched by some variants of DARTS, e.g., SGAS [28], PC-DARTS [24]. RNAS-uniform achieves a 2.60% test error on CIFAR-10 with a search cost of 0.5 GPU-days. The performance of RNAS-uniform is far better than standard DARTS, while the search time of RNAS-uniform is 0.5 GPU days, less than DARTS ( $2^{\text{nd}}$ order), and a little more than DARTS ( $1^{\text{st}}$ order). The parameters of the architecture searched by RNAS-uniform are also 3.4M, and the architecture is lightweight enough to be run on mobile devices. In general, RNAS makes a big improvement on DARTS, and can search good architectures with high accuracy and good robustness.

Table 2: Evaluation of robust accuracy on CIFAR-10 under adversarial attacks.

		Adversarially Trained			Standard Trained
Model	Params	Clean	FGSM	PGD²⁰	FGSM
ResNet-18 [30]	11.2M	84.09%	54.64%	45.86%	50.71%
DenseNet-121 [17]	7.0M	85.95%	58.46%	50.49%	45.51%
DARTS [1]	3.3M	85.17%	58.74%	50.45%	50.56%
PDARTS [23]	3.4M	85.37%	59.12%	51.32%	54.51%
RobNet-free [8]	5.6M	85.00%	59.22%	52.09%	36.99%
RACL [11]	3.6M	84.63%	58.57%	50.62%	52.38%
RNAS-max	3.4M	86.30%	59.59%	52.65%	53.67%
RNAS-uniform	3.4M	85.42%	55.36%	47.52%	53.74%

4.3 Robustness of Architecture

We evaluate the robustness of architectures that are standard and adversarially trained on CIFAR-10 by using lots of adversarial attacks. Standard training is to leverage natural examples to train the networks. The experiment settings of standard training are the same as DARTS. Adversarial training is to use adversarial examples to train the networks. In our paper, we use 7-step PGD to train the networks; the step size is 0.01 and the perturbation scale is 0.031. We use FGSM to attack standard-trained architectures; FGSM and PGD²⁰ are used to attack adversarially-trained architectures. Table 2 shows the evaluation results on CIFAR-10 under adversarial attacks. RNAS-max achieves good robust accuracy by standard training or adversarially training, it gets the highest robust accuracy than other methods. However, RNAS-uniform has bad performance on robustness, and it means that RNAS-uniform sacrifices the robustness of the architecture and has a little effect on accuracy by using noise examples as input. In the further, we seek a better method to improve the robustness and accuracy of the architectures searched by NAS with low consumption cost.

5 Conclusion

In our works, we propose a novel NAS method, RNAS. RNAS designs a regularization term to balance accuracy and robustness during the search process of NAS. To reduce the search time of RNAS, we try to use noise examples instead of adversarial examples to train NAS. Experiment results shows the RNAS is effective and achieve SOTA performance on image classification and adversarial attacks. In future, we will try to design a more efficient NAS method to search architectures with better performance.

References

[1] Hanxiao Liu, Karen Simonyan, and Yiming Yang, “Darts: Differentiable architecture search,” arXiv preprint arXiv:1806.09055, 2018.
[2] Xunyu Zhu, Jian Li, Yong Liu, Jun Liao, and Weiping Wang, “Operation-level progressive differentiable architecture search,” in 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 2021, pp. 1559–1564.
[3] Xunyu Zhu, Jian Li, Yong Liu, and Weiping Wang, “Improving differentiable architecture search via self-distillation,” arXiv preprint arXiv:2302.05629, 2023.
[4] Zichao Guo, Xiangyu Zhang, Haoyuan Mu, Wen Heng, Zechun Liu, Yichen Wei, and Jian Sun, “Single path one-shot neural architecture search with uniform sampling,” in European Conference on Computer Vision. Springer, 2020, pp. 544–560.
[5] Ren Pang, Zhaohan Xi, Shouling Ji, Xiapu Luo, and Ting Wang, “On the security risks of automl,” 2021.
[6] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu, “Towards deep learning models resistant to adversarial attacks,” in International Conference on Learning Representations, 2018.
[7] Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, and Aleksander Madry, “Robustness may be at odds with accuracy,” arXiv preprint arXiv:1805.12152, 2018.
[8] Minghao Guo, Yuzhe Yang, Rui Xu, Ziwei Liu, and Dahua Lin, “When nas meets robustness: In search of robust architectures against adversarial attacks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 631–640.
[9] Yanxi Li, Zhaohui Yang, Yunhe Wang, and Chang Xu, “Neural architecture dilation for adversarial robustness,” Advances in Neural Information Processing Systems, vol. 34, 2021.
[10] Karen Simonyan and Andrew Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[11] Minjing Dong, Yanxi Li, Yunhe Wang, and Chang Xu, “Adversarially robust neural architectures,” arXiv preprint arXiv:2009.00902, 2020.
[12] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu, “Towards deep learning models resistant to adversarial attacks,” 2019.
[13] Battista Biggio, Blaine Nelson, and Pavel Laskov, “Poisoning attacks against support vector machines,” 2013.
[14] Jingfeng Zhang, Xilie Xu, Bo Han, Gang Niu, Lizhen Cui, Masashi Sugiyama, and Mohan Kankanhalli, “Attacks which do not kill training make adversarial learning stronger,” in International conference on machine learning. PMLR, 2020, pp. 11278–11287.
[15] Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein, “Adversarial training for free!,” Advances in Neural Information Processing Systems, vol. 32, 2019.
[16] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013.
[17] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708.
[18] Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V Le, “Learning transferable architectures for scalable image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8697–8710.
[19] Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le, “Regularized evolution for image classifier architecture search,” in Proceedings of the aaai conference on artificial intelligence, 2019, vol. 33, pp. 4780–4789.
[20] Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean, “Efficient neural architecture search via parameters sharing,” in International conference on machine learning. PMLR, 2018, pp. 4095–4104.
[21] Sirui Xie, Hehui Zheng, Chunxiao Liu, and Liang Lin, “Snas: stochastic neural architecture search,” arXiv preprint arXiv:1812.09926, 2018.
[22] Han Cai, Ligeng Zhu, and Song Han, “Proxylessnas: Direct neural architecture search on target task and hardware,” arXiv preprint arXiv:1812.00332, 2018.
[23] Xin Chen, Lingxi Xie, Jun Wu, and Qi Tian, “Progressive differentiable architecture search: Bridging the depth gap between search and evaluation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1294–1303.
[24] Yuhui Xu, Lingxi Xie, Xiaopeng Zhang, Xin Chen, Guo-Jun Qi, Qi Tian, and Hongkai Xiong, “Pc-darts: Partial channel connections for memory-efficient architecture search,” arXiv preprint arXiv:1907.05737, 2019.
[25] Xiangning Chen and Cho-Jui Hsieh, “Stabilizing Differentiable Architecture Search via Perturbation-based Regularization,” arXiv e-prints, p. arXiv:2002.05283, Feb. 2020.
[26] Xuanyi Dong and Yi Yang, “Searching for a robust neural architecture in four gpu hours,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1761–1770.
[27] Arber Zela, Thomas Elsken, Tonmoy Saikia, Yassine Marrakchi, Thomas Brox, and Frank Hutter, “Understanding and robustifying differentiable architecture search,” arXiv preprint arXiv:1909.09656, 2019.
[28] Guohao Li, Guocheng Qian, Itzel C Delgadillo, Matthias Muller, Ali Thabet, and Bernard Ghanem, “Sgas: Sequential greedy architecture search,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1620–1630.
[29] Ruochen Wang, Minhao Cheng, Xiangning Chen, Xiaocheng Tang, and Cho-Jui Hsieh, “Rethinking architecture selection in differentiable NAS,” in International Conference on Learning Representations, 2021.
[30] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.