This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Fujitsu, Kanagawa, Japan 22institutetext: Inria & ENS \mid PSL & CNRS, Paris, France

Verifying Attention Robustness of Deep Neural Networks against Semantic Perturbations

Satoshi Munakata 11    Caterina Urban 22    Haruki Yokoyama 11    Koji Yamamoto 11    Kazuki Munakata 11
Abstract

It is known that deep neural networks (DNNs) classify an input image by paying particular attention to certain specific pixels; a graphical representation of the magnitude of attention to each pixel is called a saliency-map. Saliency-maps are used to check the validity of the classification decision basis, e.g., it is not a valid basis for classification if a DNN pays more attention to the background rather than the subject of an image. Semantic perturbations can significantly change the saliency-map. In this work, we propose the first verification method for attention robustness, i.e., the local robustness of the changes in the saliency-map against combinations of semantic perturbations. Specifically, our method determines the range of the perturbation parameters (e.g., the brightness change) that maintains the difference between the actual saliency-map change and the expected saliency-map change below a given threshold value. Our method is based on activation region traversals, focusing on the outermost robust boundary for scalability on larger DNNs. Experimental results demonstrate that our method can show the extent to which DNNs can classify with the same basis regardless of semantic perturbations and report on performance and performance factors of activation region traversals.

Keywords:
feed-forward ReLU neural networks, robustness certification, semantic perturbations, saliency-map consistency, traversing

1 Introduction

Classification Robustness. Deep neural networks (DNN) are dominant solutions in image classification [13]. However, quality assurance is essential when DNNs are used in safety-critical systems, for example, cyber-physical systems such as self-driving cars and medical diagnosis [1]. From an assurance point of view, the robustness of the classification against input perturbations is one of the key properties, and thus, it has been studied extensively [11]. [4, 6] reported that despite input images can be perturbed in the real world by various mechanisms, such as brightness change and translation; even small semantic perturbations can change the classification labels for DNNs. Therefore, it is essential to verify classification robustness against semantic perturbations. Several methods have already been proposed to compute the range of perturbation parameters (e.g., the amount of brightness change and translation) that do not change classification labels [2, 16].

Classification Validity. It is known that DNNs classify an input image by paying particular attention to certain specific pixels in the image; a graphical representation of the magnitude of attention to each pixel, like a heatmap, is called saliency-map [21, 27]. A saliency-map can be obtained from the gradients of DNN outputs with respect to an input image, and it is used to check the validity of the classification decision basis. For instance, if a DNN classifies the subject type by paying attention to a background rather than the subject to be classified in an input image (as in the case of “Husky vs. Wolf [19]”), it is not a valid basis for classification. We believe that such low validity classification should not be accepted in safety-critical situations, even if the classification labels are correct. Semantic perturbations can significantly change the saliency-maps [17, 7, 8]. However, existing robustness verification methods only target changes in the classification labels and not the saliency-maps.

Our Approach: Verifying Attention Robustness. In this work, we propose the first verification method for attention robustness, i.e., the local robustness of the changes in the saliency-map against combinations of semantic perturbations. Specifically, our method determines the range of the perturbation parameters (e.g., the brightness change) that maintains the difference between (a) the actual saliency-map change and (b) the expected saliency-map change below a given threshold value. Regarding the latter (b), brightness change keeps the saliency-map unchanged, whereas translation moves one along with the image. Although the concept of such difference is the same as saliency-map consistency used in semi-supervised learning [7, 8], for the sake of verification, it is necessary to calculate the minimum and maximum values of the difference within each perturbation parameter sub-space. Therefore, we focus on the fact that DNN output is linear with respect to DNN input within an activation region [9]. That is, the actual saliency-map calculated from the gradient is constant within each region; thus, we can compute the range of the difference by sampling a single point within each region if the saliency-map is expected to keep, while by convex optimization if the saliency map is expected to move. Our method is based on traversing activation regions on a DNN with layers for classification and semantic perturbations; it is also possible to traverse (i.e., verify) all activation regions in a small DNN or traverse only activation regions near the outermost robust boundary in a larger DNN. Experimental results demonstrate that our method can show the extent to which DNNs can classify with the same basis regardless of semantic perturbations and report on performance and performance factors of activation region traversals.

Contributions. Our main contributions are:

  • We formulate the problem of attention robustness verification; we then propose a method for verifying attention robustness for the first time. Using our method, it is also possible to traverse and verify all activation regions or only ones near the outermost decision boundary.

  • We implement our method in a python tool and evaluate it on DNNs trained with popular datasets; we then show the specific performance and factors of verifying attention robustness. In the context of traversal verification methods, we use the largest DNNs for performance evaluation.

2 Overview

[Uncaptioned image]
Figure 1: Misclassifications caused by combinations of semantic perturbations.
[Uncaptioned image]
Figure 2: Perturbation-induced changes in images (first row), saliency-maps (second row) and the metric quantified the degree of collapse of each saliency-map (third row); where δ\delta denotes the threshold to judge a saliency-map is valid or not.
[Uncaptioned image]
Figure 3: The outermost boundaries of classification robustness (left) and attention robustness (right); the origin at the bottom-left corresponds to the input image without perturbation, and each plotted point denotes the perturbed input image (middle). The shapes of the boundaries indicate the existence of regions that the DNN successfully classifies without sufficient evidence.
[Uncaptioned image]
Figure 4: Differences in changes in saliency-maps for two DNNs. Each saliency-map of DNN-1 above is more collapsed than DNN-2’s: where columns (O), (B), (P), and (T) denote original (i.e., without perturbations), brightness change, patch, and translation, respectively.

Situation. Suppose a situation where we have to evaluate the weaknesses of a DNN for image classification against combinations of semantic perturbations caused by differences in shooting conditions, such as lighting and subject position. For example, as shown in Figure 4, the original label of the handwritten text image is “0”; however, the DNN often misclassifies it as the other labels, with changes in brightness, patch, and translations. Therefore, we want to know in advance the ranges of semantic perturbation parameters that are likely to cause such misclassification as a weakness of the DNN for each typical image. However, classification robustness is not sufficient for capturing such weaknesses in the following cases.

Case 1. Even if the brightness changes so much that the image is not visible to humans, the classification label of the perturbed image may happen to match the original label. Then vast ranges of the perturbation parameters are evaluated as robust for classification; however, such overestimated ranges are naturally invalid and unsafe. For instance, Figure 4 shows the changes in MNIST image “8” and the actual saliency-map when the brightness is gradually changed; although the classification seems robust because the labels of each image are the same, the collapsed saliency-maps indicate that the DNN does not pay proper attention to text “8” in each image. Therefore, our approach uses the metric attention inconsistency, which quantifies the degree of collapse of a saliency-map, to further evaluate the range of the perturbation parameter as satisfying the property attention robustness; i.e., the DNN is paying proper attention as well as the original image. Attention inconsistency is a kind of distance (cf. Figure 4) between an actual saliency-map (second row) and an expected one (third row); e.g., the saliency-map of DNN-1 for translation perturbation (column (T)) is expected to follow image translation; however, if it is not, then attention inconsistency is high. In addition, Figure 4 shows an example of determining that attention robustness is satisfied if each attention inconsistency value (third row) is less than or equal to threshold value δ\delta.

Case 2. The classification label often changes by combining semantic perturbations, such as brightness change and patch, even for the perturbation parameter ranges that each perturbation alone could be robust. It is important to understand what combinations are weak for the DNN; however, it is difficult to verify all combinations as there are many semantic perturbations assumed in an operational environment. In our observations, a perturbation that significantly collapses the saliency-map is more likely to cause misclassification when combined with another perturbation because another perturbation can change the intensity of pixels to which the DNN should not pay attention. Therefore, to understand the weakness of combining perturbations, our approach visualizes the outermost boundary at which the sufficiency of robustness switches on the perturbation parameter space. For instance, Figure 4 shows connected regions that contain the outermost boundary for classification robustness (left side) or attention robustness (right side). The classification boundary indicates that the DNN can misclassify the image with a thin patch and middle brightness. In contrast, the attention boundary further indicates that the brightness change can collapse the saliency-map more than patching, so we can see that any combinations with the brightness change pose a greater risk. Even when the same perturbations are given, the values of attention inconsistency for different DNNs are usually different (cf. Figure 4); thus, it is better to evaluate what semantic perturbation poses a greater risk for each DNN.

3 Problem Formulation

Our method targets feed-forward ReLU-activated neural networks (ReLU-FNNs) for image classification. A ReLU-FNN image classifier is a function f:XYf\colon X\to Y mapping an NfN^{f}-dimensional (pixels ×\times color-depth) image xXNfx\in X\subseteq\mathbb{R}^{N^{f}} to a classification label argmaxjYfj(x)argmax_{j\in Y}f_{j}(x) in the KfK^{f}-class label space Y={1,,Kf}Y=\{1,\dots,K^{f}\}, where fj:Xf_{j}:X\to\mathbb{R} is the confidence function for the j-th class.

The ReLU activation function occurs in between the linear maps performed by the ReLU-FNN layers and applies the function max(0,xl,n)max(0,x_{l,n}) to each neuron xl,nx_{l,n} in a layer lLfl\in L^{f} (where LfL^{f} is the number of layers of ReLU-FNN ff). When xl,n>0x_{l,n}>0, we say that xl,nx_{l,n} is active; otherwise, we say that xl,nx_{l,n} is inactive. We write apf(x)ap^{f}(x) for the activation pattern of an image xx given as input to a ReLU-FNN ff, i.e., the sequence of neuron activation statuses in ff when xx is taken as input. We write APfAP^{f} for the entire set of activation patterns of a ReLU-FNN ff.

Given an activation pattern pAPfp\in AP^{f}, we write arf(p)ar^{f}(p) for the corresponding activation region, i.e., the subset of the input space containing all images that share the same activation pattern: xarf(p)apf(x)=px\in ar^{f}(p)\Leftrightarrow ap^{f}(x)=p. Note that, neuron activation statuses in an activation pattern pp yield half-space constraints in the input space [9, 12]. Thus, an activation region arf(p)ar^{f}(p) can equivalently be represented as a convex polytope described by the conjunction of the half-space constraints resulting from the activation pattern pp.

3.0.1 Classification Robustness.

A semantic perturbation is a function g:Θ×XXg:\Theta\times X\to X applying a perturbation with NgN^{g} parameters θΘNg\theta\in\Theta\subseteq\mathbb{R}^{N^{g}} to an image xXx\in X to yield a perturbed image g(θ,x)def=gNg(θNg,)g1(θ1,x)=gNg(θNg,g1(θ1,x),)Xg(\theta,x)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}g_{N^{g}}(\theta_{N^{g}},\cdot)\circ\dots\circ g_{1}(\theta_{1},x)=g_{N^{g}}(\theta_{N^{g}},\dots g_{1}(\theta_{1},x),\dots)\in X, where gi:×XXg_{i}:\mathbb{R}\times X\to X performs the ii-th atomic semantic perturbation with parameter θi\theta_{i} (with gi(0,x)=xg_{i}(0,x)=x for any image xXx\in X). For instance, a brightness decrease perturbation gbg_{b} is a(n atomic) semantic perturbation function with a single brightness adjustment parameter β0\beta\geq 0: gb(β,x)def=ReLU(x1β)g_{b}(\beta,x)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}ReLU(x-\vec{1}\beta).

Definition 1 (Classification Robustness)

A perturbation region ηΘ\eta\subset\Theta satisfies classification robustness — written CR(x;η)CR(x;\eta) — if and only if the classification label f(g(θ,x))f(g(\theta,x)) is the same as f(x)f(x) when the perturbation parameter θ\theta is within η\eta: CR(x;η)def=θη.f(x)=f(g(θ,x))CR(x;\eta)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}\forall\theta\in\eta.\;f(x)=f(g(\theta,x)).

Vice versa, we define misclassification robustness when f(g(θ,x))f(g(\theta,x)) is always different from f(x)f(x) when θ\theta is within η\eta: MR(x;η)def=θη.f(x)f(g(θ,x))MR(x;\eta)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}\forall\theta\in\eta.\;f(x)\neq f(g(\theta,x)).

The classification robustness verification problem ProbCRdef=(f,g,x0,Θ)Prob^{CR}\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}(f,g,x0,\Theta) consists in enumerating, for a given input image x0x0, the perturbation parameter regions ηCR,ηMRΘ\eta^{CR},\eta^{MR}\subset\Theta respectively satisfying CR(x0;ηCR)CR(x0;\eta^{CR}) and MR(x0;ηMR)MR(x0;\eta^{MR}).

3.0.2 Attention Robustness.

We generalize the definition of saliency-map from [21] to that of an attention-map, which is a function mapj:XXmap_{j}:X\to X from an image xXx\in X to the heatmap image mjXm_{j}\in X plotting the magnitude of the contribution to the j-th class confidence fj(x)f_{j}(x) for each pixel of xx. Specifically, mapj(x)def=filter(fj(x)x1,,fj(x)xNf)map_{j}(x)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}filter\Bigl{(}\frac{\partial f_{j}(x)}{\partial x_{1}},\dots,\frac{\partial f_{j}(x)}{\partial x_{N^{f}}}\Bigr{)}, where filter()filter(\cdot) is an arbitrary image processing function (such as normalization and smoothing) and, following [21, 28], the magnitude of the contribution of each pixel x1,,xNfx_{1},\dots,x_{N^{f}} is given by the gradient with respect to the jj-th class confidence. When filter(x)def=|x|filter(x)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}|x|, our definition of mapjmap_{j} matches that of saliency-map in [21]. Note that, within an activation region arf(p)ar^{f}(p), fjf_{j} is linear [9] and thus the gradient fj(x)xi\frac{\partial f_{j}(x)}{\partial x_{i}} is a constant value.

We expect attention-maps to change consistently with respect to a semantic image perturbation. For instance, for a brightness change perturbation, we expect the attention-map to remain the same. Instead, for a translation perturbation, we expect the attention-map to be subject to the same translation. In the following, we write g~()\tilde{g}(\cdot) for the attention-map perturbation corresponding to a given semantic perturbation g()g(\cdot). We define attention inconsistency as the difference between the actual and expected attention-map after a semantic perturbation:

ai(x;θ)def=jYdist(mapj(g(θ,x)),g~(θ,mapj(x)))ai(x;\theta)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}\sum_{j\in Y}dist\Bigl{(}map_{j}\bigl{(}g(\theta,x)\bigr{)},\tilde{g}\bigl{(}\theta,map_{j}(x)\bigr{)}\Bigr{)}

where dist:X×Xdist\colon X\times X\to\mathbb{R} is an arbitrary distance function such as Lp-norm (xxp||x-x^{\prime}||_{p}). Note that, when dist()dist(\cdot) is L2-norm, our definition of attention inconsistency coincides with the definition of saliency-map consistency given by [7].

Definition 2 (Attention Robustness)

A perturbation region ηΘ\eta\subset\Theta satisfies attention robustness — written AR(x;η,δ)AR(x;\eta,\delta) — if and only if the attention inconsistency is always less than or equal to δ\delta when the perturbation parameter θ\theta is within η\eta: AR(x;η,δ)def=θη.ai(x;θ)δAR(x;\eta,\delta)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}\forall\theta\in\eta.\;ai(x;\theta)\leq\delta.

When the attention inconsistency is always greater than δ\delta, we have inconsistency robustness: IR(x;η,δ)def=θη.ai(x;θ)>δIR(x;\eta,\delta)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}\forall\theta\in\eta.\;ai(x;\theta)>\delta.

The attention robustness verification problem ProbARdef=(f,g,x0,Θ,δ)Prob^{AR}\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}(f,g,x0,\Theta,\delta) consists in enumerating, for a given input image x0x0, the perturbation parameter regions ηAR,ηIRΘ\eta^{AR},\eta^{IR}\subset\Theta respectively satisfying AR(x0;ηAR,δ)AR(x0;\eta^{AR},\delta) and IR(x0;ηIR,δ)IR(x0;\eta^{IR},\delta).

3.0.3 Outermost Boundary Verification.

Refer to caption
Figure 5: Illustration of outermost CR boundary on a 2-dimensional perturbation parameter space. The origin 0\vec{0} is the original image without perturbation (θ1=θ2=0\theta_{1}=\theta_{2}=0).

In practice, to represent the trend of the weakness of a ReLU-FNN image classifier to a semantic perturbation, we argue that it is not necessary to enumerate all perturbation parameter regions within a perturbation parameter space Θ\Theta. Instead, we search the outermost CRCR/ARAR boundary, that is, the perturbation parameter regions η\eta that lay on the CRCR/ARAR boundary farthest away from the original image.

An illustration of the outermost CR boundary is given in Figure 5. More formally, we define the outermost CRCR boundary as follows:

Definition 3 (Outermost CRCR Boundary)

The outermost CRCR boundary of a classification robustness verification problem, obCR(ProbCR)ob^{CR}(Prob^{CR}), is a set of perturbation parameter regions HS𝒫(Θ)HS\subset\mathcal{P}(\Theta) such that:

  1. 1.

    for all perturbation regions ηHS\eta\in HS, there exists a path connected-space from the original image x0x0 (i.e., 0Θ\vec{0}\in\Theta) to η\eta that consists of regions satisfying CRCR (written Reachable(η;x0)Reachable(\eta;x0));

  2. 2.

    all perturbation regions ηHS\eta\in HS lay on the classification boundary, i.e., θ,θη.f(g(θ,x0))=f(x0)f(g(θ,x0))f(x0)\exists\theta,\theta^{\prime}\in\eta.\;f(g(\theta,x0))=f(x0)\land f(g(\theta^{\prime},x0))\neq f(x0);

  3. 3.

    there exists a region ηHS\eta\in HS that contains the farthest reachable perturbation parameter point θ~\tilde{\theta} from the original image, i.e., θ~=maxθΘθ2\tilde{\theta}=max_{\theta\in\Theta}||\theta||_{2} such that Reachable({θ};x0)Reachable(\{\theta\};x0).

The definition of the outermost ARAR boundary is analogous. Note that not all perturbation regions inside the outermost CRCR/ARAR boundary satisfy the CRCR/ARAR property (cf. the enclaves in Figure 5).

The outermost CRCR boundary verification problem and outermost ARAR boundary verification problem ProbobCR=(f,g,x0,Θ)Prob^{CR}_{ob}=(f,g,x0,\Theta) and ProbobAR=(f,g,x0,Θ,δ)Prob^{AR}_{ob}=(f,g,x0,\Theta,\delta) consist in enumerating, for a given input image x0x0, the perturbation parameter regions ηobCR\eta^{CR}_{ob} and ηobAR\eta^{AR}_{ob}) that belong to the outermost CRCR and ARAR boundary obCR(ProbCR)ob^{CR}(Prob^{CR}) and obAR(ProbAR)ob^{AR}(Prob^{AR}).

4 Geometric Boundary Search (GBS)

In the following, we describe our Geometric Boundary Search (GBS) method for solving ProbobCRProb^{CR}_{ob}, and ProbobARProb^{AR}_{ob} shown in Algorithm 1 and 2. In Appendix I.8, we also describe our Breadth-First Search (BFS) method for solving ProbCRProb^{CR}, and ProbARProb^{AR}.

4.1 Encoding Semantic Perturbations

After some variables initialization (cf. Line 1 in Algorithm 1), the semantic perturbation gg is encoded into a ReLU-FNN gx0:ΘXg^{x0}:\Theta\to X (cf. Line 2).

In this paper, we focus on combinations of atomic perturbations such as brightness change (B), patch placement (P), and translation (T). Nonetheless, our method is applicable to any semantic perturbation as long as it can be represented or approximated with sufficient accuracy.

For the encoding, we follow [16] and represent (combinations of) semantic perturbations as a piecewise linear function by using affine transformations and ReLUs. For instance, a brightness decrease perturbation gb(β,x0)def=ReLU(x01β)g_{b}(\beta,x0)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}ReLU(x0-\vec{1}\beta) (cf. Section 3.0.1) can be encoded as a ReLU-FNN as follows:

gb(β,x0)encode[100010001]ReLU([110010101001][βx01x0Nf])+0g_{b}(\beta,x0)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr encode\cr\kern 1.0pt\cr$\longrightarrow$\cr\kern 0.0pt\cr}}}\begin{bmatrix}1&0&\dots&0\\ 0&1&\dots&0\\ &&\dots&\\ 0&0&\dots&1\\ \end{bmatrix}\>ReLU\left(\begin{bmatrix}-1&1&0&\dots&0\\ -1&0&1&\dots&0\\ &&&\dots&\\ -1&0&0&\dots&1\\ \end{bmatrix}\begin{bmatrix}\beta\\ x0_{1}\\ \dots\\ x0_{N^{f}}\\ \end{bmatrix}\right)+\vec{0}

which we can combine with the given ReLU-FNN ff to obtain the compound ReLU-FNN fgbx0f\circ g_{b}^{x0} to verify. The full encoding for all considered (brightness, patch, translation) perturbations is shown in Appendix I.5.

4.2 Traversing Activation Regions.

GBS then performs a traversal of activation regions of the compound ReLU-FNN fgx0f\circ g^{x0} near the outermost CRCR/ARAR boundary for ProbobCRProb^{CR}_{ob}/ProbobARProb^{AR}_{ob}. Specifically, it initializes a queue QQ with the activation pattern apfgx0(0)ap^{f\circ g^{x0}}(\vec{0}) of the original input image x0x0 with no semantic perturbation, i.e., θ=0\theta=\vec{0} (cf. Line 3 in Algorithm 1, we explain the other queue initialization parameters shortly). Given a queue element qQq\in Q, the functions p(q)p(q), isFollowing(q)isFollowing(q), and lineDistance(q)lineDistance(q) respectively return the 1st, 2nd, and 3rd element of qq.

Then, for each activation pattern pp in QQ (cf. Line 6), GBS reconstructs the corresponding perturbation parameter region η\eta (subroutine constructActivationRegion, Line 7) as the convex polytope resulting from pp (cf. Section 3 and η\eta in Figure 6-(1a)).

Next, for each neuron xl,nx_{l,n} in fgx0f\circ g^{x0} (cf. Line 11), it checks whether its activation status cannot flip within the perturbation parameter space Θ\Theta, i.e., the resulting half-space has no feasible points within Θ\Theta (subroutine isStable, Line 12, cf. half-space h1,5h_{1,5} in Figure 6-(1a)). Otherwise, a new activation pattern pp^{\prime} is constructed by flipping the activation status of xl,nx_{l,n} (subroutine flipped, Line 13) and added to a local queue QQ^{\prime} (cf. Line 9, and  2325) if pp^{\prime} has not been observed already (cf. Line 14) and it is feasible (subroutine calcInteriorPointOnFace, Lines 15-16. cf. point θF\theta^{F} and half-space h1,2h_{1,2} in Figure 6-(1a)).

The perturbation parameter region η\eta is then simplified to η~\tilde{\eta} (subroutine simplified, Line 2 in Algorithm 2; e.g., reducing the half-spaces used to represent η\eta to just h1,2h_{1,2} and h1,3h_{1,3} in Figure 6-(1a)). η~\tilde{\eta} is used to efficiently calculate the range of attention inconsistency within η\eta (subroutine calcRange, Line 3 in Algorithm 2, cf. Section 4.4), and then attention/inconsistency robustness can be verified based on the range (Line 5 and 8 in Algorithm 2). Furthermore, classification/misclassification robustness can be verified in the same way if subroutine calcRangecalcRange returns the range of confidence ff(x0)(gx0(θ))fj(gx0(θ))f_{f(x0)}(g^{x0}(\theta))-f_{j}(g^{x0}(\theta)) within η~\tilde{\eta} (cf. Section 4.4) and δ=0wδ=0\delta=0\land w^{\delta}=0. At last, the local queue QQ^{\prime} is pushed onto QQ (cf. Line 29 in Algorithm 1).

To avoid getting stuck around enclaves inside the outermost CRCR/ARAR boundary (cf. Figure 5) during the traversal of activation regions, GBS switches status when needed between “searching for a decision boundary” and “following a found decision boundary”. The initial status is set to “searching for a decision boundary”, i.e., ¬isFollowing\neg isFollowing when initializing the queue QQ (cf. Line 3). The switch to isFollowingisFollowing happens when region η\eta is on the boundary (i.e., loδuplo\leq\delta\leq up) or near the boundary (i.e., δwδloδ+wδ\delta-w^{\delta}\leq lo\leq\delta+w^{\delta}, cf. Line 15 in Algorithm 2 and Figure 6-(3a,1b,3b)); where wδw^{\delta} is a hyperparameter to determine whether close to the boundary or not. wδw^{\delta} should be greater than 0 to verify attention/inconsistency robustness because attention inconsistency changes discretely for ReLU-FNNs (ch. Section 4.3). GBS can revert back to searching for a decision boundary if when following a found boundary it finds a reachable perturbation parameter region that is farther from 0\vec{0} (cf. Lines 19-20 in Algorithm 1 and Figure 6-(2b)).

Algorithm 1 gbs(f,g,x0,Θ;δ,w)(HCR,HMR,HCB,HAR,HIR,HAB)gbs(f,g,x0,\Theta;\delta,w)\to(H^{CR},H^{MR},H^{CB},H^{AR},H^{IR},H^{AB})
0:  f,g,x0,Θ,δf,g,x0,\Theta,\delta
0:  HCR,HMR,HCB,HAR,HIR,HAB𝒫(Θ)H^{CR},H^{MR},H^{CB},H^{AR},H^{IR},H^{AB}\subset\mathcal{P}(\Theta)
1:  HCR,HMR,HCB,HAR,HIR,HAB{},{},{},{},{},{}H^{CR},H^{MR},H^{CB},H^{AR},H^{IR},H^{AB}\leftarrow\{\},\{\},\{\},\{\},\{\},\{\}
2:  gx0g(,x0)g^{x0}\leftarrow g(\cdot,x0) // currying gg with x0x0; i.e, gx0(θ)=g(θ,x0)g^{x0}(\theta)=g(\theta,x0).
3:  QAPfgx0×𝔹×{(apfgx0(0),,0)}Q\subset AP^{f\circ g^{x0}}\times\mathbb{B}\times\mathbb{R}\leftarrow\{(ap^{f\circ g^{x0}}(\vec{0}),\bot,0)\} // queue for boundary search.
4:  OBSAPfgx0{}OBS\subset AP^{f\circ g^{x0}}\leftarrow\{\} // observed activation patterns.
5:  while #|Q|>0\#|Q|>0 // loop for geometrical-boundary search. do
6:     qpopMaxLineDistance(Q)q\leftarrow popMaxLineDistance(Q); pp(q)p\leftarrow p(q); OBSOBS{p}OBS\leftarrow OBS\cup\{p\}
7:     ηconstructActivationRegion(fgx0,p)\eta\leftarrow constructActivationRegion(f\circ g^{x0},p)
8:     FS×{}FS\subset\mathbb{Z}\times\mathbb{Z}\leftarrow\{\} // (l,n)(l,n) means the nn-th neuron in ll-th layer is a face of η\eta
9:     QAPfgx0×𝔹×{}Q^{\prime}\subset AP^{f\circ g^{x0}}\times\mathbb{B}\times\mathbb{R}\leftarrow\{\} // local queue for an iteration.
10:     // Push each activation region connected to η\eta.
11:     for l=1l=1 to #layers of fgx0f\circ g^{x0}; n=1n=1 to #neurons of the ll-th layer do
12:        continue if isStable(p,l,n,Θ)isStable(p,l,n,\Theta) // skip if activation of xl,nx_{l,n} cannot flip in Θ\Theta.
13:        pflipped(p,l,n)p^{\prime}\leftarrow flipped(p,l,n) // flip activation status for neuron xl,nx_{l,n}.
14:        continue if pOBSp^{\prime}\in OBS else OBSOBS{p}OBS\leftarrow OBS\cup\{p^{\prime}\} // skip if pp^{\prime} was observed.
15:        θFcalcInteriorPointOnFace(η,l,n)\theta^{F}\leftarrow calcInteriorPointOnFace(\eta,l,n)
16:        continue if θF=null\theta^{F}=null // skip if pp^{\prime} is infeasible.
17:        FSFS{(l,n)}FS\leftarrow FS\cup\{(l,n)\} // (l,n) is a face of η\eta.
18:        θLcalcInteriorPointOnLine(η,l,n)\theta^{L}\leftarrow calcInteriorPointOnLine(\eta,l,n)
19:        if isFollowing(q)θLnullθL2>lineDistance(q)isFollowing(q)\land\theta^{L}\neq null\land||\theta^{L}||_{2}>lineDistance(q) then
20:           q(p,,lineDistance(q))q\leftarrow(p,\bot,lineDistance(q)) // Re-found the line in boundary-following.
21:        end if
22:        if ¬isFollowing(q)θLnull\lnot isFollowing(q)\land\theta^{L}\neq null then
23:           QQ{(p,,θL2)}Q^{\prime}\leftarrow Q^{\prime}\cup\{(p^{\prime},\bot,||\theta^{L}||_{2})\} // continue line-search.
24:        else
25:           QQ{(p,isFollowing(q),lineDistance(q))}Q^{\prime}\leftarrow Q^{\prime}\cup\{(p^{\prime},isFollowing(q),lineDistance(q))\} // continue current.
26:        end if
27:     end for
28:     (…Verify η\eta…) // See Algorithm 2 for ARAR/IRIR (analogous for CRCR/MRMR)
29:     QQQQ\leftarrow Q\cup Q^{\prime} // Push
30:  end while
Algorithm 2 (Expanding from Algorithm 1 for AR(x0;η,δ)AR(x0;\eta,\delta)/IR(x0;η,δ)IR(x0;\eta,\delta))
1:  (…Verify η\eta…)
2:  η~simplified(η,FS)\tilde{\eta}\leftarrow simplified(\eta,FS) // limit the constraints on η\eta to FSFS.
3:  (lo,up)calcRange(x0;η~)(lo,up)\leftarrow calcRange(x0;\tilde{\eta}) // the range ([lower and upper) of ai within η~\tilde{\eta}.
4:  nearBoundary(loδup)(δwδloδ+wδ)(δwδupδ+wδ)nearBoundary\leftarrow(lo\leq\delta\leq up)\lor(\delta-w^{\delta}\leq lo\leq\delta+w^{\delta})\lor(\delta-w^{\delta}\leq up\leq\delta+w^{\delta})
5:  if loupδlo\leq up\leq\delta /* satisfying AR */ then
6:     HARHAR{η~}H^{AR}\leftarrow H^{AR}\cup\{\tilde{\eta}\}
7:     Q{}Q^{\prime}\leftarrow\{\} if isFollowing(q)¬nearBoundaryisFollowing(q)\land\lnot nearBoundary // no traversing connected regions.
8:  else if δ<loup\delta<lo\leq up /* satisfying IR */ then
9:     HIRHIR{η~}H^{IR}\leftarrow H^{IR}\cup\{\tilde{\eta}\}
10:     Q{}Q^{\prime}\leftarrow\{\} if ¬nearBoundary\lnot nearBoundary // no traversing connected regions.
11:  else
12:     HABHAB{η~}H^{AB}\leftarrow H^{AB}\cup\{\tilde{\eta}\}
13:  end if
14:  if ¬isFollowing(q)nearBoundary\lnot isFollowing(q)\land nearBoundary then
15:     (…Update QQ^{\prime} such that qQ.isFollowing(q)\forall q^{\prime}\in Q.\;isFollowing(q^{\prime})…) // switch to boundary-following.
16:  end if
Refer to caption
Figure 6: A running example of GBS. The upper row shows the basic traversing flow, while the lower row shows the flow of avoiding enclaves. hl,nh_{l,n} denotes a half-space corresponding to neuron activity pl,np_{l,n}.

4.3 Calculating Attention Inconsistency

Gradients within an Activation Region. Let pAPfgx0p\in AP^{f\circ g^{x0}} be an activation pattern of the compound ReLU-FNN fgx0f\circ g^{x0}. The gradient fj(gx0(θ))θs\frac{\partial f_{j}(g^{x0}(\theta))}{\partial\theta_{s}} is constant within arfgx0(p)ar^{f\circ g^{x0}}(p) (cf. Section 3.0.2). We write gix0(θ)g_{i}^{x0}(\theta) for the ii-th pixel xix_{i} of a perturbed image in {gx0(θ)θarfgx0(p)}X\{g^{x0}(\theta)\mid\theta\in ar^{f\circ g^{x0}}(p)\}\subset X. The gradient gix0θ=xiθ\frac{\partial g_{i}^{x0}}{\partial\theta}=\frac{\partial x_{i}}{\partial\theta} is also a constant value. By the chain rule, we have fj(x)xi=fj(gx0(θ))/θsxi/θs\frac{\partial f_{j}(x)}{\partial x_{i}}=\frac{\partial f_{j}(g^{x0}(\theta))/\partial\theta_{s}}{\partial x_{i}/\theta_{s}}. Thus fj(x)xi\frac{\partial f_{j}(x)}{\partial x_{i}} is also constant. This fact is formalized by the following lemma:

Lemma 1

fj(x)xi=C(x{gx0(θ)θarfgx0(p)})\frac{\partial f_{j}(x)}{\partial x_{i}}=C\;\;(x\in\{g^{x0}(\theta)\mid\theta\in ar^{f\circ g^{x0}}(p)\})

(cf. a small example in Appendix I.7). Therefore, the gradient fj(x)xi\frac{\partial f_{j}(x)}{\partial x_{i}} can be computed as the weights of the j-th class output for ReLU-FNN ff about activation pattern apf(x˙)ap^{f}(\dot{x}); where, x˙=gx0(θ˙)\dot{x}=g^{x0}(\dot{\theta}) and θ˙\dot{\theta} is an arbitrary sample within arfgx0(p)ar^{f\circ g^{x0}}(p) (cf. Appendix I.1).

For the perturbed gradient g~(θ,fj(x)xi)\tilde{g}(\theta,\frac{\partial f_{j}(x)}{\partial x_{i}}), let g~(θ)\tilde{g}(\theta) be the ReLU-FNN gfj(x)xi(θ)g^{\frac{\partial f_{j}(x)}{\partial x_{i}}}(\theta^{\prime}). Thus, the same consideration as above applies.

Attention Inconsistency (ai). We assume both filter()filter(\cdot) and dist()dist(\cdot) are convex downward functions for calculating the maximum / minimum value by convex optimization. Specifically, filter()filter(\cdot) is one of the identity function (I), the absolute function (A), and the 3×33\times 3 mean filter (M). dist()dist(\cdot) is one of the L1L_{1}-norm (L1L_{1}) and the L2L_{2}-norm (L2L_{2}): where, ww is the width of image xXx\in X.

4.4 Verifying CR and AR within an activation region

Our method leverages the fact that the gradient of a ReLU-FNN output with respect to the input is constant within an activation region (cf. Section 3.0.2); thus, CR/MRCR/MR can be resolved by linear programming, and AR/IRAR/IR can be resolved by just only one sampling if the saliency-map is expected to keep or convex optimization if the saliency-map is expected to move.

Verifying CR/MR. When x0x0 is fixed, each activation region of the ReLU-FNN f(g(θ,x0)):ΘYf(g(\theta,x0)):\Theta\to Y is a region in the perturbation parameter space Θ\Theta. Within an activation region ηΘ\eta\subset\Theta of the ReLU-FNN f(g(θ,x0))f(g(\theta,x0)), CR(f,g,x0,η)CR(f,g,x0,\eta) is satisfied if and only if the ReLU-FNN output corresponding to the label of the original image x0x0 cannot be less than the ReLU-FNN outputs of all other labels, i.e., the following equation holds: minjY{f(x0)},θηff(x0)(x)fj(g(θ,x0))>0CR(f,g,x0,η)min_{j\in Y\setminus\{f(x0)\},\theta\in\eta}f_{f(x0)}(x)-f_{j}(g(\theta,x0))>0\Leftrightarrow CR(f,g,x0,\eta) Each DNN output fj(g(θ,x0))f_{j}(g(\theta,x0)) is linear within η\eta, and thus, the left-hand side of the above equation can be determined soundly and completely by using an LP solver (Eq. 3-(c) in Appendix I.1), such as scipy.optimize.linprog 111https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linprog.html. Similarly, MR(f,g,x0,η)MR(f,g,x0,\eta) is satisfied if and only if the ReLU-FNN output corresponding to the label of the original image x0x0 cannot be greater than the ReLU-FNN outputs of any other labels.

Verifying AR/IR. Within an activation region ηΘ\eta\subset\Theta of the ReLU-FNN f(g(θ,x0))f(g(\theta,x0)), AR(f,g,x0,η,δ)AR(f,g,x0,\eta,\delta) is satisfied if and only if the following equation holds: maxθηai(θ,x0)δAR(f,g,x0,η,δ)max_{\theta\in\eta}ai(\theta,x0)\leq\delta\Leftrightarrow AR(f,g,x0,\eta,\delta) If filter()filter(\cdot) and dist()dist(\cdot) are both convex downward functions, as the sum of convex downward functions is also a convex downward function, the left-hand side of the above equation can be determined by comparing the values at both ends. On the other hand, IR(f,g,x0,η,δ)IR(f,g,x0,\eta,\delta) is satisfied if and only if the following equation holds: minθηai(θ,x0)>δIR(f,g,x0,η,δ)min_{\theta\in\eta}ai(\theta,x0)>\delta\Leftrightarrow IR(f,g,x0,\eta,\delta) The left-hand side of the above equation can be determined by using an convex optimizer, such as scipy.optimize.minimize 222https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html Note that if the saliency-map is expected to keep against perturbations, the above optimization is unnecessary because ai(θeta,x0)ai(\theta\in eta,x0) is constant.

5 Experimental Evaluation

To confirm the usefulness of our method, we conducted evaluation experiments.

Setups. Table 1 shows the ReLU-FNNs for classification used in our experiments; each FNN uses different training data and architectures. We inserted semantic perturbation layers (cf. Section 4.1) in front of each ReLU-FNN for classification during each experiment; the layers had total 1,568 neurons.

Table 1: The ReLU-FNNs we used in the experiments: where, “Conv” and “FC” in column Layers denote convolutional and fully connected layers, respectively.
Name #Neurons Layers
M-FNN-100 100 FC×\times2
M-FNN-200 200 FC×\times4
M-FNN-400 400 FC×\times8
M-FNN-800 800 FC×\times16
M-CNN-S 2,028 Conv×\times2,FC×\times1
M-CNN-M 14,824 Conv×\times2,FC×\times1
Name #Neurons Layers
F-FNN-100 100 FC×\times2
F-FNN-200 200 FC×\times4
F-FNN-400 400 FC×\times8
F-FNN-800 800 FC×\times16
F-CNN-S 2,028 Conv×\times2,FC×\times1
F-CNN-M 14,824 Conv×\times2,FC×\times1

All experiments were performed on virtual computational resource “rt_C.small” (with CPU 4 Threads and Memory 30 GiB) of physical compute node “V” (with 2 CPU; Intel Xeon Gold 6148 Processor 2.4 GHz 20 Cores (40 Threads), and 12 Memory; 32 GiB DDR4 2666 MHz RDIMM (ECC)) in AI Bridging Cloud Infrastructure (ABCI) 333https://docs.abci.ai/en/system-overview/ provided by National Institute of Advanced Industrial Science and Technology (AIST).

We implemented our method in a python tool for evaluation; in addition, we will make the tool available at https://zenodo.org/record/6544905 444However, due to our internal procedures, we cannot publish it until at least May 21..

Refer to caption
Figure 7: Experimental results.

RQ1: How much computation time is required to solve each problem?  For each ReLU-FNN, we measured the computation time of algorithms GBS and BFS with 10 images selected from the end of each dataset (i.e., Indexes 69990-69999. These images were not used in the training of any ReLU-FNNs). GBS also distinguishes between gbs-CR that traverses the CR boundary, gbs-AR that traverses the AR boundary, and gbs-CRAR that traverses the boundary of the regions satisfying both CR and AR. Furthermore, we measured above per the combinations of semantic perturbations (BP), (TP), and (TB) that denote “brightness change / patch”, “translation / patch”, and “translation / brightness change”, respectively. We used the definitions filter(x)def=xfilter(x)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}x, dist(x,x)def=xx2dist(x,x^{\prime})\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}||x-x^{\prime}||_{2}, δdef=3.0\delta\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}3.0, and wδ=0.2w^{\delta}=0.2.

Figure 7-(C) shows the trend of increasing computation time with increasing the number of neurons for each algorithm as a box plot on a log scale. Note that each verification timed out at 2 hours, and thus, the upper limit of computation time (y-axis) was 7,200 seconds. We can see that algorithm BFS took more computation time than GBS. Figure 7-(D) shows the number of verified activation regions for each combination of perturbations; we can see that GBS reduced the number of regions to traverse compared to BFS as intended.

However, we can also see that gbs-AR took longer to traverse more activation regions than gbs-CR. Figure 7-(A) and (B) show the breakdown of each verification result for gbs-AR and gbs-CR; where, blue, red, gray, and black bars denote robust, not-robust, timed out, and out-of-memory, respectively. The figures show that gbs-AR timed out at a higher rate for smaller size DNNs than gbs-CR.

Moreover, we can also see that the median computation time increased exponentially with the number of neurons for all algorithms in Figure 7-(C). This result suggests that exact verification-based traversing is not applicable to practical-scale DNNs such as VGG16[22], and fundamental scalability improvement measures, such as incorporating approximate verification, are needed.

RQ2: What are the performance factors?  Figure 7-(E) and (F) show the correlation between computation time and the number of verified activation regions and the number of face checking as scatter plots, respectively: where the results of experiments that timed out are excluded. The figures show strong positive correlations for each DNN size (the number of neurons), especially in the number of face checking; thus, reducing redundant regions and faces in understanding the boundaries should directly reduce verification time. For example, Figure 7-(G) shows an example of verifying activation regions near the AR boundary too extensively due to large hyperparameter wθ=0.2w^{\theta}=0.2. Like gbs-CR, AR boundary also needs to be able to narrow down the search to only the regions on the AR boundaries.

6 Discussion

Internal Validity. Using the outermost CRCR/ARAR boundary as a trend of weakness to a combination of semantic perturbations, not all regions inside the outermost CRCR/ARAR boundary satisfy the CRCR/ARAR property. Unlike CRCR, ARAR requires hyperparameters δ,wδ\delta,w^{\delta}\in\mathbb{R}; however, there are no clear criteria for setting them.

External Validity. We focus here on brightness change, patch, and translation perturbations. Still, our method applies to any semantic perturbations as long as they can be represented or approximated with sufficient accuracy. Our method is not applicable to a semantic perturbation gg for which a corresponding saliency-map change g~\tilde{g} cannot be defined and computed, e.g., DNNs that generate an image from an input text (such as [18]). Our method has not supported the MaxPool layer of CNNs[13].

Performance Bottlenecks.

The computation time increases exponentially with the number of ReLU-FNN neurons. Thus, our current method is not yet applicable on a practical scale, such as VGG16 [22]. If it is difficult to enumerate all activation regions, GBS traverses only regions on the outermost boundary. In the future, it may be better to use ReLU relaxation for dense areas.

7 Related Work

Robustness Verification. To the best of our knowledge, there have been no reports that formulate the attention robustness verification problem or that propose the method for such problem; e.g., [11, 30, 1]. [23] first verified robustness against image rotation, and [2] verified robustness against more semantic perturbations, such as image translation, scaling, shearing, brightness change, and contrast change. However, in this paper, we have demonstrated that attention robustness more accurately captures trends in weakness for the combinations of semantic perturbations than existing classification robustness in some cases. In addition, As [25] reported, approximated verification methods like DeepPoly[23] fail to verify near the boundary. In contrast, our GBS enables verification near the boundary by exploratory and exact verification.

[16] proposed that any Lp-norm-based verification tools can be used to verify the classification robustness against semantic perturbations by inserting special DNN layers that induce semantic perturbations in the front of DNN layers for classification. In order to transform the verification problem on the inherently high-dimensional input image space XX into one on the low-dimensional perturbation parameter space Θ\Theta, we adopted their idea, i.e., inserting DNN layers for semantic perturbations (ΘX\Theta\to X) in front of DNN layers for classification (XYX\to Y). However, it is our original idea to calculate the value range of the gradient for DNN output (fj(g(θ,xi))/xi\partial f_{j}(g(\theta,x_{i}))/\partial x_{i}) within an activation region on the perturbation parameter space (cf. Sections 4.3-4.4).

Traversing activation regions. Since [12] first proposed the method to traverse activation regions, several improvements and extensions have been proposed [14, 5]. All of them use all breadth-first searches with a priority queue to compute the maximum safety radius or the maxima of the given objective function in fewer iterations. In contrast, our algorithm GBS uses a breadth-first search with a priority queue to reach the outermost CRCR/ARAR boundary in fewer iterations while avoiding enclaves.

[5] responded to the paper reviewer that traversing time would increase exponentially with the size of a DNN 555https://openreview.net/forum?id=zWy1uxjDdZJ. Our experiment also showed that larger DNNs increase traversing time due to the denser activation regions. The rapid increase in the number of activation regions will be one of the biggest barriers to the scalability of traversing methods, including our method. Although the upper bound theoretical estimation for the number of activation regions increases exponentially with the number of layers in a DNN [10], [9] reported that actual DNNs have surprisingly few activation regions because of the myriad of infeasible activation patterns. Therefore, it will need to understand the number of activation regions of DNNs operating in the real world.

To improve scalability, there is the method of dividing the input space and verifying each in perfectly parallel [29]. However, our method has not been fully parallelized yet, because we have focused on accurately calculating the outermost CRCR/ARAR boundary and attention robustness in this study. To improve scalability, there are several methods of targeting only low-dimensional subspaces in the high-dimensional input space for verification [24, 26, 15]. We have similarly taken advantage of low-dimensionality, e.g., using low-dimensional perturbation parameters to represent high-dimensional input image pixels as mediator variables (i.e., curried perturbation function gx0(θ)=xg^{x0}(\theta)=x^{\prime}) to reduce the elapsed time of LP solvers, determining the stability of neuron activity from few vertices of perturbation parameter space Θ\Theta.

Saliency-Map. Since [21] first proposed the method to obtain a saliency-map from the gradients of DNN outputs with respect to an input image (i.e., fj(x)/xi\partial f_{j}(x)/\partial x_{i}), many improvements and extensions have been proposed [3, 27, 20]. We formulated an attention-map primarily using the saliency-map definition by [21]. However, it is a future work to formulate attention robustness corresponding to improvements, such as gradient-smoothing[3] and line-integrals [27].

It is known that semantic perturbations can significantly change the saliency-maps [17, 7, 8]. [7] first claimed the saliency-map should consistently follow image translation and proposed the method to quantify saliency-map consistency. We formulated attention inconsistency acac primarily using the saliency-map consistency by [7].

8 Conclusion and Future Work

We, for the first time, have presented a verification method for attention robustness based on traversing activation regions on the DNN that contains layers for semantic perturbations and layers for classification. Attention robustness is the property that the saliency-map consistency is less than a threshold value. We have demonstrated that attention robustness more accurately captures trends in weakness for the combinations of semantic perturbations than existing classification robustness. Although the performance evaluation presented in this study is not yet on a practical scale, such as VGG16 [22], we believe that the attention robustness verification problem we have formulated opens a new door to quality assurance for DNNs. We plan to increase the number of semantic perturbation types that can be verified and improve scalability by using abstract interpretation in future work.

References

  • [1] Ashmore, R., Calinescu, R., Paterson, C.: Assuring the machine learning lifecycle: Desiderata, methods, and challenges. ACM Computing Surveys (CSUR) 54(5), 1–39 (2021)
  • [2] Balunovic, M., Baader, M., Singh, G., Gehr, T., Vechev, M.: Certifying geometric robustness of neural networks. In: NeurIPS. vol. 32 (2019)
  • [3] Daniel, S., Nikhil, T., Been, K., Fernanda, V., Wattenberg, M.: Smoothgrad: removing noise by adding noise. In: ICMLVIZ. PMLR (2017)
  • [4] Engstrom, L., Tran, B., Tsipras, D., Schmidt, L., Madry, A.: Exploring the landscape of spatial robustness. In: ICML. pp. 1802–1811. PMLR (2019)
  • [5] Fromherz, A., Leino, K., Fredrikson, M., Parno, B., Pasareanu, C.: Fast geometric projections for local robustness certification. In: ICLR. ICLR (2021)
  • [6] Gao, X., Saha, R.K., Prasad, M.R., Roychoudhury, A.: Fuzz testing based data augmentation to improve robustness of deep neural networks. In: ICSE. pp. 1147–1158. IEEE,ACM (2020)
  • [7] Guo, H., Zheng, K., Fan, X., Yu, H., Wang, S.: Visual attention consistency under image transforms for multi-label image classification. In: CVPR. pp. 729–739. IEEE,CVF (2019)
  • [8] Han, T., Tu, W.W., Li, Y.F.: Explanation consistency training: Facilitating consistency-based semi-supervised learning with interpretability. In: AAAI. vol. 35, pp. 7639–7646. AAAI (2021)
  • [9] Hanin, B., Rolnick, D.: Deep relu networks have surprisingly few activation patterns. In: NeurIPS. vol. 32 (2019)
  • [10] Hinz, P.: An analysis of the piece-wise affine structure of ReLU feed-forward neural networks. Ph.D. thesis, ETH Zurich (2021)
  • [11] Huang, X., Kroening, D., Ruan, W., Sharp, J., Sun, Y., Thamo, E., Wu, M., Yi, X.: A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability. Computer Science Review 37, 100270 (2020)
  • [12] Jordan, M., Lewis, J., Dimakis, A.G.: Provable certificates for adversarial examples: Fitting a ball in the union of polytopes. In: NeurIPS. vol. 32 (2019)
  • [13] Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NeurIPS. vol. 25 (2012)
  • [14] Lim, C.H., Urtasun, R., Yumer, E.: Hierarchical verification for adversarial robustness. In: ICML. vol. 119, pp. 6072–6082. PMLR (2020)
  • [15] Mirman, M., Hägele, A., Bielik, P., Gehr, T., Vechev, M.: Robustness certification with generative models. In: PLDI. pp. 1141–1154. ACM SIGPLAN (2021)
  • [16] Mohapatra, J., Weng, T.W., Chen, P.Y., Liu, S., Daniel, L.: Towards verifying robustness of neural networks against a family of semantic perturbations. In: CVPR. pp. 244–252. IEEE,CVF (2020)
  • [17] Montavon, G., Samek, W., Müller, K.R.: Methods for interpreting and understanding deep neural networks. Digital Signal Processing 73, 1–15 (2018)
  • [18] Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., Sutskever, I.: Zero-shot text-to-image generation. In: ICML. PMLR (2021)
  • [19] Ribeiro, M.T., Singh, S., Guestrin, C.: “why should i trust you?” explaining the predictions of any classifier. In: KDD. pp. 1135–1144. ACM SIGKDD (2016)
  • [20] Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In: ICCV. IEEE (Oct 2017)
  • [21] Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: Visualising image classification models and saliency maps. In: ICLR (2014)
  • [22] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
  • [23] Singh, G., Gehr, T., Püschel, M., Vechev, M.: An abstract domain for certifying neural networks. In: POPL. pp. 1–30. ACM New York, NY, USA (2019)
  • [24] Sotoudeh, M., Thakur, A.V.: Computing linear restrictions of neural networks. In: NeurIPS. vol. 32 (2019)
  • [25] Sotoudeh, M., Thakur, A.V.: Provable repair of deep neural networks. In: PLDI. pp. 588–603. ACM SIGPLAN (2021)
  • [26] Sotoudeh, M., Thakur, A.V.: Syrenn: A tool for analyzing deep neural networks. In: TACAS. pp. 281–302 (2021)
  • [27] Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: ICML. pp. 3319–3328. PMLR (2017)
  • [28] Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be at odds with accuracy. In: ICLR (2019)
  • [29] Urban, C., Christakis, M., Wüstholz, V., Zhang, F.: Perfectly parallel fairness certification of neural networks. Proceedings of the ACM on Programming Languages 4(OOPSLA), 1–30 (2020)
  • [30] Urban, C., Miné, A.: A review of formal methods applied to machine learning. CoRR abs/2104.02466 (2021), https://arxiv.org/abs/2104.02466

I Appendix

I.1 Linearity of Activation Regions

Given activation pattern pAPfp\in AP^{f} as constant, within activation region arf(p)ar^{f}(p) each output of ReLU-FNN fj(xarf(p))f_{j}(x\in ar^{f}(p)) is linear for xx (cf. Figure 8) because all ReLU operators have already resolved to 0 or xx [9]. i.e., fj(xarf(p))=Ajx+bjf_{j}(x\in ar^{f}(p))=A^{\prime}_{j}x+b^{\prime}_{j}: where, AjA^{\prime}_{j} and bjb^{\prime}_{j} denote simplified weights and bias about activation pattern pp and class jj.

Refer to caption
Figure 8: An example of activation regions [9].
ReLU-FNN output is linear on each activation region, i.e., each output plane painted for each activation region is flat.

That is, the gradient of each ReLU-FNN output fj(x)f_{j}(x) within activation region arf(p)ar^{f}(p) is constant, i.e., the following equation holds: where CC\in\mathbb{R} is a constant value.

Feasiblef(pAPf)fj(x)xi=C(xarf(p))Feasible^{f}(p\in AP^{f})\Rightarrow\frac{\partial f_{j}(x)}{\partial x_{i}}=C\;\;(x\in ar^{f}(p)) (1)

An activation region can be interpreted as the H-representation of a convex polytope on input space Nf\mathbb{R}^{N^{f}}. Specifically, neuron activity pl,np_{l,n} and pp have a one-to-one correspondence with a half-space and convex polytope defined by the intersection (conjunction) of all half-spaces, because fn(l)(x)f_{n}^{(l)}(x) is also linear when pAPfp\in AP^{f} is constant. Therefore, we interpret activation region arf(p)ar^{f}(p) and the following H-representation of convex polytope HConvexf(x;p)HConvex^{f}(x;p) each other as needed: where, A′′A^{\prime\prime} and b′′b^{\prime\prime} denote simplified weights and bias about activation pattern pp, and Al,n′′xbl,n′′A^{\prime\prime}_{l,n}x\leq b^{\prime\prime}_{l,n} is the half-space corresponding to the n-th neuron activity pl,np_{l,n} in the l-th layer.

HConvexf(x;p)def=A′′xb′′l,nAl,n′′xbl,n′′\begin{split}HConvex^{f}(x;p)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}A^{\prime\prime}x\leq b^{\prime\prime}\;\equiv\;\bigwedge_{l,n}A^{\prime\prime}_{l,n}x\leq b^{\prime\prime}_{l,n}\end{split} (2)

I.2 Connectivity of Activation Regions

When feasible activation patterns p,pAPfp,p^{\prime}\in AP^{f} are in a relationship with each other that flips single neuron activity pl,n{0,1}p_{l,n}\in\{0,1\}, they are connected regions because they share single face HFacel,nf(x;p)def=Al,n′′x=bl,n′′HFace^{f}_{l,n}(x;p)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}A^{\prime\prime}_{l,n}x=b^{\prime\prime}_{l,n} corresponding to flipped pl,np_{l,n} [12]. It is possible to flexibly traverse activation regions while ensuring connectivity by selecting a neuron activity to be flipped according to a prioritization; several traversing methods have been proposed [12, 14, 5]. However, there are generally rather many neuron activities that become infeasible when flipped [12]. For instance, half-spaces h1,3h_{1,3} is a face of activation region η\eta in Figure 6-(1a); thus, flipping neuron activity p1,3p_{1,3}, GBS can traverse connected region η\eta in Figure 6-(1b). In contrast, half-space h1,1h_{1,1} is not a face of activation region η\eta in Figure 6-(1a); thus, flipping neuron activity p1,1p_{1,1}, the corresponded activation region is infeasible (i.e., the intersection of flipped half-spaces has no area).

I.3 Hierarchy of Activation Regions

When feasible activation patterns p,pAPfp,p^{\prime}\in AP^{f} are in a relationship with each other that matches all of LfL^{\prime f}-th upstream activation pattern pLfdef=[pl,n1lLf,1nNlf](1LfLf)p_{\leq L^{\prime f}}\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}[p_{l,n}\mid 1\leq l\leq L^{\prime f},1\leq n\leq N^{f}_{l}]\;\;(1\leq L^{\prime f}\leq L^{f}), they are included parent activation region arLff(p)ar^{f}_{\leq L^{\prime f}}(p) corresponding to convex polytope HConvexLff(x;p)def=lLf,nAl,n′′xbl,n′′HConvex^{f}_{\leq L^{\prime f}}(x;p)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}\bigwedge_{l\leq L^{\prime f},n}A^{\prime\prime}_{l,n}x\leq b^{\prime\prime}_{l,n} [14]. That is, xarf(p).xarLff(p)\forall x\in ar^{f}(p).\;x\in ar^{f}_{\leq L^{\prime f}}(p) and xNf.HConvexf(x;p)HConvexLff(x;p)\forall x\in\mathbb{R}^{N^{f}}.\;HConvex^{f}(x;p)\Rightarrow HConvex^{f}_{\leq L^{\prime f}}(x;p).

Similarly, we define LfL^{\prime f}-th downstream activation pattern as pLfdef=[pl,nLflLf,1nNlf](1LfLf)p_{\geq L^{\prime f}}\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}[p_{l,n}\mid L^{\prime f}\leq l\leq L^{f},1\leq n\leq N^{f}_{l}]\;\;(1\leq L^{\prime f}\leq L^{f}).

I.4 Linear Programming on an Activation Region

Based on the linearity of activation regions and ReLU-FNN outputs, we can use Linear Programming (LP) to compute (a) the feasibility of an activation region, (b) the flippability of a neuron activity, and (c) the minimum (maximum) of a ReLU-FNN output within an activation region. We show each LP encoding of the problems (a,b,c) in the SciPy LP form 666https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linprog.html: where, pAPfp\in AP^{f} is a given activation pattern of ReLU-FNN ff, and pl,np_{l,n} is a give neuron activity to be flipped.

(𝐚)xNf.HConvexf(x;p)encodeminx0x𝐬.𝐭.,A′′xb′′(𝐛)xNf.HConvexf(x;p)HFacel,nf(x;p)encodeminx0x𝐬.𝐭.,A′′xb′′,Al,n′′x=bl,n′′(𝐜)minxfj(x)𝐬.𝐭.,HConvexf(x;p)encode(minxAjx𝐬.𝐭.,A′′xb′′)+bj\begin{split}\mathbf{(a)}\;\;&\exists x\in\mathbb{R}^{N^{f}}.\;HConvex^{f}(x;p)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr encode\cr\kern 1.0pt\cr$\longrightarrow$\cr\kern 0.0pt\cr}}}\min_{x}{\vec{0}x}\;\;\mathbf{s.t.,}\;A^{\prime\prime}x\leq b^{\prime\prime}\\ \mathbf{(b)}\;\;&\exists x\in\mathbb{R}^{N^{f}}.\;HConvex^{f}(x;p)\land HFace^{f}_{l,n}(x;p)\\ &\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr encode\cr\kern 1.0pt\cr$\longrightarrow$\cr\kern 0.0pt\cr}}}\min_{x}{\vec{0}x}\;\;\mathbf{s.t.,}\;A^{\prime\prime}x\leq b^{\prime\prime},\;A^{\prime\prime}_{l,n}x=b^{\prime\prime}_{l,n}\\ \mathbf{(c)}\;\;&\min_{x}{f_{j}(x)}\;\;\mathbf{s.t.,}\;HConvex^{f}(x;p)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr encode\cr\kern 1.0pt\cr$\longrightarrow$\cr\kern 0.0pt\cr}}}\Bigl{(}\min_{x}{A^{\prime}_{j}x}\;\;\mathbf{s.t.,}\;A^{\prime\prime}x\leq b^{\prime\prime}\Bigr{)}+b^{\prime}_{j}\\ \end{split} (3)

I.5 Full Encoding Semantic Perturbations

. We focus here on the perturbations of brightness change (B), patch (P), and translation (T), and then describe how to encode the combination of them into ReLU-FNN gx0:ΘXg^{x0}:\Theta\to X: where, |θ(l)|=dimθ(l)|\theta^{(l)}|=\dim\theta^{(l)}, ww is the width of image x0x0, px,py,pw,phpx,py,pw,ph are the patch x-position, y-position, width, height, and txtx is the amount of movement in x-axis direction. Here, perturbation parameter θΘ\theta\in\Theta consists of the amount of brightness change for (B), the density of the patch for (P), and the amount of translation for (T). In contrast, perturbation parameters not included in the dimensions of Θ\Theta, such as w,px,py,pw,ph,txw,px,py,pw,ph,tx, are assumed to be given as constants before verification.

g(θ,x0)encodegx0(θ)// currying g with given constant x0.gx0(θ)=g(5)(θx0)// concat x0g(1)(μ)=A(T)μ// translateg(2)(μ)=A(P)g(1)(μ)// patchg(3)(μ)=A(B)g(2)(μ)// brightness changeg(4)(μ)=ReLU(g(3)(μ))+1// clip max(0,xi)g(5)(μ)=ReLU(g(4)(μ))+1// clip min(1,xi)\begin{split}g(\theta,x0)&\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr encode\cr\kern 1.0pt\cr$\longrightarrow$\cr\kern 0.0pt\cr}}}g^{x0}(\theta)\;\;\textit{{\color[rgb]{1,0,1}// {currying $g$ with given constant $x0$.}}}\\ g^{x0}(\theta)&=g^{(5)}(\theta\circ x0)\;\;\textit{{\color[rgb]{1,0,1}// {concat $x0$}}}\\ g^{(1)}(\mu)&=A^{(T)}\mu\;\;\textit{{\color[rgb]{1,0,1}// {translate}}}\\ g^{(2)}(\mu)&=A^{(P)}g^{(1)}(\mu)\;\;\textit{{\color[rgb]{1,0,1}// {patch}}}\\ g^{(3)}(\mu)&=A^{(B)}g^{(2)}(\mu)\;\;\textit{{\color[rgb]{1,0,1}// {brightness change}}}\\ g^{(4)}(\mu)&=-ReLU(g^{(3)}(\mu))+\vec{1}\;\;\textit{{\color[rgb]{1,0,1}// {clip $max(0,x_{i})$}}}\\ g^{(5)}(\mu)&=-ReLU(g^{(4)}(\mu))+\vec{1}\;\;\textit{{\color[rgb]{1,0,1}// {clip $min(1,x_{i})$}}}\\ \end{split}
A(B)=[ar,c(B)],A(P)=[ar,c(P)],A(T)=[ar,c(T)]\begin{split}&A^{(B)}=\left[a^{(B)}_{r,c}\right],A^{(P)}=\left[a^{(P)}_{r,c}\right],A^{(T)}=\left[a^{(T)}_{r,c}\right]\\ \end{split}
ar,c(B)={1(c=1r|θ(l+1)|)// add θ1(l)1(c=r+1)// copy θ2(l) and xi0(otherwise)ar,c(P)={1(c=1On(r))// add θ1(l)1(c=r+1)// copy θ2(l) and xi0(otherwise)On(r)def=𝐥𝐞𝐭ir|θ(l+1)|.(pxi/wpx+pw)(pyimodwpy+ph)\begin{split}&a^{(B)}_{r,c}=\left\{\begin{array}[]{lll}1&(c=1\land r\geq|\theta^{(l+1)}|)&\textit{{\color[rgb]{1,0,1}// {add $\theta^{(l)}_{1}$}}}\\ 1&(c=r+1)&\textit{{\color[rgb]{1,0,1}// {copy $\theta^{(l)}_{\geq 2}$ and $x_{i}$}}}\\ 0&(otherwise)\\ \end{array}\right.\\ &a^{(P)}_{r,c}=\left\{\begin{array}[]{lll}1&(c=1\land On(r))&\textit{{\color[rgb]{1,0,1}// {add $\theta^{(l)}_{1}$}}}\\ 1&(c=r+1)&\textit{{\color[rgb]{1,0,1}// {copy $\theta^{(l)}_{\geq 2}$ and $x_{i}$}}}\\ 0&(otherwise)\\ \end{array}\right.\\ &On(r)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}\mathbf{let\;}i\coloneqq r-|\theta^{(l+1)}|.\;(px\leq\lfloor i/w\rfloor\leq px+pw)\land(py\leq i\bmod w\leq py+ph)\\ \end{split}
ar,c(T)={1(c=r+1r|θ(l+1)|)// copy θ2(l)0(c=1¬(1t(r)s(r)N))// zero paddingx0tgt(r)x0src(r)(c=1r|θ(l+1)|)// add θ1(l)Δx0i1(c=s(r)+|θ(l)|r>|θ(l+1)|)// copy x0i0(otherwise)s(r)def=𝐥𝐞𝐭ir|θ(l+1)|.(i/w+tx1)w+(imodw)t(r)def=𝐥𝐞𝐭ir|θ(l+1)|.(i/w+tx2)w+(imodw)\begin{split}&a^{(T)}_{r,c}=\left\{\begin{array}[]{lll}1&(c=r+1\land r\leq|\theta^{(l+1)}|)&\textit{{\color[rgb]{1,0,1}// {copy $\theta^{(l)}_{\geq 2}$}}}\\ 0&(c=1\land\lnot(1\leq t(r)\leq s(r)\leq N))&\textit{{\color[rgb]{1,0,1}// {zero padding}}}\\ x0_{tgt(r)}-x0_{src(r)}&(c=1\land r\geq|\theta^{(l+1)}|)&\textit{{\color[rgb]{1,0,1}// {add $\theta^{(l)}_{1}\Delta x0_{i}$}}}\\ 1&(c=s(r)+|\theta^{(l)}|\land r>|\theta^{(l+1)}|)&\textit{{\color[rgb]{1,0,1}// {copy $x0_{i}$}}}\\ 0&(otherwise)\\ \end{array}\right.\\ &s(r)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}\mathbf{let\;}i\coloneqq r-|\theta^{(l+1)}|.\;(\lfloor i/w\rfloor+tx-1)w+(i\bmod w)\\ &t(r)\mathrel{\vbox{\offinterlineskip\halign{\hfil#\hfil\cr def\cr\kern 1.0pt\cr$=$\cr\kern 0.0pt\cr}}}\mathbf{let\;}i\coloneqq r-|\theta^{(l+1)}|.\;(\lfloor i/w\rfloor+tx-2)w+(i\bmod w)\\ \end{split}

I.6 Images used for our experiments

We used 10 images (i.e., Indexes 69990-69999) selected from the end of the MNIST dataset (cf. Figure 10) and the Fashion-MNIST dataset (cf. Figure 10), respectively. We did not use these images in the training of any ReLU-FNNs.

[Uncaptioned image]
Figure 9: MNIST images used for experiments.
[Uncaptioned image]
Figure 10: Fashion-MNIST images used for experiments.

I.7 An example of Lemma 1

Lemma 1 is reprinted below.

fj(x)xi=C(x{gx0(θ)θarfg(p)})\frac{\partial f_{j}(x)}{\partial x_{i}}=C\;\;(x\in\{g^{x0}(\theta)\mid\theta\in ar^{f\circ g}(p)\})
{itembox}

A small example of Lemma 1 (cf. Figure 11) Let X=[0,1]3X=[0,1]^{3}, Y=2Y=\mathbb{R}^{2}, Θ=[0,1]1\Theta=[0,1]^{1}, x0X=(1,0.5,0.1)x0\in X=(1,0.5,0.1), gx0(θΘ)X=ReLU(θ1+x0)g^{x0}(\theta\in\Theta)\in X=ReLU(-\theta\vec{1}+x0), and f(xX)Y=ReLU(x1+x2,x1+x3)f(x\in X)\in Y=ReLU(x_{1}+x_{2},x_{1}+x_{3}).
Because gx0(0.6)=ReLU(0.4,0.1,0.5)g^{x0}(0.6)=ReLU(0.4,-0.1,-0.5) and f(gx0(0.6))=ReLU(0.4,0.4)f(g^{x0}(0.6))=ReLU(0.4,0.4), p=apfg(0.6)=[1,0,0|1,1]APfgp=ap^{f\circ g}(0.6)=[1,0,0|1,1]\in AP^{f\circ g}.
Then, p2=[1,1]=apf(gx0(0.6))APfp_{\geq 2}=[1,1]=ap^{f}(g^{x0}(0.6))\in AP^{f}.
Here, arfg(p)ar^{f\circ g}(p) corresponding to HConvexfg(θ;p)θ+10θ+0.50θ+0.10θ+10θ+100.5θ1HConvex^{f\circ g}(\theta;p)\equiv-\theta+1\geq 0\land-\theta+0.5\leq 0\land-\theta+0.1\leq 0\land-\theta+1\geq 0\land-\theta+1\geq 0\equiv 0.5\leq\theta\leq 1, on the other hand, arf(p2)ar^{f}(p_{\geq 2}) corresponding to HConvexf(x;p2)x1+x20x1+x30HConvex^{f}(x;p_{\geq 2})\equiv x_{1}+x_{2}\geq 0\land x_{1}+x_{3}\geq 0.
Because 0x1+x2=x1+x3=1θ0.5(θarfg(p))0\leq x_{1}+x_{2}=x_{1}+x_{3}=1-\theta\leq 0.5\;(\theta\in ar^{f\circ g}(p)), θarfg(p).gx0(θ)arf(p2)\forall\theta\in ar^{f\circ g}(p).\;g^{x0}(\theta)\in ar^{f}(p_{\geq 2}).

Refer to caption
Figure 11: An image for a small example of Lemma 1.

I.8 Algorithm BFS

Algorithm BFS traverses entire activation regions in perturbation parameter space Θ\Theta, as shown in Figure 12.

Refer to caption
Figure 12: Examples of BFS results. (Near the edges, polygons may fail to render, resulting in blank regions.)

Algorithm BFS initializes QQ with apfgx0(0)ap^{f\circ g^{x0}}(\vec{0}) (Line 3). Then, for each activation pattern pp in QQ (Lines 5-6), it reconstructs the corresponding activation region η\eta (subroutine constructActivationRegion, Line 8) as the H-representation of pp (cf. Equation 2). Next, for each neuron in fgx0f\circ g^{x0} (Line 12), it checks whether the neuron activity pl,np_{l,n} cannot flip within the perturbation parameter space Θ\Theta, i.e., one of the half-spaces has no feasible points within Θ\Theta (subroutine isStable, Line 13). Otherwise, a new activation pattern pp^{\prime} is constructed by flipping pl,np_{l,n} (subroutine flipped, Line 14) and added to the queue (Line 20) if pp^{\prime} is feasible (subroutine calcInteriorPointOnFace, Lines 17-18). Finally, the activation region η\eta is simplified (Line 24) and used to verify CRCR (subroutine solveCR and solveVR, Lines 25-27, cf. Section 4.4) and VRVR (subroutine solveAR and solveIR, Lines 32-34, cf. Section 4.4).

Algorithm 3 bfs(f,g,x0,Θ,δ)(HCR,HMR,HCB,HAR,HIR,HAB)bfs(f,g,x0,\Theta,\delta)\to(H^{CR},H^{MR},H^{CB},H^{AR},H^{IR},H^{AB})
0:  f,g,x0,Θ,δf,g,x0,\Theta,\delta
0:  HCR,HMR,HCB,HAR,HIR,HAB𝒫(Θ)H^{CR},H^{MR},H^{CB},H^{AR},H^{IR},H^{AB}\subset\mathcal{P}(\Theta)
1:  HCR,HMR,HCB,HAR,HIR,HAB{},{},{},{},{},{}H^{CR},H^{MR},H^{CB},H^{AR},H^{IR},H^{AB}\leftarrow\{\},\{\},\{\},\{\},\{\},\{\}
2:  gx0g(,x0)g^{x0}\leftarrow g(\cdot,x0) // currying gg with x0x0; i.e, gx0(θ)=g(θ,x0)g^{x0}(\theta)=g(\theta,x0).
3:  QAPfgx0{apfgx0(0)}Q\subset AP^{f\circ g^{x0}}\leftarrow\{ap^{f\circ g^{x0}}(\vec{0})\} // queue for breadth-first search.
4:  OBSAPfgx0{}OBS\subset AP^{f\circ g^{x0}}\leftarrow\{\} // observed activation patterns.
5:  while #|Q|>0\#|Q|>0 // loop for breadth-first search. do
6:     ppop(Q)p\leftarrow pop(Q)
7:     OBSOBS{p}OBS\leftarrow OBS\cup\{p\}
8:     ηconstructActivationRegion(fgx0,p)\eta\leftarrow constructActivationRegion(f\circ g^{x0},p)
9:      
10:     // Push the connected activation regions of η\eta.
11:     FS×{}FS\subset\mathbb{Z}\times\mathbb{Z}\leftarrow\{\} // (l,n)(l,n) means the nn-th neuron in ll-th layer is a face of η\eta
12:     for l=1l=1 to the layer size of DNN fgx0f\circ g^{x0}, n=1n=1 to the neuron size of the ll-th layer do
13:        continue if isStable(p,l,n,Θ)isStable(p,l,n,\Theta) // skip if neuron activity pl,np_{l,n} cannot flip within Θ\Theta.
14:        pflipped(p,l,n)p^{\prime}\leftarrow flipped(p,l,n) // flip neuron activity pl,np_{l,n}.
15:        continue if pOBSp^{\prime}\in OBS // skip if pp^{\prime} has already observed.
16:        OBSOBS{p}OBS\leftarrow OBS\cup\{p^{\prime}\}
17:        θFcalcInteriorPointOnFace(η,l,n)\theta^{F}\leftarrow calcInteriorPointOnFace(\eta,l,n)
18:        continue if θF=null\theta^{F}=null // skip if pp^{\prime} is infeasible.
19:        FSFS{(l,n)}FS\leftarrow FS\cup\{(l,n)\}
20:        QQ{p}Q\leftarrow Q\cup\{p^{\prime}\} // push.
21:     end for
22:      
23:     // Verify activation region η\eta.
24:     η~simplified(η,FS)\tilde{\eta}\leftarrow simplified(\eta,FS) // limit the constraints on η\eta to FSFS.
25:     if solveCR(x0;η~)solveCR(x0;\tilde{\eta}) then
26:        HCRHCR{η~}H^{CR}\leftarrow H^{CR}\cup\{\tilde{\eta}\}
27:     else if solveMR(x0;η~)solveMR(x0;\tilde{\eta}) then
28:        HMRHMR{η~}H^{MR}\leftarrow H^{MR}\cup\{\tilde{\eta}\}
29:     else
30:        HCBHCB{η~}H^{CB}\leftarrow H^{CB}\cup\{\tilde{\eta}\}
31:     end if
32:     if solveAR(x0;η~)solveAR(x0;\tilde{\eta}) then
33:        HARHAR{η~}H^{AR}\leftarrow H^{AR}\cup\{\tilde{\eta}\}
34:     else if solveIR(x0;η~)solveIR(x0;\tilde{\eta}) then
35:        HIRHIR{η~}H^{IR}\leftarrow H^{IR}\cup\{\tilde{\eta}\}
36:     else
37:        HABHAB{η~AB}H^{AB}\leftarrow H^{AB}\cup\{\tilde{\eta}^{AB}\}
38:     end if
39:  end while
40:  return  HCR,HMR,HCB,HAR,HIR,HABH^{CR},H^{MR},H^{CB},H^{AR},H^{IR},H^{AB}

I.9 Details of experimental results

Table 2 shows breakdown of verification statuses in experimental results for each algorithm and each DNN size (cf. Section 5). In particular, for traversing AR boundaries, we can see the problem that the ratio of “Timeout” and “Failed (out-of-memory)” increases as the size of the DNN increases. This problem is because gbs-AR traverses more activation regions by the width of the hyperparameter wδw^{\delta} than gbs-CR. It would be desirable in the future, for example, to traverse only the small number of activation regions near the AR boundary.

Table 2: Breakdown of verification statuses. “Robust” and “NotRobust” mean algorithm found “only robust regions” and “at least one not-robust region”, respectively. “Timeout” and “Failed” mean algorithm did not finish “within 2 hours” and “due to out-of-memory”, respectively.
algorithm #neurons Robust NotRobust Timeout Failed
bfs 100 13 22 25 0
bfs 200 11 15 34 0
gbs-CR 100 16 44 0 0
gbs-CR 200 17 43 0 0
gbs-CR 400 14 46 0 0
gbs-CR 800 17 43 0 0
gbs-CR 2028 16 44 0 0
gbs-CR 14824 4 0 0 56
gbs-AR 100 14 35 11 0
gbs-AR 200 19 27 14 0
gbs-AR 400 30 13 17 0
gbs-AR 800 28 3 28 1
gbs-AR 2028 15 4 33 8
gbs-AR 14824 4 0 0 56
gbs-CRAR 100 14 41 5 0
gbs-CRAR 200 19 33 8 0
gbs-CRAR 400 30 14 16 0
gbs-CRAR 800 28 6 26 0
gbs-CRAR 2028 15 8 32 5
gbs-CRAR 14824 4 0 0 56