This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

VERCEL: Verification and Rectification of Configuration Errors with Least Squares

Abhiram Singh, Sidharth Sharma and Ashwin Gumaste Abhiram Singh, Sidharth Sharma and Ashwin Gumaste are with the Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Mumbai-400076, India, (e-mail: abhiram25.1990@gmail.com, sidharth.sharma@ieee.org, ashwing@ieee.org).
Abstract

We present Vercel, a network verification and automatic fault rectification tool that is based on a computationally tractable, algorithmically expressive, and mathematically aesthetic domain of linear algebra. Vercel works on abstracting out packet headers into standard basis vectors that are used to create a port-specific forwarding matrix 𝒜\mathcal{A}, representing a set of packet headers/prefixes that a router forwards along a port. By equating this matrix 𝒜\mathcal{A} and a vector bb (that represents the set of all headers under consideration), we are able to apply least squares (which produces a column rank agnostic solution) to compute which headers are reachable at the destination. Reachability now simply means evaluating if vector bb is in the column space of 𝒜\mathcal{A}, which can efficiently be computed using least squares. Further, the use of vector representation and least squares opens new possibilities for understanding network behavior. For example, we are able to map rules, routing policies, what-if scenarios to the fundamental linear algebraic form, 𝒜x=b\mathcal{A}x=b, as well as determine how to configure forwarding tables appropriately. We show Vercel is faster than the state-of-art such as NetPlumber, Veriflow, APKeep, AP Verifier, when measured over diverse datasets. Vercel is almost as fast as Deltanet, when rules are verified in batches and provides better scalability, expressiveness and memory efficiency. A key highlight of Vercel is that while evaluating for reachability, the tool can incorporate intents, and transform these into auto-configurable table entries, implying a recommendation/correction system.

Index Terms:
Network verification, least squares, reachability, binary tree.

I Introduction

It is widely known that configuration errors in service provider networks form bulk of network outages [1, 2, 3] resulting in operational challenges leading to conservative planning and slow rollout of services. Errors can be in the control or dataplane of routers, firewalls, middleboxes and switches. Control plane (protocol) verification [4, 5, 6, 7, 8, 9, 10, 11] is harder to accomplish on account of the difficulties in abstracting the statefulness of protocols to computation models. In contrast, significant work exists in the realm of data plane verification [12, 13, 14, 15, 16, 17, 18, 19, 20, 21], which is feasible (though complex). The early work of atomic predicates [22] involved mapping network forwarding rules with formal methods that resulted in a formulation whose solution led to answer reachablity verification queries. The complexity of atomic predicates was further relaxed by the static HSA scheme [23], which considered packets in LL-dimensional space (where LL is the number of bits in a packet header) and worked on tracking the transformation of such packets through multiple network boxes (routers, etc.) by applying and then conjoining transfer functions. HSA was extended to include dynamic updates in NetPlumber [24]. Another work – Veriflow [16], mapped network-wide headers to equivalence classes (ECs) (defined as a set of packets that are treated similarly at routers) through a trie data structure. Veriflow was able to detect configuration errors in real-time. Recent schemes APKeep [17] and Deltanet [15] improved the initial success of Veriflows’ approach using optimizations and graph-theoretic means, resulting in faster reachability computation, loop-detection, blackhole identification (a packet at a node with no rule matching the packets’ header).

The next set of questions pertains to expressiveness, robustness, service support, and generic verification of network invariants. The solution to these lies in combining the initial successes of HSA/NetPlumber (the concept of LL-dimensional space) with the EC approach by the more recent techniques (Veriflow, APKeep, Deltanet). To this end, our solution, Vercel, applies linear algebraic techniques – a paradigmatic shift in the use of underlying techniques towards solving the dataplane verification in real-time. Vercel uses both the dimensional transformations in HSA/Netplumber and ECs of Veriflow, and then applies linear algebra to achieve verification, recommendation and rectification. To understand the rationale behind using linear algebra, recall the concept of header spaces in HSA/NetPlumber. Header spaces represent an LL-bit header into an LL-dimensional space and model a forwarding device as a function that transforms a header from one subspace to another. However, the complexity of transforming headers in subspaces is quadratic in the number of headers. In contrast, Vercel first identifies mm ECs by the rules collected from all the forwarding devices. Thereafter, Vercel creates an mm-dimensional space corresponding to these ECs such that forwarding rules of each device form subspaces in this mm-dimensional space. Finally, to check reachability, Vercel models packet forwarding at routers by linearly projecting a point from one subspace to another along a path from the source to the destination. We argue that transforming a point (instead of subspaces as in HSA and NetPlumber) can best be done with linear algebra by converting ECs to vectors, and computing/tracing how these vectors move from a source to a destination, thus enabling us to evaluate for reachability. In addition to reachability, Vercel attains unprecedented scalability and efficiency (Section VI) as well as lays the groundwork for a more powerful abstraction – that of providing a recommendation/rectification system. It is established that linear projections can be efficiently handled by applying linear algebraic operations such as least squares [25]. By definition, the linearity of our technique (linear algebra) implies speed of computation, availability of well-established theory, and readily available well-polished tools. Using these tools, we can verify network constructs and build a recommendation/rectification system that automatically rectifies configuration errors. With Vercel’s rectification technique, we can simply specify intents, and the tool automatically provisions tables, even when reachability does not seem to readily exist. It does so by computing the missing entries at tables and populating those, all in a single linear algebraic operation, which is possible only because we have transformed header spaces to vectors. Vercel makes a jumpstart in auto-configuration, such that the tool recommends and corrects faults in both path selection and table manipulation.

To compute packet (or equivalently header) reachability between a source and destination router, Vercel initially creates a network-wide binary tree, as a data structure for efficiently representing headers present in the forwarding tables of routers. This binary tree enables the division of the header space into non-overlapping partitions (similar to ECs), and thereafter, Vercel represents each partition as orthogonal vectors. These unique vectors provide a foundation for introducing least squares. With these vectors arranged in columns, Vercel creates a forwarding matrix 𝒜\mathcal{A} for each port at a router. Vercel initializes another vector bb for all the partitions whose reachability we desire to be evaluated. The matrix 𝒜\mathcal{A} and vector bb is then part of the standard linear equation 𝒜x=b\mathcal{A}x=b. The solution vector xx identifies which headers in bb are being forwarded along the port corresponding to 𝒜\mathcal{A}. We argue that if the matrix 𝒜\mathcal{A} contains orthogonal columns, then solving 𝒜x=b\mathcal{A}x=b across different ports with least squares leads to solving for reachability. Applying least squares gives us excellent insights into computing reachability and fixing configuration faults.

Another advantage of using linear algebra is that Vercel does away with checking reachability on multiple headers sequentially (as in Veriflow and Deltanet); instead, it relies on vector spaces to apply a single algebraic operation simultaneously on multiple headers resulting in a key contribution of batch processing, which is not supported by earlier schemes. Parallelism in other techniques can be obtained by running multiple threads, whereas Vercel takes advantage of CPU based (Table VI) vector processing instructions even on a single thread, inducing scalability. Vercel’s key contributions are listed as follows:

1) Linear algebra to model networking devices: The merit of representing packet headers in a vector space is that it increases the expressive power beyond reachability. Vercel can be used to check loops, detect blackholes, confirm routing policies and answer what-if scenarios in the presence of ACL rules and packet-header transformations.

2) Real-time network verification: We show that Vercel achieves significant improvement (on standard datasets, such as the one from Stanford [26]) of 8×8\times over Veriflow, 164×164\times over NetPlumber, 1.7×1.7\times over APKeep, 36×36\times over AP Verifier. Although Deltanet is faster than Vercel for verifying a single update, it cannot model a diverse set of network functions such as NAT and ACL and consumes up to 7.8×7.8\times more memory as compared to Vercel. Also, in the case of batch updates, the performance of Vercel is comparable to Deltanet (which does not implicitly even support batch processing).

3) Scalability: Vercel performs well on large networks such as a synthetic network of 2000 nodes with millions of rules. Even with a network of this size, the verification time is of the order of 500μ\mus.

4) Using linear algebra for recommendation and rectification: Linear regression being a by-product of least squares, can also be used as a recommendation system (how well are forwarding tables configured). The standard equation 𝒜x=b\mathcal{A}x=b may not have “a” solution (because 𝒜\mathcal{A} may not be a full rank matrix) and the error resulting from least squares (Chapter 4 of [25]) tells us the efficiency of routing, as well as also paves the way for automatically provisioning intents and rectifying errors.

Network verification is important to avoid expensive outages. For example, in 2021, the Meta (Facebook’s) network went down globally due to a BGP configuration fault [27]. Configuration faults are the cause of 70 percent of global network outages, and this motivates approaches towards larger coverage of configuration fault detection. Our approach Vercel is different in treatment compared to other solutions – while other solutions use techniques such as atomic predicates or transformations, or SAT solvers, our approach is based on least-squares, exploring a paradigm never done before. This approach offers multiple advantages. It helps us verify configurations, detect misconfigurations, and develop a recommendation system that automatically rectifies errors. This fundamental step forward (recommendation and rectification), in our view, sets Vercel apart from the rest of the verification solutions.

Paper organization: Section II provides an overview of Vercel, while functional blocks of Vercel are presented in Section III. Section IV provides an optimized implementation of Vercel. Section V extends Vercel to a variety of network functions while evaluation of Vercel is presented in Section VI. We discuss related works in Section VII. Finally, Section VIII concludes the paper.

II Vercel Overview

Refer to caption
(a)
Refer to caption
(b)
Figure 1: (a) A toy network of 4 routers in which a new rule at router QQ is being inserted in its forwarding table (shown in dashed block). Vercel represents all rules in a binary tree and after insertion of a new rule, identifies 3 headers whose reachability might be affected. (b) An example to demonstrate reachability between router YY and RR along the path YURY-U-R with least squares. Vercel represents 3 headers in 3-dimension with the orthogonal vectors. Forwarding rules of routers YY and UU are represented in xx-yy and yy-zz planes, respectively. Vercel computes reachability by sequentially projecting a point in 3-dimension to the subspace created for router YY and UU.
TABLE I: Symbols and notations used.
Notation Definition Notation Definition Notation Definition
LL Size of header in bits mm Number of equivalence classes N((𝒜ip)T))N((\mathcal{A}_{i}^{p})^{T})) Null space of 𝒜ip\mathcal{A}_{i}^{p}
rr Supernet headers aa Atomic headers C(𝒜ip)C(\mathcal{A}_{i}^{p}) Column space of 𝒜ip\mathcal{A}_{i}^{p}
qq Represents iatomic headers [𝒜|b][\mathcal{A}|b] Augmented matrix HH Represents the unit step function
𝒜\mathcal{A} A set of packet headers/prefixes that a router forwards along a port bb Represents the set of all headers under consideration gig_{i} An mm-dimensional filtering vector with binary values (0/1)
xx The solution vector for linear equations 𝒜x=b\mathcal{A}x=b nn Headers forwarded along a port, 1nm1\leq n\leq m x^\hat{x} Approximate solution using least squares
𝒜Y0\mathcal{A}_{Y}^{0} Headers forwarded by router YY on its port 0 SaffectedS\textsuperscript{affected} Set of headers whose reachability may have been affected TiT_{i} Transformation matrix
faclf^{acl} Filtering function that maps a set of packet headers to a set of actions vipv_{i}^{p} Atomic+iatomic headers (in SaffectedS\textsuperscript{affected}) that router ii forwards to its port pp ftrf^{tr} A set of rules containing the header “match” and “transformed” fields
PaffectedP\textsuperscript{affected} A set contains those ports on which a router forwards the packet with the header in the set SaffectedS\textsuperscript{affected} cic_{i} The indices with a non-zero entry in it represent packets received at router ii that do not get any match in the forwarding table tit_{i} Represents atomic+iatomic headers reachable up to the router ii and then transformed using TiT_{i}

This section presents a high-level overview of Vercel, including an intuition for applying linear algebra for network verification. Notations used in the paper are listed in Table I. Vercel actively ’listens’ to configuration updates (either made through an SDN controller or any other tool) to capture topology and forwarding tables/updates. Working on the granularity of individual packet headers is infeasible; hence Vercel, on the same lines of [6, 15, 16, 17], groups headers so that packets in the same group are similarly dealt with across routers.

To group packet headers, we need to parse rules from all the routers. Solely for the purpose of storing rules, we create a binary tree, which preserves the hierarchical representation of packet headers (Section III-A, III-B). For every header, the tree enables a path starting from the root towards the leaves denoting header bits. The use of binary tree is similar to the use of trie by Veriflow. However, unlike Veriflow, where the leaves contain node-rule pairs, in our case, leaves imply non-overlapping headers (ECs). Since Vercel is primarily designed to verify invariants in real-time; therefore, Vercel traverses the binary tree to determine mm headers, whose reachablity might have been altered after a rule update (Section III-C). Thereafter, Vercel uniquely maps mm headers to mm-orthogonal vectors that collectively create an mm-dimensional space. After creating the mm-dimensional space, Vercel identifies a subspace for each port on which a router forwards 1nm1\leq n\leq m headers. After obtaining nn headers, Vercel defines a subspace for each port by selecting nn corresponding orthogonal vectors and storing these vectors in a matrix (𝒜\mathcal{A}) of dimension (m,n)(m,n) (Section III-D). In addition to matrix 𝒜\mathcal{A}, Vercel initializes an mm-dimensional vector bb, which is designed to evaluate the reachability of all mm headers under consideration.

Matrix 𝒜\mathcal{A} and vector bb come together to form standard linear equation 𝒜x=b\mathcal{A}x=b, whose solution will eventually confirm if reachability exists (Section III-E). Specifically, we solve 𝒜x=b\mathcal{A}x=b for each port along the path between a source and destination. Three cases exist while solving 𝒜x=b\mathcal{A}x=b.

In the first case, there exist rules that forward some but not all of the received packets to the specified port. This implies vector bb is not in the column space of matrix 𝒜\mathcal{A}, indicating that no solution can be found for 𝒜x=b\mathcal{A}x=b. Therefore, Vercel finds an approximate solution x^\hat{x} using least squares. For this, Vercel obtains a projection point 𝒜x^\mathcal{A}\hat{x} in the column space of the matrix 𝒜\mathcal{A}. The approximate solution vector x^\hat{x} selects those headers in bb that the router forwards along its output port.

In the second case, if there exist forwarding rules for all the packets received at the router, then vector bb is in the subspace defined by the columns of matrix 𝒜\mathcal{A}. The solution vector xx in 𝒜x=b\mathcal{A}x=b points to all the headers in bb that are forwarded along the port.

In the third case, if a router forwards none of the received packets along the selected port, then vector bb lies in the null space of matrix 𝒜T\mathcal{A}^{T} and least squares returns a solution x^=0\hat{x}=0.

It is possible to efficiently solve all three cases. We argue that if matrix 𝒜\mathcal{A} contains orthogonal columns, then solving 𝒜x=b\mathcal{A}x=b across different ports with least squares provides the projection of bb in the intersection of subspaces and finds a solution to the reachability problem (Section III-E).

If 𝒜\mathcal{A} was a full rank matrix, the solution would be easy, but since that is not always the case, the next best thing to do is apply least squares to obtain a projection point.

Note that solving 𝒜x=b\mathcal{A}x=b with options like matrix inversion, row reduction or linear optimization, is not efficient: (1) Finding the inverse of matrix 𝒜\mathcal{A} to solve 𝒜x=b\mathcal{A}x=b is not feasible as 𝒜\mathcal{A} is not of full rank (mnm\neq n). Computing pseudo-inverse to obtain x=𝒜+bx=\mathcal{A}^{+}b needs singular value decomposition (SVD), (a multi-step process), infeasible in real-time. (2) Row reduction algorithms are slow as they are cubic in time. (3) Linear optimization results in infeasibility (because vector bb may not be present in the column space of 𝒜\mathcal{A}.

In contrast, least squares guarantees to provide a solution of 𝒜x=b\mathcal{A}x=b irrespective of the size of 𝒜\mathcal{A} and nature of bb. Note that with the orthogonality condition imposed on the columns of matrix 𝒜\mathcal{A}, we can find the projection of bb in the column space of 𝒜\mathcal{A} in linear time. Importantly, note that though we use least squares to compute reachability, the perceived approximation characteristic of least squares has no bearing on the exactness/correctness of the reachability calculation. In the classical least squares model, error plays a role, and it may give the reader the impression that such error may lead to approximation, which is definitely not the case with Vercel. For example, consider the first case, where least squares finds an approximate solution x^\hat{x}. Though x^\hat{x} is said to be an approximate solution, however it only selects a subset of headers in 𝒜\mathcal{A}, specifically those for which forwarding rules are present at the said router. In any case, x^\hat{x} will not select false positives/negatives because of the approximation.

Moving forward, intuitively, least squares projects bb at the intersection of subspaces corresponding to the ports present along a path from the source to the destination. The intersection of subspaces corresponds to the intersection of headers generated from the rules at routers. The representation of headers in vector spaces enables Vercel to apply linear algebraic operations to simultaneously process headers in a single step, thereby providing speed up in verification.

II-A Example

We now illustrate the functionality of Vercel with an example. Figure 1(a) is a toy network of 4 routers. For simplicity, we consider 3-bit headers though generalization to more bits is trivial. In this example, Vercel checks for reachability between source router YY and destination RR. The forwarding table at router UU shows that packets with header 000/3 and 01/2 are forwarded along output ports 0 and 11, respectively. Vercel collects rules from the forwarding tables and utilizes a binary tree to arrange packet headers (as in Figure 1(a)). While doing so, Vercel also keeps track of (router, port) pairs for each rule. The 4 leaf nodes of the binary tree {a1,q2,a3,a4}\{a1,\;q2,\;a3,\;a4\} denote non-overlapping headers {000/3, 001/3, 01/2, 1/1}\{000/3,\;001/3,\;01/2,\;1/1\}. In the figure, the node labels, i.e., rr (supernet), aa (atomic) and qq (iatomic) denote a specific class of headers defined later in Section III-B.

Now assume a network administrator inserts a new rule at router QQ (as shown in Figure 1(a)), which forwards packets with header 0/1 on its output port 0 (shown by dashed block in the forwarding table at router QQ). Vercel detects the rule update and inserts the corresponding header 0/1 in the binary tree. Post insertion, Vercel now traverses the headers in its subtrees to check if any of them were impacted for reachability during the update. There are three such (impacted) headers i.e., 000/3, 001/3, 01/2, which are added to a set, SaffectedS\textsuperscript{affected}. Vercel creates 3-dimensional orthogonal vectors [1, 0, 0], [0, 1, 0] and [0, 0, 1] corresponding to impacted headers 001/3, 000/3 and 01/2. For simplicity, we have selected orthogonal vectors with binary values. Next, Vercel creates a vector binit=b_{init}= [1, 1, 1] that corresponds to three non-overlapping headers {q2,a1,a3}\{q2,\;a1,\;a3\} for which we want to evaluate reachability. After creating binitb_{init}, Vercel creates subspaces for port 0 at routers YY, UU and QQ. This is because along port 0, these routers forward some packets whose headers are in the set SaffectedS\textsuperscript{affected}. Figure 1(b) shows xx-yy and yy-zz planes corresponding to the (router,port)(router,port) pairs {(Y,0),(U,0)}\{(Y,0),\;(U,0)\}, forming two subspaces. Since router YY forwards packets with header 001/3 and 000/3 on its port 0, therefore its subspace (xx-yy plane) is defined by a matrix 𝒜Y0\mathcal{A}_{Y}^{0}. Similarly, router UU forwards packets with header 000/3 and 01/2 on its port 0, therefore its subspace (yy-zz plane) is defined by a matrix 𝒜U0\mathcal{A}_{U}^{0}.

After creating the subspaces for different ports, Vercel evaluates reachability by modelling packet forwarding at router YY. This is done by solving 𝒜Y0x=binit\mathcal{A}_{Y}^{0}x=b_{init}. At router YY, vector binitb_{init} neither lies in the column space (C(𝒜Y0)C(\mathcal{A}_{Y}^{0})) nor in the null space of (𝒜Y0)T(\mathcal{A}_{Y}^{0})^{T} (YY in superscript denotes the matrix transpose). Equivalently, binitb_{init} lies at the union of the column and null space of 𝒜Y0\mathcal{A}_{Y}^{0} and (𝒜Y0)T(\mathcal{A}_{Y}^{0})^{T}, respectively. The above scenario implies that no solution is possible for 𝒜Y0x=binit\mathcal{A}_{Y}^{0}x=b_{init}. In this case, least squares finds an approximate solution resulting in a projection point bY=b_{Y}= [1, 1, 0] in C(𝒜Y0)C(\mathcal{A}_{Y}^{0}), i.e., xx-yy plane. The point bYb_{Y} implies that router YY forwards packets with headers 001/3 and 000/3 through port 0, which are then received by router UU on path YUY-U. Since packets with header 01/2 are not forwarded from router YY, therefore the third index (corresponding to header 01/2) of vector bYb_{Y} contains a 0 entry. Similarly, Vercel models the packet forwarding at router UU at port 0 by solving 𝒜U0x=bY\mathcal{A}_{U}^{0}x=b_{Y} and resulting in a projection point bU=b_{U}= [0, 1, 0] in the yy-zz plane, indicating that router UU only forwards packets with header 000/3 through port 0. Since router RR receives a non-zero vector bUb_{U} from router UU, therefore Vercel confirms reachability between router YY and RR along the path YURY-U-R. Finally, at the destination (router RR), Vercel determines the set of reachable packet headers by computing the dot product between bUb_{U} and vector representation of headers in the set SaffectedS\textsuperscript{affected}. For this example, Vercel obtains a non-zero dot product between bUb_{U} and vector [0, 1, 0] (corresponding to the header 000/3), implying packets with header 000/3 are reachable from router YY to RR.

Refer to caption
Figure 2: Steps for Vercel’s handling of user-specified intent related to service configuration (in the network shown on the right side of the figure).

II-B Rectification and Automatic Provisioning

Using least squares, we check for reachability and express intent to establish a service between two nodes in the network. If reachability between these two nodes does not exist, then Vercel recommends a path (Section V-E) and rectifies the situation by automatically populating tables at interim routers leading to reachability. The fact that misconfigurations are not just found, but also rectified and that the rectification is a direct by-product of 𝒜x=b\mathcal{A}x=b using least squares, is a significant value addition to operators. An operator only has to express a high level intent for a bunch of services and not worry about how these are correctly provisioned. The correctness is taken care of by Vercel.

Let us assume we performed least squares on 𝒜x=b\mathcal{A}x=b for a port and identified that our target vector bb is in the null space of 𝒜T\mathcal{A}^{T}. Which means “all” the headers represented by bb are not forwarded along that port. Now, if we were to ensure reachability for any of the headers in bb, then bb must be in the column space of 𝒜\mathcal{A}. The simplest way of doing this is to replace 𝒜\mathcal{A} with an augmented matrix [𝒜|b][\mathcal{A}|b].

Refer to caption
Figure 3: Example demonstrating internals of Vercel’s rectification system. Initially, RR is not reachable from YY. However, after applying Vercel’s rectification at QQ, reachability till RR is ensured along the path YUQRY-U-Q-R.

We now describe rectification via the following example resulting from a users’ intent. Consider a four-node network consisting of routers YY, UU, QQ and RR, shown on the right side of Fig. 2. Forwarding tables at each of the routers and corresponding vector notations for their non-overlapping headers are also shown in the figure. Assume an intent by the user whose goal is to set up a service between nodes YY and RR. Vercel is responsible for parsing the intent statement to extract information such as source (YY), destination (RR), and action (service provisioning). Vercel then examines the configuration of routers YY and UU, generates forwarding matrices for both, and initializes binitb_{init} (all prefixes) for reachability check. Upon analysis, Vercel determines that destination RR is currently unreachable from source YY (as there are no forwarding rules installed at routers UU and QQ to forward packets received from YY to RR). In response, Vercel enables its rectification engine and proposes a solution based on linear algebraic operations. The activated rectification engine creates a new rule (01/2, Port 1) at router QQ to establish reachability. The verification and rectification engines of Vercel work together for this intent, and the flow of operations is shown in Fig. 2. A detailed explanation of the linear algebraic operations performed for rectification is provided below.

In Fig. 3 is an example demonstrating Vercel’s rectification system for the intent expressed in Fig. 2. We apply Vercel to verify reachability, for which matrices 𝒜\mathcal{A} and vector bb are created as shown in Fig. 3. After applying 𝒜x=b\mathcal{A}x=b at router YY and UU, we get bUb_{U}, which is a set of prefixes reachable from YY to QQ (through path YUQY-U-Q). Now at router QQ, while applying 𝒜Qx=bU\mathcal{A}_{Q}x=b_{U}, Vercel finds that bUb_{U} is in the null space of 𝒜QT\mathcal{A}_{Q}^{T}, which implies none of the prefixes in bUb_{U} will be forwarded towards RR. At this point, our recommendation system kicks in and suggests a “fix” to address the fault at router QQ, such that reachability between YY and RR is established. As part of the fix, Vercel’s recommendation system adds bUb_{U} as a new column of matrix 𝒜Q\mathcal{A}_{Q} (and updates the corresponding forwarding rules at router QQ). Now, the updated matrix 𝒜Q\mathcal{A}_{Q} has three columns. This solves the problem as when we evaluate 𝒜Qx=bU\mathcal{A}_{Q}x=b_{U}, bUb_{U} will be in the column space of 𝒜Q\mathcal{A}_{Q}. As a result of 𝒜Qx=bU\mathcal{A}_{Q}x=b_{U}, we get bQ=[0,1,0]b_{Q}=[0,1,0], which is a vector that represents the prefixes reachable till destination RR. Since this vector is now non-zero, a few prefixes are now reachable at the destination after applying the fix.

Nevertheless, employing rectification to address issues may inadvertently result in unintended consequences, such as rendering some destinations unreachable. To mitigate this, our rectification system operates based on the following principle: when implementing a fix (by adding a rule), the reachability of any current header should remain unaffected. For this, identifying the appropriate header is crucial whenever a rule update is conducted as part of rectification. In this regard, Vercel opts for a header that was previously unreachable between the source and destination pairs of interest. In accordance with this principle, the following procedure is followed during the rectification process.

Assume the rectification system suggests adding a rule with header zz in one of the routers’ forwarding tables. Now, Vercel calculates if there exist any reachable headers zz’ (for some other source-destination pairs) in the same table that overlap with zz. Vercel then performs zzz-z’ to check if the resultant set is non-empty. The non-emptiness here implies the existence of the non-reachable headers between source and destination. If so, it configures the corresponding non-empty set as a separate rule in the forwarding table. In case the result (zzz-z’) is empty, then Vercel attempts to find a header (other than zz) and repeats the process. If we exhaust all possible ‘z’s through computation and still cannot find a non-empty zzz-z’, then Vercel notifies that rectification is not possible. By computing such zzz-z’, Vercel ensures that while adding a rule during the rectification process, it does not impact the reachability of any existing header.

III Vercel: Under the hood

This section describes the detailed working of Vercel.

III-A Representing headers in a binary tree

Refer to caption
Figure 4: Vercel components.

Initially, Vercel queries routers or an SDN controller to obtain forwarding tables and compute the network topology as shown in Fig. 4. Thereafter, Vercel arranges all the existing packet headers (extracted from rules in the forwarding tables) in the form of a binary tree. For adding a new header, Vercel first identifies its correct position in the tree by traversing the tree based on LL-header bits. These bits form a path from the root to a prospective node, where the new header is to be inserted. At every level (in the tree) the corresponding header bit indicates the traversal direction (0 = left and 1 = right branch). A path from the root to an intermediate/leaf node corresponds to a specific packet header. The role of the binary tree is in efficiently dividing the packet header space into non-overlapping partitions. Each packet belonging to a partition observes similar processing at all the routers in the network. The binary-tree enabled succinct representation provides Vercel a way in dividing the packet header space into three categories: supernet, atomic and iatomic. Out of these, atomic and iatomic categories together represent all the non-overlapping headers extracted from the forwarding tables across the network, while supernet represents the union of one or more atomic and iatomic headers.

III-B Classifying the nodes of the binary tree

Vercel labels a packet header as a supernet, if the corresponding path snippet from the root terminates at a non-leaf node. The remaining packet headers extracted from the rules (whose corresponding paths from the root terminate at a leaf node) are denoted as atomic. Due to the hierarchy in the header representation, multiple headers (e.g., supernet and atomic) may overlap. Therefore, Vercel further fragments supernets to create a new category of headers (called induced-atomic headers (iatomic) that do not overlap with the atomic headers. For example, in Figure 1(a) we observe 3 atomic headers (labeled as aa), 1 iatomic header (labeled as qq) and 3 supernets (labeled as rr) in the binary tree. Note that the atomic and iatomic headers represent the leaf nodes of a binary tree.

In addition to the attribute of label types (r/a/qr/a/q), each node of the binary tree is also characterized by a list containing tuples. A tuple (i,p)(i,\;p) at node kk denotes that router ii contains a rule that maps a packet header (represented by a path from the root to node kk) to an output port pp. In addition, Vercel assigns an integer identifier to each leaf node that distinguishes headers within the set of atomic and iatomic headers.

So far, we have defined atomic and iatomic headers; however, a crucial aspect is to analyze the task of identifying and segregating them once the tree is constructed. The following Lemma and Theorem provide a theoretical understanding of the procedure and complexities involved in determining the atomic and iatomic headers (the reader may skip the proofs for continuity).

Lemma 1: The worst-case space and time complexity of finding all atomic headers in a network containing mm forwarding rules is 𝒪(m)\mathcal{O}(m).

Proof of Lemma 1: In the worst case, mm forwarding rules are non-overlapping with each other, and therefore, corresponding headers represent leaf nodes of the binary tree. In this case, Vercel labels all leaf nodes (mm) as atomic by performing a post-order traversal of the tree, implying that the space complexity of the atomic headers is 𝒪(m)\mathcal{O}(m). A post-order traversal of the binary tree with mm leaf nodes can be performed in 𝒪(m)\mathcal{O}(m) time. Therefore, the time complexity of identifying atomic headers is also 𝒪(m)\mathcal{O}(m).

Theorem 1: Given mm forwarding rules and a set of atomic headers having space complexity of 𝒪(m)\mathcal{O}(m), the space complexity of the smallest size induced-atomic set is 𝒪(m)\mathcal{O}(m) and it can be computed with the time complexity of 𝒪(m)\mathcal{O}(m).

Proof of Theorem 1: Space complexity of iatomic nodes: Consider a general case, in which multiple (say uu) leaf nodes (atomic headers) have an ancestor supernet (rr) with a maximum path length of vv between a supernet and an atomic header. To represent all headers of the supernet (rr) using non-overlapping partitions, Vercel introduces a maximum of uvuv number of new iatomic headers in the tree along the path between rr and vv. In the extreme case, when a supernet is a root node of the binary tree and all other headers are atomic, the number of iatomic nodes cannot be greater than 𝒪(mw)\mathcal{O}(mw), where ww is the maximum path length between the root node and a leaf node and mm is the number of rules in the forwarding tables across the network. A similar argument holds for multiple supernets having a common ancestor (as a supernet) in the binary tree. Since ww is dependent on the header length (e.g., an IP prefix has a maximum value of 32 in case of IPv4) and wmw\ll m, therefore ww is treated as a constant and space complexity of iatomic headers becomes 𝒪(m)\mathcal{O}(m).

Time complexity of adding iatomic nodes: Vercel creates all iatomic headers (new leaf nodes) by performing the post-order traversal of the binary tree. While doing so, Vercel checks whether the current node of the tree is labeled as a supernet. If the current node is labeled as a supernet, then for all descendant nodes having a single child, Vercel creates a second child and labels it as iatomic (constant time operation). Since the binary tree has the space complexity 𝒪(m)\mathcal{O}(m) (with mm rules and 𝒪(m)\mathcal{O}(m) atomic headers); therefore, the time complexity of post-order traversal along with the addition of new leaf nodes (iatomic headers) is 𝒪(m)\mathcal{O}(m).

III-C Identifying affected atomic+iatomic headers post a rule update

Vercel is always tuned into the network to intercept any rule update at the routers. Once Vercel identifies an update, it computes whether reachability between different source-destination pairs is intact. For this, Vercel determines corresponding atomic+iatomic headers from the created binary tree, whose reachability may be affected. Vercel denotes all such headers in a set SaffectedS\textsuperscript{affected}. To create SaffectedS\textsuperscript{affected}, Vercel traverses a path of length LL from the root to a node ojo_{j}, directed by the LL bits of the header (specified by the rule being inserted/deleted). Thereafter, Vercel traverses subtrees of node ojo_{j}. During this traversal, if Vercel observes an atomic/iatomic node, then it appends the node’s identifier in the set SaffectedS\textsuperscript{affected}. Besides the set SaffectedS\textsuperscript{affected}, Vercel creates another set PaffectedP\textsuperscript{affected}, which contains those ports on which a router forwards the packet with the header in the set SaffectedS\textsuperscript{affected}. Since nodes of the binary tree store information of the router’s identifier (ii) and corresponding port (pp) in the form of a tuple (i,pi,\;p); therefore, during the tree traversal (as described for SaffectedS\textsuperscript{affected}), Vercel collects tuples from the nodes labeled as atomic/supernet and includes those in the set PaffectedP\textsuperscript{affected}.

The following theorem shows that it is possible to identify sets SaffectedS\textsuperscript{affected} and PaffectedP\textsuperscript{affected} in linear time, which justifies the fast verification time of Vercel (Section VI).

Theorem 2: The worst case time complexity to identify both the sets SaffectedS\textsuperscript{affected} and PaffectedP\textsuperscript{affected} is 𝒪(m)\mathcal{O}(m), where mm is the number of atomic+iatomic headers in the network.

Proof of Theorem 2: Insertion of a rule requires finding affected headers and relevant router ports (Section III.B). Insertion of a header (corresponding to a new rule) in the binary tree requires 𝒪(m)\mathcal{O}(m) time for traversing a path length of LL (for LL bits header) from the root to a node ojo_{j} and visiting all child nodes of ojo_{j} in order to create iatomic nodes in the tree (worst case scenario). While traversing the tree, Vercel finds all overlapping rules in a router, creates the set SaffectedS\textsuperscript{affected} by appending the identifier of the leaves and identifies relevant router ports PaffectedP\textsuperscript{affected} by collecting the (i,p)(i,\;p) tuples. This tree traversal a) from the root to node ojo_{j} and b) subtrees of ojo_{j} can be done with the time complexity of 𝒪(m)\mathcal{O}(m). Therefore, after a rule insertion, the time complexity to create the sets SaffectedS\textsuperscript{affected} and PaffectedP\textsuperscript{affected} is 𝒪(m)\mathcal{O}(m). A similar argument holds for the rule deletion that has a time complexity of 𝒪(m)\mathcal{O}(m).

III-D Creation of subspaces for relevant ports

After obtaining sets SaffectedS\textsuperscript{affected} and PaffectedP\textsuperscript{affected}, Vercel represents atomic+iatomic headers from the set SaffectedS\textsuperscript{affected} in a vector space of mm-dimensions. For that, Vercel computes the number of headers in the set SaffectedS\textsuperscript{affected} i.e., m=|Saffected|m=|S\textsuperscript{affected}|. Thereafter, Vercel defines an mm-dimensional space by creating mm orthogonal vectors and uniquely maps these vectors to mm atomic+iatomic headers in the set SaffectedS\textsuperscript{affected}. Orthogonality is essential to model the packet forwarding at routers with least squares (Section III-E). The set of orthogonal vectors can be generated using the Gram-Schmidt [28] process (in which we take any vector, and then using that as a reference, make all other vectors orthogonal, one vector at a time). Later, we shall do away with requirement of orthogonal vectors while achieving linear time reachability computations (Section IV).

After making an mm-dimensional space, Vercel makes subspaces for each port. A port’s subspace includes packets with atomic and iatomic headers that a router sends to that port. To make these subspaces, Vercel goes through each (i,pi,\;p) tuple in the set PaffectedP\textsuperscript{affected}. For each tuple, it picks packets with atomic and iatomic headers from the set SaffectedS\textsuperscript{affected} that router ii sends through output port pp. If router ii sends packets with a unique set of 1nm1\leq n\leq m headers through port pp, Vercel creates a subspace for the (i,pi,\;p) tuple. To do so, Vercel picks nn orthogonal vectors and stores them in a matrix 𝒜ip\mathcal{A}_{i}^{p} of size (m,n)(m,\;n). In addition to 𝒜ip\mathcal{A}_{i}^{p}, Vercel makes a vector binitb_{init} with mm dimensions. This vector shows the packet headers being evaluated for reachability between a source and destination.

III-E Least squares to model the packet processing at a router

After creating subspaces, Vercel utilizes ports in the set PaffectedP\textsuperscript{affected} to traverse different paths between the source and destination for evaluating reachability. Assume source SS, destination DD and an interim router ii, denoted by path: S,,i1,i,i+1,,DS,...,i-1,i,i+1,...,D. Also, assume that Vercel has computed packets (with headers in SaffectedS\textsuperscript{affected}) that are reachable up to router ii from SS. These packet headers are represented by vector bi1b_{i-1}. Now, Vercel aims to identify those headers in bi1b_{i-1} that router ii forwards to i+1i+1 through one of its output ports pp. Algebraically, this situation can be achieved by first solving 𝒜ipxip=bi1\mathcal{A}_{i}^{p}x_{i}^{p}=b_{i-1} to obtain x^ip\hat{x}_{i}^{p}. Thereafter, with x^ip\hat{x}_{i}^{p}, Vercel obtains an orthogonal projection of bi1b_{i-1} onto the column space of 𝒜ip\mathcal{A}_{i}^{p} (denoted as C(𝒜ip)C(\mathcal{A}_{i}^{p})). The column space of a matrix denotes those points that can be obtained through a linear combination of columns of the matrix. The efficient way to solve 𝒜ipxip=bi1\mathcal{A}_{i}^{p}x_{i}^{p}=b_{i-1} is least squares, which always guarantees to provide a solution in real-time. While obtaining the projection of bi1b_{i-1} in C(𝒜ip)C(\mathcal{A}_{i}^{p}) using least squares, there can be three possible outcomes:

Case 1: This is the dominant case in which router ii forwards some of the received packets along an output port pp. This is because the forwarding table has some of the matching rules for all incoming packets. Mathematically, bi1b_{i-1} is partially both in the column space of 𝒜ip\mathcal{A}_{i}^{p} and the null space of (𝒜ip)T(\mathcal{A}_{i}^{p})^{T}. The null space of a matrix denotes points that are orthogonal to all rows of the matrix. In other words, bi1b_{i-1} is a linear combination of basis vectors from C(𝒜ip)C(\mathcal{A}_{i}^{p}) and N((𝒜ip)T))N((\mathcal{A}_{i}^{p})^{T})). As a consequence, there exists no solution to the linear equations 𝒜ipxip=bi1\mathcal{A}_{i}^{p}x_{i}^{p}=b_{i-1}. Therefore, we utilize least squares to obtain the best approximate solution (x^ip\hat{x}_{i}^{p}). Subsequently, x^\hat{x} is used to obtain a projection point bi=𝒜ipx^ipb_{i}=\mathcal{A}_{i}^{p}\hat{x}_{i}^{p}. Now, the projected point bib_{i} (present in C(𝒜ip)C(\mathcal{A}_{i}^{p})), represents a subset of packet headers received at router ii (from the previous router i1i-1) which are being forwarded along output port pp. For an example of this case, refer to Fig. 1(b), where nodes YY and UU are forwarding some of the prefixes in bib_{i}.

Case 2: In this case, router ii forwards all the received packets along its output port pp. Equivalently, bi1b_{i-1} is present in the column space of 𝒜ip\mathcal{A}_{i}^{p}. In this case, the method of least squares returns an exact solution xipx_{i}^{p}. The projected point bi=𝒜ipxipb_{i}=\mathcal{A}_{i}^{p}x_{i}^{p} contains information on all atomic+iatomic headers present in bi1b_{i-1}. Therefore, with least squares, Vercel models packet forwarding for all headers in bi1b_{i-1} to the next router i+1i+1 along the port pp at router ii.

Case 3: In this case, router ii blocks all the received packets along its output port pp, implying bi1b_{i-1} lies in the null space of (𝒜ip)T(\mathcal{A}_{i}^{p})^{T}; therefore, least squares returns a solution x^ip=0\hat{x}_{i}^{p}=0. The projection point is bi=𝒜ipx^ip=0b_{i}=\mathcal{A}_{i}^{p}\hat{x}_{i}^{p}=0. The projection point bi=0b_{i}=0 specifies that router ii blocks all packets with atomic+iatomic headers represented by bi1b_{i-1} at output port pp.

III-F Determining the reachable set of packet headers from the vector representation

If there exist multiple paths between a source and destination, then Vercel would compute reachability along each path and store the reachable set of packet headers at the destination as breachable=d=1ebdb_{reachable}=\sum_{d=1}^{e}b_{d}, where bdb_{d} is the set of reachable headers at the destination node along path dd and ee is the total number of paths created with ports in PaffectedP\textsuperscript{affected}. After traversing all the paths, Vercel labels an atomic+iatomic header (in the set SaffectedS\textsuperscript{affected}) as reachable if it obtains a non-zero dot product between the vector representation of the header and breachableb_{reachable}.

IV Vercel Optimization: Reducing Verification Time

As discussed, Vercel models the packet forwarding behavior along each output port of a router by projecting a point in the column space of the matrix corresponding to the port. The projection is generally obtained by solving normal equations bi=𝒜(𝒜T𝒜)1bi1b_{i}=\mathcal{A}(\mathcal{A}^{T}\mathcal{A})^{-1}b_{i-1} (for simplicity, we have dropped the subscript and superscript for matrix 𝒜\mathcal{A} created for port pp of router ii). However, by selecting orthogonal vectors for representing atomic+iatomic headers, we can obtain the same projection point as bi=𝒜ip(𝒜ip)Tbi1b_{i}=\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T}b_{i-1}. Further, with standard basis vectors, the matrix product 𝒜ip(𝒜ip)Tm×m\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T}\in\mathbb{R}^{m\times m} results in a diagonal matrix with entries 0 and 1. The jthj\textsuperscript{th} diagonal entry will be “1”, if jjth row of 𝒜ip\mathcal{A}^{p}_{i} is non-zero. Therefore, vector bib_{i} is an element-wise product between the diagonal elements of 𝒜ip(𝒜ip)T\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T} and bi1b_{i-1}. We can efficiently obtain the diagonal elements of 𝒜ip(𝒜ip)T\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T} by first creating vip=0v_{i}^{p}=0 in mm-dimensional space and then initializing the jthj\textsuperscript{th} entry of vipv_{i}^{p} to 1 if router ii forwards jthj\textsuperscript{th} header (in the set SaffectedS\textsuperscript{affected}) to its output port pp. After initialization, the forwarding vector vipv_{i}^{p} denotes atomic+iatomic headers (in the set SaffectedS\textsuperscript{affected}) that router ii forwards to its output port pp. Subsequently, we can efficiently compute the projection point bib_{i} as bi=vipbi1b_{i}=v_{i}^{p}\otimes b_{i-1}, where \otimes is the Hadamard product between two vectors. Note that the projection point bib_{i} can now be computed efficiently in linear time.

Theorem 3: The time complexity of modeling the forwarding behavior of a router along a port pp (i.e., obtaining a projection point bimb_{i}\in\mathbb{R}^{m} from bi1mb_{i-1}\in\mathbb{R}^{m}) is 𝒪(m)\mathcal{O}(m), where mm is the number of atomic+iatomic headers in set SaffectedS\textsuperscript{affected} with headers represented by the standard basis vectors.

Proof: The proof is provided in Appendix I.

As described in Section III, at the destination, Vercel utilizes vector breachableb_{reachable} to represent those packet headers, which are reachable from the source. The following theorem shows that by using the standard basis for representing atomic+iatomic headers, it is possible to efficiently recover the reachable packet headers from the vector breachableb_{reachable} in linear time.

Theorem 4: The worst-case complexity of determining the reachable set of atomic+iatomic headers from breachablemb_{reachable}\in\mathbb{R}^{m} is 𝒪(m)\mathcal{O}(m).

Proof: The proof is provided in Appendix II.

We have established that there exists a linear time solution: a) to model packet processing at a router using vector algebra; b) to determine the reachable set of packet headers from the vector breachableb_{reachable}. The following theorem now presents an efficient linear-time solution to compute the reachable set of headers along a path from source to destination. This theorem is an improvement over the complexity achieved by state-of-art Deltanet, which is amortized linear in the number of affected headers and logarithmic in the number of overlapping rules present in a router. In the worst case, the number of elements in the set SaffectedS\textsuperscript{affected} can equal the rules in the network. However, with experiments (Section VI), we show that in all practical scenarios, the number of atomic+iatomic headers in the set SaffectedS\textsuperscript{affected} is much smaller than the total number of rules across all the devices. Vercel implements the linear time approach suggested by the following theorem to achieve fast verification time on different networking scenarios.

Theorem 5: In the worst case, the time complexity of reachability computation along a path using Vercel is 𝒪(m)\mathcal{O}(m), where mm is the number of rules in the network.

Proof: The proof is provided in Appendix III.

Although the above theorem provides time complexity for a single path, in practice, Vercel examines all possible paths (derived from PaffectedP\textsuperscript{affected}) while checking reachability. Next, we show memory efficiency of Vercel in terms of the size and number of forwarding vectors needed to check reachability between a source and destination.

Theorem 6: After a rule insertion/deletion, the space complexity of Vercel is 𝒪(|Saffected||Paffected|)\mathcal{O}(|S\textsuperscript{affected}|*|P\textsuperscript{affected}|) by using the standard basis to represent atomic+iatomic headers and forwarding vectors vipv_{i}^{p}).

Proof: The proof is provided in Appendix IV.

V Expressive Vercel Features

We now extend Vercel to incorporate reachability queries involving header transformations and packet filters, as well as for identifying forwarding loops and blackholes.

V-A Packet transformation

At some middleboxes in the network, forwarding tables also contain rules that transform packet headers, thereby making reachability computation complex. In general, header transformation (denoted by function ftrf^{tr}) is a set of rules containing the header “match” and “transformed” fields. To model header transformations, Vercel extracts transformation rules from routers and inserts the headers (specified in the “match” and “transformed” fields) in the binary tree (as discussed in Section III-A, III-B). Thereafter, Vercel creates a transformation matrix Tim×mT_{i}\in\mathbb{R}^{m\times m}, for each router ii performing header transformations. Here, mm denotes the total number of atomic+iatomic headers in the network. Vercel initializes all entries of matrix TT to zero, representing the absence of header transformation. Thereafter, a few entries in the matrix TT are updated to 1, based on the transformation rules. An entry T(j,k)=1T^{(j,k)}=1 of the matrix specifies that an atomic/iatomic header with identifier kk is transformed to another atomic/iatomic header with identifier jj. While solving for reachability, Vercel first computes ti=H(Tibi1)t_{i}=H(T_{i}b_{i-1}) to transform the headers and then solves 𝒜ipxip=ti\mathcal{A}_{i}^{p}x_{i}^{p}=t_{i} to model forwarding at router ii (where HH represents the unit step function). The vector tit_{i} represents atomic+iatomic headers reachable up to the router ii and then transformed using TiT_{i}.

Example of Transformation Matrix: Consider the headers in the forwarding table of router UU (Figure 1(a)) and a transformation function (denoted as fUtrf^{tr}_{U}) containing a single rule fUtr:01/200/2f^{tr}_{U}:01/2\to 00/2. To create the transformation matrix at router UU, i.e., TUT_{U}, Vercel requires information of atomic+iatomic headers ahead in time. In this example, the atomic and iatomic headers are {a1,q2,a3,a4}\{a1,q2,a3,a4\}. The matched header 01/2 (a3a3) is itself atomic while transformed header 00/2 can be represented by a combination of an atomic 000/3 (a1a1) and an iatomic 001/3 (q2q2) headers. Now, TUT_{U} is a (4, 4) matrix populated as follows.

TU=a1q2a3a4a1( 1010) q20110a30000a40001,T_{U}=\;\;\scriptsize\bordermatrix{&a1&q2&a3&a4\cr a1&1&0&1&0\cr q2&0&1&1&0\cr a3&0&0&0&0\cr a4&0&0&0&1\cr},\;\vspace*{3pt}

The transformation matrix TUT_{U} is initialized with “1”s along its diagonal, except for the index (3, 3). To represent the transformation of header a3a3 to (a1,q2a1,\;q2), TUT_{U} is initialized with “1”s at indices (1, 3) and (2, 3).

V-B Packet filtering

In a network, devices such as firewalls and routers contain an access control list (ACL) to restrict the access of packets. A rule in an ACL is a filtering function faclf^{acl} that maps a set of packet headers to a set of {permit, deny} actions. To incorporate packet filtering at a router/firewall ii, Vercel extracts ACL rules from ii and arranges the header fields from ACL rules in the existing binary tree. The process of adding ACL rules to the binary tree is similar to that of adding forwarding rules (Section III-A), though with different actions. Thereafter, to model packet filtering for all the atomic+iatomic headers in the affected set (Section III-C), Vercel creates an mm-dimensional filtering vector gig_{i} with binary values (0/1). A non-zero entry at the jjth index of the vector gig_{i} implies that an ACL rule allows packets with atomic/iatomic header sjs_{j} to pass through router ii. After obtaining the filtering vector gig_{i}, Vercel performs filtering at router ii as fi=gibi1f_{i}=g_{i}\otimes b_{i-1}, where \otimes represents the Hadamard product between vectors. An index jj with the non-zero entry in the filtered vector fif_{i} implies that received packets with atomic/iatomic header sjs_{j} are permitted to pass through router ii. After filtering, Vercel models packet forwarding along port pp at router ii by solving 𝒜ipxi=fi\mathcal{A}_{i}^{p}x_{i}=f_{i} with least squares.

V-C Identifying forwarding loop

To detect a forwarding loop, consider an intermediate router ii, along the path SS,…,ii-11, ii, ii+11,…,DD. Router ii receives packets with headers in vector bi1b_{i-1} from router ii-11. Thereafter, Vercel models packet forwarding at router ii along a port pp using 𝒜ipxip=bi1\mathcal{A}_{i}^{p}x_{i}^{p}=b_{i-1} and obtains a projection point bi0b_{i}\neq 0. After obtaining bib_{i}, Vercel checks whether the next router i+1i+1 (connected through port pp at router ii) has already been traversed by the received packets. If the identifier of the next router i+1i+1 is already traversed, then Vercel confirms that the rule update will trigger a loop. To compute which packet headers are in a loop, Vercel searches for the indices with non-zero entries in bib_{i}.

V-D Identifying blackholes

Blackholes represent a network state in which a router receives a packet, but its forwarding table does not contain a corresponding rule. To identify a blackhole at router ii, Vercel implements the following steps. (1) Vercel models the forwarding behavior of router ii based on the rules in its forwarding table. Since we have vipv_{i}^{p} for all the affected ports of ii (Section III-C, IV), Vercel performs element-wise logical OR between all vipv_{i}^{p} (denoting atomic+iatomic headers that router ii forwards to its output port pp) to get a unified forwarding vector viv_{i} corresponding to router ii. Intuitively, viv_{i} represents packets with atomic+iatomic headers that router ii forwards to its neighbors. (2) Vercel computes vibi1v_{i}\otimes b_{i-1} to obtain a projection point bib_{i}. Vector bib_{i} represents those atomic+iatomic headers in bi1b_{i-1} that router ii forwards to its neighbors. (3) To detect a blackhole at router ii, Vercel computes ci=bi1bic_{i}=b_{i-1}\oplus b_{i} (where \oplus denotes the element-wise logical XOR). The indices with a non-zero entry in vector cic_{i} represent packets (with atomic+iatomic headers) received at router ii that do not get any match in the forwarding table.

V-E Recommendations from projection errors

The approximate solution x^\hat{x} obtained with least squares provides a projection point 𝒜x^\mathcal{A}\hat{x}. We can measure the difference between two points 𝒜x^\mathcal{A}\hat{x} and bb to obtain the projection error, whose absolute value for computation of reachability has no implication. However, the projection error (𝒜x^b\mathcal{A}\hat{x}-b) is a useful metric to provide intuition into packet processing at a router as well as network-wide metric computations. While the quantum of projection error (𝒜x^b\mathcal{A}\hat{x}-b) has no bearing on reachability, it turns out that the cumulative error vector could be useful in identifying path correctness. In short, the cumulative l2l2 norm of projection errors across a path can reflect the misconfigurations along the path.

When a router receives a packet at a port, then there must exist a rule that forwards the packet to another port. However, if no rule exists that matches the address field of this packet, then taking 𝒜x^b\mathcal{A}\hat{x}-b contributes towards a higher projection error. If the configuration is not along the shortest path, then even if the projection error at a router is low, the accumulated projection error along the path adds up and becomes high.

Hence, one use of projection error computation is to find if the configuration is on the shortest path, which gives a direct measure of the data plane correctness. To achieve this, the controller computes the l2l2 norm of the error at each node and adds the errors along all possible paths obtained through router configurations. If the configurations are correct, the shortest path should have the least total error. If this is not the case, Vercel adjusts the configurations based on the errors at intermediate nodes to ensure they result in the shortest path.

For more insight into this, suppose a provider wants to ensure that all the flows belonging to say a VR application must be configured always on the shortest path. To this end, Vercel initializes vector b0b_{0} with the IP addresses corresponding to these flows and runs a reachability check. While doing so, it computes the cumulative error along each path to the destination. In case the minimum cumulative error is not obtained from the shortest path, then Vercel identifies that most of the configuration is on longer paths. Now, by observing each intermediate node’s projection error, Vercel identifies which flows do not have a matching rule along the shortest path. Finally, these missing rules are configured at the identified nodes. To ensure that the non-shortest path is not chosen, the corresponding rules are first identified and then deleted from the nodes along the non-shortest paths by using projection errors.

Example: To understand how projection errors can be used prolifically, revert back to Figure 1, we also observe that router UU drops some packets due to the absence of a rule that will process header 001/3001/3. This is observed by computing the projection error between vectors bY=b_{Y}= [1, 1, 0] (headers received at UU) and bU=b_{U}= [0, 1, 0] (headers forwarded by UU along port 0). The projection error (bYbUb_{Y}-b_{U}) along port 0 at router UU is [1, 0, 0], and this indicates that packets with header 001/3 are not forwarded along the output port 0 at router UU. However, as per the shortest path configuration, router UU should forward packets with header 001/3 along port 0 (in which case the projection error is zero). Therefore, the error vector helps in identifying possible misconfigurations.

A second use of the projection error is for automatic table population amidst no reachability. If reachability does not exist between two routers, then the cumulative projection error will be very large along all the paths. A path with the lowest cumulative l2 norm indicates that there are already a large number of relevant rules installed at the routers, and we just have to add a few more to establish reachability. On that path, for the nodes which have high projection error (compared to other nodes), Vercel recommends the addition of table entries such that the projection error at those nodes reduces. The table entries (prefix, port), now facilitate the router at which the entry is added to route the packet to the destination, leading to reachability. Through this iterative process, the entries can be updated to result in the shortest path. In this way, by using a simple script, Vercel can automatically establish reachability while achieving the shortest path routing for the source-destination pairs of interest.

VI Evaluation

We evaluate Vercel as a verification tool on various datasets and compare it with other approaches.

VI-A Implementation and dataset details

Vercel is implemented as \sim2000 lines of single-threaded Python 3.7 code with dependency on Numpy [29] (for matrix computations) and Networkx [30] (for creating a network graph). We evaluate the performance of Vercel on a single core of an Intel Core i7 CPU at 3.6GHz clock and 64GB RAM.

To extensively evaluate the performance of Vercel, we have considered network topologies ranging from campus networks to large enterprise/provider networks with millions of rules (up to 124.7 million). Shown in Table II are the datasets that we consider for Vercel’s evaluation. The Stanford and Berkeley datasets serve as benchmarks for evaluating network invariants [26, 31]. Further, we also consider datasets RF 1755, RF 6461, RF 3257, and INET consisting of rules from the autonomous systems derived from the Rocketfuel Project [32]. These datasets represent dense network topologies and contain millions of forwarding rules, suitable for evaluatng scalability of a verification technique. To evaluate Vercel’s performance on a service provider network, we consider datasets Airtel 1, Airtel 2 used by [15], which are available at [31]. To expand to very large providers, we created two synthetic datasets – Simnet1 and Simnet2, containing 1000 and 2000 nodes with 100K edges in each network. We assign IP addresses on these two networks based on the mask length distribution extracted from the Rocketfuel project. Finally, we populate forwarding rules on Simnet1 and Simnet2 with shortest path routing. We also point out that Vercel has been implemented in conjunction with controllers and switches such as ONOS and OVS, thus showing industry-centric compatibility. Table II provides the specifics of the datasets such as nodes, edges, number of rules in the network, number of updates (insertion/deletion). To compute the verification time of Vercel, we first load 90% of the forwarding rules from the dataset, then populate the binary tree and then perform real-time updates by randomly selecting the remaining rules for insertion and deletion at router forwarding tables. We consider dynamic updates by first performing rule insertion in the forwarding tables (which comprises half of the rule updates shown in the sixth column of Table II), followed by rule deletion.

TABLE II: Details of the datasets.
Dataset Nodes Edges Average degree of nodes Number of rules Number of rule updates Average table size
Airtel1 16 26 3.25 3.81 ×\times 104 2 ×\times 105 2381
Airtel2 16 26 3.25 3.81 ×\times 104 2 ×\times 105 2381
Stanford 16 74 9.25 7.3 ×\times 105 1 ×\times 104 45570
Berkeley 23 252 21.91 12.81 ×\times 106 1 ×\times 104 557300
RF 1755 87 2308 53.06 33.73 ×\times 106 1 ×\times 104 387734
RF 6461 138 8140 117.97 75.01 ×\times 106 1 ×\times 104 543519
RF 3257 161 9432 117.17 74.49 ×\times 106 1 ×\times 104 422688
INET 314 40770 259.68 124.7 ×\times 106 1 ×\times 104 395979
Simnet1 1000 100000 200 49.95 ×\times 106 1 ×\times 104 49950
Simnet2 2000 100000 100 99.95 ×\times 106 1 ×\times 104 49975

VI-B Verification time per rule update

The most important metric to evaluate a network verification scheme is verification time [16, 15, 17, 24]. Figure 5 shows the cumulative distribution function (CDF) of verification time achieved by Vercel applied to the datasets [31, 26]. From Figure 5, we observe that Vercel processes at least 70% of the updates within 100μs100\mu s on all the datasets. Even if we consider 90% of the updates, verification time of Vercel remains under 500μs500\mu s. These results imply that Vercel achieves sub-millisecond verification time, even on large networks.

Refer to caption
Figure 5: CDF of the verification time of Vercel on 8 datasets [31, 26] during a rule update.
TABLE III: Rule verification time of Vercel for checking reachability.
Dataset Airtel1 Airtel2 Stanford Berkeley RF1755 RF6461 RF3257 INET Simnet1 Simnet2
# atomic+iatomic headers 1404 1404 335225 747854 902740 902740 902740 629954 77437 77836
Median (in μs\mu s) 26 26 53 52 32 66 77 53 112 194
Mean (in μs\mu s) 32 32 53 55 40 111 116 162 232 428
Percentage <250μs250\mu s 99.76% 99.76% 99.75% 99.8% 99.72% 88.72% 87.07% 73.9% 70.9% 54.58%

Table III provides verification time of Vercel on 10 datasets. The first row of Table III shows the total number of atomic+ iatomic headers in each network. Note that the ratio of atomic+iatomic headers to the number of forwarding rules on the INET dataset (containing the largest number of rules across all datasets) is as small as \sim0.005, hence Vercel exploits the overlap among forwarding rules and represents the header space with a small number of non-overlapping atomic+iatomic headers, thereby reducing the number of headers to be processed. The second and third rows of Table III show the median and mean of the verification time for Vercel on different datasets. These results show that the median and mean verification time of Vercel remains bounded to within 77μs77\mu s and 162μs162\mu s on all but the two synthetic datasets. In the case of the INET dataset, the mean increases to 162μs162\mu s as the average number of neighbors per router is \sim260 (which is significantly higher than all other datasets). This increase in nodal degree results in the traversal of more paths in the network, increasing verification time. We also observe the mean and median of verification time of Vercel on two synthetic networks is up to 428μs428\mu s and 194μs194\mu s. Since these networks contain a large number of (2000) nodes and (100K) edges, therefore reachability analysis requires traversing longer paths between source and destination nodes. Accordingly, an increase in the path length increases verification time.

Another metric of interest (benchmarked by [15, 17]) is – how many updates are verified in less than 250μs250\mu s (third row in Table III). Vercel verifies \sim74% of the updates within 250μs250\mu s, on a large network with \sim124.7M rules. For a network with 20002000 nodes, Vercel manages to verify \sim55% of the rules within 250μs250\mu s. The primary reason for this fast verification time is the quick identification of the affected atomic+iatomic headers and verifying multiple headers together in vector space using least squares.

VI-C Identifying loops and blackhole

Shown in Table IV (rows labeled with RL and RB) is the time for determining reachability in the presence of loops and in another case with blackholes. In these two cases, the mean verification time of Vercel is less than 90 and 98 μ\mus on all the datasets. Verifying loops require a slightly longer time because Vercel performs extra checks for re-traversal along a path and stores a non-zero vector bib_{i} to identify the headers in the loop (Section V). For blackhole detection, Vercel performs some extra steps (element-wise logical OR and XOR, see Section V) along with a reachability check, which slightly increases the time for verification.

TABLE IV: Vercel’s reachability evaluation in the presence of loops (RL), blackhole (RB), packet transformations (RT), packet filtering (RF) and routing policy (RP) analysis.
Dataset Stanford Berkeley RF 1755 Airtel 1 Airtel 2
Median (RL) 76μs\mu s 53μs\mu s 88μs\mu s 33μs\mu s 32μs\mu s
Mean (RL) 79μs\mu s 56μs\mu s 92μs\mu s 63μs\mu s 57μs\mu s
Median (RB) 96μs\mu s 53μs\mu s 91μs\mu s 34μs\mu s 34μs\mu s
Mean (RB) 98μs\mu s 59μs\mu s 97μs\mu s 64μs\mu s 59μs\mu s
Median (RT) 884μs\mu s 417μs\mu s 469μs\mu s 36μs\mu s 37μs\mu s
Mean (RT) 918μs\mu s 349μs\mu s 440μs\mu s 285μs\mu s 229μs\mu s
Median (RF) 75μs\mu s 53μs\mu s 87μs\mu s 32μs\mu s 31μs\mu s
Mean (RF) 77μs\mu s 56μs\mu s 91μs\mu s 49μs\mu s 45μs\mu s
Median (RP) 82μs\mu s 62μs\mu s 88μs\mu s 34μs\mu s 33μs\mu s
Mean (RP) 85μs\mu s 60μs\mu s 96μs\mu s 38μs\mu s 36μs\mu s

VI-D Packet filtering

Since none of the existing datasets provide ACL rules, we created ACL rules by mapping the destination IP addresses (in the existing datasets) to filtering actions (permit/deny). Table IV shows that the mean and median verification time in the presence of ACL rules (represented as rows labeled with RF for Vercel) is less than 91 and 87 μ\mus on all the datasets that we considered.

VI-E Packet header transformation

We define a function to transform an IP prefix into another randomly chosen IP prefix and store the transformation function in a matrix TiT_{i}, for each router ii (Section V). In Table IV (with the rows labeled as RT), we observe that the mean verification time of Vercel is less than 1 ms on all the datasets. Due to the matrix-vector product Tibi1T_{i}b_{i-1}, the mean verification time increases in the presence of transformation functions. Note that the verification time of Vercel is comparable to APKeep [17], in which the verification time is also bounded within 1 ms. Note, Veriflow [16] and Deltanet [15] do not model packet transformations.

VI-F Checking policy violations

Policy-based routing (PBR) [33] is used by network administrators to define the routing behavior of packets in a network by overriding the underlying routing protocol. Vercel assigns a PBR label to those nodes of the binary tree whose corresponding headers are generated through PBR. Thereafter, Vercel does not allow any non-PBR protocol to update the forwarding action (e.g., output port) at the labeled node in the binary tree. Subsequently, after a rule update, Vercel determines the affected headers and ports in the binary tree. Finally, Vercel evaluates reachability and checks for the violation of routing policies. In our evaluation, Vercel identifies whether, after a rule update, the traffic: a) violates a path length constraint; or, b) passes through a predetermined set of routers. Table IV (rows labeled with RP) shows that the median and mean verification time of Vercel for checking these two policies together is less than 88 and 96 μs\mu s. With an additional consideration of routing policy, the verification time increases slightly. The reason is that after evaluating reachability at the destination, Vercel has to iterate over the reachable paths to check for policy violations.

VI-G Path recommendation, error rectification

Through Vercel we determine the quality of different paths (for example when deploying ECMP) by utilizing the projection error accumulated over nodes along each path. The cumulative L2L2-norm of the projection error (across all nodes along a path) quantifies the quality of a path in terms of the number of misconfigured packet headers at routers on the path. Figure 6 shows the cumulative L2L2 norm of the projection error as a function of path length. Since protocols configure most of the services on shortest paths, therefore we observe low projection error on such paths. Generally, fewer services are configured on longer paths, which is reflected by an increase in the cumulative L2 norm of the projection error. Therefore, we can use projection error as a metric to recommend the paths for service configurations.

As discussed in Section II-B, error rectification is done by including bb in the column space of matrix 𝒜\mathcal{A} for all nodes along the path, where bb is in the null space of 𝒜T\mathcal{A}^{T}. We created a synthetic network of 50 nodes and 200 edges to evaluate this approach and configured corresponding forwarding tables. Now, Vercel is applied to identify reachability between an arbitrary source and destination node. Assume, Vercel recognizes that there is no reachability, indicating that a fix is required. Computing this fix must be done fast. To show rectification computation, we varied the number of atomic/iatomic headers in the network and measured the time to rectify across differing path lengths. Results shown in Figure 7 allude to the time required to establish reachability. A path length of five required more fixes than a shorter path length of three. However, even if we increase the network-wide number of atomic/iatomic headers, the fixing time does not grow linearly. This is because Vercel leverages vectorization, where the same algebraic operation on different headers can be performed simultaneously. From Figure 7, it is evident that even on a longer path, Vercel can establish reachability within 50μs\mu s irrespective of the number of atomic/iatomic headers.

VI-H Comparison

We compare Vercel with the state-of-art real-time data plane verification tools such as AP Verifier [22], Veriflow [16], NetPlumber [24], Deltanet [15] and APKeep [17]. Table V compares the verification time of the various tools on the Stanford (campus) and Airtel (service provider) datasets. Note that on a large dataset such as Stanford, the performance of Vercel is better than all other verification techniques except Deltanet. Note Deltanet is designed only to model forwarding rules. In contrast, Vercel, APKeep, NetPlumber, and AP Verifier support data plane verification with a larger set of network functions such as filtering and transformation. Table V shows that, on the Stanford dataset, Vercel achieves mean verification time of 53 μ\mus and its performance is summarized as: 8×8\times over Veriflow, 164×164\times over NetPlumber, 1.7×1.7\times over APKeep, 36×36\times over AP Verifier. On the Stanford dataset, Vercel processes 99.7% of the rule updates within 250 μ\mus. Whereas, existing solutions such as NetPlumber, APVerifier, Veriflow and APKeep process 23.6%, 13.3%, 96.1% and 96.4% of the rule updates within 250 μ\mus, respectively. Similarly, on a small dataset of Airtel 1, Vercel achieves a significant improvement of 2.5×2.5\times over AP Verifier, 1.84×1.84\times over Veriflow, and 118×118\times over NetPlumber. The reason for Vercel’s speedup is the simultaneous processing of multiple atomic+iatomic headers at different ports. In contrast, other approaches do not have the in-built capability of simultaneous processing and are comparatively slower.

Refer to caption
Figure 6: Cumulative L2 norm of the projection error as a function of the path length for different source-destination pairs on Airtel1 dataset.
Refer to caption
Figure 7: Computation time of Vercel for error rectification as a function of non-reachable headers and path length.
TABLE V: Comparing verification time of different approaches on public datasets [17]. (TO = timeout)
Dataset Stanford Airtel 1 Airtel 2
Metric time (μs)(\mu s) % <250 μs\mu s time (μs)(\mu s) % <250 μs\mu s time (μs)(\mu s) % <250 μs\mu s
AP Verifier 1953 13.3 80 91.3 135 77.4
Veriflow 468 96.1 59 99.9 48 99.9
NetPlumber 8700 23.6 3804 3.8 TO TO
Deltanet 9 99.9 3 99.9 4 99.9
APKeep 94 96.4 7 99.8 6 99.9
Vercel 53 99.7 32 99.7 32 99.7

VI-I What-if queries and Batch Processing

A network administrator can use “what-if” queries to determine the fate of packets when a link goes down or model other scenarios. After a link failure, the SDN controller/ routing protocol recomputes paths between different source-destination pairs. Subsequently, the controller deletes existing rules corresponding to the failed links and inserts new rules (based on the updated paths) in the forwarding table of routers. To simulate a link failure, we select a random link (l)(l) in the network and delete rules from the both the connected routers that forward packets over ll. As shown in the first row of Table VI, a link failure can trigger deletion of 1000s of rules, therefore Vercel computes reachability by considering batches of rules. Table VI shows that the mean verification time for checking reachability along with loops on the INET dataset for Vercel is around 534 ms when a link failure triggers \sim3060 rule deletions in the network. In contrast, Deltanet and Veriflow require 2888 and 29117 ms. This result shows that Vercel is up to 5×\times and 54×\times faster than Deltanet and Veriflow on INET dataset. However, Deltanet performs better in 3 of the 5 datasets; though, the performance of Deltanet is dependent on network size and the number of rules.

TABLE VI: Comparing different approaches for ‘what-if’ queries for link failures. The second half of the table compares the memory utilization to results in [15].
Dataset Berkeley RF1755 RF6461 RF3257 INET
Avg. # of updates 50863 14603 9216 7900 3060
Verification time (in ms) Veriflow 3073 8100 17594 17645 29117
Deltanet 93 897 0.4 2.6 2888
Vercel 1083 603 766 741 534
Memory (in MB) Veriflow 1089 2713 5920 5882 9776
Deltanet 6208 16937 39481 40716 63563
Vercel 2216 4207 5047 5529 14200

VI-J Memory utilization

We now show the memory utilization of Vercel on the various datasets and compared to Veriflow and Deltanet. Table VI shows that Vercel is upto 7.8×\times memory efficient than Deltanet. Deltanet is memory intensive due to the implementation of two data structures: 1) binary search tree to create atoms and 2) edge labeled graph to model the forwarding behavior of packets in a network. As compared to Veriflow, Vercel consumes more memory because of the extra space required to store: a) (node, port) tuples at different nodes of the tree; b) node label in string format and; c) integer identifier at leaf nodes of the tree, but also leads to more expressive power and path recommendation.

VII Related Work

The initially proposed data plane verification techniques require a snapshot of the network state to perform queries such as reachability, loop detection and slice isolation [13, 23, 34, 22, 35, 36, 37, 38, 39, 14, 40, 41].

Xie et al. [13] presented a technique for performing static reachability analysis of large-scale IP networks. The authors modelled routing protocols, packet filters and packet transformations using a formal framework. Their reachability check algorithm can be used to efficiently determine the reachability in IP networks and also capture select what-if scenarios.

Mai et al. [34] proposed a tool called Anteater that enables operators to debug their data plane by identifying bugs in forwarding behavior. Anteater works by comparing the expected behavior of network packets with the actual behavior observed on the network, using a set of customizable test cases. Anteater converts high-level network invariants into boolean satisfiability problems (SAT) and solves these using SAT solvers. In case, the network state is found to violate the invariants, Anteater provides a counter-example that helps in tracking the root cause. This tool mainly focused on identifying forwarding loops, packet loss, inconsistencies that emerged through dataplane misconfiguration.

Header space analysis (HSA) [23] is a key approach for performing static analysis of network configuration. Header transformation – a critical step in HSA involves analyzing the effects of network policies on packet headers. Authors describe several types of header transformations, such as port mapping, address rewriting, and develop techniques for analyzing their effects on header spaces. Specifically, HSA represents a LL-bit packet header in LL-dimensional space with reachability checks translated to algebraic operations over LL-dimensional hypercubes. The limitation of HSA is that it can be computationally expensive, which makes it unsuitable, especially for large and complex networks.

Snapshot-based techniques are designed to identify problems after they occur. However, in order to mitigate their impact, it is necessary to detect events before they cause problems. As a result, more recent verification techniques aim to detect anomalies in real-time. NetPlumber [24] was one of the earliest techniques in this direction. It creates a plumbing graph that identifies existing rules affected by the addition or deletion of new rules, and uses algebraic operations defined in HSA. However, for networks with a large number of rules, the plumbing graph can be large, resulting in high verification time.

Veriflow [16] addresses the challenge of dynamic network verification through the partitioning of packet headers into equivalence classes (EC) using a trie structure. It then checks network invariants by traversing a forwarding graph that corresponds to each EC. However, Veriflow’s approach is limited to modeling forwarding functions in a network and only considers each EC in isolation, which can result in higher verification times.

Yang et al. [22] presented an algorithm for verifying network properties in real-time using atomic predicates – small and reusable building blocks that can express complex network properties. All predicates in AP Verifier are represented by binary decision diagrams (BDDs), which are rooted, directed acyclic graphs. Logical operations on BDDs can be performed efficiently using graph-based algorithms.

Deltanet [15] builds on Veriflow’s approach by constructing a single forwarding graph that covers all the equivalence classes (ECs) and then uses it to check network invariants. However, Deltanet’s approach is limited to verifying reachability in the presence of forwarding rules and does not support modeling of other network functions, such as packet filtering and transformation. Deltanet also does not support batch processing. Vercel on the other hand, does batch processing, evaluates what-if conditions and tends to be more scalable due to vector algebra.

APKeep [17] enhances the methods used in Veriflow and Deltanet by enabling the modeling of a wide range of network functions. This is accomplished by partitioning packet headers into equivalence classes (ECs) and representing network functions with Boolean formulas. The algorithm then verifies network invariants by solving these formulas using binary decision diagrams (BDDs). However, APKeep is only able to detect configuration errors, whereas Vercel can identify configuration errors and offer recommendations to correct them.

Katra [42] uses pushdown systems for evaluating reachability in multi-layer networks. In contrast, Vercel models packet headers in a vector space and checks network invariants and policies by using least squares. Flash [43] proposes an automata-theory-based solution to handle data plane verification amidst update storms and too-slow arrivals of update messages. Vercel performs well while handling update storms and outperforms state-of-the-art sequential data plane verification approaches. Mahjong [44] is a tool that helps users to choose among multiple dataplane verification approaches. Vercel is naturally suited to fast-handle storm of updates because of its scalability and vector-based parallel processing abilities.

VIII Summary

We presented Vercel, inspired by the techniques of linear algebra to check reachability and delve into error rectification and recommendation. Vercel works by making use of vector spaces that are the result of mapping packet headers onto a binary tree. Then, these vector spaces lead to the formation of a matrix 𝒜\mathcal{A}, which represents the set of headers at a port, and vector bb – the set of headers that need to be evaluated, resulting in 𝒜x=b\mathcal{A}x=b. Since 𝒜x=b\mathcal{A}x=b is not always solvable, Vercel deploys least squares, which guarantees a solution irrespective of the rank of 𝒜\mathcal{A}. Representing headers in vector space helps process multiple headers simultaneously. The use of vector algebra makes Vercel achieve aspects of verification like batch updates, what-if conditions, path and table recommendations beyond what other techniques can do. Based on the experiments with real-world datasets, we show Vercel models a variety of network functions and checks for reachability and network invariants (loops, blackhole).

We show that Vercel is at least 70% faster than the state-of-art techniques while providing more expressive power. We also showed that Vercel could be used to evaluate reachability post link failures and check for routing policies. The least squares solution also results in a recommendation model that checks tables for configuration anomalies to avoid longer paths. Vercel’s vector space architecture leads to a paradigm shift – where it takes in intents and converts these into table entries leading to automatic service provisioning. The recommendation model provides for more possibilities beyond what we have so far studied in the domain of network verification.

References

  • [1] H. Zeng, P. Kazemian, G. Varghese, and N. McKeown, “A survey on network troubleshooting,” Technical Report Stanford/TR12-HPNG-061012, Stanford University, Tech. Rep., 2012.
  • [2] N. Feamster and H. Balakrishnan, “Detecting bgp configuration faults with static analysis,” in Proceedings of the 2nd conference on Symposium on Networked Systems Design and Implementation (NSDI), 2005, pp. 43–56.
  • [3] R. Mahajan, D. Wetherall, and T. Anderson, “Understanding bgp misconfiguration,” in Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communications (SIGCOMM), 2002, p. 3–16.
  • [4] A. Abhashkumar, A. Gember-Jacobson, and A. Akella, “Tiramisu: Fast multilayer network verification,” in 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI), Feb 2020, pp. 201–219.
  • [5] R. Beckett, A. Gupta, R. Mahajan, and D. Walker, “A general approach to network configuration verification,” in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM), 2017, p. 155–168.
  • [6] S. Prabhu, K. Y. Chou, A. Kheradmand, B. Godfrey, and M. Caesar, “Plankton: Scalable network configuration verification through model checking,” in 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2020, pp. 953–967.
  • [7] S. K. Fayaz, T. Sharma, A. Fogel, R. Mahajan, T. Millstein, V. Sekar, and G. Varghese, “Efficient network reachability analysis using a succinct control plane representation,” in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2016, pp. 217–232.
  • [8] A. Gember-Jacobson, R. Viswanathan, A. Akella, and R. Mahajan, “Fast control plane analysis using an abstract representation,” in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM), 2016, pp. 300–313.
  • [9] R. Beckett, A. Gupta, R. Mahajan, and D. Walker, “Control plane compression,” in Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM), 2018, p. 476–489.
  • [10] T. Ball, N. Bjørner, A. Gember, S. Itzhaky, A. Karbyshev, M. Sagiv, M. Schapira, and A. Valadarsky, “Vericon: Towards verifying controller programs in software-defined networks,” in Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2014, p. 282–293.
  • [11] R. Beckett, X. K. Zou, S. Zhang, S. Malik, J. Rexford, and D. Walker, “An assertion language for debugging sdn applications,” in Proceedings of the Third Workshop on Hot Topics in Software Defined Networking, 2014, p. 91–96.
  • [12] K. Jayaraman, N. Bjørner, J. Padhye, A. Agrawal, A. Bhargava, P.-A. C. Bissonnette, S. Foster, A. Helwer, M. Kasten, I. Lee et al., “Validating datacenters at scale,” in Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM), 2019, pp. 200–213.
  • [13] G. G. Xie, J. Zhan, D. A. Maltz, H. Zhang, A. Greenberg, G. Hjalmtysson, and J. Rexford, “On static reachability analysis of ip networks,” in Proceedings IEEE Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM), 2005, pp. 2170–2183.
  • [14] H. Zeng, S. Zhang, F. Ye, V. Jeyakumar, M. Ju, J. Liu, N. McKeown, and A. Vahdat, “Libra: Divide and conquer to verify forwarding tables in huge networks,” in 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2014, pp. 87–99.
  • [15] A. Horn, A. Kheradmand, and M. Prasad, “Delta-net: Real-time network verification using atoms,” in 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2017, pp. 735–749.
  • [16] A. Khurshid, X. Zou, W. Zhou, M. Caesar, and P. B. Godfrey, “Veriflow: Verifying network-wide invariants in real time,” in 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2013, pp. 15–27.
  • [17] P. Zhang, X. Liu, H. Yang, N. Kang, Z. Gu, and H. Li, “Apkeep: Realtime verification for real networks,” in 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2020, pp. 241–255.
  • [18] N. Bjørner, G. Juniwal, R. Mahajan, S. A. Seshia, and G. Varghese, “ddnf: An efficient data structure for header spaces,” in Haifa Verification Conference, 2016, pp. 49–64.
  • [19] A. Horn, A. Kheradmand, and M. R. Prasad, “A precise and expressive lattice-theoretical framework for efficient network verification,” in IEEE International Conference on Network Protocols (ICNP), 2019, pp. 1–12.
  • [20] A. Panda, O. Lahav, K. Argyraki, M. Sagiv, and S. Shenker, “Verifying reachability in networks with mutable datapaths,” in 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2017, pp. 699–718.
  • [21] R. Stoenescu, M. Popovici, L. Negreanu, and C. Raiciu, “Symnet: Scalable symbolic execution for modern networks,” in Proceedings of the 2016 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM), 2016, p. 314–327.
  • [22] H. Yang and S. S. Lam, “Real-time verification of network properties using atomic predicates,” IEEE/ACM Transactions on Networking (TON), vol. 24, no. 2, pp. 887–900, Apr. 2015.
  • [23] P. Kazemian, G. Varghese, and N. McKeown, “Header space analysis: Static checking for networks,” in 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2012, pp. 113–126.
  • [24] P. Kazemian, M. Chang, H. Zeng, G. Varghese, N. McKeown, and S. Whyte, “Real time network policy checking using header space analysis,” in 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2013, pp. 99–111.
  • [25] G. Strang, Introduction to linear algebra.   Wellesley-Cambridge Press Wellesley, MA, 1993, vol. 3.
  • [26] P. Kazemian, “Hassel-public,” https://bitbucket.org/peymank/hassel-public/wiki/Home, 2014, accessed: 2021-1-1. [Online]. Available: https://bitbucket.org/peymank/hassel-public/wiki/Home
  • [27] S. Janardhan, “More details about the October 4 outage,” 2021, accessed: 2024-1-1. [Online]. Available: https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/
  • [28] L. Pursell and S. Trimble, “Gram-schmidt orthogonalization by gauss elimination,” The American Mathematical Monthly, vol. 98, no. 6, pp. 544–549, May 1991.
  • [29] C. R. H. et al., “Array programming with NumPy,” Nature, vol. 585, no. 7825, pp. 357–362, Sep. 2020.
  • [30] A. A. Hagberg, D. A. Schult, and P. J. Swart, “Exploring network structure, dynamics, and function using networkx,” in Proceedings of the 7th Python in Science Conference, 2008, pp. 11 – 15.
  • [31] A. Horn, “Deltanet datasets,” https://github.com/delta-net/datasets, 2017, accessed: 2022-1-1. [Online]. Available: https://github.com/delta-net/datasets
  • [32] N. Spring, R. Mahajan, and D. Wetherall, “Measuring isp topologies with rocketfuel,” ACM SIGCOMM Computer Communication Review, vol. 32, no. 4, pp. 133–145, Aug. 2002.
  • [33] C. support, “Understanding policy routing,” https://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/10116-36.html, 2005, accessed: 2022-1-1.
  • [34] H. Mai, A. Khurshid, R. Agarwal, M. Caesar, P. B. Godfrey, and S. T. King, “Debugging the data plane with anteater,” ACM SIGCOMM Computer Communication Review, vol. 41, no. 4, pp. 290–301, Aug. 2011.
  • [35] E. Al-Shaer and S. Al-Haj, “Flowchecker: Configuration analysis and verification of federated openflow infrastructures,” in Proceedings of the 3rd ACM workshop on Assurable and usable security configuration, 2010, pp. 37–44.
  • [36] E. Al-Shaer, W. Marrero, A. El-Atawy, and K. Elbadawi, “Network configuration in a box: Towards end-to-end verification of network reachability and security,” in 17th IEEE International Conference on Network Protocols (ICNP), 2009, pp. 123–132.
  • [37] A. Fogel, S. Fung, L. Pedrosa, M. Walraed-Sullivan, R. Govindan, R. Mahajan, and T. Millstein, “A general approach to network configuration analysis,” in 12th USENIX symposium on networked systems design and implementation (NSDI), 2015, pp. 469–483.
  • [38] A. Jeffrey and T. Samak, “Model checking firewall policy configurations,” in IEEE International Symposium on Policies for Distributed Systems and Networks, 2009, pp. 60–67.
  • [39] N. P. Lopes, N. Bjørner, P. Godefroid, K. Jayaraman, and G. Varghese, “Checking beliefs in dynamic networks,” in 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2015, pp. 499–512.
  • [40] B. Tian, X. Zhang, E. Zhai, H. H. Liu, Q. Ye, C. Wang, X. Wu, Z. Ji, Y. Sang, M. Zhang, D. Yu, C. Tian, H. Zheng, and B. Y. Zhao, “Safely and automatically updating in-network acl configurations with intent language,” in Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM), 2019, p. 214–226.
  • [41] S. K. Fayaz, T. Yu, Y. Tobioka, S. Chaki, and V. Sekar, “BUZZ: Testing Context-Dependent policies in stateful networks,” in 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2016, pp. 275–289.
  • [42] R. Beckett and A. Gupta, “Katra: Realtime verification for multilayer networks,” in 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), 2022, pp. 617–634.
  • [43] D. Guo, S. Chen, K. Gao, Q. Xiang, Y. Zhang, and Y. R. Yang, “Flash: fast, consistent data plane verification for large-scale network settings,” in Proceedings of the ACM SIGCOMM 2022 Conference, 2022, pp. 314–335.
  • [44] Y. Li, C. Jia, X. Hu, and J. Li, “Mahjong: A generic framework for network data plane verification,” in Proceedings of the Symposium on Architectures for Networking and Communications Systems, 2021, pp. 52–58.
  • [45] J. Alman and V. V. Williams, “A refined laser method and faster matrix multiplication,” in Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA), 2021, pp. 522–539.

Appendix A Proofs

A-A Proof of Theorem 3

Proof overview: We start the proof by showing that if columns of the matrix 𝒜\mathcal{A} are selected from the standard basis (which is an orthonormal basis), then the projection point bib_{i} can be obtained in linear time by multiplying diagonal elements of the matrix 𝒜𝒜T\mathcal{A}\mathcal{A}^{T} and vector bi1b_{i-1}. Thereafter, we show two cases with quadratic and linear time complexity to obtain the diagonal elements of the matrix 𝒜𝒜T\mathcal{A}\mathcal{A}^{T}. This is a significant improvement compared to the use of only orthogonal vectors that require matrix product (𝒪(m2.37)\mathcal{O}(m^{2.37}) [45]).

Proof: With the standard basis, it is possible to reduce the time complexity of solving the linear equations 𝒜ipxip=bi1\mathcal{A}^{p}_{i}x^{p}_{i}=b_{i-1} and obtain the projection point bib_{i}. Now the columns of matrix 𝒜ip\mathcal{A}_{i}^{p} are orthonormal (𝒜𝒜T=I\mathcal{A}\mathcal{A}^{T}=I), therefore the computation of a projection point bib_{i} from bi1b_{i-1} can be simplified as bi=𝒜ip(𝒜ip)Tbi1b_{i}=\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T}b_{i-1}. Note that the matrix product 𝒜ip(𝒜ip)Tm×m\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T}\in\mathbb{R}^{m\times m} results in a diagonal matrix, whose jthj\textsuperscript{th} diagonal entry will be “1”, if jjth row of the matrix is non-zero. Otherwise, diagonal entries of 𝒜ip(𝒜ip)T\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T} will be 0. Therefore, vector bib_{i} is essentially an element-wise product between the diagonal elements of 𝒜ip(𝒜ip)T\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T} and bi1b_{i-1}.

Now, we ask if it is possible to create a vector vipv_{i}^{p} (having mm elements) containing the diagonal elements of the matrix 𝒜ip(𝒜ip)T\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T} but without performing the matrix product 𝒜ip(𝒜ip)T\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T}? If we can create vipv_{i}^{p} in linear time, then the projection point bib_{i} can be computed efficiently (linear time) as bi=vipbi1b_{i}=v_{i}^{p}\otimes b_{i-1}, where \otimes is the element-wise product between two vectors. There exist two possibilities for creating vipv_{i}^{p} (without computing 𝒜ip(𝒜ip)T\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T}), each leading to different complexities.

Case 1: Modeling forwarding behavior with the complexity of 𝒪(m2)\mathcal{O}(m^{2}). Represent vipv_{i}^{p} as the sum of columns of 𝒜ip\mathcal{A}_{i}^{p} i.e. (vip)j=k=1m(𝒜ip)jk(v_{i}^{p})^{j}=\sum_{k=1}^{m}(\mathcal{A}_{i}^{p})^{jk}. In this case, complexity of creating vipv_{i}^{p} is 𝒪(m2)\mathcal{O}(m^{2}) and therefore, complexity for obtaining the projection point bib_{i} from bi1b_{i-1} is 𝒪(m2)\mathcal{O}(m^{2}).

Case 2: Modeling forwarding behavior with the complexity of 𝒪(m)\mathcal{O}(m). Instead of obtaining vector vipv_{i}^{p} from matrix 𝒜ip\mathcal{A}_{i}^{p}, create vip=0v_{i}^{p}=\vec{0} in mm-dimensional space. Subsequently, if router ii forwards jthj\textsuperscript{th} header (in the set SaffectedS\textsuperscript{affected}) to its output port pp, then update the jthj\textsuperscript{th} entry of vipv_{i}^{p} to 1. After updation, the forwarding vector vipv_{i}^{p} denotes atomic+iatomic headers (in the set SaffectedS\textsuperscript{affected}) that router ii forwards to its output port pp. Note vipv_{i}^{p} still represents the sum of columns of the forwarding matrix 𝒜ip\mathcal{A}_{i}^{p}. However, the complexity of updating vipv_{i}^{p} is 𝒪(m)\mathcal{O}(m) because in the worst case, each entry of the vector is visited only once. Hence, complexity for obtaining the projection point bib_{i} from bi1b_{i-1} is 𝒪(m)\mathcal{O}(m). This proves the theorem.

A-B Proof of Theorem 4

The representation of headers in the set SaffectedS\textsuperscript{affected} using a standard basis also reduces the complexity of finding the reachable headers at the destination node. In the existing solution (Section III), an atomic+iatomic header skSs_{k}\in S is reachable between a source and destination if the dot product between breachableb_{reachable} and the vector notation (eke_{k}) of header sks_{k} is non-zero. By performing dot product, we can obtain all reachable atomic+iatomic headers with the complexity of 𝒪(m2)\mathcal{O}(m^{2}) (each dot product is of complexity 𝒪(m)\mathcal{O}(m)) and we need to perform mm such dot products. However, suppose we utilize the standard basis vectors to represent atomic+iatomic headers. In that case, all reachable headers correspond to the indices with non-zero entries of breachableb_{reachable} and the complexity of finding non-zero entries in an mm-dimensional vector is 𝒪(m)\mathcal{O}(m). This completes the proof of the theorem.

A-C Proof of Theorem 5

After a rule update (insertion/deletion), Vercel performs the following three steps to compute reachability along a path: 1. Find the sets SaffectedS\textsuperscript{affected} and PaffectedP\textsuperscript{affected}; 2. Model forwarding behavior of routers’ ports present in the path; 3. Determine reachable headers at the destination from the vector space. From Theorem 2, we know that the time complexity of finding the sets SaffectedS\textsuperscript{affected} and PaffectedP\textsuperscript{affected} is 𝒪(m)\mathcal{O}(m). Thereafter, Theorem 3 helps to model the forwarding behavior of a router with time complexity of 𝒪(m)\mathcal{O}(m). Now, consider that the number of routers along a path are constant (cc) w.r.t. to the number of atomic+iatomic headers in a network. In this case, the complexity to compute reachability along the path is 𝒪(cm)=𝒪(m)\mathcal{O}(c*m)=\mathcal{O}(m). Finally, Theorem 4 provides a solution (of complexity 𝒪(m)\mathcal{O}(m)) to recover reachable headers from the vector space. Note that each step (steps 1-3 described above) requires 𝒪(m)\mathcal{O}(m) time. Therefore, Vercel solves reachability between a source and destination along a path with time complexity of 𝒪(m)\mathcal{O}(m). This completes the proof of the theorem.

A-D Proof of Theorem 6

As described in the proof presented in Appendix A-B, if we represent atomic+iatomic headers using standard basis vectors, then it is possible to create a forwarding vector vipm,m=|Saffected|v_{i}^{p}\in\mathbb{R}^{m},m=|S\textsuperscript{affected}| representing the atomic+iatomic headers that router ii forwards along its port pp. Since, a forwarding vector vv is created for each port in the set PaffectedP\textsuperscript{affected}, therefore Vercel requires |Saffected||Paffected||S\textsuperscript{affected}|*|P\textsuperscript{affected}| bits to create the forwarding vectors vipv_{i}^{p}. In other words, the space complexity of Vercel (by using the forwarding vectors vipv_{i}^{p}) is 𝒪(|Saffected||Paffected|)\mathcal{O}(|S\textsuperscript{affected}|*|P\textsuperscript{affected}|). For example, consider a network with 10K atomic+iatomic headers and 200 ports across the network. If after a rule insertion, |Saffected|=1|S\textsuperscript{affected}|=1K and |Paffected|=50|P\textsuperscript{affected}|=50, then Vercel requires 6\sim 6 KB to represent the forwarding vectors (vip)(v_{i}^{p}). This completes the proof of the theorem.