VERCEL: Verification and Rectification of Configuration Errors with Least Squares

Abhiram Singh, Sidharth Sharma and Ashwin Gumaste Abhiram Singh, Sidharth Sharma and Ashwin Gumaste are with the Department of Computer Science and Engineering, Indian Institute of Technology Bombay, Mumbai-400076, India, (e-mail: abhiram25.1990@gmail.com, sidharth.sharma@ieee.org, ashwing@ieee.org).

Abstract

We present Vercel, a network verification and automatic fault rectification tool that is based on a computationally tractable, algorithmically expressive, and mathematically aesthetic domain of linear algebra. Vercel works on abstracting out packet headers into standard basis vectors that are used to create a port-specific forwarding matrix $\mathcal{A}$ , representing a set of packet headers/prefixes that a router forwards along a port. By equating this matrix $\mathcal{A}$ and a vector $b$ (that represents the set of all headers under consideration), we are able to apply least squares (which produces a column rank agnostic solution) to compute which headers are reachable at the destination. Reachability now simply means evaluating if vector $b$ is in the column space of $\mathcal{A}$ , which can efficiently be computed using least squares. Further, the use of vector representation and least squares opens new possibilities for understanding network behavior. For example, we are able to map rules, routing policies, what-if scenarios to the fundamental linear algebraic form, $\mathcal{A}x=b$ , as well as determine how to configure forwarding tables appropriately. We show Vercel is faster than the state-of-art such as NetPlumber, Veriflow, APKeep, AP Verifier, when measured over diverse datasets. Vercel is almost as fast as Deltanet, when rules are verified in batches and provides better scalability, expressiveness and memory efficiency. A key highlight of Vercel is that while evaluating for reachability, the tool can incorporate intents, and transform these into auto-configurable table entries, implying a recommendation/correction system.

Index Terms:

Network verification, least squares, reachability, binary tree.

I Introduction

It is widely known that configuration errors in service provider networks form bulk of network outages [1, 2, 3] resulting in operational challenges leading to conservative planning and slow rollout of services. Errors can be in the control or dataplane of routers, firewalls, middleboxes and switches. Control plane (protocol) verification [4, 5, 6, 7, 8, 9, 10, 11] is harder to accomplish on account of the difficulties in abstracting the statefulness of protocols to computation models. In contrast, significant work exists in the realm of data plane verification [12, 13, 14, 15, 16, 17, 18, 19, 20, 21], which is feasible (though complex). The early work of atomic predicates [22] involved mapping network forwarding rules with formal methods that resulted in a formulation whose solution led to answer reachablity verification queries. The complexity of atomic predicates was further relaxed by the static HSA scheme [23], which considered packets in $L$ -dimensional space (where $L$ is the number of bits in a packet header) and worked on tracking the transformation of such packets through multiple network boxes (routers, etc.) by applying and then conjoining transfer functions. HSA was extended to include dynamic updates in NetPlumber [24]. Another work – Veriflow [16], mapped network-wide headers to equivalence classes (ECs) (defined as a set of packets that are treated similarly at routers) through a trie data structure. Veriflow was able to detect configuration errors in real-time. Recent schemes APKeep [17] and Deltanet [15] improved the initial success of Veriflows’ approach using optimizations and graph-theoretic means, resulting in faster reachability computation, loop-detection, blackhole identification (a packet at a node with no rule matching the packets’ header).

The next set of questions pertains to expressiveness, robustness, service support, and generic verification of network invariants. The solution to these lies in combining the initial successes of HSA/NetPlumber (the concept of $L$ -dimensional space) with the EC approach by the more recent techniques (Veriflow, APKeep, Deltanet). To this end, our solution, Vercel, applies linear algebraic techniques – a paradigmatic shift in the use of underlying techniques towards solving the dataplane verification in real-time. Vercel uses both the dimensional transformations in HSA/Netplumber and ECs of Veriflow, and then applies linear algebra to achieve verification, recommendation and rectification. To understand the rationale behind using linear algebra, recall the concept of header spaces in HSA/NetPlumber. Header spaces represent an $L$ -bit header into an $L$ -dimensional space and model a forwarding device as a function that transforms a header from one subspace to another. However, the complexity of transforming headers in subspaces is quadratic in the number of headers. In contrast, Vercel first identifies $m$ ECs by the rules collected from all the forwarding devices. Thereafter, Vercel creates an $m$ -dimensional space corresponding to these ECs such that forwarding rules of each device form subspaces in this $m$ -dimensional space. Finally, to check reachability, Vercel models packet forwarding at routers by linearly projecting a point from one subspace to another along a path from the source to the destination. We argue that transforming a point (instead of subspaces as in HSA and NetPlumber) can best be done with linear algebra by converting ECs to vectors, and computing/tracing how these vectors move from a source to a destination, thus enabling us to evaluate for reachability. In addition to reachability, Vercel attains unprecedented scalability and efficiency (Section VI) as well as lays the groundwork for a more powerful abstraction – that of providing a recommendation/rectification system. It is established that linear projections can be efficiently handled by applying linear algebraic operations such as least squares [25]. By definition, the linearity of our technique (linear algebra) implies speed of computation, availability of well-established theory, and readily available well-polished tools. Using these tools, we can verify network constructs and build a recommendation/rectification system that automatically rectifies configuration errors. With Vercel’s rectification technique, we can simply specify intents, and the tool automatically provisions tables, even when reachability does not seem to readily exist. It does so by computing the missing entries at tables and populating those, all in a single linear algebraic operation, which is possible only because we have transformed header spaces to vectors. Vercel makes a jumpstart in auto-configuration, such that the tool recommends and corrects faults in both path selection and table manipulation.

To compute packet (or equivalently header) reachability between a source and destination router, Vercel initially creates a network-wide binary tree, as a data structure for efficiently representing headers present in the forwarding tables of routers. This binary tree enables the division of the header space into non-overlapping partitions (similar to ECs), and thereafter, Vercel represents each partition as orthogonal vectors. These unique vectors provide a foundation for introducing least squares. With these vectors arranged in columns, Vercel creates a forwarding matrix $\mathcal{A}$ for each port at a router. Vercel initializes another vector $b$ for all the partitions whose reachability we desire to be evaluated. The matrix $\mathcal{A}$ and vector $b$ is then part of the standard linear equation $\mathcal{A}x=b$ . The solution vector $x$ identifies which headers in $b$ are being forwarded along the port corresponding to $\mathcal{A}$ . We argue that if the matrix $\mathcal{A}$ contains orthogonal columns, then solving $\mathcal{A}x=b$ across different ports with least squares leads to solving for reachability. Applying least squares gives us excellent insights into computing reachability and fixing configuration faults.

Another advantage of using linear algebra is that Vercel does away with checking reachability on multiple headers sequentially (as in Veriflow and Deltanet); instead, it relies on vector spaces to apply a single algebraic operation simultaneously on multiple headers resulting in a key contribution of batch processing, which is not supported by earlier schemes. Parallelism in other techniques can be obtained by running multiple threads, whereas Vercel takes advantage of CPU based (Table VI) vector processing instructions even on a single thread, inducing scalability. Vercel’s key contributions are listed as follows:

1) Linear algebra to model networking devices: The merit of representing packet headers in a vector space is that it increases the expressive power beyond reachability. Vercel can be used to check loops, detect blackholes, confirm routing policies and answer what-if scenarios in the presence of ACL rules and packet-header transformations.

2) Real-time network verification: We show that Vercel achieves significant improvement (on standard datasets, such as the one from Stanford [26]) of $8\times$ over Veriflow, $164\times$ over NetPlumber, $1.7\times$ over APKeep, $36\times$ over AP Verifier. Although Deltanet is faster than Vercel for verifying a single update, it cannot model a diverse set of network functions such as NAT and ACL and consumes up to $7.8\times$ more memory as compared to Vercel. Also, in the case of batch updates, the performance of Vercel is comparable to Deltanet (which does not implicitly even support batch processing).

3) Scalability: Vercel performs well on large networks such as a synthetic network of 2000 nodes with millions of rules. Even with a network of this size, the verification time is of the order of 500 $\mu$ s.

4) Using linear algebra for recommendation and rectification: Linear regression being a by-product of least squares, can also be used as a recommendation system (how well are forwarding tables configured). The standard equation $\mathcal{A}x=b$ may not have “a” solution (because $\mathcal{A}$ may not be a full rank matrix) and the error resulting from least squares (Chapter 4 of [25]) tells us the efficiency of routing, as well as also paves the way for automatically provisioning intents and rectifying errors.

Network verification is important to avoid expensive outages. For example, in 2021, the Meta (Facebook’s) network went down globally due to a BGP configuration fault [27]. Configuration faults are the cause of 70 percent of global network outages, and this motivates approaches towards larger coverage of configuration fault detection. Our approach Vercel is different in treatment compared to other solutions – while other solutions use techniques such as atomic predicates or transformations, or SAT solvers, our approach is based on least-squares, exploring a paradigm never done before. This approach offers multiple advantages. It helps us verify configurations, detect misconfigurations, and develop a recommendation system that automatically rectifies errors. This fundamental step forward (recommendation and rectification), in our view, sets Vercel apart from the rest of the verification solutions.

Paper organization: Section II provides an overview of Vercel, while functional blocks of Vercel are presented in Section III. Section IV provides an optimized implementation of Vercel. Section V extends Vercel to a variety of network functions while evaluation of Vercel is presented in Section VI. We discuss related works in Section VII. Finally, Section VIII concludes the paper.

II Vercel Overview

TABLE I: Symbols and notations used.

Notation	Definition	Notation	Definition	Notation	Definition
$L$	Size of header in bits	$m$	Number of equivalence classes	$N((\mathcal{A}_{i}^{p})^{T}))$	Null space of $\mathcal{A}_{i}^{p}$
$r$	Supernet headers	$a$	Atomic headers	$C(\mathcal{A}_{i}^{p})$	Column space of $\mathcal{A}_{i}^{p}$
$q$	Represents iatomic headers	$[\mathcal{A}\|b]$	Augmented matrix	$H$	Represents the unit step function
$\mathcal{A}$	A set of packet headers/prefixes that a router forwards along a port	$b$	Represents the set of all headers under consideration	$g_{i}$	An $m$ -dimensional filtering vector with binary values (0/1)
$x$	The solution vector for linear equations $\mathcal{A}x=b$	$n$	Headers forwarded along a port, $1\leq n\leq m$	$\hat{x}$	Approximate solution using least squares
$\mathcal{A}_{Y}^{0}$	Headers forwarded by router $Y$ on its port $0$	$S\textsuperscript{affected}$	Set of headers whose reachability may have been affected	$T_{i}$	Transformation matrix
$f^{acl}$	Filtering function that maps a set of packet headers to a set of actions	$v_{i}^{p}$	Atomic+iatomic headers (in $S\textsuperscript{affected}$ ) that router $i$ forwards to its port $p$	$f^{tr}$	A set of rules containing the header “match” and “transformed” fields
$P\textsuperscript{affected}$	A set contains those ports on which a router forwards the packet with the header in the set $S\textsuperscript{affected}$	$c_{i}$	The indices with a non-zero entry in it represent packets received at router $i$ that do not get any match in the forwarding table	$t_{i}$	Represents atomic+iatomic headers reachable up to the router $i$ and then transformed using $T_{i}$

This section presents a high-level overview of Vercel, including an intuition for applying linear algebra for network verification. Notations used in the paper are listed in Table I. Vercel actively ’listens’ to configuration updates (either made through an SDN controller or any other tool) to capture topology and forwarding tables/updates. Working on the granularity of individual packet headers is infeasible; hence Vercel, on the same lines of [6, 15, 16, 17], groups headers so that packets in the same group are similarly dealt with across routers.

To group packet headers, we need to parse rules from all the routers. Solely for the purpose of storing rules, we create a binary tree, which preserves the hierarchical representation of packet headers (Section III-A, III-B). For every header, the tree enables a path starting from the root towards the leaves denoting header bits. The use of binary tree is similar to the use of trie by Veriflow. However, unlike Veriflow, where the leaves contain node-rule pairs, in our case, leaves imply non-overlapping headers (ECs). Since Vercel is primarily designed to verify invariants in real-time; therefore, Vercel traverses the binary tree to determine $m$ headers, whose reachablity might have been altered after a rule update (Section III-C). Thereafter, Vercel uniquely maps $m$ headers to $m$ -orthogonal vectors that collectively create an $m$ -dimensional space. After creating the $m$ -dimensional space, Vercel identifies a subspace for each port on which a router forwards $1\leq n\leq m$ headers. After obtaining $n$ headers, Vercel defines a subspace for each port by selecting $n$ corresponding orthogonal vectors and storing these vectors in a matrix ( $\mathcal{A}$ ) of dimension $(m,n)$ (Section III-D). In addition to matrix $\mathcal{A}$ , Vercel initializes an $m$ -dimensional vector $b$ , which is designed to evaluate the reachability of all $m$ headers under consideration.

Matrix $\mathcal{A}$ and vector $b$ come together to form standard linear equation $\mathcal{A}x=b$ , whose solution will eventually confirm if reachability exists (Section III-E). Specifically, we solve $\mathcal{A}x=b$ for each port along the path between a source and destination. Three cases exist while solving $\mathcal{A}x=b$ .

In the first case, there exist rules that forward some but not all of the received packets to the specified port. This implies vector $b$ is not in the column space of matrix $\mathcal{A}$ , indicating that no solution can be found for $\mathcal{A}x=b$ . Therefore, Vercel finds an approximate solution $\hat{x}$ using least squares. For this, Vercel obtains a projection point $\mathcal{A}\hat{x}$ in the column space of the matrix $\mathcal{A}$ . The approximate solution vector $\hat{x}$ selects those headers in $b$ that the router forwards along its output port.

In the second case, if there exist forwarding rules for all the packets received at the router, then vector $b$ is in the subspace defined by the columns of matrix $\mathcal{A}$ . The solution vector $x$ in $\mathcal{A}x=b$ points to all the headers in $b$ that are forwarded along the port.

In the third case, if a router forwards none of the received packets along the selected port, then vector $b$ lies in the null space of matrix $\mathcal{A}^{T}$ and least squares returns a solution $\hat{x}=0$ .

It is possible to efficiently solve all three cases. We argue that if matrix $\mathcal{A}$ contains orthogonal columns, then solving $\mathcal{A}x=b$ across different ports with least squares provides the projection of $b$ in the intersection of subspaces and finds a solution to the reachability problem (Section III-E).

If $\mathcal{A}$ was a full rank matrix, the solution would be easy, but since that is not always the case, the next best thing to do is apply least squares to obtain a projection point.

Note that solving $\mathcal{A}x=b$ with options like matrix inversion, row reduction or linear optimization, is not efficient: (1) Finding the inverse of matrix $\mathcal{A}$ to solve $\mathcal{A}x=b$ is not feasible as $\mathcal{A}$ is not of full rank ( $m\neq n$ ). Computing pseudo-inverse to obtain $x=\mathcal{A}^{+}b$ needs singular value decomposition (SVD), (a multi-step process), infeasible in real-time. (2) Row reduction algorithms are slow as they are cubic in time. (3) Linear optimization results in infeasibility (because vector $b$ may not be present in the column space of $\mathcal{A}$ .

In contrast, least squares guarantees to provide a solution of $\mathcal{A}x=b$ irrespective of the size of $\mathcal{A}$ and nature of $b$ . Note that with the orthogonality condition imposed on the columns of matrix $\mathcal{A}$ , we can find the projection of $b$ in the column space of $\mathcal{A}$ in linear time. Importantly, note that though we use least squares to compute reachability, the perceived approximation characteristic of least squares has no bearing on the exactness/correctness of the reachability calculation. In the classical least squares model, error plays a role, and it may give the reader the impression that such error may lead to approximation, which is definitely not the case with Vercel. For example, consider the first case, where least squares finds an approximate solution $\hat{x}$ . Though $\hat{x}$ is said to be an approximate solution, however it only selects a subset of headers in $\mathcal{A}$ , specifically those for which forwarding rules are present at the said router. In any case, $\hat{x}$ will not select false positives/negatives because of the approximation.

Moving forward, intuitively, least squares projects $b$ at the intersection of subspaces corresponding to the ports present along a path from the source to the destination. The intersection of subspaces corresponds to the intersection of headers generated from the rules at routers. The representation of headers in vector spaces enables Vercel to apply linear algebraic operations to simultaneously process headers in a single step, thereby providing speed up in verification.

II-A Example

We now illustrate the functionality of Vercel with an example. Figure 1(a) is a toy network of 4 routers. For simplicity, we consider 3-bit headers though generalization to more bits is trivial. In this example, Vercel checks for reachability between source router $Y$ and destination $R$ . The forwarding table at router $U$ shows that packets with header 000/3 and 01/2 are forwarded along output ports $0$ and $1$ , respectively. Vercel collects rules from the forwarding tables and utilizes a binary tree to arrange packet headers (as in Figure 1(a)). While doing so, Vercel also keeps track of (router, port) pairs for each rule. The 4 leaf nodes of the binary tree $\{a1,\;q2,\;a3,\;a4\}$ denote non-overlapping headers $\{000/3,\;001/3,\;01/2,\;1/1\}$ . In the figure, the node labels, i.e., $r$ (supernet), $a$ (atomic) and $q$ (iatomic) denote a specific class of headers defined later in Section III-B.

Now assume a network administrator inserts a new rule at router $Q$ (as shown in Figure 1(a)), which forwards packets with header 0/1 on its output port $0$ (shown by dashed block in the forwarding table at router $Q$ ). Vercel detects the rule update and inserts the corresponding header 0/1 in the binary tree. Post insertion, Vercel now traverses the headers in its subtrees to check if any of them were impacted for reachability during the update. There are three such (impacted) headers i.e., 000/3, 001/3, 01/2, which are added to a set, $S\textsuperscript{affected}$ . Vercel creates 3-dimensional orthogonal vectors [1, 0, 0], [0, 1, 0] and [0, 0, 1] corresponding to impacted headers 001/3, 000/3 and 01/2. For simplicity, we have selected orthogonal vectors with binary values. Next, Vercel creates a vector $b_{init}=$ [1, 1, 1] that corresponds to three non-overlapping headers $\{q2,\;a1,\;a3\}$ for which we want to evaluate reachability. After creating $b_{init}$ , Vercel creates subspaces for port 0 at routers $Y$ , $U$ and $Q$ . This is because along port 0, these routers forward some packets whose headers are in the set $S\textsuperscript{affected}$ . Figure 1(b) shows $x$ - $y$ and $y$ - $z$ planes corresponding to the $(router,port)$ pairs $\{(Y,0),\;(U,0)\}$ , forming two subspaces. Since router $Y$ forwards packets with header 001/3 and 000/3 on its port $0$ , therefore its subspace ( $x$ - $y$ plane) is defined by a matrix $\mathcal{A}_{Y}^{0}$ . Similarly, router $U$ forwards packets with header 000/3 and 01/2 on its port $0$ , therefore its subspace ( $y$ - $z$ plane) is defined by a matrix $\mathcal{A}_{U}^{0}$ .

After creating the subspaces for different ports, Vercel evaluates reachability by modelling packet forwarding at router $Y$ . This is done by solving $\mathcal{A}_{Y}^{0}x=b_{init}$ . At router $Y$ , vector $b_{init}$ neither lies in the column space ( $C(\mathcal{A}_{Y}^{0})$ ) nor in the null space of $(\mathcal{A}_{Y}^{0})^{T}$ ( $Y$ in superscript denotes the matrix transpose). Equivalently, $b_{init}$ lies at the union of the column and null space of $\mathcal{A}_{Y}^{0}$ and $(\mathcal{A}_{Y}^{0})^{T}$ , respectively. The above scenario implies that no solution is possible for $\mathcal{A}_{Y}^{0}x=b_{init}$ . In this case, least squares finds an approximate solution resulting in a projection point $b_{Y}=$ [1, 1, 0] in $C(\mathcal{A}_{Y}^{0})$ , i.e., $x$ - $y$ plane. The point $b_{Y}$ implies that router $Y$ forwards packets with headers 001/3 and 000/3 through port 0, which are then received by router $U$ on path $Y-U$ . Since packets with header 01/2 are not forwarded from router $Y$ , therefore the third index (corresponding to header 01/2) of vector $b_{Y}$ contains a 0 entry. Similarly, Vercel models the packet forwarding at router $U$ at port $0$ by solving $\mathcal{A}_{U}^{0}x=b_{Y}$ and resulting in a projection point $b_{U}=$ [0, 1, 0] in the $y$ - $z$ plane, indicating that router $U$ only forwards packets with header 000/3 through port $0$ . Since router $R$ receives a non-zero vector $b_{U}$ from router $U$ , therefore Vercel confirms reachability between router $Y$ and $R$ along the path $Y-U-R$ . Finally, at the destination (router $R$ ), Vercel determines the set of reachable packet headers by computing the dot product between $b_{U}$ and vector representation of headers in the set $S\textsuperscript{affected}$ . For this example, Vercel obtains a non-zero dot product between $b_{U}$ and vector [0, 1, 0] (corresponding to the header 000/3), implying packets with header 000/3 are reachable from router $Y$ to $R$ .

II-B Rectification and Automatic Provisioning

Using least squares, we check for reachability and express intent to establish a service between two nodes in the network. If reachability between these two nodes does not exist, then Vercel recommends a path (Section V-E) and rectifies the situation by automatically populating tables at interim routers leading to reachability. The fact that misconfigurations are not just found, but also rectified and that the rectification is a direct by-product of $\mathcal{A}x=b$ using least squares, is a significant value addition to operators. An operator only has to express a high level intent for a bunch of services and not worry about how these are correctly provisioned. The correctness is taken care of by Vercel.

Let us assume we performed least squares on $\mathcal{A}x=b$ for a port and identified that our target vector $b$ is in the null space of $\mathcal{A}^{T}$ . Which means “all” the headers represented by $b$ are not forwarded along that port. Now, if we were to ensure reachability for any of the headers in $b$ , then $b$ must be in the column space of $\mathcal{A}$ . The simplest way of doing this is to replace $\mathcal{A}$ with an augmented matrix $[\mathcal{A}|b]$ .

We now describe rectification via the following example resulting from a users’ intent. Consider a four-node network consisting of routers $Y$ , $U$ , $Q$ and $R$ , shown on the right side of Fig. 2. Forwarding tables at each of the routers and corresponding vector notations for their non-overlapping headers are also shown in the figure. Assume an intent by the user whose goal is to set up a service between nodes $Y$ and $R$ . Vercel is responsible for parsing the intent statement to extract information such as source ( $Y$ ), destination ( $R$ ), and action (service provisioning). Vercel then examines the configuration of routers $Y$ and $U$ , generates forwarding matrices for both, and initializes $b_{init}$ (all prefixes) for reachability check. Upon analysis, Vercel determines that destination $R$ is currently unreachable from source $Y$ (as there are no forwarding rules installed at routers $U$ and $Q$ to forward packets received from $Y$ to $R$ ). In response, Vercel enables its rectification engine and proposes a solution based on linear algebraic operations. The activated rectification engine creates a new rule (01/2, Port 1) at router $Q$ to establish reachability. The verification and rectification engines of Vercel work together for this intent, and the flow of operations is shown in Fig. 2. A detailed explanation of the linear algebraic operations performed for rectification is provided below.

In Fig. 3 is an example demonstrating Vercel’s rectification system for the intent expressed in Fig. 2. We apply Vercel to verify reachability, for which matrices $\mathcal{A}$ and vector $b$ are created as shown in Fig. 3. After applying $\mathcal{A}x=b$ at router $Y$ and $U$ , we get $b_{U}$ , which is a set of prefixes reachable from $Y$ to $Q$ (through path $Y-U-Q$ ). Now at router $Q$ , while applying $\mathcal{A}_{Q}x=b_{U}$ , Vercel finds that $b_{U}$ is in the null space of $\mathcal{A}_{Q}^{T}$ , which implies none of the prefixes in $b_{U}$ will be forwarded towards $R$ . At this point, our recommendation system kicks in and suggests a “fix” to address the fault at router $Q$ , such that reachability between $Y$ and $R$ is established. As part of the fix, Vercel’s recommendation system adds $b_{U}$ as a new column of matrix $\mathcal{A}_{Q}$ (and updates the corresponding forwarding rules at router $Q$ ). Now, the updated matrix $\mathcal{A}_{Q}$ has three columns. This solves the problem as when we evaluate $\mathcal{A}_{Q}x=b_{U}$ , $b_{U}$ will be in the column space of $\mathcal{A}_{Q}$ . As a result of $\mathcal{A}_{Q}x=b_{U}$ , we get $b_{Q}=[0,1,0]$ , which is a vector that represents the prefixes reachable till destination $R$ . Since this vector is now non-zero, a few prefixes are now reachable at the destination after applying the fix.

Nevertheless, employing rectification to address issues may inadvertently result in unintended consequences, such as rendering some destinations unreachable. To mitigate this, our rectification system operates based on the following principle: when implementing a fix (by adding a rule), the reachability of any current header should remain unaffected. For this, identifying the appropriate header is crucial whenever a rule update is conducted as part of rectification. In this regard, Vercel opts for a header that was previously unreachable between the source and destination pairs of interest. In accordance with this principle, the following procedure is followed during the rectification process.

Assume the rectification system suggests adding a rule with header $z$ in one of the routers’ forwarding tables. Now, Vercel calculates if there exist any reachable headers $z’$ (for some other source-destination pairs) in the same table that overlap with $z$ . Vercel then performs $z-z’$ to check if the resultant set is non-empty. The non-emptiness here implies the existence of the non-reachable headers between source and destination. If so, it configures the corresponding non-empty set as a separate rule in the forwarding table. In case the result ( $z-z’$ ) is empty, then Vercel attempts to find a header (other than $z$ ) and repeats the process. If we exhaust all possible ‘z’s through computation and still cannot find a non-empty $z-z’$ , then Vercel notifies that rectification is not possible. By computing such $z-z’$ , Vercel ensures that while adding a rule during the rectification process, it does not impact the reachability of any existing header.

III Vercel: Under the hood

This section describes the detailed working of Vercel.

III-A Representing headers in a binary tree

Initially, Vercel queries routers or an SDN controller to obtain forwarding tables and compute the network topology as shown in Fig. 4. Thereafter, Vercel arranges all the existing packet headers (extracted from rules in the forwarding tables) in the form of a binary tree. For adding a new header, Vercel first identifies its correct position in the tree by traversing the tree based on $L$ -header bits. These bits form a path from the root to a prospective node, where the new header is to be inserted. At every level (in the tree) the corresponding header bit indicates the traversal direction (0 = left and 1 = right branch). A path from the root to an intermediate/leaf node corresponds to a specific packet header. The role of the binary tree is in efficiently dividing the packet header space into non-overlapping partitions. Each packet belonging to a partition observes similar processing at all the routers in the network. The binary-tree enabled succinct representation provides Vercel a way in dividing the packet header space into three categories: supernet, atomic and iatomic. Out of these, atomic and iatomic categories together represent all the non-overlapping headers extracted from the forwarding tables across the network, while supernet represents the union of one or more atomic and iatomic headers.

III-B Classifying the nodes of the binary tree

Vercel labels a packet header as a supernet, if the corresponding path snippet from the root terminates at a non-leaf node. The remaining packet headers extracted from the rules (whose corresponding paths from the root terminate at a leaf node) are denoted as atomic. Due to the hierarchy in the header representation, multiple headers (e.g., supernet and atomic) may overlap. Therefore, Vercel further fragments supernets to create a new category of headers (called induced-atomic headers (iatomic) that do not overlap with the atomic headers. For example, in Figure 1(a) we observe 3 atomic headers (labeled as $a$ ), 1 iatomic header (labeled as $q$ ) and 3 supernets (labeled as $r$ ) in the binary tree. Note that the atomic and iatomic headers represent the leaf nodes of a binary tree.

In addition to the attribute of label types ( $r/a/q$ ), each node of the binary tree is also characterized by a list containing tuples. A tuple $(i,\;p)$ at node $k$ denotes that router $i$ contains a rule that maps a packet header (represented by a path from the root to node $k$ ) to an output port $p$ . In addition, Vercel assigns an integer identifier to each leaf node that distinguishes headers within the set of atomic and iatomic headers.

So far, we have defined atomic and iatomic headers; however, a crucial aspect is to analyze the task of identifying and segregating them once the tree is constructed. The following Lemma and Theorem provide a theoretical understanding of the procedure and complexities involved in determining the atomic and iatomic headers (the reader may skip the proofs for continuity).

Lemma 1: The worst-case space and time complexity of finding all atomic headers in a network containing $m$ forwarding rules is $\mathcal{O}(m)$ .

Proof of Lemma 1: In the worst case, $m$ forwarding rules are non-overlapping with each other, and therefore, corresponding headers represent leaf nodes of the binary tree. In this case, Vercel labels all leaf nodes ( $m$ ) as atomic by performing a post-order traversal of the tree, implying that the space complexity of the atomic headers is $\mathcal{O}(m)$ . A post-order traversal of the binary tree with $m$ leaf nodes can be performed in $\mathcal{O}(m)$ time. Therefore, the time complexity of identifying atomic headers is also $\mathcal{O}(m)$ .

Theorem 1: Given $m$ forwarding rules and a set of atomic headers having space complexity of $\mathcal{O}(m)$ , the space complexity of the smallest size induced-atomic set is $\mathcal{O}(m)$ and it can be computed with the time complexity of $\mathcal{O}(m)$ .

Proof of Theorem 1: Space complexity of iatomic nodes: Consider a general case, in which multiple (say $u$ ) leaf nodes (atomic headers) have an ancestor supernet ( $r$ ) with a maximum path length of $v$ between a supernet and an atomic header. To represent all headers of the supernet ( $r$ ) using non-overlapping partitions, Vercel introduces a maximum of $uv$ number of new iatomic headers in the tree along the path between $r$ and $v$ . In the extreme case, when a supernet is a root node of the binary tree and all other headers are atomic, the number of iatomic nodes cannot be greater than $\mathcal{O}(mw)$ , where $w$ is the maximum path length between the root node and a leaf node and $m$ is the number of rules in the forwarding tables across the network. A similar argument holds for multiple supernets having a common ancestor (as a supernet) in the binary tree. Since $w$ is dependent on the header length (e.g., an IP prefix has a maximum value of 32 in case of IPv4) and $w\ll m$ , therefore $w$ is treated as a constant and space complexity of iatomic headers becomes $\mathcal{O}(m)$ .

Time complexity of adding iatomic nodes: Vercel creates all iatomic headers (new leaf nodes) by performing the post-order traversal of the binary tree. While doing so, Vercel checks whether the current node of the tree is labeled as a supernet. If the current node is labeled as a supernet, then for all descendant nodes having a single child, Vercel creates a second child and labels it as iatomic (constant time operation). Since the binary tree has the space complexity $\mathcal{O}(m)$ (with $m$ rules and $\mathcal{O}(m)$ atomic headers); therefore, the time complexity of post-order traversal along with the addition of new leaf nodes (iatomic headers) is $\mathcal{O}(m)$ .

III-C Identifying affected atomic+iatomic headers post a rule update

Vercel is always tuned into the network to intercept any rule update at the routers. Once Vercel identifies an update, it computes whether reachability between different source-destination pairs is intact. For this, Vercel determines corresponding atomic+iatomic headers from the created binary tree, whose reachability may be affected. Vercel denotes all such headers in a set $S\textsuperscript{affected}$ . To create $S\textsuperscript{affected}$ , Vercel traverses a path of length $L$ from the root to a node $o_{j}$ , directed by the $L$ bits of the header (specified by the rule being inserted/deleted). Thereafter, Vercel traverses subtrees of node $o_{j}$ . During this traversal, if Vercel observes an atomic/iatomic node, then it appends the node’s identifier in the set $S\textsuperscript{affected}$ . Besides the set $S\textsuperscript{affected}$ , Vercel creates another set $P\textsuperscript{affected}$ , which contains those ports on which a router forwards the packet with the header in the set $S\textsuperscript{affected}$ . Since nodes of the binary tree store information of the router’s identifier ( $i$ ) and corresponding port ( $p$ ) in the form of a tuple ( $i,\;p$ ); therefore, during the tree traversal (as described for $S\textsuperscript{affected}$ ), Vercel collects tuples from the nodes labeled as atomic/supernet and includes those in the set $P\textsuperscript{affected}$ .

The following theorem shows that it is possible to identify sets $S\textsuperscript{affected}$ and $P\textsuperscript{affected}$ in linear time, which justifies the fast verification time of Vercel (Section VI).

Theorem 2: The worst case time complexity to identify both the sets $S\textsuperscript{affected}$ and $P\textsuperscript{affected}$ is $\mathcal{O}(m)$ , where $m$ is the number of atomic+iatomic headers in the network.

Proof of Theorem 2: Insertion of a rule requires finding affected headers and relevant router ports (Section III.B). Insertion of a header (corresponding to a new rule) in the binary tree requires $\mathcal{O}(m)$ time for traversing a path length of $L$ (for $L$ bits header) from the root to a node $o_{j}$ and visiting all child nodes of $o_{j}$ in order to create iatomic nodes in the tree (worst case scenario). While traversing the tree, Vercel finds all overlapping rules in a router, creates the set $S\textsuperscript{affected}$ by appending the identifier of the leaves and identifies relevant router ports $P\textsuperscript{affected}$ by collecting the $(i,\;p)$ tuples. This tree traversal a) from the root to node $o_{j}$ and b) subtrees of $o_{j}$ can be done with the time complexity of $\mathcal{O}(m)$ . Therefore, after a rule insertion, the time complexity to create the sets $S\textsuperscript{affected}$ and $P\textsuperscript{affected}$ is $\mathcal{O}(m)$ . A similar argument holds for the rule deletion that has a time complexity of $\mathcal{O}(m)$ .

III-D Creation of subspaces for relevant ports

After obtaining sets $S\textsuperscript{affected}$ and $P\textsuperscript{affected}$ , Vercel represents atomic+iatomic headers from the set $S\textsuperscript{affected}$ in a vector space of $m$ -dimensions. For that, Vercel computes the number of headers in the set $S\textsuperscript{affected}$ i.e., $m=|S\textsuperscript{affected}|$ . Thereafter, Vercel defines an $m$ -dimensional space by creating $m$ orthogonal vectors and uniquely maps these vectors to $m$ atomic+iatomic headers in the set $S\textsuperscript{affected}$ . Orthogonality is essential to model the packet forwarding at routers with least squares (Section III-E). The set of orthogonal vectors can be generated using the Gram-Schmidt [28] process (in which we take any vector, and then using that as a reference, make all other vectors orthogonal, one vector at a time). Later, we shall do away with requirement of orthogonal vectors while achieving linear time reachability computations (Section IV).

After making an $m$ -dimensional space, Vercel makes subspaces for each port. A port’s subspace includes packets with atomic and iatomic headers that a router sends to that port. To make these subspaces, Vercel goes through each ( $i,\;p$ ) tuple in the set $P\textsuperscript{affected}$ . For each tuple, it picks packets with atomic and iatomic headers from the set $S\textsuperscript{affected}$ that router $i$ sends through output port $p$ . If router $i$ sends packets with a unique set of $1\leq n\leq m$ headers through port $p$ , Vercel creates a subspace for the ( $i,\;p$ ) tuple. To do so, Vercel picks $n$ orthogonal vectors and stores them in a matrix $\mathcal{A}_{i}^{p}$ of size $(m,\;n)$ . In addition to $\mathcal{A}_{i}^{p}$ , Vercel makes a vector $b_{init}$ with $m$ dimensions. This vector shows the packet headers being evaluated for reachability between a source and destination.

III-E Least squares to model the packet processing at a router

After creating subspaces, Vercel utilizes ports in the set $P\textsuperscript{affected}$ to traverse different paths between the source and destination for evaluating reachability. Assume source $S$ , destination $D$ and an interim router $i$ , denoted by path: $S,...,i-1,i,i+1,...,D$ . Also, assume that Vercel has computed packets (with headers in $S\textsuperscript{affected}$ ) that are reachable up to router $i$ from $S$ . These packet headers are represented by vector $b_{i-1}$ . Now, Vercel aims to identify those headers in $b_{i-1}$ that router $i$ forwards to $i+1$ through one of its output ports $p$ . Algebraically, this situation can be achieved by first solving $\mathcal{A}_{i}^{p}x_{i}^{p}=b_{i-1}$ to obtain $\hat{x}_{i}^{p}$ . Thereafter, with $\hat{x}_{i}^{p}$ , Vercel obtains an orthogonal projection of $b_{i-1}$ onto the column space of $\mathcal{A}_{i}^{p}$ (denoted as $C(\mathcal{A}_{i}^{p})$ ). The column space of a matrix denotes those points that can be obtained through a linear combination of columns of the matrix. The efficient way to solve $\mathcal{A}_{i}^{p}x_{i}^{p}=b_{i-1}$ is least squares, which always guarantees to provide a solution in real-time. While obtaining the projection of $b_{i-1}$ in $C(\mathcal{A}_{i}^{p})$ using least squares, there can be three possible outcomes:

Case 1: This is the dominant case in which router $i$ forwards some of the received packets along an output port $p$ . This is because the forwarding table has some of the matching rules for all incoming packets. Mathematically, $b_{i-1}$ is partially both in the column space of $\mathcal{A}_{i}^{p}$ and the null space of $(\mathcal{A}_{i}^{p})^{T}$ . The null space of a matrix denotes points that are orthogonal to all rows of the matrix. In other words, $b_{i-1}$ is a linear combination of basis vectors from $C(\mathcal{A}_{i}^{p})$ and $N((\mathcal{A}_{i}^{p})^{T}))$ . As a consequence, there exists no solution to the linear equations $\mathcal{A}_{i}^{p}x_{i}^{p}=b_{i-1}$ . Therefore, we utilize least squares to obtain the best approximate solution ( $\hat{x}_{i}^{p}$ ). Subsequently, $\hat{x}$ is used to obtain a projection point $b_{i}=\mathcal{A}_{i}^{p}\hat{x}_{i}^{p}$ . Now, the projected point $b_{i}$ (present in $C(\mathcal{A}_{i}^{p})$ ), represents a subset of packet headers received at router $i$ (from the previous router $i-1$ ) which are being forwarded along output port $p$ . For an example of this case, refer to Fig. 1(b), where nodes $Y$ and $U$ are forwarding some of the prefixes in $b_{i}$ .

Case 2: In this case, router $i$ forwards all the received packets along its output port $p$ . Equivalently, $b_{i-1}$ is present in the column space of $\mathcal{A}_{i}^{p}$ . In this case, the method of least squares returns an exact solution $x_{i}^{p}$ . The projected point $b_{i}=\mathcal{A}_{i}^{p}x_{i}^{p}$ contains information on all atomic+iatomic headers present in $b_{i-1}$ . Therefore, with least squares, Vercel models packet forwarding for all headers in $b_{i-1}$ to the next router $i+1$ along the port $p$ at router $i$ .

Case 3: In this case, router $i$ blocks all the received packets along its output port $p$ , implying $b_{i-1}$ lies in the null space of $(\mathcal{A}_{i}^{p})^{T}$ ; therefore, least squares returns a solution $\hat{x}_{i}^{p}=0$ . The projection point is $b_{i}=\mathcal{A}_{i}^{p}\hat{x}_{i}^{p}=0$ . The projection point $b_{i}=0$ specifies that router $i$ blocks all packets with atomic+iatomic headers represented by $b_{i-1}$ at output port $p$ .

III-F Determining the reachable set of packet headers from the vector representation

If there exist multiple paths between a source and destination, then Vercel would compute reachability along each path and store the reachable set of packet headers at the destination as $b_{reachable}=\sum_{d=1}^{e}b_{d}$ , where $b_{d}$ is the set of reachable headers at the destination node along path $d$ and $e$ is the total number of paths created with ports in $P\textsuperscript{affected}$ . After traversing all the paths, Vercel labels an atomic+iatomic header (in the set $S\textsuperscript{affected}$ ) as reachable if it obtains a non-zero dot product between the vector representation of the header and $b_{reachable}$ .

IV Vercel Optimization: Reducing Verification Time

As discussed, Vercel models the packet forwarding behavior along each output port of a router by projecting a point in the column space of the matrix corresponding to the port. The projection is generally obtained by solving normal equations $b_{i}=\mathcal{A}(\mathcal{A}^{T}\mathcal{A})^{-1}b_{i-1}$ (for simplicity, we have dropped the subscript and superscript for matrix $\mathcal{A}$ created for port $p$ of router $i$ ). However, by selecting orthogonal vectors for representing atomic+iatomic headers, we can obtain the same projection point as $b_{i}=\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T}b_{i-1}$ . Further, with standard basis vectors, the matrix product $\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T}\in\mathbb{R}^{m\times m}$ results in a diagonal matrix with entries 0 and 1. The $j\textsuperscript{th}$ diagonal entry will be “1”, if $j$ ^th row of $\mathcal{A}^{p}_{i}$ is non-zero. Therefore, vector $b_{i}$ is an element-wise product between the diagonal elements of $\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T}$ and $b_{i-1}$ . We can efficiently obtain the diagonal elements of $\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T}$ by first creating $v_{i}^{p}=0$ in $m$ -dimensional space and then initializing the $j\textsuperscript{th}$ entry of $v_{i}^{p}$ to 1 if router $i$ forwards $j\textsuperscript{th}$ header (in the set $S\textsuperscript{affected}$ ) to its output port $p$ . After initialization, the forwarding vector $v_{i}^{p}$ denotes atomic+iatomic headers (in the set $S\textsuperscript{affected}$ ) that router $i$ forwards to its output port $p$ . Subsequently, we can efficiently compute the projection point $b_{i}$ as $b_{i}=v_{i}^{p}\otimes b_{i-1}$ , where $\otimes$ is the Hadamard product between two vectors. Note that the projection point $b_{i}$ can now be computed efficiently in linear time.

Theorem 3: The time complexity of modeling the forwarding behavior of a router along a port $p$ (i.e., obtaining a projection point $b_{i}\in\mathbb{R}^{m}$ from $b_{i-1}\in\mathbb{R}^{m}$ ) is $\mathcal{O}(m)$ , where $m$ is the number of atomic+iatomic headers in set $S\textsuperscript{affected}$ with headers represented by the standard basis vectors.

Proof: The proof is provided in Appendix I.

As described in Section III, at the destination, Vercel utilizes vector $b_{reachable}$ to represent those packet headers, which are reachable from the source. The following theorem shows that by using the standard basis for representing atomic+iatomic headers, it is possible to efficiently recover the reachable packet headers from the vector $b_{reachable}$ in linear time.

Theorem 4: The worst-case complexity of determining the reachable set of atomic+iatomic headers from $b_{reachable}\in\mathbb{R}^{m}$ is $\mathcal{O}(m)$ .

Proof: The proof is provided in Appendix II.

We have established that there exists a linear time solution: a) to model packet processing at a router using vector algebra; b) to determine the reachable set of packet headers from the vector $b_{reachable}$ . The following theorem now presents an efficient linear-time solution to compute the reachable set of headers along a path from source to destination. This theorem is an improvement over the complexity achieved by state-of-art Deltanet, which is amortized linear in the number of affected headers and logarithmic in the number of overlapping rules present in a router. In the worst case, the number of elements in the set $S\textsuperscript{affected}$ can equal the rules in the network. However, with experiments (Section VI), we show that in all practical scenarios, the number of atomic+iatomic headers in the set $S\textsuperscript{affected}$ is much smaller than the total number of rules across all the devices. Vercel implements the linear time approach suggested by the following theorem to achieve fast verification time on different networking scenarios.

Theorem 5: In the worst case, the time complexity of reachability computation along a path using Vercel is $\mathcal{O}(m)$ , where $m$ is the number of rules in the network.

Proof: The proof is provided in Appendix III.

Although the above theorem provides time complexity for a single path, in practice, Vercel examines all possible paths (derived from $P\textsuperscript{affected}$ ) while checking reachability. Next, we show memory efficiency of Vercel in terms of the size and number of forwarding vectors needed to check reachability between a source and destination.

Theorem 6: After a rule insertion/deletion, the space complexity of Vercel is $\mathcal{O}(|S\textsuperscript{affected}|*|P\textsuperscript{affected}|)$ by using the standard basis to represent atomic+iatomic headers and forwarding vectors $v_{i}^{p}$ ).

Proof: The proof is provided in Appendix IV.

V Expressive Vercel Features

We now extend Vercel to incorporate reachability queries involving header transformations and packet filters, as well as for identifying forwarding loops and blackholes.

V-A Packet transformation

At some middleboxes in the network, forwarding tables also contain rules that transform packet headers, thereby making reachability computation complex. In general, header transformation (denoted by function $f^{tr}$ ) is a set of rules containing the header “match” and “transformed” fields. To model header transformations, Vercel extracts transformation rules from routers and inserts the headers (specified in the “match” and “transformed” fields) in the binary tree (as discussed in Section III-A, III-B). Thereafter, Vercel creates a transformation matrix $T_{i}\in\mathbb{R}^{m\times m}$ , for each router $i$ performing header transformations. Here, $m$ denotes the total number of atomic+iatomic headers in the network. Vercel initializes all entries of matrix $T$ to zero, representing the absence of header transformation. Thereafter, a few entries in the matrix $T$ are updated to 1, based on the transformation rules. An entry $T^{(j,k)}=1$ of the matrix specifies that an atomic/iatomic header with identifier $k$ is transformed to another atomic/iatomic header with identifier $j$ . While solving for reachability, Vercel first computes $t_{i}=H(T_{i}b_{i-1})$ to transform the headers and then solves $\mathcal{A}_{i}^{p}x_{i}^{p}=t_{i}$ to model forwarding at router $i$ (where $H$ represents the unit step function). The vector $t_{i}$ represents atomic+iatomic headers reachable up to the router $i$ and then transformed using $T_{i}$ .

Example of Transformation Matrix: Consider the headers in the forwarding table of router $U$ (Figure 1(a)) and a transformation function (denoted as $f^{tr}_{U}$ ) containing a single rule $f^{tr}_{U}:01/2\to 00/2$ . To create the transformation matrix at router $U$ , i.e., $T_{U}$ , Vercel requires information of atomic+iatomic headers ahead in time. In this example, the atomic and iatomic headers are $\{a1,q2,a3,a4\}$ . The matched header 01/2 ( $a3$ ) is itself atomic while transformed header 00/2 can be represented by a combination of an atomic 000/3 ( $a1$ ) and an iatomic 001/3 ( $q2$ ) headers. Now, $T_{U}$ is a (4, 4) matrix populated as follows.

T_{U}=\;\;\scriptsize\bordermatrix{&a1&q2&a3&a4\cr a1&1&0&1&0\cr q2&0&1&1&0\cr a3&0&0&0&0\cr a4&0&0&0&1\cr},\;\vspace*{3pt}

The transformation matrix $T_{U}$ is initialized with “1”s along its diagonal, except for the index (3, 3). To represent the transformation of header $a3$ to ( $a1,\;q2$ ), $T_{U}$ is initialized with “1”s at indices (1, 3) and (2, 3).

V-B Packet filtering

In a network, devices such as firewalls and routers contain an access control list (ACL) to restrict the access of packets. A rule in an ACL is a filtering function $f^{acl}$ that maps a set of packet headers to a set of {permit, deny} actions. To incorporate packet filtering at a router/firewall $i$ , Vercel extracts ACL rules from $i$ and arranges the header fields from ACL rules in the existing binary tree. The process of adding ACL rules to the binary tree is similar to that of adding forwarding rules (Section III-A), though with different actions. Thereafter, to model packet filtering for all the atomic+iatomic headers in the affected set (Section III-C), Vercel creates an $m$ -dimensional filtering vector $g_{i}$ with binary values (0/1). A non-zero entry at the $j$ ^th index of the vector $g_{i}$ implies that an ACL rule allows packets with atomic/iatomic header $s_{j}$ to pass through router $i$ . After obtaining the filtering vector $g_{i}$ , Vercel performs filtering at router $i$ as $f_{i}=g_{i}\otimes b_{i-1}$ , where $\otimes$ represents the Hadamard product between vectors. An index $j$ with the non-zero entry in the filtered vector $f_{i}$ implies that received packets with atomic/iatomic header $s_{j}$ are permitted to pass through router $i$ . After filtering, Vercel models packet forwarding along port $p$ at router $i$ by solving $\mathcal{A}_{i}^{p}x_{i}=f_{i}$ with least squares.

V-C Identifying forwarding loop

To detect a forwarding loop, consider an intermediate router $i$ , along the path $S$ ,…, $i$ - $1$ , $i$ , $i$ + $1$ ,…, $D$ . Router $i$ receives packets with headers in vector $b_{i-1}$ from router $i$ - $1$ . Thereafter, Vercel models packet forwarding at router $i$ along a port $p$ using $\mathcal{A}_{i}^{p}x_{i}^{p}=b_{i-1}$ and obtains a projection point $b_{i}\neq 0$ . After obtaining $b_{i}$ , Vercel checks whether the next router $i+1$ (connected through port $p$ at router $i$ ) has already been traversed by the received packets. If the identifier of the next router $i+1$ is already traversed, then Vercel confirms that the rule update will trigger a loop. To compute which packet headers are in a loop, Vercel searches for the indices with non-zero entries in $b_{i}$ .

V-D Identifying blackholes

Blackholes represent a network state in which a router receives a packet, but its forwarding table does not contain a corresponding rule. To identify a blackhole at router $i$ , Vercel implements the following steps. (1) Vercel models the forwarding behavior of router $i$ based on the rules in its forwarding table. Since we have $v_{i}^{p}$ for all the affected ports of $i$ (Section III-C, IV), Vercel performs element-wise logical OR between all $v_{i}^{p}$ (denoting atomic+iatomic headers that router $i$ forwards to its output port $p$ ) to get a unified forwarding vector $v_{i}$ corresponding to router $i$ . Intuitively, $v_{i}$ represents packets with atomic+iatomic headers that router $i$ forwards to its neighbors. (2) Vercel computes $v_{i}\otimes b_{i-1}$ to obtain a projection point $b_{i}$ . Vector $b_{i}$ represents those atomic+iatomic headers in $b_{i-1}$ that router $i$ forwards to its neighbors. (3) To detect a blackhole at router $i$ , Vercel computes $c_{i}=b_{i-1}\oplus b_{i}$ (where $\oplus$ denotes the element-wise logical XOR). The indices with a non-zero entry in vector $c_{i}$ represent packets (with atomic+iatomic headers) received at router $i$ that do not get any match in the forwarding table.

V-E Recommendations from projection errors

The approximate solution $\hat{x}$ obtained with least squares provides a projection point $\mathcal{A}\hat{x}$ . We can measure the difference between two points $\mathcal{A}\hat{x}$ and $b$ to obtain the projection error, whose absolute value for computation of reachability has no implication. However, the projection error ( $\mathcal{A}\hat{x}-b$ ) is a useful metric to provide intuition into packet processing at a router as well as network-wide metric computations. While the quantum of projection error ( $\mathcal{A}\hat{x}-b$ ) has no bearing on reachability, it turns out that the cumulative error vector could be useful in identifying path correctness. In short, the cumulative $l2$ norm of projection errors across a path can reflect the misconfigurations along the path.

When a router receives a packet at a port, then there must exist a rule that forwards the packet to another port. However, if no rule exists that matches the address field of this packet, then taking $\mathcal{A}\hat{x}-b$ contributes towards a higher projection error. If the configuration is not along the shortest path, then even if the projection error at a router is low, the accumulated projection error along the path adds up and becomes high.

Hence, one use of projection error computation is to find if the configuration is on the shortest path, which gives a direct measure of the data plane correctness. To achieve this, the controller computes the $l2$ norm of the error at each node and adds the errors along all possible paths obtained through router configurations. If the configurations are correct, the shortest path should have the least total error. If this is not the case, Vercel adjusts the configurations based on the errors at intermediate nodes to ensure they result in the shortest path.

For more insight into this, suppose a provider wants to ensure that all the flows belonging to say a VR application must be configured always on the shortest path. To this end, Vercel initializes vector $b_{0}$ with the IP addresses corresponding to these flows and runs a reachability check. While doing so, it computes the cumulative error along each path to the destination. In case the minimum cumulative error is not obtained from the shortest path, then Vercel identifies that most of the configuration is on longer paths. Now, by observing each intermediate node’s projection error, Vercel identifies which flows do not have a matching rule along the shortest path. Finally, these missing rules are configured at the identified nodes. To ensure that the non-shortest path is not chosen, the corresponding rules are first identified and then deleted from the nodes along the non-shortest paths by using projection errors.

Example: To understand how projection errors can be used prolifically, revert back to Figure 1, we also observe that router $U$ drops some packets due to the absence of a rule that will process header $001/3$ . This is observed by computing the projection error between vectors $b_{Y}=$ [1, 1, 0] (headers received at $U$ ) and $b_{U}=$ [0, 1, 0] (headers forwarded by $U$ along port 0). The projection error ( $b_{Y}-b_{U}$ ) along port 0 at router $U$ is [1, 0, 0], and this indicates that packets with header 001/3 are not forwarded along the output port 0 at router $U$ . However, as per the shortest path configuration, router $U$ should forward packets with header 001/3 along port 0 (in which case the projection error is zero). Therefore, the error vector helps in identifying possible misconfigurations.

A second use of the projection error is for automatic table population amidst no reachability. If reachability does not exist between two routers, then the cumulative projection error will be very large along all the paths. A path with the lowest cumulative l2 norm indicates that there are already a large number of relevant rules installed at the routers, and we just have to add a few more to establish reachability. On that path, for the nodes which have high projection error (compared to other nodes), Vercel recommends the addition of table entries such that the projection error at those nodes reduces. The table entries (prefix, port), now facilitate the router at which the entry is added to route the packet to the destination, leading to reachability. Through this iterative process, the entries can be updated to result in the shortest path. In this way, by using a simple script, Vercel can automatically establish reachability while achieving the shortest path routing for the source-destination pairs of interest.

VI Evaluation

We evaluate Vercel as a verification tool on various datasets and compare it with other approaches.

VI-A Implementation and dataset details

Vercel is implemented as $\sim$ 2000 lines of single-threaded Python 3.7 code with dependency on Numpy [29] (for matrix computations) and Networkx [30] (for creating a network graph). We evaluate the performance of Vercel on a single core of an Intel Core i7 CPU at 3.6GHz clock and 64GB RAM.

To extensively evaluate the performance of Vercel, we have considered network topologies ranging from campus networks to large enterprise/provider networks with millions of rules (up to 124.7 million). Shown in Table II are the datasets that we consider for Vercel’s evaluation. The Stanford and Berkeley datasets serve as benchmarks for evaluating network invariants [26, 31]. Further, we also consider datasets RF 1755, RF 6461, RF 3257, and INET consisting of rules from the autonomous systems derived from the Rocketfuel Project [32]. These datasets represent dense network topologies and contain millions of forwarding rules, suitable for evaluatng scalability of a verification technique. To evaluate Vercel’s performance on a service provider network, we consider datasets Airtel 1, Airtel 2 used by [15], which are available at [31]. To expand to very large providers, we created two synthetic datasets – Simnet1 and Simnet2, containing 1000 and 2000 nodes with 100K edges in each network. We assign IP addresses on these two networks based on the mask length distribution extracted from the Rocketfuel project. Finally, we populate forwarding rules on Simnet1 and Simnet2 with shortest path routing. We also point out that Vercel has been implemented in conjunction with controllers and switches such as ONOS and OVS, thus showing industry-centric compatibility. Table II provides the specifics of the datasets such as nodes, edges, number of rules in the network, number of updates (insertion/deletion). To compute the verification time of Vercel, we first load 90% of the forwarding rules from the dataset, then populate the binary tree and then perform real-time updates by randomly selecting the remaining rules for insertion and deletion at router forwarding tables. We consider dynamic updates by first performing rule insertion in the forwarding tables (which comprises half of the rule updates shown in the sixth column of Table II), followed by rule deletion.

TABLE II: Details of the datasets.

Dataset	Nodes	Edges	Average degree of nodes	Number of rules	Number of rule updates	Average table size
Airtel1	16	26	3.25	3.81 $\times$ 10⁴	2 $\times$ 10⁵	2381
Airtel2	16	26	3.25	3.81 $\times$ 10⁴	2 $\times$ 10⁵	2381
Stanford	16	74	9.25	7.3 $\times$ 10⁵	1 $\times$ 10⁴	45570
Berkeley	23	252	21.91	12.81 $\times$ 10⁶	1 $\times$ 10⁴	557300
RF 1755	87	2308	53.06	33.73 $\times$ 10⁶	1 $\times$ 10⁴	387734
RF 6461	138	8140	117.97	75.01 $\times$ 10⁶	1 $\times$ 10⁴	543519
RF 3257	161	9432	117.17	74.49 $\times$ 10⁶	1 $\times$ 10⁴	422688
INET	314	40770	259.68	124.7 $\times$ 10⁶	1 $\times$ 10⁴	395979
Simnet1	1000	100000	200	49.95 $\times$ 10⁶	1 $\times$ 10⁴	49950
Simnet2	2000	100000	100	99.95 $\times$ 10⁶	1 $\times$ 10⁴	49975

VI-B Verification time per rule update

The most important metric to evaluate a network verification scheme is verification time [16, 15, 17, 24]. Figure 5 shows the cumulative distribution function (CDF) of verification time achieved by Vercel applied to the datasets [31, 26]. From Figure 5, we observe that Vercel processes at least 70% of the updates within $100\mu s$ on all the datasets. Even if we consider 90% of the updates, verification time of Vercel remains under $500\mu s$ . These results imply that Vercel achieves sub-millisecond verification time, even on large networks.

TABLE III: Rule verification time of Vercel for checking reachability.

Dataset	Airtel1	Airtel2	Stanford	Berkeley	RF1755	RF6461	RF3257	INET	Simnet1	Simnet2
# atomic+iatomic headers	1404	1404	335225	747854	902740	902740	902740	629954	77437	77836
Median (in $\mu s$ )	26	26	53	52	32	66	77	53	112	194
Mean (in $\mu s$ )	32	32	53	55	40	111	116	162	232	428
Percentage < $250\mu s$	99.76%	99.76%	99.75%	99.8%	99.72%	88.72%	87.07%	73.9%	70.9%	54.58%

Table III provides verification time of Vercel on 10 datasets. The first row of Table III shows the total number of atomic+ iatomic headers in each network. Note that the ratio of atomic+iatomic headers to the number of forwarding rules on the INET dataset (containing the largest number of rules across all datasets) is as small as $\sim$ 0.005, hence Vercel exploits the overlap among forwarding rules and represents the header space with a small number of non-overlapping atomic+iatomic headers, thereby reducing the number of headers to be processed. The second and third rows of Table III show the median and mean of the verification time for Vercel on different datasets. These results show that the median and mean verification time of Vercel remains bounded to within $77\mu s$ and $162\mu s$ on all but the two synthetic datasets. In the case of the INET dataset, the mean increases to $162\mu s$ as the average number of neighbors per router is $\sim$ 260 (which is significantly higher than all other datasets). This increase in nodal degree results in the traversal of more paths in the network, increasing verification time. We also observe the mean and median of verification time of Vercel on two synthetic networks is up to $428\mu s$ and $194\mu s$ . Since these networks contain a large number of (2000) nodes and (100K) edges, therefore reachability analysis requires traversing longer paths between source and destination nodes. Accordingly, an increase in the path length increases verification time.

Another metric of interest (benchmarked by [15, 17]) is – how many updates are verified in less than $250\mu s$ (third row in Table III). Vercel verifies $\sim$ 74% of the updates within $250\mu s$ , on a large network with $\sim$ 124.7M rules. For a network with $2000$ nodes, Vercel manages to verify $\sim$ 55% of the rules within $250\mu s$ . The primary reason for this fast verification time is the quick identification of the affected atomic+iatomic headers and verifying multiple headers together in vector space using least squares.

VI-C Identifying loops and blackhole

Shown in Table IV (rows labeled with RL and RB) is the time for determining reachability in the presence of loops and in another case with blackholes. In these two cases, the mean verification time of Vercel is less than 90 and 98 $\mu$ s on all the datasets. Verifying loops require a slightly longer time because Vercel performs extra checks for re-traversal along a path and stores a non-zero vector $b_{i}$ to identify the headers in the loop (Section V). For blackhole detection, Vercel performs some extra steps (element-wise logical OR and XOR, see Section V) along with a reachability check, which slightly increases the time for verification.

TABLE IV: Vercel’s reachability evaluation in the presence of loops (RL), blackhole (RB), packet transformations (RT), packet filtering (RF) and routing policy (RP) analysis.

Dataset	Stanford	Berkeley	RF 1755	Airtel 1	Airtel 2
Median (RL)	76 $\mu s$	53 $\mu s$	88 $\mu s$	33 $\mu s$	32 $\mu s$
Mean (RL)	79 $\mu s$	56 $\mu s$	92 $\mu s$	63 $\mu s$	57 $\mu s$
Median (RB)	96 $\mu s$	53 $\mu s$	91 $\mu s$	34 $\mu s$	34 $\mu s$
Mean (RB)	98 $\mu s$	59 $\mu s$	97 $\mu s$	64 $\mu s$	59 $\mu s$
Median (RT)	884 $\mu s$	417 $\mu s$	469 $\mu s$	36 $\mu s$	37 $\mu s$
Mean (RT)	918 $\mu s$	349 $\mu s$	440 $\mu s$	285 $\mu s$	229 $\mu s$
Median (RF)	75 $\mu s$	53 $\mu s$	87 $\mu s$	32 $\mu s$	31 $\mu s$
Mean (RF)	77 $\mu s$	56 $\mu s$	91 $\mu s$	49 $\mu s$	45 $\mu s$
Median (RP)	82 $\mu s$	62 $\mu s$	88 $\mu s$	34 $\mu s$	33 $\mu s$
Mean (RP)	85 $\mu s$	60 $\mu s$	96 $\mu s$	38 $\mu s$	36 $\mu s$

VI-D Packet filtering

Since none of the existing datasets provide ACL rules, we created ACL rules by mapping the destination IP addresses (in the existing datasets) to filtering actions (permit/deny). Table IV shows that the mean and median verification time in the presence of ACL rules (represented as rows labeled with RF for Vercel) is less than 91 and 87 $\mu$ s on all the datasets that we considered.

VI-E Packet header transformation

We define a function to transform an IP prefix into another randomly chosen IP prefix and store the transformation function in a matrix $T_{i}$ , for each router $i$ (Section V). In Table IV (with the rows labeled as RT), we observe that the mean verification time of Vercel is less than 1 ms on all the datasets. Due to the matrix-vector product $T_{i}b_{i-1}$ , the mean verification time increases in the presence of transformation functions. Note that the verification time of Vercel is comparable to APKeep [17], in which the verification time is also bounded within 1 ms. Note, Veriflow [16] and Deltanet [15] do not model packet transformations.

VI-F Checking policy violations

Policy-based routing (PBR) [33] is used by network administrators to define the routing behavior of packets in a network by overriding the underlying routing protocol. Vercel assigns a PBR label to those nodes of the binary tree whose corresponding headers are generated through PBR. Thereafter, Vercel does not allow any non-PBR protocol to update the forwarding action (e.g., output port) at the labeled node in the binary tree. Subsequently, after a rule update, Vercel determines the affected headers and ports in the binary tree. Finally, Vercel evaluates reachability and checks for the violation of routing policies. In our evaluation, Vercel identifies whether, after a rule update, the traffic: a) violates a path length constraint; or, b) passes through a predetermined set of routers. Table IV (rows labeled with RP) shows that the median and mean verification time of Vercel for checking these two policies together is less than 88 and 96 $\mu s$ . With an additional consideration of routing policy, the verification time increases slightly. The reason is that after evaluating reachability at the destination, Vercel has to iterate over the reachable paths to check for policy violations.

VI-G Path recommendation, error rectification

Through Vercel we determine the quality of different paths (for example when deploying ECMP) by utilizing the projection error accumulated over nodes along each path. The cumulative $L2$ -norm of the projection error (across all nodes along a path) quantifies the quality of a path in terms of the number of misconfigured packet headers at routers on the path. Figure 6 shows the cumulative $L2$ norm of the projection error as a function of path length. Since protocols configure most of the services on shortest paths, therefore we observe low projection error on such paths. Generally, fewer services are configured on longer paths, which is reflected by an increase in the cumulative L2 norm of the projection error. Therefore, we can use projection error as a metric to recommend the paths for service configurations.

As discussed in Section II-B, error rectification is done by including $b$ in the column space of matrix $\mathcal{A}$ for all nodes along the path, where $b$ is in the null space of $\mathcal{A}^{T}$ . We created a synthetic network of 50 nodes and 200 edges to evaluate this approach and configured corresponding forwarding tables. Now, Vercel is applied to identify reachability between an arbitrary source and destination node. Assume, Vercel recognizes that there is no reachability, indicating that a fix is required. Computing this fix must be done fast. To show rectification computation, we varied the number of atomic/iatomic headers in the network and measured the time to rectify across differing path lengths. Results shown in Figure 7 allude to the time required to establish reachability. A path length of five required more fixes than a shorter path length of three. However, even if we increase the network-wide number of atomic/iatomic headers, the fixing time does not grow linearly. This is because Vercel leverages vectorization, where the same algebraic operation on different headers can be performed simultaneously. From Figure 7, it is evident that even on a longer path, Vercel can establish reachability within 50 $\mu s$ irrespective of the number of atomic/iatomic headers.

VI-H Comparison

We compare Vercel with the state-of-art real-time data plane verification tools such as AP Verifier [22], Veriflow [16], NetPlumber [24], Deltanet [15] and APKeep [17]. Table V compares the verification time of the various tools on the Stanford (campus) and Airtel (service provider) datasets. Note that on a large dataset such as Stanford, the performance of Vercel is better than all other verification techniques except Deltanet. Note Deltanet is designed only to model forwarding rules. In contrast, Vercel, APKeep, NetPlumber, and AP Verifier support data plane verification with a larger set of network functions such as filtering and transformation. Table V shows that, on the Stanford dataset, Vercel achieves mean verification time of 53 $\mu$ s and its performance is summarized as: $8\times$ over Veriflow, $164\times$ over NetPlumber, $1.7\times$ over APKeep, $36\times$ over AP Verifier. On the Stanford dataset, Vercel processes 99.7% of the rule updates within 250 $\mu$ s. Whereas, existing solutions such as NetPlumber, APVerifier, Veriflow and APKeep process 23.6%, 13.3%, 96.1% and 96.4% of the rule updates within 250 $\mu$ s, respectively. Similarly, on a small dataset of Airtel 1, Vercel achieves a significant improvement of $2.5\times$ over AP Verifier, $1.84\times$ over Veriflow, and $118\times$ over NetPlumber. The reason for Vercel’s speedup is the simultaneous processing of multiple atomic+iatomic headers at different ports. In contrast, other approaches do not have the in-built capability of simultaneous processing and are comparatively slower.

TABLE V: Comparing verification time of different approaches on public datasets [17]. (TO = timeout)

Dataset	Stanford		Airtel 1		Airtel 2
Metric	time $(\mu s)$	% <250 $\mu s$	time $(\mu s)$	% <250 $\mu s$	time $(\mu s)$	% <250 $\mu s$
AP Verifier	1953	13.3	80	91.3	135	77.4
Veriflow	468	96.1	59	99.9	48	99.9
NetPlumber	8700	23.6	3804	3.8	TO	TO
Deltanet	9	99.9	3	99.9	4	99.9
APKeep	94	96.4	7	99.8	6	99.9
Vercel	53	99.7	32	99.7	32	99.7

VI-I What-if queries and Batch Processing

A network administrator can use “what-if” queries to determine the fate of packets when a link goes down or model other scenarios. After a link failure, the SDN controller/ routing protocol recomputes paths between different source-destination pairs. Subsequently, the controller deletes existing rules corresponding to the failed links and inserts new rules (based on the updated paths) in the forwarding table of routers. To simulate a link failure, we select a random link $(l)$ in the network and delete rules from the both the connected routers that forward packets over $l$ . As shown in the first row of Table VI, a link failure can trigger deletion of 1000s of rules, therefore Vercel computes reachability by considering batches of rules. Table VI shows that the mean verification time for checking reachability along with loops on the INET dataset for Vercel is around 534 ms when a link failure triggers $\sim$ 3060 rule deletions in the network. In contrast, Deltanet and Veriflow require 2888 and 29117 ms. This result shows that Vercel is up to 5 $\times$ and 54 $\times$ faster than Deltanet and Veriflow on INET dataset. However, Deltanet performs better in 3 of the 5 datasets; though, the performance of Deltanet is dependent on network size and the number of rules.

TABLE VI: Comparing different approaches for ‘what-if’ queries for link failures. The second half of the table compares the memory utilization to results in [15].

Dataset		Berkeley	RF1755	RF6461	RF3257	INET
Avg. # of updates		50863	14603	9216	7900	3060
Verification time (in ms)	Veriflow	3073	8100	17594	17645	29117
	Deltanet	93	897	0.4	2.6	2888
	Vercel	1083	603	766	741	534
Memory (in MB)	Veriflow	1089	2713	5920	5882	9776
	Deltanet	6208	16937	39481	40716	63563
	Vercel	2216	4207	5047	5529	14200

VI-J Memory utilization

We now show the memory utilization of Vercel on the various datasets and compared to Veriflow and Deltanet. Table VI shows that Vercel is upto 7.8 $\times$ memory efficient than Deltanet. Deltanet is memory intensive due to the implementation of two data structures: 1) binary search tree to create atoms and 2) edge labeled graph to model the forwarding behavior of packets in a network. As compared to Veriflow, Vercel consumes more memory because of the extra space required to store: a) (node, port) tuples at different nodes of the tree; b) node label in string format and; c) integer identifier at leaf nodes of the tree, but also leads to more expressive power and path recommendation.

VII Related Work

The initially proposed data plane verification techniques require a snapshot of the network state to perform queries such as reachability, loop detection and slice isolation [13, 23, 34, 22, 35, 36, 37, 38, 39, 14, 40, 41].

Xie et al. [13] presented a technique for performing static reachability analysis of large-scale IP networks. The authors modelled routing protocols, packet filters and packet transformations using a formal framework. Their reachability check algorithm can be used to efficiently determine the reachability in IP networks and also capture select what-if scenarios.

Mai et al. [34] proposed a tool called Anteater that enables operators to debug their data plane by identifying bugs in forwarding behavior. Anteater works by comparing the expected behavior of network packets with the actual behavior observed on the network, using a set of customizable test cases. Anteater converts high-level network invariants into boolean satisfiability problems (SAT) and solves these using SAT solvers. In case, the network state is found to violate the invariants, Anteater provides a counter-example that helps in tracking the root cause. This tool mainly focused on identifying forwarding loops, packet loss, inconsistencies that emerged through dataplane misconfiguration.

Header space analysis (HSA) [23] is a key approach for performing static analysis of network configuration. Header transformation – a critical step in HSA involves analyzing the effects of network policies on packet headers. Authors describe several types of header transformations, such as port mapping, address rewriting, and develop techniques for analyzing their effects on header spaces. Specifically, HSA represents a $L$ -bit packet header in $L$ -dimensional space with reachability checks translated to algebraic operations over $L$ -dimensional hypercubes. The limitation of HSA is that it can be computationally expensive, which makes it unsuitable, especially for large and complex networks.

Snapshot-based techniques are designed to identify problems after they occur. However, in order to mitigate their impact, it is necessary to detect events before they cause problems. As a result, more recent verification techniques aim to detect anomalies in real-time. NetPlumber [24] was one of the earliest techniques in this direction. It creates a plumbing graph that identifies existing rules affected by the addition or deletion of new rules, and uses algebraic operations defined in HSA. However, for networks with a large number of rules, the plumbing graph can be large, resulting in high verification time.

Veriflow [16] addresses the challenge of dynamic network verification through the partitioning of packet headers into equivalence classes (EC) using a trie structure. It then checks network invariants by traversing a forwarding graph that corresponds to each EC. However, Veriflow’s approach is limited to modeling forwarding functions in a network and only considers each EC in isolation, which can result in higher verification times.

Yang et al. [22] presented an algorithm for verifying network properties in real-time using atomic predicates – small and reusable building blocks that can express complex network properties. All predicates in AP Verifier are represented by binary decision diagrams (BDDs), which are rooted, directed acyclic graphs. Logical operations on BDDs can be performed efficiently using graph-based algorithms.

Deltanet [15] builds on Veriflow’s approach by constructing a single forwarding graph that covers all the equivalence classes (ECs) and then uses it to check network invariants. However, Deltanet’s approach is limited to verifying reachability in the presence of forwarding rules and does not support modeling of other network functions, such as packet filtering and transformation. Deltanet also does not support batch processing. Vercel on the other hand, does batch processing, evaluates what-if conditions and tends to be more scalable due to vector algebra.

APKeep [17] enhances the methods used in Veriflow and Deltanet by enabling the modeling of a wide range of network functions. This is accomplished by partitioning packet headers into equivalence classes (ECs) and representing network functions with Boolean formulas. The algorithm then verifies network invariants by solving these formulas using binary decision diagrams (BDDs). However, APKeep is only able to detect configuration errors, whereas Vercel can identify configuration errors and offer recommendations to correct them.

Katra [42] uses pushdown systems for evaluating reachability in multi-layer networks. In contrast, Vercel models packet headers in a vector space and checks network invariants and policies by using least squares. Flash [43] proposes an automata-theory-based solution to handle data plane verification amidst update storms and too-slow arrivals of update messages. Vercel performs well while handling update storms and outperforms state-of-the-art sequential data plane verification approaches. Mahjong [44] is a tool that helps users to choose among multiple dataplane verification approaches. Vercel is naturally suited to fast-handle storm of updates because of its scalability and vector-based parallel processing abilities.

VIII Summary

We presented Vercel, inspired by the techniques of linear algebra to check reachability and delve into error rectification and recommendation. Vercel works by making use of vector spaces that are the result of mapping packet headers onto a binary tree. Then, these vector spaces lead to the formation of a matrix $\mathcal{A}$ , which represents the set of headers at a port, and vector $b$ – the set of headers that need to be evaluated, resulting in $\mathcal{A}x=b$ . Since $\mathcal{A}x=b$ is not always solvable, Vercel deploys least squares, which guarantees a solution irrespective of the rank of $\mathcal{A}$ . Representing headers in vector space helps process multiple headers simultaneously. The use of vector algebra makes Vercel achieve aspects of verification like batch updates, what-if conditions, path and table recommendations beyond what other techniques can do. Based on the experiments with real-world datasets, we show Vercel models a variety of network functions and checks for reachability and network invariants (loops, blackhole).

We show that Vercel is at least 70% faster than the state-of-art techniques while providing more expressive power. We also showed that Vercel could be used to evaluate reachability post link failures and check for routing policies. The least squares solution also results in a recommendation model that checks tables for configuration anomalies to avoid longer paths. Vercel’s vector space architecture leads to a paradigm shift – where it takes in intents and converts these into table entries leading to automatic service provisioning. The recommendation model provides for more possibilities beyond what we have so far studied in the domain of network verification.

References

[1] H. Zeng, P. Kazemian, G. Varghese, and N. McKeown, “A survey on network troubleshooting,” Technical Report Stanford/TR12-HPNG-061012, Stanford University, Tech. Rep., 2012.
[2] N. Feamster and H. Balakrishnan, “Detecting bgp configuration faults with static analysis,” in Proceedings of the 2nd conference on Symposium on Networked Systems Design and Implementation (NSDI), 2005, pp. 43–56.
[3] R. Mahajan, D. Wetherall, and T. Anderson, “Understanding bgp misconfiguration,” in Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communications (SIGCOMM), 2002, p. 3–16.
[4] A. Abhashkumar, A. Gember-Jacobson, and A. Akella, “Tiramisu: Fast multilayer network verification,” in 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI), Feb 2020, pp. 201–219.
[5] R. Beckett, A. Gupta, R. Mahajan, and D. Walker, “A general approach to network configuration verification,” in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM), 2017, p. 155–168.
[6] S. Prabhu, K. Y. Chou, A. Kheradmand, B. Godfrey, and M. Caesar, “Plankton: Scalable network configuration verification through model checking,” in 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2020, pp. 953–967.
[7] S. K. Fayaz, T. Sharma, A. Fogel, R. Mahajan, T. Millstein, V. Sekar, and G. Varghese, “Efficient network reachability analysis using a succinct control plane representation,” in 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2016, pp. 217–232.
[8] A. Gember-Jacobson, R. Viswanathan, A. Akella, and R. Mahajan, “Fast control plane analysis using an abstract representation,” in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM), 2016, pp. 300–313.
[9] R. Beckett, A. Gupta, R. Mahajan, and D. Walker, “Control plane compression,” in Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM), 2018, p. 476–489.
[10] T. Ball, N. Bjørner, A. Gember, S. Itzhaky, A. Karbyshev, M. Sagiv, M. Schapira, and A. Valadarsky, “Vericon: Towards verifying controller programs in software-defined networks,” in Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2014, p. 282–293.
[11] R. Beckett, X. K. Zou, S. Zhang, S. Malik, J. Rexford, and D. Walker, “An assertion language for debugging sdn applications,” in Proceedings of the Third Workshop on Hot Topics in Software Defined Networking, 2014, p. 91–96.
[12] K. Jayaraman, N. Bjørner, J. Padhye, A. Agrawal, A. Bhargava, P.-A. C. Bissonnette, S. Foster, A. Helwer, M. Kasten, I. Lee et al., “Validating datacenters at scale,” in Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM), 2019, pp. 200–213.
[13] G. G. Xie, J. Zhan, D. A. Maltz, H. Zhang, A. Greenberg, G. Hjalmtysson, and J. Rexford, “On static reachability analysis of ip networks,” in Proceedings IEEE Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM), 2005, pp. 2170–2183.
[14] H. Zeng, S. Zhang, F. Ye, V. Jeyakumar, M. Ju, J. Liu, N. McKeown, and A. Vahdat, “Libra: Divide and conquer to verify forwarding tables in huge networks,” in 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2014, pp. 87–99.
[15] A. Horn, A. Kheradmand, and M. Prasad, “Delta-net: Real-time network verification using atoms,” in 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2017, pp. 735–749.
[16] A. Khurshid, X. Zou, W. Zhou, M. Caesar, and P. B. Godfrey, “Veriflow: Verifying network-wide invariants in real time,” in 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2013, pp. 15–27.
[17] P. Zhang, X. Liu, H. Yang, N. Kang, Z. Gu, and H. Li, “Apkeep: Realtime verification for real networks,” in 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2020, pp. 241–255.
[18] N. Bjørner, G. Juniwal, R. Mahajan, S. A. Seshia, and G. Varghese, “ddnf: An efficient data structure for header spaces,” in Haifa Verification Conference, 2016, pp. 49–64.
[19] A. Horn, A. Kheradmand, and M. R. Prasad, “A precise and expressive lattice-theoretical framework for efficient network verification,” in IEEE International Conference on Network Protocols (ICNP), 2019, pp. 1–12.
[20] A. Panda, O. Lahav, K. Argyraki, M. Sagiv, and S. Shenker, “Verifying reachability in networks with mutable datapaths,” in 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2017, pp. 699–718.
[21] R. Stoenescu, M. Popovici, L. Negreanu, and C. Raiciu, “Symnet: Scalable symbolic execution for modern networks,” in Proceedings of the 2016 Conference of the ACM Special Interest Group on Data Communication (SIGCOMM), 2016, p. 314–327.
[22] H. Yang and S. S. Lam, “Real-time verification of network properties using atomic predicates,” IEEE/ACM Transactions on Networking (TON), vol. 24, no. 2, pp. 887–900, Apr. 2015.
[23] P. Kazemian, G. Varghese, and N. McKeown, “Header space analysis: Static checking for networks,” in 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2012, pp. 113–126.
[24] P. Kazemian, M. Chang, H. Zeng, G. Varghese, N. McKeown, and S. Whyte, “Real time network policy checking using header space analysis,” in 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2013, pp. 99–111.
[25] G. Strang, Introduction to linear algebra. Wellesley-Cambridge Press Wellesley, MA, 1993, vol. 3.
[26] P. Kazemian, “Hassel-public,” https://bitbucket.org/peymank/hassel-public/wiki/Home, 2014, accessed: 2021-1-1. [Online]. Available: https://bitbucket.org/peymank/hassel-public/wiki/Home
[27] S. Janardhan, “More details about the October 4 outage,” 2021, accessed: 2024-1-1. [Online]. Available: https://engineering.fb.com/2021/10/05/networking-traffic/outage-details/
[28] L. Pursell and S. Trimble, “Gram-schmidt orthogonalization by gauss elimination,” The American Mathematical Monthly, vol. 98, no. 6, pp. 544–549, May 1991.
[29] C. R. H. et al., “Array programming with NumPy,” Nature, vol. 585, no. 7825, pp. 357–362, Sep. 2020.
[30] A. A. Hagberg, D. A. Schult, and P. J. Swart, “Exploring network structure, dynamics, and function using networkx,” in Proceedings of the 7th Python in Science Conference, 2008, pp. 11 – 15.
[31] A. Horn, “Deltanet datasets,” https://github.com/delta-net/datasets, 2017, accessed: 2022-1-1. [Online]. Available: https://github.com/delta-net/datasets
[32] N. Spring, R. Mahajan, and D. Wetherall, “Measuring isp topologies with rocketfuel,” ACM SIGCOMM Computer Communication Review, vol. 32, no. 4, pp. 133–145, Aug. 2002.
[33] C. support, “Understanding policy routing,” https://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/10116-36.html, 2005, accessed: 2022-1-1.
[34] H. Mai, A. Khurshid, R. Agarwal, M. Caesar, P. B. Godfrey, and S. T. King, “Debugging the data plane with anteater,” ACM SIGCOMM Computer Communication Review, vol. 41, no. 4, pp. 290–301, Aug. 2011.
[35] E. Al-Shaer and S. Al-Haj, “Flowchecker: Configuration analysis and verification of federated openflow infrastructures,” in Proceedings of the 3rd ACM workshop on Assurable and usable security configuration, 2010, pp. 37–44.
[36] E. Al-Shaer, W. Marrero, A. El-Atawy, and K. Elbadawi, “Network configuration in a box: Towards end-to-end verification of network reachability and security,” in 17th IEEE International Conference on Network Protocols (ICNP), 2009, pp. 123–132.
[37] A. Fogel, S. Fung, L. Pedrosa, M. Walraed-Sullivan, R. Govindan, R. Mahajan, and T. Millstein, “A general approach to network configuration analysis,” in 12th USENIX symposium on networked systems design and implementation (NSDI), 2015, pp. 469–483.
[38] A. Jeffrey and T. Samak, “Model checking firewall policy configurations,” in IEEE International Symposium on Policies for Distributed Systems and Networks, 2009, pp. 60–67.
[39] N. P. Lopes, N. Bjørner, P. Godefroid, K. Jayaraman, and G. Varghese, “Checking beliefs in dynamic networks,” in 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2015, pp. 499–512.
[40] B. Tian, X. Zhang, E. Zhai, H. H. Liu, Q. Ye, C. Wang, X. Wu, Z. Ji, Y. Sang, M. Zhang, D. Yu, C. Tian, H. Zheng, and B. Y. Zhao, “Safely and automatically updating in-network acl configurations with intent language,” in Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM), 2019, p. 214–226.
[41] S. K. Fayaz, T. Yu, Y. Tobioka, S. Chaki, and V. Sekar, “BUZZ: Testing Context-Dependent policies in stateful networks,” in 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2016, pp. 275–289.
[42] R. Beckett and A. Gupta, “Katra: Realtime verification for multilayer networks,” in 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), 2022, pp. 617–634.
[43] D. Guo, S. Chen, K. Gao, Q. Xiang, Y. Zhang, and Y. R. Yang, “Flash: fast, consistent data plane verification for large-scale network settings,” in Proceedings of the ACM SIGCOMM 2022 Conference, 2022, pp. 314–335.
[44] Y. Li, C. Jia, X. Hu, and J. Li, “Mahjong: A generic framework for network data plane verification,” in Proceedings of the Symposium on Architectures for Networking and Communications Systems, 2021, pp. 52–58.
[45] J. Alman and V. V. Williams, “A refined laser method and faster matrix multiplication,” in Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA), 2021, pp. 522–539.

Appendix A Proofs

A-A Proof of Theorem 3

Proof overview: We start the proof by showing that if columns of the matrix $\mathcal{A}$ are selected from the standard basis (which is an orthonormal basis), then the projection point $b_{i}$ can be obtained in linear time by multiplying diagonal elements of the matrix $\mathcal{A}\mathcal{A}^{T}$ and vector $b_{i-1}$ . Thereafter, we show two cases with quadratic and linear time complexity to obtain the diagonal elements of the matrix $\mathcal{A}\mathcal{A}^{T}$ . This is a significant improvement compared to the use of only orthogonal vectors that require matrix product ( $\mathcal{O}(m^{2.37})$ [45]).

Proof: With the standard basis, it is possible to reduce the time complexity of solving the linear equations $\mathcal{A}^{p}_{i}x^{p}_{i}=b_{i-1}$ and obtain the projection point $b_{i}$ . Now the columns of matrix $\mathcal{A}_{i}^{p}$ are orthonormal ( $\mathcal{A}\mathcal{A}^{T}=I$ ), therefore the computation of a projection point $b_{i}$ from $b_{i-1}$ can be simplified as $b_{i}=\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T}b_{i-1}$ . Note that the matrix product $\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T}\in\mathbb{R}^{m\times m}$ results in a diagonal matrix, whose $j\textsuperscript{th}$ diagonal entry will be “1”, if $j$ ^th row of the matrix is non-zero. Otherwise, diagonal entries of $\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T}$ will be 0. Therefore, vector $b_{i}$ is essentially an element-wise product between the diagonal elements of $\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T}$ and $b_{i-1}$ .

Now, we ask if it is possible to create a vector $v_{i}^{p}$ (having $m$ elements) containing the diagonal elements of the matrix $\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T}$ but without performing the matrix product $\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T}$ ? If we can create $v_{i}^{p}$ in linear time, then the projection point $b_{i}$ can be computed efficiently (linear time) as $b_{i}=v_{i}^{p}\otimes b_{i-1}$ , where $\otimes$ is the element-wise product between two vectors. There exist two possibilities for creating $v_{i}^{p}$ (without computing $\mathcal{A}^{p}_{i}(\mathcal{A}^{p}_{i})^{T}$ ), each leading to different complexities.

Case 1: Modeling forwarding behavior with the complexity of $\mathcal{O}(m^{2})$ . Represent $v_{i}^{p}$ as the sum of columns of $\mathcal{A}_{i}^{p}$ i.e. $(v_{i}^{p})^{j}=\sum_{k=1}^{m}(\mathcal{A}_{i}^{p})^{jk}$ . In this case, complexity of creating $v_{i}^{p}$ is $\mathcal{O}(m^{2})$ and therefore, complexity for obtaining the projection point $b_{i}$ from $b_{i-1}$ is $\mathcal{O}(m^{2})$ .

Case 2: Modeling forwarding behavior with the complexity of $\mathcal{O}(m)$ . Instead of obtaining vector $v_{i}^{p}$ from matrix $\mathcal{A}_{i}^{p}$ , create $v_{i}^{p}=\vec{0}$ in $m$ -dimensional space. Subsequently, if router $i$ forwards $j\textsuperscript{th}$ header (in the set $S\textsuperscript{affected}$ ) to its output port $p$ , then update the $j\textsuperscript{th}$ entry of $v_{i}^{p}$ to 1. After updation, the forwarding vector $v_{i}^{p}$ denotes atomic+iatomic headers (in the set $S\textsuperscript{affected}$ ) that router $i$ forwards to its output port $p$ . Note $v_{i}^{p}$ still represents the sum of columns of the forwarding matrix $\mathcal{A}_{i}^{p}$ . However, the complexity of updating $v_{i}^{p}$ is $\mathcal{O}(m)$ because in the worst case, each entry of the vector is visited only once. Hence, complexity for obtaining the projection point $b_{i}$ from $b_{i-1}$ is $\mathcal{O}(m)$ . This proves the theorem.

A-B Proof of Theorem 4

The representation of headers in the set $S\textsuperscript{affected}$ using a standard basis also reduces the complexity of finding the reachable headers at the destination node. In the existing solution (Section III), an atomic+iatomic header $s_{k}\in S$ is reachable between a source and destination if the dot product between $b_{reachable}$ and the vector notation ( $e_{k}$ ) of header $s_{k}$ is non-zero. By performing dot product, we can obtain all reachable atomic+iatomic headers with the complexity of $\mathcal{O}(m^{2})$ (each dot product is of complexity $\mathcal{O}(m)$ ) and we need to perform $m$ such dot products. However, suppose we utilize the standard basis vectors to represent atomic+iatomic headers. In that case, all reachable headers correspond to the indices with non-zero entries of $b_{reachable}$ and the complexity of finding non-zero entries in an $m$ -dimensional vector is $\mathcal{O}(m)$ . This completes the proof of the theorem.

A-C Proof of Theorem 5

After a rule update (insertion/deletion), Vercel performs the following three steps to compute reachability along a path: 1. Find the sets $S\textsuperscript{affected}$ and $P\textsuperscript{affected}$ ; 2. Model forwarding behavior of routers’ ports present in the path; 3. Determine reachable headers at the destination from the vector space. From Theorem 2, we know that the time complexity of finding the sets $S\textsuperscript{affected}$ and $P\textsuperscript{affected}$ is $\mathcal{O}(m)$ . Thereafter, Theorem 3 helps to model the forwarding behavior of a router with time complexity of $\mathcal{O}(m)$ . Now, consider that the number of routers along a path are constant ( $c$ ) w.r.t. to the number of atomic+iatomic headers in a network. In this case, the complexity to compute reachability along the path is $\mathcal{O}(c*m)=\mathcal{O}(m)$ . Finally, Theorem 4 provides a solution (of complexity $\mathcal{O}(m)$ ) to recover reachable headers from the vector space. Note that each step (steps 1-3 described above) requires $\mathcal{O}(m)$ time. Therefore, Vercel solves reachability between a source and destination along a path with time complexity of $\mathcal{O}(m)$ . This completes the proof of the theorem.

A-D Proof of Theorem 6

As described in the proof presented in Appendix A-B, if we represent atomic+iatomic headers using standard basis vectors, then it is possible to create a forwarding vector $v_{i}^{p}\in\mathbb{R}^{m},m=|S\textsuperscript{affected}|$ representing the atomic+iatomic headers that router $i$ forwards along its port $p$ . Since, a forwarding vector $v$ is created for each port in the set $P\textsuperscript{affected}$ , therefore Vercel requires $|S\textsuperscript{affected}|*|P\textsuperscript{affected}|$ bits to create the forwarding vectors $v_{i}^{p}$ . In other words, the space complexity of Vercel (by using the forwarding vectors $v_{i}^{p}$ ) is $\mathcal{O}(|S\textsuperscript{affected}|*|P\textsuperscript{affected}|)$ . For example, consider a network with 10K atomic+iatomic headers and 200 ports across the network. If after a rule insertion, $|S\textsuperscript{affected}|=1$ K and $|P\textsuperscript{affected}|=50$ , then Vercel requires $\sim 6$ KB to represent the forwarding vectors $(v_{i}^{p})$ . This completes the proof of the theorem.