¹¹affiliationtext: University of Bergen, Norway²²affiliationtext: University of Utah, USA

Structural Optimal Jacobian Accumulation and Minimum Edge Count are NP-Complete Under Vertex Elimination

Matthias Bentert Alex Crane Pål Grønås Drange Yosuke Mizutani Blair D. Sullivan

Abstract

We study graph-theoretic formulations of two fundamental problems in algorithmic differentiation. The first (Structural Optimal Jacobian Accumulation) is that of computing a Jacobian while minimizing multiplications. The second (Minimum Edge Count) is to find a minimum-size computational graph. For both problems, we consider the vertex elimination operation. Our main contribution is to show that both problems are NP-complete, thus resolving longstanding open questions. In contrast to prior work, our reduction for Structural Optimal Jacobian Accumulation does not rely on any assumptions about the algebraic relationships between local partial derivatives; we allow these values to be mutually independent. We also provide $O^{*}(2^{n})$ -time exact algorithms for both problems, and show that under the exponential time hypothesis these running times are essentially tight. Finally, we provide a data reduction rule for Structural Optimal Jacobian Accumulation by showing that false twins may always be eliminated consecutively.

1 Introduction

A core subroutine in numerous scientific computing applications is the efficient computation of derivatives. If inexactness is acceptable, finite difference methods [22] may be applicable. Meanwhile, computer algebra packages allow for exact symbolic computation, though these methods suffer from the requirement of a closed-form expression of the function $F$ to be differentiated as well as poor running time in practice. A third approach is algorithmic differentiation, also sometimes called automatic differentiation. Algorithmic differentiation provides almost exact computations, incurring only rounding errors (in contrast to finite difference methods which also incur truncation errors). Furthermore, the running time required by algorithmic differentiation is bounded by the time taken to compute $F$ (in contrast to symbolic methods). We refer to the works by Griewank and Walther [11] and by Naumann [20] for an introduction and numerous applications.

In algorithmic differentiation, we assume that the function $F$ is implemented as a numerical program. The key insight is that such programs consist of compositions of elementary functions, e.g., multiplication, sin, cos, etc., for which the derivatives are known or easily computable. Then, derivative computations for the function $F$ follow by application of the chain rule. The relevant numerical program can be modeled as a directed acyclic graph (DAG) $D=(S\uplus I,E)$ , referred to as the computational graph of $F$ [10]. The source and sink vertices $S$ model the inputs and outputs of $F$ , respectively. The internal vertices $I$ model elementary function calls. Arcs (directed edges) model data dependencies. Let $\mathcal{P}_{s,t}$ denote the set of all paths from source $s$ to sink $t$ . If we associate to each arc $(u,v)$ the known (or easily computable) local partial derivative $\pdv{v}{u}$ , then the chain rule allows us to compute the derivative of $t$ with respect to $s$ by

\displaystyle\odv{t}{s}=\sum_{P\in\mathcal{P}_{s,t}}\prod_{(u,v)\in P}\pdv{v}{u}

as shown by Bauer [2]. This weighted DAG is the linearized computational graph.

Using the linearized computational graph, we may model the computation of a Jacobian as follows. An elimination of an internal vertex $v$ is the deletion of $v$ and the creation of all arcs from each in-neighbor to each out-neighbor of $v$ which are not already present. We write ${D}_{/{v}}$ for the resulting DAG. An elimination sequence ${\sigma=(v_{1},v_{2},\ldots,v_{\ell})}$ of length $\ell$ is a tuple of internal vertices, and we denote by ${D}_{\sigma}=((({D}_{/{v_{1}}})_{/v_{2}})\ldots)_{/v_{\ell}}$ the result of eliminating these vertices in the order given by $\sigma$ . We call $\sigma$ a total elimination sequence if $\ell=|I|$ . If $\sigma$ is a total elimination sequence, then with appropriate local partial derivative computations or updates during the sequence, ${D}_{\sigma}$ can be thought of as a bipartite DAG (with sources on one side and sinks on the other) representing the Jacobian matrix of the associated numerical program. To reflect the number of multiplications needed to maintain correctness of Equation (1), we say that the cost of eliminating an internal vertex $v$ is the Markowitz degree $\mu(v)$ of $v$ , that is, the in-degree of $v$ times the out-degree of $v$ . The cost of an elimination sequence is the sum of the costs of the involved eliminations. We can now phrase the problem of computing a Jacobian with few multiplications in purely graph-theoretic terms:

Optimal Jacobian Accumulation is known to be NP-complete only under certain assumptions regarding algebraic dependencies between local partial derivatives [19]. Despite this, heuristics used in practice are based on the purely structural formulation presented here [4, 6, 9, 21] and understanding the complexity of this formulation, open since at least 1993 [8], has recently been highlighted as an important problem in applied combinatorics [1].

A solution to Structural Optimal Jacobian Accumulation is always a total elimination sequence, resulting in a bipartite DAG representing the Jacobian. A related problem, first posed by Griewank and Vogel in 2005 [10], is to identify a (not necessarily total) elimination sequence which results in a computational graph with a minimum number of arcs:

The motivation to solve this problem is twofold. First, suppose that our function $F$ is a map from $\mathbb{R}^{n}$ to $\mathbb{R}^{m}$ and we wish to multiply its Jacobian (a matrix in $\mathbb{R}^{m\times n}$ ) by a matrix $S\in\mathbb{R}^{n\times q}$ . We may model this computation by augmenting the linearized computational graph $D$ with $q$ new source vertices. For each new source $i\in\{1,2,\ldots,q\}$ and each original source $j\in\{1,2,\ldots,n\}$ , we add the arc $(i,j)$ . By labeling the new arcs with entries from the matrix $S$ , we may obtain the result of the matrix multiplication via application of Equation (1). We refer to the work by Mosenkis and Naumann [17] for a formal presentation. The number of multiplications required to use Equation (1) in this way grows with the number of arcs in $D$ , thus motivating the computation of a small (linearized) computational graph. A second motivation is that $D$ can sometimes reveal useful information not evident in the Jacobian matrix. This situation, known as scarcity, is described by Griewank and Vogel [10]. Thus, it is desirable to store $D$ , rather than the Jacobian matrix, and consequently it is also desirable for $D$ to be as small as possible.

Despite these motivations and several algorithmic studies [10, 16, 17], the computational complexity of Minimum Edge Count has remained open since its introduction [10, 17]. Like Structural Optimal Jacobian Accumulation, resolving this question has recently been highlighted as an important open problem [1].

Our Results.

We show that both Structural Optimal Jacobian Accumulation and Minimum Edge Count are NP-complete, resolving the key complexity questions which have stood open since 1993 [8] and 2005 [10], respectively. Furthermore, we prove that unless the exponential time hypothesis (ETH)¹¹1The ETH is a popular complexity assumption that states that 3-Sat cannot be solved in subexponential time. See Section 2 for more details. fails, neither problem admits a subexponential algorithm, i.e., an algorithm with running time $2^{o(n+m)}$ . We complement our lower bounds by providing $O^{*}(2^{n})$ -time algorithms for both problems.

2 Preliminaries and Basic Observations

In this section, we define the notation we use throughout the paper, introduce relevant concepts from the existing literature, and show two useful basic propositions.

Notation.

For a positive integer $n$ , we use $[n]$ to denote the set $\{1,2,\ldots,n\}$ . We use standard graph terminology. In particular, a graph $G=(V,E)$ or $D=(V,A)$ is a pair where $V$ denotes the set of vertices and $E$ and $A$ denote the set of (undirected) edges or (directed) arcs, respectively. We use $n$ to indicate the number of vertices in a graph and $m$ to indicate the number of edges or arcs. For an (undirected) edge between two vertices $u$ and $v$ we write $\{u,v\}$ , and for an arc (a directed edge) from $u$ to $v$ we write $(u,v)$ . Given a vertex $v\in V$ , we denote by $N^{-}_{{D}}({v})$ and $N^{+}_{{D}}({v})$ the open in- and out-neighborhoods of $v$ , respectively. The Markowitz degree is defined to be $\mu_{D}(v)=\deg_{D}^{-}(v)\cdot\deg_{D}^{+}(v)=|N^{-}_{{D}}({v})|\cdot|N^{+}_{{D}}({v})|$ . If the graph is clear from context, we omit the subscript in the above notation. We say that two vertices $u$ and $v$ are false twins in a directed graph $D$ if $N^{-}_{{D}}({u})=N^{-}_{{D}}({v})$ and $N^{+}_{{D}}({u})=N^{+}_{{D}}({v})$ . Given sequences $\sigma_{1}=(a_{1},a_{2},\ldots,a_{i})$ and $\sigma_{2}=(b_{1},b_{2},\ldots,b_{j})$ , we write $(\sigma_{1},\sigma_{2})$ for the combined sequence $(a_{1},a_{2},\ldots,a_{i},b_{1},b_{2},\ldots,b_{j})$ , and we generalize this notation to more than two sequences.

Reductions and the ETH.

We assume the reader to be familiar with basic concepts in complexity theory like big-O (Bachmann–Landau) notation, NP-completeness, and polynomial-time many-one reductions (also known as Karp reductions). We refer to the standard textbook by Garey and Johnson [7] for an introduction. We use $O^{*}$ to hide factors that are polynomial in the input size and call a polynomial-time many-one reduction a linear reduction when the size of the constructed instance $I^{\prime}$ is linear in the size of the original instance $I$ , that is, $|I^{\prime}|\in O(|I|)$ . The exponential time hypothesis (ETH) [12] states that there is some $\varepsilon>0$ such that each algorithm solving 3-Sat takes at least $2^{\varepsilon n+o(n)}$ time, where $n$ is the number of variables in the input instance. Assuming the ETH, 3-Sat and many other problems cannot be solved in subexponential ( $2^{o(n+m)}$ ) time [13]. It is known that if there is a linear reduction from a problem $A$ to a problem $B$ and $A$ cannot be solved in subexponential time, then also $B$ cannot be solved in subexponential time [13].

Fundamental Observations.

We next show two useful observations. The first one states that the order in which vertices are eliminated does not affect the resulting graph (note that it may still affect the cost of the elimination sequence). This is a folklore result, but to our knowledge no proof is known. Our argument can be seen as an adaptation of one used by Rose and Tarjan to prove a closely related result [23].

Proposition 1.

Let $D=(V,A)$ be a DAG, let $X\subseteq V$ be a set of vertices, and let $\sigma_{1}$ and $\sigma_{2}$ be two permutations of the vertices in $X$ . Then, $D_{\sigma_{1}}=D_{\sigma_{2}}$ .

Proof.

We first show for any DAG $D=(V,A)$ and any three vertices ${u,v,w\in V}$ that there is a directed path from $u$ to $v$ in $D$ if and only if there is a directed path from $u$ to $v$ in ${D}_{/{w}}$ . To this end, first assume that there is a directed path $P$ from $u$ to $v$ in $D$ . If $P$ does not contain $w$ , then $P$ is also a directed path in ${D}_{/{w}}$ . Otherwise, let $x,y$ be the vertices before and after $w$ in $P$ , respectively. Since the elimination of $w$ adds an arc from $x$ to $y$ , there is also a directed path from $u$ to $v$ in this case. Now assume that there is a directed path in ${D}_{/{w}}$ . We assume without loss of generality that $P$ is a shortest path in $D$ . There are again two cases. Either $P$ is also a directed path in $D$ or there is at least one arc $(x,y)$ in $P$ that is not present in $D$ . In the first case, $P$ shows that there is a directed path from $u$ to $v$ in $D$ by definition. In the second case, we consider the first arc $(x,y)$ in $P$ that is not contained in $D$ . Note that by construction $(x,y)$ is only added to ${D}_{/{w}}$ if $x$ is an in-neighbor of $w$ and $y$ is an out-neighbor of $w$ in $D$ . Moreover, since $P$ is a shortest path, there is no other vertex $y^{\prime}$ in $P$ that is also a out-neighbor of $w$ as otherwise the arc $(x,y^{\prime})$ exists in ${D}_{/{w}}$ contradicting that $P$ is a shortest path. We can now replace the arc $(x,y)$ in $P$ by a subpath $(x,w,y)$ to get a directed path from $u$ to $v$ in $D$ .

Let $\sigma=(w_{1},w_{2},\ldots,w_{k})$ be a sequence. By induction on $k$ (using the above argument), it holds for any two vertices $u,v\in V\setminus\{w_{1},w_{2},\ldots,w_{k}\}$ that there is a directed path from $u$ to $v$ in $D$ if and only if there one in $((({D}_{/{w_{1}}})_{/w_{2}})\ldots)_{/w_{k}}=D_{\sigma}$ .

Now assume towards a contradiction that $D_{\sigma_{1}}\neq D_{\sigma_{2}}$ . Note that both contain the same set $V\setminus X$ of vertices. We assume without loss of generality that there is an arc $(u,v)$ that exists in $D_{\sigma_{1}}$ but not in $D_{\sigma_{2}}$ . By the above argument, since $(u,v)$ appears in $D_{\sigma_{1}}$ there is a directed path from $u$ to $v$ in $D$ . However, since $(u,v)$ does not appear in $D_{\sigma_{2}}$ , there is no directed path from $u$ to $v$ in $D$ , a contradiction. This concludes the proof. ∎

Let $\sigma$ be a sequence of internal vertices, and let $X$ be the set of vertices appearing in $\sigma$ . In the rest of this paper, we may use ${D}_{X}$ to denote the graph ${D}_{\sigma}$ . By Proposition 1, this notation is well-defined.

To conclude this section, we show that false twins can be handled uniformly in Structural Optimal Jacobian Accumulation. Let $T$ be a set of false twins, i.e., $u$ and $v$ are false twins for every $u,v\in T$ . Then we may assume, without loss of generality, that eliminating any $u\in T$ also entails eliminating the rest of the vertices in $T$ immediately afterward.

Proposition 2.

Let $D=(S\uplus I,A)$ be a DAG and let $T\subseteq I$ be a set of false twins. Then, there exists an optimal elimination sequence (for Structural Optimal Jacobian Accumulation) that eliminates the vertices of $T$ consecutively.

Proof.

Let $T\subseteq I$ be a set of false twins in $D$ . We first prove the result when $|T|=2$ . Let $T=\{u,v\}$ , and let $\sigma$ be an optimal solution. We may assume that $u$ and $v$ are eliminated non-consecutively in $\sigma$ , as otherwise the proof is complete. We further assume without loss of generality that $u$ is eliminated before $v$ in $\sigma$ . Let $\sigma_{1}$ be the subsequence of $\sigma$ before $u$ , $\sigma_{2}$ be the subsequence between $u$ and $v$ and $\sigma_{3}$ be the subsequence after $v$ . Let $X_{1},X_{2}$ , and $X_{3}$ be the sets of vertices appearing in $\sigma_{1},\sigma_{2}$ , and $\sigma_{3}$ , respectively. Note that $X_{1}$ and $X_{3}$ might be empty, but $X_{2}$ contains at least one vertex. Let $\sigma^{\prime}=(\sigma_{1},\sigma_{2},u,v,\sigma_{3})$ and $\sigma^{\prime\prime}=(\sigma_{1},u,v,\sigma_{2},\sigma_{3})$ . Let $c,c^{\prime}$ , and $c^{\prime\prime}$ be the costs of $\sigma,\sigma^{\prime}$ , and $\sigma^{\prime\prime}$ , respectively. Since $\sigma$ is optimal, it holds that $c\leq c^{\prime}$ and $c\leq c^{\prime\prime}$ . Now, we claim that both $\sigma^{\prime}$ and $\sigma^{\prime\prime}$ are optimal (it suffices to show that either $\sigma^{\prime}$ or $\sigma^{\prime\prime}$ is optimal, but it will be useful later to prove that both are). Assume otherwise, so at least one of the inequalities $c\leq c^{\prime}$ or $c\leq c^{\prime\prime}$ is strict. This implies $2c<c^{\prime}+c^{\prime\prime}$ . We will show that this inequality leads to a contradiction.

It holds that the cost of $\sigma_{1}$ in $D$ and the cost of $\sigma_{3}$ in $D_{X_{1}\cup X_{2}\cup\{u,v\}}$ are accounted for identically in the total costs of $\sigma,\sigma^{\prime}$ , and $\sigma^{\prime\prime}$ . Thus, any difference in the values of $c,c^{\prime},$ and $c^{\prime\prime}$ is attributable entirely to differing costs of eliminating $u,v$ and the vertices in $X_{2}$ . Let $d_{1},d_{2},$ and $d_{3}$ be the costs of $\sigma_{2}$ in $D_{X_{1}},D_{X_{1}\cup\{u\}}$ , and $D_{X_{1}\cup\{u,v\}}$ , respectively. Note that these terms are well-defined by Proposition 1. This implies

\displaystyle\begin{split}2\big{(}\mu_{D_{X_{1}}}(u)+d_{2}+\mu_{D_{X_{1}\cup\{u\}\cup X_{2}}}(v)\big{)}<&\\ \big{(}d_{1}+\mu_{D_{X_{1}\cup X_{2}}}(u)+\mu_{D_{X_{1}\cup X_{2}\cup\{u\}}}&(v)\big{)}+\big{(}\mu_{D_{X_{1}}}(u)+\mu_{D_{X_{1}\cup\{u\}}}(v)+d_{3}\big{)}.\end{split}

(5)

Moreover, since $u$ and $v$ are false twins and the elimination of $u$ does not change the cost of eliminating $v$ , it holds that $\mu_{D_{X_{1}\cup X_{2}}}(u)=\mu_{D_{X_{1}\cup X_{2}\cup\{u\}}}(v)$ and ${\mu_{D_{X_{1}}}(u)=\mu_{D_{X_{1}\cup\{u\}}}(v)}$ . Substituting this into Inequality (5) yields $2d_{2}<d_{1}+d_{3}$ . Next, let ${\sigma_{2}=(w_{1},w_{2},\ldots,w_{k})}$ and let ${W_{i}=X_{1}\cup\{w_{1},w_{2},\ldots,w_{i-1}\}}$ for each $i\in[k]$ . Notice that it holds that ${d_{1}=\sum_{i\in[k]}\mu_{D_{W_{i}}}(w_{i})}$ , ${d_{2}=\sum_{i\in[k]}\mu_{D_{W_{i}\cup\{u\}}}(w_{i})}$ , and $d_{3}=\sum_{i\in[k]}\mu_{D_{W_{i}\cup\{u,v\}}}(w_{i})$ . To conclude the proof, we will show that for each $w_{i}\in X_{2}$ ,

2\mu_{D_{W_{i}\cup\{u\}}}(w_{i})\geq\mu_{D_{W_{i}}}(w_{i})+\mu_{D_{W_{i}\cup\{u,v\}}}(w_{i}).

Note that this implies $2d_{2}\geq d_{1}+d_{3}$ , yielding the desired contradiction. To show the above claim, we consider three cases: (i) $w_{i}$ is an out-neighbor of $u$ in $D_{W_{i}}$ , (ii) $w_{i}$ is an in-neighbor of $u$ in $D_{W_{i}}$ , and (iii) $w_{i}$ is neither an in- nor an out-neighbor of $u$ in $D_{W_{i}}$ . Note that since $D_{W_{i}}$ is a DAG, $w_{i}$ cannot be both an in-neighbor and an out-neighbor of $u$ . Moreover, since $u$ and $v$ are false twins, $w_{i}$ is an in-/out-neighbor of $u$ if and only if it is an in-/out-neighbor of $v$ . In the first case, note that

•

$|N^{+}_{{D_{W_{i}}}}({w_{i}})|=|N^{+}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})|=|N^{+}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})|$ ,
•

$|N^{-}_{{D_{W_{i}}}}({w_{i}})|\leq|N^{-}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})|+2$ , and
•

$|N^{-}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})|=|N^{-}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})|+1$ .

The first holds as the out-degree of $w_{i}$ does not change if $u$ and/or $v$ are eliminated. To see the second, note that eliminating $u$ and $v$ can reduce the in-degree of $w_{i}$ by at most two. Finally, if $u$ is already eliminated, then eliminating $v$ does not add any new in-neighbors of $w_{i}$ since $u$ and $v$ are false twins (and this property remains true even if other vertices are eliminated). Thus, we get

	$\displaystyle 2\mu_{D_{W_{i}\cup\{u\}}}(w_{i})$	$\displaystyle=2\big{(}\|N^{-}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|\cdot\|N^{+}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|\big{)}$
		$\displaystyle=2\big{(}(\|N^{-}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})\|+1)\cdot\|N^{+}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|\big{)}$
		$\displaystyle=(2\|N^{-}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})\|+2)\cdot\|N^{+}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|$
		$\displaystyle\geq(\|N^{-}_{{D_{W_{i}}}}({w_{i}})\|+\|N^{-}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})\|)\cdot\|N^{+}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|$
		$\displaystyle=\mu_{D_{W_{i}}}(w_{i})+\mu_{D_{W_{i}\cup\{u,v\}}}(w_{i}).$

The second case is analogous with the roles of in- and out-neighbors swapped, that is,

•

$|N^{-}_{{D_{W_{i}}}}({w_{i}})|=|N^{-}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})|=|N^{-}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})|$ ,
•

$|N^{+}_{{D_{W_{i}}}}({w_{i}})|\leq|N^{+}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})|+2$ , and
•

$|N^{+}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})|=|N^{+}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})|+1$ .

This yields

	$\displaystyle 2\mu_{D_{W_{i}\cup\{u\}}}(w_{i})$	$\displaystyle=2\big{(}\|N^{-}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|\cdot\|N^{+}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|\big{)}$
		$\displaystyle=2\big{(}\|N^{-}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|\cdot(\|N^{+}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})\|+1)\big{)}$
		$\displaystyle=\|N^{-}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|\cdot(2\|N^{+}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})\|+2)$
		$\displaystyle\geq\|N^{-}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|\cdot(\|N^{+}_{{D_{W_{i}}}}({w_{i}})\|+\|N^{+}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})\|)$
		$\displaystyle=\mu_{D_{W_{i}}}(w_{i})+\mu_{D_{W_{i}\cup\{u,v\}}}(w_{i}).$

Finally, in the third case it holds that

•

$|N^{-}_{{D_{W_{i}}}}({w_{i}})|=|N^{-}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})|=|N^{-}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})|$ , and
•

$|N^{+}_{{D_{W_{i}}}}({w_{i}})|=|N^{+}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})|=|N^{+}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})|$ .

Thus, we get

2\mu_{D_{W_{i}\cup\{u\}}}(w_{i})=2\cdot|N^{-}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})|\cdot|N^{+}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})|=\mu_{D_{W_{i}}}(w_{i})+\mu_{D_{W_{i}\cup\{u,v\}}}(w_{i}).

Since $2\mu_{D_{W_{i}\cup\{u\}}}(w_{i})\geq\mu_{D_{W_{i}}}(w_{i})+\mu_{D_{W_{i}\cup\{u,v\}}}(w_{i})$ holds in all cases and we showed before that this contradicts Inequality (5), this concludes the proof when $|T|=2$ .

To generalize the result to larger sets $T$ , we use induction on the size $\ell$ of $T$ . As shown above, the base case where $\ell=2$ holds true. Next, assume the proposition is true for $\ell-1$ . Let $T=\{v_{1},v_{2},\ldots,v_{\ell}\}$ , and let $\sigma$ be an optimal elimination sequence. Let $T^{\prime}=T\setminus\{v_{\ell}\}$ . By the inductive hypothesis (on $T^{\prime}$ ) and Proposition 1, we may assume that $\sigma=(\sigma_{1},v_{1},v_{2},\ldots v_{\ell-1},\sigma_{2},v_{\ell},\sigma_{3})$ or ${\sigma=(\sigma_{1},v_{\ell},\sigma_{2},v_{1},v_{2},\ldots,v_{\ell-1},\sigma_{3})}$ for some sequences $\sigma_{1},\sigma_{2}$ , and $\sigma_{3}$ . By applying the argument from the case where $|T|=2$ to $v_{\ell-1}$ and $v_{\ell}$ or to $v_{\ell}$ and $v_{1}$ , we observe optimal solutions $(\sigma_{1},v_{1},v_{2},\ldots,v_{\ell-1},v_{\ell},\sigma_{2},\sigma_{3})$ or ${(\sigma_{1},\sigma_{2},v_{\ell},v_{1},v_{2},\ldots,v_{\ell-1},\sigma_{3})}$ . This completes the proof. ∎

3 Structural Optimal Jacobian Accumulation is NP-complete

In this section, we show that Structural Optimal Jacobian Accumulation is NP-complete. We reduce from Vertex Cover, which is defined as follows.

Theorem 1.

Structural Optimal Jacobian Accumulation is NP-complete. Assuming the ETH, it cannot be solved in $2^{o(n+m)}$ time.

Proof.

We first show containment in NP. Note that a total elimination sequence is a permutation of $I$ and therefore can be encoded in polynomial space. Moreover, given such a sequence, we can compute its cost in polynomial time by computing the vertex eliminations one after another.

To show hardness, we reduce from Vertex Cover. It is well known that Vertex Cover is NP-hard and cannot be solved in $2^{o(n+m)}$ time unless the ETH fails [13, 14]. We will provide a linear reduction from Vertex Cover to Structural Optimal Jacobian Accumulation thereby proving the theorem. To this end, let $(G=(V,E),k)$ be an input instance of Vertex Cover. Let $n=|V|$ and $m=|E|$ . We create an equivalent instance $(D=(S\uplus I,A),k^{\prime})$ of Structural Optimal Jacobian Accumulation as follows. For each vertex $v\in V$ , we create five vertices $v_{1},v_{2},v_{3},v_{4},v_{5}$ . The vertices $v_{1},v_{4}$ and $v_{5}$ are contained in $S$ for all $v\in V$ . Vertices $v_{2}$ and $v_{3}$ are contained in $I$ . Next, we add the set $\left\{(v_{1},v_{2}),(v_{2},v_{3}),(v_{2},v_{4}),(v_{2},v_{5}),(v_{3},v_{4}),(v_{3},v_{5})\right\}$ of arcs to $A$ for each $v\in V$ . Finally, for each edge $\{u,v\}\in E$ , we add the arcs $(u_{1},v_{3}),(u_{2},v_{3}),(v_{1},u_{3})$ and $(v_{2},u_{3})$ to $A$ . Notice that every arc goes from a lower-indexed vertex to a higher-indexed vertex. Hence, the constructed digraph is a DAG. To finish the construction, we set $k^{\prime}=6m+4n+k.$ An illustration of the construction is depicted in Figure 1.

Figure 1: The input graph is shown on the left and the constructed graph is shown on the right. An optimal solution (corresponding to the vertex cover that only contains the middle vertex) first eliminates the red vertex, then all blue vertices, then all green vertices, and finally the yellow vertex.

Since the reduction can clearly be computed in polynomial time, it only remains to show correctness. We proceed to show that $(G,k)$ is a yes-instance of Vertex Cover if and only if the constructed instance $(D,k^{\prime})$ is a yes-instance of Structural Optimal Jacobian Accumulation.

First, assume that $G$ contains a vertex cover $C$ of size at most $k$ . We show that eliminating all vertices $v_{2}$ with $v\in C$ first, then $u_{3}$ for all $u\in V\setminus C$ , followed by $u_{2}$ for all $u\in V\setminus C$ , and finally all vertices $v_{3}$ for each $v\in C$ results in a total cost of $k^{\prime}$ . To see this, note that the cost of eliminating $v_{2}$ for any vertex $v\in C$ is $\deg(v)+3$ as $v_{2}$ has a single in-neighbor $v_{1}$ , three out-neighbors $v_{3},v_{4}$ , and $v_{5}$ , and $\deg(v)$ out-neighbors $u_{3}$ (one for each $u\in N(v)$ ). The cost of eliminating $u_{3}$ for any $u\notin C$ afterwards is $2\deg(u)+2$ as by construction $u_{3}$ has in-neighbors $\{w_{1}\mid w\in N(u)\}\cup\{u_{2}\}$ , two out-neighbors $u_{4},u_{5}$ and no in-neighbors in $\{w_{2}\mid w\in N(u)\}$ , as each $w\in N(u)$ is by definition in $C$ and hence the corresponding $w_{2}$ has been eliminated before. The cost of eliminating $u_{2}$ for $u\in V\setminus C$ is afterwards $\deg(u)+2$ as $u_{2}$ has the single in-neighbor $u_{1}$ , two out-neighbors $u_{4}$ and $u_{5}$ and $\deg(u)$ out-neighbors $\{v_{3}\mid v\in C\}$ (note that $u$ cannot have neighbors in $V\setminus C$ since $C$ is a vertex cover and $u\notin C$ ). Finally, the cost of eliminating $v_{3}$ for any $v\in C$ is $2\deg(v)+2$ since $v_{3}$ has $\deg(v)+1$ in-neighbors $\{w_{1}\mid w\in N[v]\}$ and two out-neighbors $v_{4}$ and $v_{5}$ . Summing these costs over all vertices and applying the handshake lemma²²2The handshake lemma states that the sum of vertex degrees is twice the number of edges [5]. gives a total cost of

	$\displaystyle\sum_{v\in C}(\deg(v)+3)$	$\displaystyle+\sum_{u\in V\setminus C}(2\deg(u)+2)+\sum_{u\in V\setminus C}(\deg(u)+2)+\sum_{v\in C}(2\deg(v)+2)$
		$\displaystyle=\sum_{v\in C}(3\deg(v)+5)+\sum_{u\in V\setminus C}(3\deg(u)+4)$
		$\displaystyle=\sum_{v\in V}(3\deg(v)+4)+\|C\|$
		$\displaystyle\leq 6m+4n+k=k^{\prime}.$

This shows that the constructed instance $(D,k^{\prime})$ is a yes-instance of Structural Optimal Jacobian Accumulation.

In the other direction, assume that there is an ordering $\sigma$ of the vertices in $I$ resulting in a total cost of at most $k^{\prime}$ . Let $J\subseteq V$ be the set of vertices such that for each $v\in J$ it holds that $v_{2}$ is eliminated before $v_{3}$ or $v_{3}$ is eliminated before $u_{2}$ for any $u\in N(v)$ by $\sigma$ . We will show that $J$ is a vertex cover of size at most $k$ in $G$ . To this end, we first provide a lower bound for the cost of eliminating any vertex, regardless of which vertices have been eliminated previously. Note that $v_{3}$ for any vertex $v\in V$ has two out-neighbors $v_{4}$ and $v_{5}$ in $S$ , $\deg(v)$ in-neighbors $\{w_{1}\mid w\in N(v)\}$ in $S$ , and one additional in-neighbor which is either $v_{2}$ if $v_{2}$ was not eliminated before or $v_{1}$ if $v_{2}$ was eliminated before. Hence, the cost for eliminating $v_{3}$ is at least $2\deg(v)+2$ . Moreover, the cost of eliminating $v_{2}$ for any vertex $v\in V$ is at least $\deg(v)+2$ as $v_{2}$ has the in-neighbor $v_{1}\in S$ , two out-neighbors $v_{4},v_{5}\in S$ , and for each $w\in N(v)$ at least one additional out-neighbor ( $w_{3}$ if $w_{3}$ was not eliminated before or $w_{4}$ and $w_{5}$ if $w_{3}$ was eliminated before). Summing these costs over all vertices (and again applying the handshake lemma) gives a lower bound of $6m+4n=k^{\prime}-k$ .

The next step is to prove that $J$ contains at most $k$ vertices. To this end, note that for each vertex $v\in J$ , the cost increases by at least one over the analyzed lower bound. If $v_{3}$ is eliminated after $v_{2}$ for some $v\in J$ , then the cost of eliminating $v_{2}$ increases by one as $v_{2}$ has the additional out-neighbor $v_{3}$ . If $v_{3}$ is eliminated before $u_{2}$ for some $v\in J$ and $u\in N(v)$ , then the cost of eliminating $u_{2}$ increases by one as the out-neighbor $v_{3}$ is replaced by the two out-neighbors $v_{4}$ and $v_{5}$ . This immediately implies that $|J|\leq k$ .

Finally, we show that $J$ is a vertex cover. Assume towards a contradiction that this is not the case. Then, there is some edge $\{u,v\}\in E$ with $u\notin J$ and $v\notin J$ . By definition of $J$ , it holds that $u_{3}$ is eliminated before $u_{2}$ , $u_{3}$ is eliminated after $v_{2}$ , $v_{3}$ is eliminated before $v_{2}$ , and $v_{3}$ is eliminated after $u_{2}$ by $\sigma$ . Note that this implies that $\sigma$ eliminates $u_{3}$ before $u_{2}$ , before $v_{3}$ , before $v_{2}$ , before $u_{3}$ , a contradiction. Thus, $J$ is a vertex cover of size at most $k$ and the initial instance of Vertex Cover is therefore a yes-instance. This concludes the proof. ∎

4 Minimum Edge Count is NP-complete

In this section we show that Minimum Edge Count is NP-complete and, assuming the ETH, it cannot be solved in subexponential time. To this end, we reduce from Independent Set, which is defined as follows.

Theorem 2.

Minimum Edge Count is NP-complete. Assuming the ETH, it cannot be solved in $2^{o(n+m)}$ time.

Proof.

We again start by showing containment in NP. We can encode a (not necessarily total) elimination sequence in polynomial space. Moreover, given such a sequence, we can compute the resulting DAG in polynomial time and verify that it contains at most $k$ edges.

To show hardness, we reduce from Independent Set in 2-degenerate subcubic graphs of minimum girth five, that is, graphs in which each vertex has between two and three neighbors and each cycle has length at least five. This problem is known to be NP-hard and cannot be solved in $2^{o(n+m)}$ time assuming the ETH [15]. We will provide a linear reduction from that problem to Minimum Edge Count to show the theorem. To this end, let $(G=(V=\{v_{1},v_{2},\ldots,v_{n}\},E),k)$ be an instance of Independent Set where each vertex has between two and three neighbors and no cycles of length three or four are present in $G$ . We will construct an instance ${(D=(S\uplus I,A),k^{\prime})}$ of Minimum Edge Count. We begin by imposing an arbitrary total order $\pi$ on the vertex set $V$ . We partition the vertices into four types based on their degree and the order $\pi$ as follows. Vertices of type 1 have degree 3 and either all neighbors come earlier with respect to $\pi$ or all neighbors come later. Vertices of type 2 have degree 3 and at least one earlier and at least one later neighbor with respect to $\pi$ . Vertices of type 3 have degree 2 and either both neighbors come earlier or both neighbors come later with respect to $\pi$ . Finally, vertices of type 4 have degree 2 and one of the neighbors comes earlier with respect to $\pi$ while the other neighbor comes later.

We now describe the construction of $D$ , depicted in Figure 2. We begin by creating a set $T\subseteq S$ of $4$ sink vertices. Next, for each $v_{i}\in V$ , we create a vertex $u_{i}\in I$ as well as two vertex sets $I_{i}\subseteq S$ and $O_{i}\subseteq S$ both of size $4$ . We add arcs from each vertex in $I_{i}$ to every vertex in $\{u_{i}\}\cup T$ to $A$ . We also add arcs from $u_{i}$ to each vertex in $O_{i}\cup T$ to $A$ . Finally, we add arcs from $I_{i}$ to $O_{i}$ based on the type of $v_{i}$ as follows. Note that since $|I_{i}|=|O_{i}|=4$ , there are up to $16$ possible arcs from vertices in $I_{i}$ to vertices in $O_{i}$ . For type-1 vertices, we add 14 of the possible arcs (it does not matter which arcs we add). For vertices of type 2,3, and 4, we add $16,11,$ and $12$ arcs to $A$ , respectively. After completing this procedure for every vertex in $V$ , we add an arc $(u_{i},u_{j})$ for any edge $\{v_{i},v_{j}\}\in E$ where $v_{i}$ comes before $v_{j}$ in the order $\pi$ . This concludes the construction of the graph $D$ . Let $n^{\prime}$ and $m^{\prime}$ be the number of vertices and arcs in $D$ . To conclude the construction, we set $k^{\prime}=m^{\prime}-k$ . Observe that $D$ is a DAG and the number of vertices and arcs is linear in $n+m$ .

Figure 2: An example instance of independent set on the left and the constructed instance on the right. We assume the order

\pi

to be

v_{1}

first, then

v_{2}

and

v_{3}

last. Each large node

I_{i},O_{i}

and

T

(shaded gray) represents an independent set of size

4

. A bold arc between two nodes represents all possible arcs between the respective vertex sets (in one direction) unless a number is shown next to the arc. In this case, the number represents the number of arcs between the two sets of vertices.

Since the reduction can be computed in polynomial time and the constructed instance has linear size in the input instance, it only remains to show correctness. To this end, first assume that there exists an independent set $X\subseteq V$ of size $k$ in $G$ . We eliminate the vertices $X^{\prime}=\{u_{i}\mid v_{i}\in X\}$ . Note that Proposition 1 ensures that the order in which we eliminate the vertices does not matter. We will show that the resulting graph contains at most $k^{\prime}$ arcs.

First, note that new arcs between vertices $u_{i}$ and $u_{j}$ might be created while eliminating vertices in $X^{\prime}$ . However, since $X$ is an independent set, such arcs are only created between vertices where $N(v_{i})\cap N(v_{j})\neq\emptyset$ . Since $G$ has girth at least 5, if the elimination of a vertex in $X^{\prime}$ could create the arc $(u_{i},u_{j})$ , then this arc was not there initially and was not added by the elimination of a different vertex in $X^{\prime}$ . We next show that the elimination of each vertex in $X^{\prime}$ reduces the number of arcs in the graph by exactly one. Let $u_{i}\in X^{\prime}$ be an arbitrary vertex. If $v_{i}$ has degree three, then $u_{i}$ has exactly $15$ incident arcs, where 12 are to or from $I_{i},O_{o}$ and $T$ and three are to vertices $u_{j_{1}},u_{j_{2}},u_{j_{3}}$ . The number of new arcs created in this case is $14$ as shown next. For each $\ell\in[3]$ , four new arcs are created from vertices in $I_{i}$ to $u_{j_{\ell}}$ if $v_{i}$ comes before $v_{j_{\ell}}$ with respect to $\pi$ and four new arcs from $u_{j_{\ell}}$ to vertices in $O_{i}$ are created otherwise. Hence, in any case, 12 new arcs are created. If $v_{i}$ is of type 1, then 2 arcs are created from vertices in $I_{i}$ to vertices in $O_{i}$ . If $v_{i}$ is of type 2, then two additional arcs are created between the vertices $u_{j_{1}},u_{j_{2}}$ , and $u_{j_{3}}$ . Hence, if $v_{i}$ has degree 3, then the elimination of vertex $u_{i}$ removes $15$ arcs and creates 14 new ones, that is, the number of arcs decreases by one. If $v_{i}$ has degree 2, then $u_{i}$ is incident to exactly $14$ arcs (12 to and from vertices in $I_{i},O_{i}$ , and $T$ and two additional arcs to or from vertices $u_{j_{1}}$ and $u_{j_{2}}$ ). The number of new arcs created in this case is $13$ as shown next. For each $\ell\in[2]$ , four new arcs are created from vertices in $I_{i}$ to $u_{j_{\ell}}$ if $v_{i}$ comes before $v_{j_{\ell}}$ with respect to $\pi$ and four new arcs from $u_{j_{\ell}}$ to vertices in $O_{i}$ are created otherwise. In any case, 8 new arcs are created this way. If $v_{i}$ is of type 3, then five arcs are created from vertices in $I_{i}$ to vertices in $O_{i}$ . If $v_{i}$ is of type 4, then four additional arcs are created from vertices in $I_{i}$ to vertices in $O_{i}$ and one additional arc is added between $u_{j_{1}}$ and $u_{j_{2}}$ . Hence, if $v_{i}$ has degree 2, then the elimination of vertex $u_{i}$ removes $14$ arcs and creates 13 new ones, that is, the number of arcs also decreases by one in this case. Since $k^{\prime}=m^{\prime}-k$ and each of the $k$ removals decreases the number of arcs by one, the resulting graph contains at most $k^{\prime}$ arcs, showing that the constructed instance is a yes-instance.

For the other direction, assume that $X^{\prime}$ is a solution to $(D,k^{\prime})$ , that is, $D_{X^{\prime}}$ has at most $k^{\prime}$ arcs. Further, assume that $X^{\prime}$ is minimal in the sense that, for each $u\in X^{\prime}$ , $D_{X^{\prime}\setminus\{u\}}$ contains more arcs than $D_{X^{\prime}}$ . Note that this notion is well-defined due to Proposition 1. Moreover, given any solution, a minimal one can be computed in polynomial time. Let $X=\{v_{i}\mid u_{i}\in X^{\prime}\}$ . We will show that $X$ induces an independent set in $G$ and that $|X|\geq k$ , that is, $X$ is an independent set of size at least $k$ in $G$ and the original instance is therefore a yes-instance.

Assume toward a contradiction that $X$ is not an independent set in $G$ , that is, there exist vertices $u_{i},u_{j}\in X^{\prime}$ such that $\{v_{i},v_{j}\}\in E$ . We claim that $X^{\prime}$ is not minimal in this case. To prove this claim, note that eliminating a vertex $u_{i}$ does not decrease the in-degree or out-degree of any vertex $u_{j}$ (at any stage during a elimination sequence) and if $\{v_{i},v_{j}\}\in E$ , then one of the degrees of $u_{j}$ increases. If $u_{i}$ is neither an in-neighbor nor an out-neighbor of $u_{j}$ , then eliminating $u_{i}$ does not change either degree of $u_{j}$ . If $u_{i}$ is an in-neighbor, then the out-degree of $u_{j}$ remains unchanged and the in-degree increases as the vertices in $I_{i}$ become in-neighbors of $u_{j}$ (and they cannot be in-neighbors of $u_{j}$ while $u_{i}$ is not eliminated). If $u_{i}$ is an out-neighbor of $u_{j}$ , then the in-degree of $u_{j}$ remains unchanged and the out-degree increases as the vertices in $O_{i}$ become new out-neighbors of $u_{j}$ . Let $d$ be the number of vertices $w$ such that $(w,u_{j})$ or $(u_{j},w)$ is an arc in $D_{X^{\prime}\setminus\{u_{j}\}}$ and $w\notin I_{j}\cup O_{j}\cup T$ . Note that $d>\deg(v_{j})$ since $u_{i}\in X^{\prime}\setminus\{u_{j}\}$ and $\{v_{i},v_{j}\}\in E$ . Eliminating $u_{j}$ in $D_{X^{\prime}\setminus\{u_{j}\}}$ removes $d+12$ arcs. If $v_{j}$ has degree 3, then eliminating $u_{j}$ in $D_{X^{\prime}\setminus\{u_{j}\}}$ creates at least $4d=d+3d\geq d+12$ arcs since (i) $d>\deg(v_{j})=3$ implies $d\geq 4$ and (ii) eliminating $u_{j}$ creates four arcs between vertices in $I_{j}\cup O_{j}$ and each other (in- or out-)neighbor of $u_{j}$ except for vertices in $T$ . If $v_{j}$ has degree 2, then eliminating $u_{j}$ creates at least four arcs from vertices in $I_{j}$ to vertices in $O_{j}$ plus at least $4d$ arcs, that is, at least $4d+4=d+3d+4\geq d+13>d+12$ arcs since $d>\deg(v_{j})=2$ implies $d\geq 3$ . Hence, in any case the number of newly created arcs is at least as large as the number of removed arcs. That is, the number of arcs does not decrease, showing that $X^{\prime}$ is not a minimal solution.

It only remains to show that $|X|\geq k$ . As analyzed in the forward direction, the elimination of any vertex $u_{i}$ reduces the number of arcs by exactly one if no vertex $u_{j}$ with $\{v_{i},v_{j}\}\in E$ was eliminated before. Since $k^{\prime}=m-k$ , this shows that $|X|\geq k$ , concluding the proof. ∎

5 Algorithms

In this section, we give two simple algorithms that show that Structural Optimal Jacobian Accumulation and Minimum Edge Count can be solved in $O^{*}(2^{n})$ time. We begin with Minimum Edge Count.

Proposition 3.

Minimum Edge Count can be solved in $O(2^{n}n^{3})$ time and with polynomial space.

Proof.

By Proposition 1, the order in which vertices in an optimal solution are eliminated is irrelevant. Hence, we can simply test for each subset $X$ of vertices, how many arcs remain if the vertices in $X$ are eliminated. Since there are $2^{n}$ possible subsets and each of the at most $n$ eliminations for each subset can be computed in $O(n^{2})$ time, all subsets can be tested in $O(2^{n}n^{3})$ time. ∎

We continue with Structural Optimal Jacobian Accumulation, where we use an algorithmic framework due to Bodlaender et al. [3].

Proposition 4.

Structural Optimal Jacobian Accumulation can be solved in $O(2^{n}n^{4})$ time. It can also be solved in $O(4^{n}n^{3})$ time using polynomial space.

Proof.

As shown by Bodlaender et al. [3], any vertex ordering problem can be solved in $O(2^{n}n^{c+1})$ time and in $O(4^{n}n^{c})$ time using polynomial space if it can be reformulated as $\min_{\pi}\sum_{v\in V}f(D,\pi_{<v},v)$ , where $\pi$ is a permutation of the vertices, $\pi_{<v}$ is the set of all vertices that appear before $v$ in $\pi$ , and $f$ can be computed in $O(n^{c})$ time. We show that Structural Optimal Jacobian Accumulation fits into this framework (with $c=3$ ). We only consider vertices in $I$ , that is, non-terminal vertices as these are all the vertices that should be eliminated. We use the function

f(D,\pi_{<v},v)=|N^{-}_{{D_{\pi_{<v}}}}({v})|\cdot|N^{+}_{{D_{\pi_{<v}}}}({v})|.

Note that we can compute $D_{\pi_{<v}}$ (and therefore $f$ ) in $O(n^{3})$ time. Note that given a permutation $\pi$ , the cost of eliminating all vertices in $I$ exactly corresponds to $\sum_{v\in V}f(D,\pi_{<v},v)$ as the cost of eliminating a vertex $v$ in a solution sequence following $\pi$ is exactly ${|N^{-}_{{D_{\pi_{<v}}}}({v})|\cdot|N^{+}_{{D_{\pi_{<v}}}}({v})|=f(D,\pi_{<v},v)}.$ This concludes the proof. ∎

6 Conclusion

We have resolved a pair of longstanding open questions by showing that Structural Optimal Jacobian Accumulation and Minimum Edge Count are both NP-complete. Our progress opens the door to many interesting questions. On the theoretical side, a key next step is to understand the complexities of both problems under the more expressive edge elimination operation [18]. There are also promising opportunities to develop approximation algorithms and/or establish lower bounds.

Acknowledgments

The authors would like to sincerely thank Paul Hovland for drawing their attention to the studied problems at Dagstuhl Seminar 24201, for insightful discussions, and for generously reviewing a preliminary version of this manuscript, providing valuable feedback and comments.

MB was supported by the European Research Council (ERC) project LOPRE (819416) under the Horizon 2020 research and innovation program. AC, YM, and BDS were partially supported by the Gordon & Betty Moore Foundation under grant GBMF4560 to BDS.

References

[1] S. G. Aksoy, R. Bennink, Y. Chen, J. Frías, Y. R. Gel, B. Kay, U. Naumann, C. O. Marrero, A. V. Petyuk, S. Roy, I. Segovia-Dominguez, N. Veldt, and S. J. Young. Seven open problems in applied combinatorics. Journal of Combinatorics, 14(4):559–601, 2023.
[2] F. L. Bauer. Computational graphs and rounding error. SIAM Journal on Numerical Analysis, 11(1):87–96, 1974.
[3] H. L. Bodlaender, F. V. Fomin, A. M. C. A. Koster, D. Kratsch, and D. M. Thilikos. A note on exact algorithms for vertex ordering problems on graphs. Theory of Computing Systems, 50(3):420–432, 2012.
[4] J. Chen, P. Hovland, T. Munson, and J. Utke. An integer programming approach to optimal derivative accumulation. In Proceedings of the 6th International Conference on Automatic Differentiation, pages 221–231, Berlin Heidelberg, 2012. Springer.
[5] R. Diestel. Graph Theory. Springer, Berlin Heidelberg, 2012.
[6] S. A. Forth, M. Tadjouddine, J. D. Pryce, and J. K. Reid. Jacobian code generated by source transformation and vertex elimination can be as efficient as hand-coding. ACM Transactions on Mathematical Software, 30(3):266–299, 2004.
[7] M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, USA, 1979.
[8] A. Griewank. Some bounds on the complexity of gradients, Jacobians, and Hessians. In Complexity in numerical optimization, pages 128–162. World Scientific, Berlin Heidelberg, 1993.
[9] A. Griewank and U. Naumann. Accumulating Jacobians as chained sparse matrix products. Mathematical Programming, 95:555–571, 2003.
[10] A. Griewank and O. Vogel. Analysis and exploitation of Jacobian scarcity. In Proceedings of the 2nd International Conference on High Performance Scientific Computing, pages 149–164, Berlin Heidelberg, 2003. Springer.
[11] A. Griewank and A. Walther. Evaluating derivatives: principles and techniques of algorithmic differentiation. SIAM, Philadelphia, 2008.
[12] R. Impagliazzo and R. Paturi. On the complexity of $k$ -Sat. Journal on Computer and Syststem Sciences, 62(2):367–375, 2001.
[13] R. Impagliazzo, R. Paturi, and F. Zane. Which problems have strongly exponential complexity? Journal of Computer and System Sciences, 63(4):512–530, 2001.
[14] R. M. Karp. Reducibility among Combinatorial Problems, pages 85–103. Springer US, Boston, MA, 1972.
[15] C. Komusiewicz. Tight running time lower bounds for vertex deletion problems. ACM Transactions on Computation Theory, 10(2):6:1–6:18, 2018.
[16] A. Lyons and J. Utke. On the practical exploitation of scarsity. In Proceedings of the 5th International Conference on Automatic Differentiation, pages 103–114, Berlin Heidelberg, 2008. Springer.
[17] V. Mosenkis and U. Naumann. On optimality preserving eliminations for the minimum edge count and optimal Jacobian accumulation problems in linearized DAGs. Optimization Methods and Software, 27(2):337–358, 2012.
[18] U. Naumann. Elimination Techniques for Cheap Jacobians, pages 247–253. Springer, Berlin Heidelberg, 2002.
[19] U. Naumann. Optimal Jacobian accumulation is NP-complete. Mathematical Programming, 112:427–441, 2008.
[20] U. Naumann. The art of differentiating computer programs: An introduction to algorithmic differentiation. SIAM, Philadelphia, 2011.
[21] J. D. Pryce and E. M. Tadjouddine. Fast automatic differentiation Jacobians by compact lu factorization. SIAM Journal on Scientific Computing, 30(4):1659–1677, 2008.
[22] A. Quarteroni, R. Sacco, and F. Saleri. Numerical mathematics. Springer Science & Business Media, Berlin Heidelberg, 2006.
[23] D. J. Rose and R. E. Tarjan. Algorithmic aspects of vertex elimination on directed graphs. SIAM Journal on Applied Mathematics, 34(1):176–197, 1978.

	$\displaystyle 2\mu_{D_{W_{i}\cup\{u\}}}(w_{i})$	$\displaystyle=2\big{(}\|N^{-}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|\cdot\|N^{+}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|\big{)}$
		$\displaystyle=2\big{(}(\|N^{-}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})\|+1)\cdot\|N^{+}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|\big{)}$
		$\displaystyle=(2\|N^{-}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})\|+2)\cdot\|N^{+}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|$
		$\displaystyle\geq(\|N^{-}_{{D_{W_{i}}}}({w_{i}})\|+\|N^{-}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})\|)\cdot\|N^{+}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|$
		$\displaystyle=\mu_{D_{W_{i}}}(w_{i})+\mu_{D_{W_{i}\cup\{u,v\}}}(w_{i}).$

	$\displaystyle 2\mu_{D_{W_{i}\cup\{u\}}}(w_{i})$	$\displaystyle=2\big{(}\|N^{-}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|\cdot\|N^{+}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|\big{)}$
		$\displaystyle=2\big{(}\|N^{-}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|\cdot(\|N^{+}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})\|+1)\big{)}$
		$\displaystyle=\|N^{-}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|\cdot(2\|N^{+}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})\|+2)$
		$\displaystyle\geq\|N^{-}_{{D_{W_{i}\cup\{u\}}}}({w_{i}})\|\cdot(\|N^{+}_{{D_{W_{i}}}}({w_{i}})\|+\|N^{+}_{{D_{W_{i}\cup\{u,v\}}}}({w_{i}})\|)$
		$\displaystyle=\mu_{D_{W_{i}}}(w_{i})+\mu_{D_{W_{i}\cup\{u,v\}}}(w_{i}).$