Differential Invariants

Uwe Naumann Informatik 12: Software and Tools for Computational Engineering, RWTH Aachen University, Germany. naumann@stce.rwth-aachen.de

Abstract

Validation is a major challenge in differentiable programming. The state of the art is based on algorithmic differentiation. Consistency of first-order tangent and adjoint programs is defined by a well-known first-order differential invariant. This paper generalizes the approach through derivation of corresponding differential invariants of arbitrary order.

keywords:

differentiable programming, algorithmic differentiation, validation

1 Introduction and State of the Art

We consider implementations of multivariate vector functions

(1)

F:{I\!\!R}^{n}\rightarrow{I\!\!R}^{m}:{\bf x}\mapsto{\bf y}=F({\bf x})

over the real (floating-point) numbers ${I\!\!R}$ as sufficiently often continuously differentiable computer programs. Distinctly named variables (e.g, ${\bf x}$ and ${\bf y}$ ) are assumed to be unaliased¹¹1They occupy disjoint system memory locations. in the given implementation. We refer to $F$ as the primal program. Its $\nu$ -th derivative (tensor) is denoted as

(2)

F^{[\nu]}=F^{[\nu]}({\bf x})\equiv\frac{d^{\nu}F}{d{\bf x}^{\nu}}({\bf x})\in{I\!\!R}^{m\times n^{\nu}}\;,

where $n^{\nu}\equiv\overset{\nu~{}\text{times}}{\overbrace{n\times\ldots\times n}}.$ We set $F^{\prime}\equiv F^{[1]}$ (Jacobian) and $F^{\prime\prime}\equiv F^{[2]}$ (Hessian). Vectors are printed in bold font. Upper case letters denote other matrices and tensors. We use $=$ to denote mathematical equality and $\equiv$ in the sense of “is defined as.”

Index notation $F^{[\nu]}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}$ is used for operations on derivative tensors with $k=1,\ldots,m$ and $j_{i}=1,\ldots,n$ for $i=1,\ldots,\nu.$ Products with vectors ${\bf u}={\bf u}_{j_{i}}\equiv(u_{s})\in{I\!\!R}^{n}$ are defined as

(3)

F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot{\bf u}_{j_{i}}\equiv\sum_{s=1}^{n}F^{[\nu]}_{k,j_{1},\ldots,j_{i-1},s,j_{i+1},\ldots,j_{\nu}}\cdot u_{s}\;.

Similarly, products with vectors ${\bf w}={\bf w}_{k}\equiv(w_{s})\in{I\!\!R}^{m}$ are defined as

(4)

F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot{\bf w}_{k}\equiv\sum_{s=1}^{m}F^{[\nu]}_{s,j_{1},\ldots,j_{\nu}}\cdot w_{s}\;.

The following rather obvious observation will be used in upcoming proofs.

Lemma 1.1.

For $F$ as in (1) with derivatives as in (2)

1.

$F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot{\bf u}_{j_{i_{1}}}\cdot{\bf v}_{j_{i_{2}}}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot{\bf v}_{j_{i_{2}}}\cdot{\bf u}_{j_{i_{1}}}$ for ${\bf u},{\bf v}\in{I\!\!R}^{n},$ and $i_{1},i_{2}\in\{1,\ldots,\nu\},$ $i_{1}\neq i_{2};$
2.

$F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot{\bf w}_{k}\cdot{\bf u}_{j_{i}}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot{\bf u}_{j_{i}}\cdot{\bf w}_{k}$ for ${\bf w}\in{I\!\!R}^{m}$ , ${\bf u}\in{I\!\!R}^{n}$ and $i\in\{1,\ldots,\nu\}.$

Proof 1.2.

The lemma follows immediately from the definition of index notation as in (3) and (4) by exploitation of commutativity of scalar addition and multiplication.

Let $1\leq i_{1}<i_{2}\leq\nu$ while ${\bf u}={\bf u}_{j_{i_{1}}}\equiv(u_{s_{1}})\in{I\!\!R}^{n}$ and ${\bf v}={\bf v}_{j_{i_{2}}}\equiv(v_{s_{2}})\in{I\!\!R}^{n}$ . Then

	$\displaystyle F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}$	$\displaystyle\cdot{\bf u}_{j_{i_{1}}}\cdot{\bf v}_{j_{i_{2}}}$
		$\displaystyle=\sum_{s_{2}=1}^{n}\sum_{s_{1}=1}^{n}F^{[\nu]}_{k,j_{1},\ldots,j_{i_{1}-1},s_{1},j_{i_{1}+1},\ldots,j_{i_{2}-1},s_{2},j_{i_{2}+1},\ldots,j_{\nu}}\cdot u_{s_{1}}\cdot v_{s_{2}}$
		$\displaystyle=\sum_{s_{1}=1}^{n}\sum_{s_{2}=1}^{n}F^{[\nu]}_{k,j_{1},\ldots,j_{i_{1}-1},s_{1},j_{i_{1}+1},\ldots,j_{i_{2}-1},s_{2},j_{i_{2}+1},\ldots,j_{\nu}}\cdot v_{s_{2}}\cdot u_{s_{1}}$
		$\displaystyle=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot{\bf v}_{j_{i_{2}}}\cdot{\bf u}_{j_{i_{1}}}\;.$

An analogous argument holds for $i_{1}>i_{2}.$

Similarly, for ${\bf w}={\bf w}_{k}\equiv(w_{s_{1}})\in{I\!\!R}^{m},$ ${\bf u}={\bf u}_{j_{i}}\equiv(u_{s_{2}})\in{I\!\!R}^{n}$ and $1\leq i\leq\nu$

	$\displaystyle F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}$	$\displaystyle\cdot{\bf w}_{k}\cdot{\bf u}_{j_{i}}$
		$\displaystyle=\sum_{s_{2}=1}^{n}\sum_{s_{1}=1}^{m}F^{[\nu]}_{s_{1},j_{1},\ldots,j_{i-1},s_{2},j_{i+1},\ldots,j_{\nu}}\cdot w_{s_{1}}\cdot u_{s_{2}}$
		$\displaystyle=\sum_{s_{1}=1}^{m}\sum_{s_{2}=1}^{n}F^{[\nu]}_{s_{1},j_{1},\ldots,j_{i-1},s_{2},j_{i+1},\ldots,j_{\nu}}\cdot u_{s_{2}}\cdot w_{s_{1}}$
		$\displaystyle=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot{\bf u}_{j_{i}}\cdot{\bf w}_{k}\;.$

Continuous differentiability up to the required order $\nu$ implies partial symmetry of derivative tensors as

F^{[\nu]}_{k,\pi_{1}(j_{1},\ldots,j_{\nu})}=F^{[\nu]}_{k,\pi_{2}(j_{1},\ldots,j_{\nu})}

for arbitrary permutations $\pi_{1}$ and $\pi_{2}$ of $j_{1},\ldots,j_{\nu}.$

Algorithmic differentiation (AD) [6, 10] of the primal program $F$ with respect to (wrt.) ${\bf x}$ in tangent (also: forward) mode yields the tangent program

F^{(1)}:{I\!\!R}^{n}\times{I\!\!R}^{n}\rightarrow{I\!\!R}^{m}:\quad({\bf x},{\bf x}^{(1)})\mapsto{\bf y}^{(1)}=F^{(1)}({\bf x},{\bf x}^{(1)})\;,

which computes ${\bf y}^{(1)}={\bf y}^{(1)}_{k}\in{I\!\!R}^{m}$ in a given tangent direction ${\bf x}^{(1)}={\bf x}^{(1)}_{j_{1}}\in{I\!\!R}^{n}$ as

(5)

{\bf y}^{(1)}_{k}\equiv\left[\frac{dF}{d{\bf x}}\right]_{k,j_{1}}\cdot{\bf x}^{(1)}_{j_{1}}=F^{\prime}_{k,j_{1}}\cdot{\bf x}^{(1)}_{j_{1}}\;.

Tangents of primal variables are marked by the superscript $\!{}^{(1)}.$ Tensors are enclosed in square brackets whenever appropriate for clarification of index notation.

AD of the primal program $F$ wrt. ${\bf x}$ in adjoint (also: reverse) mode yields the adjoint program

F_{(1)}:{I\!\!R}^{n}\times{I\!\!R}^{m}\rightarrow{I\!\!R}^{n}:\quad({\bf x},{\bf y}_{(1)})\mapsto{\bf x}^{(1)}=F_{(1)}({\bf x},{\bf y}_{(1)})\;,

which computes ${\bf x}_{(1)}={\bf x}_{(1)_{j_{1}}}\in{I\!\!R}^{n}$ in a given adjoint direction ${\bf y}_{(1)}={\bf y}_{(1)_{k}}\in{I\!\!R}^{m}$ as

(6)

{\bf x}_{(1)_{j_{1}}}\equiv\left[\frac{dF}{d{\bf x}}\right]_{k,j_{1}}\cdot{{\bf y}_{(1)}}_{k}=F^{\prime}_{k,j_{1}}\cdot{{\bf y}_{(1)}}_{k}\;.

Adjoints of primal variables are marked by the subscript $\!{}_{(1)}.$

AD and adjoint methods in particular play a central role in modern simulation and data science. Key applications include computational fluid dynamics [13], quantitative finance [4] and machine learning [12]. A growing number of AD software tools have been developed. They support a variety of programming languages. Coverage includes C/C++ [5, 9], Fortran [7, 11] and Matlab [1, 3]. The high level of activity in AD research and development is also documented by so far seven international conferences on the subject with associated proceedings / special post-conference collections; see, for example, [2]. Refer to the AD community’s web portal www.autodiff.org for further information on research groups, software tools and applications. The web presence includes a comprehensive bibliography on the subject.

The following well-known first-order differential invariant follows immediately from (5) and (6).

Theorem 1.3.

For a primal program $F$ as in (1) with tangent and adjoint programs evaluating (5) and (6) the following first-order differential invariant holds:

(7)

{\bf x}_{(1)_{j_{1}}}\cdot{\bf x}^{(1)}_{j_{1}}={{\bf y}_{(1)}}_{k}\cdot{\bf y}^{(1)}_{k}\;.

Proof 1.4.

{\bf x}_{(1)_{j_{1}}}\cdot{\bf x}^{(1)}_{j_{1}}=F^{\prime}_{k,j_{1}}\cdot{{\bf y}_{(1)}}_{k}\cdot{\bf x}^{(1)}_{j_{1}}=_{\text{\sc Lemma}~{}\ref{lem1}}F^{\prime}_{k,j_{1}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{{\bf y}_{(1)}}_{k}={{\bf y}_{(1)}}_{k}\cdot{\bf y}^{(1)}_{k}\;.

Theorem 1.3 can be used to verify the consistency of tangent and adjoint programs for given ${\bf x},$ ${\bf x}^{(1)}$ and ${\bf y}_{(1)}.$ Shared conceptual errors are not detected, for example, if the derivative of $\sqrt{x},$ $x>0,$ is incorrectly assumed to be equal to $-\frac{1}{2\cdot\sqrt{x}}.$ (This mistake was made by the author during the implementation of an early prototype of the AD software dco/c++ … [8].) To address this issue the tangent can be approximated by a finite difference quotient. Consistency of finite differences, tangents and adjoints increases the likelihood of correctness of the derivative code. Special care must be taken to control numerical errors inflicted by finite difference approximation.

2 Second-Order Differential Invariants

Let us generalize the observations from the previous section for second derivative programs.

2.1 Second Derivative Programs

For primal programs as in (1) second derivative programs are obtained by differentiation of the first-order tangent or adjoint target programs with respect to ${\bf x}$ in tangent or adjoint mode. Four variants of second derivative programs can be generated.

Lemma 2.1.

AD of the tangent program

F^{(1)}({\bf x},{\bf x}^{(1)})\equiv F^{\prime}_{k,j_{1}}({\bf x})\cdot{\bf x}^{(1)}_{j_{1}}={\bf y}_{k}^{(1)}

wrt. ${\bf x}\in{I\!\!R}^{n}$ in the tangent direction ${\bf x}^{(2)}\in{I\!\!R}^{n}$ yields the tangent of tangent program

F^{(1,2)}:{I\!\!R}^{n}\times{I\!\!R}^{n}\times{I\!\!R}^{n}\rightarrow{I\!\!R}^{m}:\quad({\bf x},{\bf x}^{(1)},{\bf x}^{(2)})\mapsto{\bf y}_{k}^{(1,2)}=F^{(1,2)}({\bf x},{\bf x}^{(1)},{\bf x}^{(2)})

for computing

(8)

{\bf y}_{k}^{(1,2)}=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf x}^{(2)}_{j_{2}}\;.

Tangents of variables used in the target program (here: the tangent program) are marked by the superscript $\!{}^{(2)}.$ Chained superscripts are combined as $\!{}^{(1,2)}\equiv{\!{}^{(1)}}^{(2)}.$

Proof 2.2.

	$\displaystyle{\bf y}_{k}^{(1,2)}$	$\displaystyle=\left[\frac{dF^{(1)}({\bf x},{\bf x}^{(1)})}{d{\bf x}}\right]_{k,j_{2}}\cdot{\bf x}^{(2)}_{j_{2}}=\left[\frac{dF^{\prime}_{k,j_{1}}}{d{\bf x}}\right]_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf x}^{(2)}_{j_{2}}$
		$\displaystyle=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf x}^{(2)}_{j_{2}}\;.$

Lemma 2.3.

AD of the tangent program

F^{(1)}({\bf x},{\bf x}^{(1)})\equiv F^{\prime}_{k,j_{1}}({\bf x})\cdot{\bf x}^{(1)}_{j_{1}}={\bf y}_{k}^{(1)}

wrt. ${\bf x}\in{I\!\!R}^{n}$ in the adjoint direction ${\bf y}^{(1)}_{(2)}\in{I\!\!R}^{m}$ yields the adjoint of tangent program

F^{(1)}_{(2)}:{I\!\!R}^{n}\times{I\!\!R}^{n}\times{I\!\!R}^{m}\rightarrow{I\!\!R}^{n}:\quad({\bf x},{\bf x}^{(1)},{\bf y}^{(1)}_{(2)})\mapsto{\bf x}_{(2)_{j_{2}}}=F^{(1)}_{(2)}({\bf x},{\bf x}^{(1)},{\bf y}^{(1)}_{(2)})

for computing

(9)

{\bf x}_{(2)_{j_{2}}}=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf y}^{(1)}_{(2)_{k}}\;.

Adjoints of variables used in the target (tangent) program carry the subscript $\!{}_{(2)}.$

Proof 2.4.

	$\displaystyle{\bf x}_{(2)_{j_{2}}}$	$\displaystyle=\left[\frac{dF^{(1)}({\bf x},{\bf x}^{(1)})}{d{\bf x}}\right]_{k,j_{2}}\cdot{\bf y}^{(1)}_{(2)_{k}}=\left[\frac{dF^{\prime}_{k,j_{1}}}{d{\bf x}}\right]_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf y}^{(1)}_{(2)_{k}}$
		$\displaystyle=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf y}^{(1)}_{(2)_{k}}\;.$

Lemma 2.5.

AD of the adjoint program

F_{(1)}({\bf x},{\bf y}_{(1)})\equiv F^{\prime}_{k,j_{1}}({\bf x})\cdot{\bf y}_{(1)_{k}}={\bf x}_{(1)_{j_{1}}}

wrt. ${\bf x}\in{I\!\!R}^{n}$ in the tangent direction ${\bf x}^{(2)}\in{I\!\!R}^{n}$ yields the tangent of adjoint program

F^{(2)}_{(1)}:{I\!\!R}^{n}\times{I\!\!R}^{m}\times{I\!\!R}^{n}\rightarrow{I\!\!R}^{n}:\quad({\bf x},{\bf y}_{(1)},{\bf x}^{(2)})\mapsto{\bf x}^{(2)}_{(1)_{j_{1}}}=F^{(2)}_{(1)}({\bf x},{\bf y}_{(1)},{\bf x}^{(2)})

for computing

(10)

{\bf x}^{(2)}_{(1)_{j_{1}}}=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\;.

Proof 2.6.

	$\displaystyle{\bf x}^{(2)}_{(1)_{j_{1}}}$	$\displaystyle=\left[\frac{dF_{(1)}({\bf x},{\bf y}_{(1)})}{d{\bf x}}\right]_{j_{1},j_{2}}\cdot{\bf x}^{(2)}_{j_{2}}=\left[\frac{dF^{\prime}_{k,j_{1}}}{d{\bf x}}\right]_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}$
		$\displaystyle=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\;.$

Lemma 2.7.

AD of the adjoint program

F_{(1)}({\bf x},{\bf y}_{(1)})\equiv F^{\prime}_{k,j_{1}}({\bf x})\cdot{\bf y}_{(1)_{k}}={\bf x}_{(1)_{j_{1}}}

wrt. ${\bf x}\in{I\!\!R}^{n}$ in the adjoint direction ${\bf x}_{(1,2)}\in{I\!\!R}^{n}$ yields the adjoint of adjoint program

F_{(1,2)}:{I\!\!R}^{n}\times{I\!\!R}^{m}\times{I\!\!R}^{n}\rightarrow{I\!\!R}^{n}:\quad({\bf x},{\bf y}_{(1)},{\bf x}_{(1,2)})\mapsto{\bf x}_{(2)_{j_{2}}}=F_{(1,2)}({\bf x},{\bf y}_{(1)},{\bf x}_{(1,2)})

for computing

(11)

{\bf x}_{(2)_{j_{2}}}=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}_{(1,2)_{j_{1}}}\;.

Adjoints of variables used in the target (adjoint) program are marked by the subscript $\!{}_{(2)}$ as before. Chained subscripts are combined as $\!{}_{(1,2)}\equiv{\!{}_{(1)_{(2)}}}.$

Proof 2.8.

	$\displaystyle{\bf x}_{(2)_{j_{2}}}$	$\displaystyle=\left[\frac{dF_{(1)}({\bf x},{\bf y}_{(1)})}{d{\bf x}}\right]_{j_{1},j_{2}}\cdot{\bf x}_{(1,2)_{j_{1}}}=\left[\frac{dF^{\prime}_{k,j_{1}}}{d{\bf x}}\right]_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}_{(1,2)_{j_{1}}}$
		$\displaystyle=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}_{(1,2)_{j_{1}}}\;.$

2.2 Differential Invariants

Theorem 2.9.

For $F$ as in (1), $F^{(1,2)}$ as in (8) and $F^{(1)}_{(2)}$ as in (9)

{\bf x}_{(2)_{j_{2}}}\cdot{\bf x}^{(2)}_{j_{2}}={\bf y}^{(1,2)}_{k}\cdot{{\bf y}^{(1)}_{(2)}}_{k}\;.

Proof 2.10.

The theorem follows immediately from (8) and (9) as

	$\displaystyle{{\bf x}_{(2)}}_{j_{2}}\cdot{\bf x}^{(2)}_{j_{2}}$	$\displaystyle=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{{\bf y}_{(2)}^{(1)}}_{k}\cdot{\bf x}^{(2)}_{j_{2}}$
		$\displaystyle=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{{\bf y}_{(2)}^{(1)}}_{k}={\bf y}^{(1,2)}_{k}\cdot{{\bf y}^{(1)}_{(2)}}_{k}\;.$

Theorem 2.9 can be used to verify the consistency of tangent of tangent and adjoint of tangent programs for given ${\bf x},$ ${\bf x}^{(1)},$ ${\bf x}^{(2)}$ and ${\bf y}^{(1)}_{(2)}.$ Tangents can be approximated by finite differences. Potentially serious numerical errors should be expected from second-order finite difference approximation. Careful tuning of perturbations is crucial. High-precision floating-point arithmetic should be considered.

Theorem 2.11.

For $F$ as in (1), $F^{(1,2)}$ as in (10) and $F^{(1)}_{(2)}$ as in (11)

{\bf x}_{(2)_{j_{2}}}\cdot{\bf x}^{(2)}_{j_{2}}={\bf x}^{(2)}_{(1)_{j_{1}}}\cdot{{\bf x}_{(1,2)}}_{j_{1}}\;.

Proof 2.12.

The theorem follows immediately from (10) and (11) as

	$\displaystyle{\bf x}_{(2)_{j_{2}}}\cdot{\bf x}^{(2)}_{j_{2}}$	$\displaystyle=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}_{(1,2)_{j_{1}}}\cdot{\bf x}^{(2)}_{j_{2}}$
		$\displaystyle=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf x}_{(1,2)_{j_{1}}}={\bf x}^{(2)}_{(1)_{j_{1}}}\cdot{\bf x}_{(1,2)_{j_{1}}}\;.$

Theorem 2.11 can be used to verify the consistency of tangent of adjoint and adjoint of adjoint programs for given ${\bf x},$ ${\bf y}_{(1)},$ ${\bf x}^{(2)}$ and ${\bf x}_{(1,2)}.$ Tangents can be approximated by finite differences.

3 Higher-Order Differential Invariants

Derivative programs of order $\nu$ have the form

{\bf v}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot V\;,

where ${\bf v}$ is an indexed tangent or adjoint of ${\bf x}$ or ${\bf y}$ and $V$ denotes a chained (outer) product of indexed tangents or adjoints of ${\bf x}$ or ${\bf y}.$ For example, $\nu=2$ , ${\bf v}={\bf x}^{(2)}_{(1)_{j_{1}}}$ and $V={\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}$ in a tangent of adjoint program. In the following sub- and superscripts of ${\bf v}$ will be appended to the (possibly empty) chains of sub- and superscripts in the expression represented by ${\bf v}.$ For example,

{\bf v}^{(2)}=\begin{cases}{\bf y}^{(1,2)}_{k}&\text{if}~{}{\bf v}={\bf y}^{(1)}_{k}\\ {\bf x}^{(2)}_{(1)_{j_{1}}}&\text{if}~{}{\bf v}={\bf x}_{(1)_{j_{1}}}\end{cases}\quad\text{and}\quad{\bf v}_{(2)}=\begin{cases}{\bf y}^{(1)}_{(2)_{k}}&\text{if}~{}{\bf v}={\bf y}^{(1)}_{k}\\ {\bf x}_{(1,2)_{j_{1}}}&\text{if}~{}{\bf v}={\bf x}_{(1)_{j_{1}}}\end{cases}\;.

3.1 Higher Derivative Programs

Theorem 3.1.

Let ${\bf y}=F({\bf x})$ be defined by (1) with first-order tangent and adjoint programs defined by (5) and (6). AD of a $(\nu-1)$ -th derivative program

{\bf v}=F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}\cdot V

in tangent mode yields the $\nu$ -th derivative program

{\bf v}^{(\nu)}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot V\cdot{\bf x}^{(\nu)}_{j_{\nu}}

for $\nu\geq 2.$

Proof 3.2.

The proof is by induction over the order $\nu$ of differentiation.

$\nu=2:$ Let the first derivative program be the

–

tangent program, that is, ${\bf v}={\bf y}^{(1)}_{k},$ $F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}=F^{[1]}_{k,j_{1}}$ and $V={\bf x}^{(1)}_{j_{1}}.$ AD wrt. ${\bf x}$ in tangent mode yields

	$\displaystyle{\bf v}^{(2)}={\bf y}^{(1,2)}_{k}$	$\displaystyle=\left[\frac{d(F^{[1]}_{k,j_{1}}\cdot{\bf x}^{(1)}_{j_{1}})}{d{\bf x}}\right]_{k,j_{2}}\cdot{\bf x}^{(2)}_{j_{2}}$
		$\displaystyle=F^{[2]}_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf x}^{(2)}_{j_{2}}=F^{[2]}_{k,j_{1},j_{2}}\cdot V\cdot{\bf x}^{(2)}_{j_{2}}$

due to independence of $V={\bf x}^{(1)}_{j_{1}}$ from ${\bf x}.$

–

adjoint program, that is, ${\bf v}={\bf x}_{(1)_{j_{1}}}$ $F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}=F^{[1]}_{k,j_{1}}$ and $V={\bf y}_{(1)_{k}}.$ AD wrt. ${\bf x}$ in tangent mode yields

	$\displaystyle{\bf v}^{(2)}={\bf x}_{(1)_{j_{1}}}^{(2)}$	$\displaystyle=\left[\frac{d(F^{[1]}_{k,j_{1}}\cdot{\bf y}_{(1)_{k}})}{d{\bf x}}\right]_{j_{1},j_{2}}\cdot{\bf x}^{(2)}_{j_{2}}$
		$\displaystyle=F^{[2]}_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}=F^{[2]}_{k,j_{1},j_{2}}\cdot V\cdot{\bf x}^{(2)}_{j_{2}}$

due to independence of $V={\bf y}_{(1)_{k}}$ from ${\bf x}.$

$\nu-1\Rightarrow\nu:$ AD of ${\bf v}=F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}\cdot V$ wrt. ${\bf x}$ in tangent mode yields

{\bf v}^{(\nu)}=\left[\frac{d(F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}\cdot V)}{d{\bf x}}\right]_{k,j_{\nu}}\cdot{\bf x}^{(\nu)}_{j_{\nu}}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot V\cdot{\bf x}^{(\nu)}_{j_{\nu}}

due to independence of $V$ from ${\bf x}.$

Theorem 3.3.

Let ${\bf y}=F({\bf x})$ be defined by (1) with tangent and adjoint programs defined by (5) and (6). AD of a $(\nu-1)$ -th derivative program

{\bf v}=F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}\cdot V

in adjoint mode yields the $\nu$ -th derivative program

{\bf x}_{(\nu)_{j_{\nu}}}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot V\cdot{\bf v}_{(\nu)}

for $\nu\geq 2.$

Proof 3.4.

The proof is by induction over the order $\nu$ of differentiation.

$\nu=2:$ Let the first derivative program be the

–

tangent program, that is, ${\bf v}={\bf y}^{(1)}_{k},$ $F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}=F^{[1]}_{k,j_{1}}$ and $V={\bf x}^{(1)}_{j_{1}}.$ AD wrt. ${\bf x}$ in adjoint mode yields

	$\displaystyle{\bf x}_{(2)_{j_{2}}}$	$\displaystyle=\left[\frac{d(F^{[1]}_{k,j_{1}}\cdot{\bf x}^{(1)}_{j_{1}})}{d{\bf x}}\right]_{k,j_{2}}\cdot{\bf y}^{(1)}_{(2)_{k}}=F^{[2]}_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf y}^{(1)}_{(2)_{k}}$
		$\displaystyle=F^{[2]}_{k,j_{1},j_{2}}\cdot V\cdot{\bf v}_{(2)}$

due to independence of $V={\bf x}^{(1)}_{j_{1}}$ from ${\bf x}.$

–

adjoint program, that is, ${\bf v}={\bf x}_{(1)_{j_{1}}}$ $F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}=F^{[1]}_{k,j_{1}}$ and $V={\bf y}_{(1)_{k}}.$ AD wrt. ${\bf x}$ in adjoint mode yields

	$\displaystyle{\bf x}_{(2)_{j_{2}}}$	$\displaystyle=\left[\frac{d(F^{[1]}_{k,j_{1}}\cdot{\bf y}_{(1)_{k}})}{d{\bf x}}\right]_{j_{1},j_{2}}\cdot{\bf x}_{(1,2)_{j_{1}}}=F^{[2]}_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}_{(1,2)_{j_{1}}}$
		$\displaystyle=F^{[2]}_{k,j_{1},j_{2}}\cdot V\cdot{\bf v}_{(2)}$

due to independence of $V={\bf y}_{(1)_{k}}$ from ${\bf x}.$

$\nu-1\Rightarrow\nu:$ AD of ${\bf v}=F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}\cdot V$ wrt. ${\bf x}$ in adjoint mode yields

{\bf x}_{(\nu)_{j_{\nu}}}=\left[\frac{d(F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}\cdot V)}{d{\bf x}}\right]_{i,j_{\nu}}\cdot{\bf v}_{(\nu)}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot V\cdot{\bf v}_{(\nu)}

due to independence of $V$ from ${\bf x}$ and with the vector ${\bf v}$ as the result of the $(\nu-1)$ -th derivative program being indexed by $i.$

Examples

A third derivative program is derived in tangent of adjoint of tangent mode recursively as follows:

$\nu=1:$ ${\bf v}={\bf y}_{k}$ , $V=1,$ tangent mode $\Rightarrow$ ${\bf y}^{(1)}_{k}=F^{[1]}_{k,j_{1}}\cdot{\bf x}^{(1)}_{j_{1}};$
$\nu=2:$ ${\bf v}={\bf y}^{(1)}_{k}$ , $V={\bf x}^{(1)}_{j_{1}},$ adjoint mode $\Rightarrow$ ${\bf x}_{(2)_{j_{2}}}=F^{[2]}_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf y}^{(1)}_{(2)_{k}};$ Note that in this case $i=k$ with reference to the proof of Theorem 3.3.

$\nu=3:$ ${\bf v}={\bf x}_{(2)_{j_{2}}}$ , $V={\bf x}^{(1)}_{j_{1}}\cdot{\bf y}^{(1)}_{(2)_{k}}$ , tangent mode $\Rightarrow$

{\bf x}_{(2)_{j_{2}}}^{(3)}=F^{[3]}_{k,j_{1},j_{2},j3}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf y}^{(1)}_{(2)_{k}}\cdot{\bf x}^{(3)}_{j_{3}}\;.

Eight third derivative programs can be generated by application of tangent mode

	$\displaystyle\text{to (\ref{eqn:TT})}\Rightarrow\quad$	$\displaystyle{\bf y}_{k}^{(1,2,3)}$	$\displaystyle=F^{[3]}_{k,j_{1},j_{2},j_{3}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf x}^{(3)}_{j_{3}}$
	$\displaystyle\text{to (\ref{eqn:AT})}\Rightarrow$	$\displaystyle{\bf x}^{(3)}_{(2)_{j_{2}}}$	$\displaystyle=F^{[3]}_{k,j_{1},j_{2},j_{3}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf y}^{(1)}_{(2)_{k}}\cdot{\bf x}^{(3)}_{j_{3}}$
	$\displaystyle\text{to (\ref{eqn:TA})}\Rightarrow$	$\displaystyle{\bf x}^{(2,3)}_{(1)_{j_{1}}}$	$\displaystyle=F^{[3]}_{k,j_{1},j_{2},j_{3}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf x}^{(3)}_{j_{3}}$
	$\displaystyle\text{to (\ref{eqn:AA})}\Rightarrow$	$\displaystyle{\bf x}^{(3)}_{(2)_{j_{2}}}$	$\displaystyle=F^{[3]}_{k,j_{1},j_{2},j_{3}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}_{(1,2)_{j_{1}}}\cdot{\bf x}^{(3)}_{j_{3}}$
and of adjoint mode
	$\displaystyle\text{to (\ref{eqn:TT})}\Rightarrow$	$\displaystyle{\bf x}_{(3)_{j_{3}}}$	$\displaystyle=F^{[3]}_{k,j_{1},j_{2},j_{3}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf y}_{(3)_{k}}^{(1,2)}$
	$\displaystyle\text{to (\ref{eqn:AT})}\Rightarrow$	$\displaystyle{\bf x}_{(3)_{j_{3}}}$	$\displaystyle=F^{[3]}_{k,j_{1},j_{2},j_{3}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf y}^{(1)}_{(2)_{k}}\cdot{\bf x}_{(2,3)_{j_{2}}}$
	$\displaystyle\text{to (\ref{eqn:TA})}\Rightarrow$	$\displaystyle{\bf x}_{(3)_{j_{3}}}$	$\displaystyle=F^{[3]}_{k,j_{1},j_{2},j_{3}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf x}^{(2)}_{(1,3)_{j_{1}}}$
	$\displaystyle\text{to (\ref{eqn:AA})}\Rightarrow$	$\displaystyle{\bf x}_{(3)_{j_{3}}}$	$\displaystyle=F^{[3]}_{k,j_{1},j_{2},j_{3}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}_{(1,2)_{j_{1}}}\cdot{\bf x}_{(2,3)_{j_{2}}}\;.$

A fifth derivative program is derived in tangent of adjoint of adjoint of tangent of adjoint mode as follows:

$\nu=1:$ ${\bf v}={\bf y}_{k}$ , $V=1$ , adjoint mode $\Rightarrow$ ${\bf x}_{(1)_{j_{1}}}=F^{[1]}_{k,j_{1}}\cdot{\bf y}_{(1)_{k}}$
$\nu=2:$ ${\bf v}={\bf x}_{(1)_{j_{1}}}$ , $V={\bf y}_{(1)_{k}}$ , tangent mode $\Rightarrow$ ${\bf x}_{(1)_{j_{1}}}^{(2)}=F^{[2]}_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}$

$\nu=3:$ ${\bf v}={\bf x}_{(1)_{j_{1}}}^{(2)}$ , $V={\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}$ , adjoint mode $\Rightarrow$

{\bf x}_{(3)_{j_{3}}}=F^{[3]}_{k,j_{1},j_{2},j_{3}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf x}_{(1,3)_{j_{1}}}^{(2)}

$\nu=4:$ ${\bf v}={\bf x}_{(3)_{j_{3}}}$ , $V={\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf x}_{(1,3)_{j_{1}}}^{(2)}$ , adjoint mode $\Rightarrow$

{\bf x}_{(4)_{j_{4}}}=F^{[4]}_{k,j_{1},j_{2},j_{3},j_{4}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf x}_{(1,3)_{j_{1}}}^{(2)}\cdot{\bf x}_{(3,4)_{j_{3}}}

$\nu=5:$ ${\bf v}={\bf x}_{(4)_{j_{4}}}$ , $V={\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf x}_{(1,3)_{j_{1}}}^{(2)}\cdot{\bf x}_{(3,4)_{j_{3}}}$ , tangent mode $\Rightarrow$

{\bf x}_{(4)_{j_{4}}}^{(5)}=F^{[5]}_{k,j_{1},j_{2},j_{3},j_{4},j_{5}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf x}_{(1,3)_{j_{1}}}^{(2)}\cdot{\bf x}_{(3,4)_{j_{3}}}\cdot{\bf x}^{(5)}_{j_{5}}.

3.2 Differential Invariants

Theorem 3.5.

For $\nu$ -th derivative programs as in Theorems 3.1 and 3.3

{\bf x}_{(\nu)_{j_{\nu}}}\cdot{\bf x}^{(\nu)}_{j_{\nu}}={\bf v}_{(\nu)}\cdot{\bf v}^{(\nu)}\;.

Proof 3.6.

Let ${\bf v}=F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu}-1}\cdot V.$ With

{\bf x}_{(\nu)_{j_{\nu}}}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot V\cdot{\bf v}_{(\nu)}\;\;\text{and}\;\;{\bf v}^{(\nu)}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot V\cdot{\bf x}^{(\nu)}_{j_{\nu}}

it follows that

	$\displaystyle{\bf x}_{(\nu)_{j_{\nu}}}\cdot{\bf x}^{(\nu)}_{j_{\nu}}$	$\displaystyle=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot V\cdot{\bf v}_{(\nu)}\cdot{\bf x}^{(\nu)}_{j_{\nu}}$
		$\displaystyle=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot V\cdot{\bf x}^{(\nu)}_{j_{\nu}}\cdot{\bf v}_{(\nu)}={\bf v}^{(\nu)}\cdot{\bf v}_{(\nu)}={\bf v}_{(\nu)}\cdot{\bf v}^{(\nu)}$

Note that the $2^{\nu-1}$ derivative programs of order $\nu-1$ yield $\nu$ distinct ${\bf v}.$ Hence, $\nu$ differential invariants can be derived. For example,

${\bf v}={\bf y}^{(1,2)}_{k}\Rightarrow{\bf x}_{(3)_{j_{3}}}\cdot{\bf x}^{(3)}_{j_{3}}={\bf y}^{(1,2)}_{(3)_{k}}\cdot{\bf y}^{(1,2,3)}_{k}$
${\bf v}={\bf x}_{(2)_{j_{2}}}\Rightarrow{\bf x}_{(3)_{j_{3}}}\cdot{\bf x}^{(3)}_{j_{3}}={\bf x}_{(2,3)_{j_{2}}}\cdot{\bf x}^{(3)}_{(2)_{j_{2}}}$
${\bf v}={\bf x}^{(2)}_{(1)_{j_{1}}}\Rightarrow{\bf x}_{(3)_{j_{3}}}\cdot{\bf x}^{(3)}_{j_{3}}={\bf x}^{(2)}_{(1)_{j_{1}}}\cdot{\bf x}^{(2,3)}_{(1)_{j_{1}}}$

for $\nu=3.$ One out of six sixth-order differential invariants over the $2^{6}=64$ different sixth derivative programs is ${\bf x}_{(6)_{j_{6}}}\cdot{\bf x}^{(6)}_{j_{6}}={\bf x}^{(5)}_{(4,6)_{j_{4}}}\cdot{\bf x}^{(5,6)}_{(4)_{j_{4}}}.$

4 Discussion

Any AD tool capable of generating derivative programs of arbitrary order can be used to implement the validation of differential invariants. Source code transformation tools such as Tapenade [7] need to be applicable to their own output. While this is typically straight forward for tangent of $\ldots$ of tangent programs the repeated application of adjoint mode may cause difficulties due to technical details specific to the given AD tool.

Overloading tools such as dco/c++ [8] need to allow for recursive instantiation of their derivative types with derivative types of lower order; dco/c++ supports this feature through nested C++ templates. Derivative programs of arbitrary order can be generated by arbitrary nesting of tangent and adjoint types. From a practical perspective this level of flexibility may not be crucial. Higher-order adjoint programs can always be generated as tangent of $\ldots$ of tangent of adjoint programs provided continuous differentiability of the primal program up to the required order.

Differential invariants can be used as a debugging criterion for derivative code generated by AD. The primal program $F$ evaluates a partially ordered sequence of differentiable elemental functions $\Phi_{s}=\Phi_{s}({\bf v}_{r})_{r\prec s}:{I\!\!R}^{n_{s}}\rightarrow{I\!\!R}^{m_{s}}$ as a single assignment code²²2Each variable is assumed to be written once.

{\bf v}_{s}\small\mathrel{{:}{=}}\Phi_{s}({\bf v}_{r})_{r\prec s}\quad\text{for}~{}s\small\mathrel{{:}{=}}1,\ldots,q

and where, adopting the notation from [6], $r\prec s$ if and only if ${\bf v}_{r}$ is an argument of $\Phi_{s}.$ We use $\small\mathrel{{:}{=}}$ to denote assignment as defined by imperative programming languages.

AD of $F$ results in the augmentation of the latter with code for computing tangents or/and adjoints. For example, AD of the single assignment code in tangent mode yields the tangent single assignment code

\left.\begin{split}{\bf w}\equiv{\bf v}_{s}&\small\mathrel{{:}{=}}\Phi_{s}({\bf v}_{r})_{r\prec s}\equiv\Phi_{s}({\bf u})\\ {\bf w}^{(1)}_{k}&\small\mathrel{{:}{=}}\left[\Phi^{\prime}_{s}\right]_{k,j_{1}}\cdot{\bf u}^{(1)}_{j_{1}}\\ \end{split}\quad\right\}\quad\text{for}~{}s=1,\ldots,q\;.

By the chain rule of differentiation the resulting tangent program computes

(12)

{\bf y}=F({\bf x});\quad{\bf y}^{(1)}_{k}=F^{\prime}_{k,j_{1}}\cdot{\bf x}^{(1)}_{j_{1}}\;.

Given values for the inputs ${\bf x}$ and ${\bf x}^{(1)}$ yield values for both outputs ${\bf y}$ and ${\bf y}^{(1)}.$ Obviously, (5) is contained within (12). The adjoint single assignment code can be derived analogously.

Stepping forward through a single assignment code with support for the propagation of both tangents and adjoints enables debugging of derivative code as follows: Initialization of ${\bf x}^{(1)}$ (for example randomly) in addition to ${\bf x}$ yields ${\bf v}^{(1)}_{s}$ for $s=1,\ldots,q.$ For each (or selected) $s$ the initialization of ${\bf v}_{s_{(1)}}$ followed by backward propagation of adjoints yields ${\bf x}_{(1)}.$ Consistency of tangents and adjoints up to the current $s$ can be validated by checking the differential invariant ${\bf x}_{(1)_{j_{1}}}\cdot{\bf x}^{(1)}_{j_{1}}=[{\bf v}_{s_{(1)}}]_{k}\cdot[{\bf v}^{(1)}_{s}]_{k}.$ Additional evidence for the desirable correctness of the adjoint program can be obtained by approximating the tangents by finite differences.

Most established AD software tools support user-defined elemental functions. Their built-in elemental functions can typically be expected to be correct leaving user intervention as the most likely source of errors. The sketched debugging algorithm enables the localization and subsequent correction of potential errors.

The formalism extends seamlessly to higher derivative programs. Implementation in the context of AD software yields a number of technical challenges the discussion of which is beyond the scope of this paper.

5 Conclusion

AD as a form of differentiable programming has become an indispensable ingredient of state of the art numerical methods. Software tools for AD provide valuable support for the (semi-)automatic generation of derivative programs. Validation of correctness and debugging of such programs poses a serious challenge. The work presented in this paper aims to set the mathematical stage for the development of corresponding methods and for their highly desirable implementation.

References

[1] H. Bischof, M. Bücker, B. Lang, A. Rasch, and A. Vehreschild. Combining source transformation and operator overloading techniques to compute derivatives for MATLAB programs. In Proceedings of the Second IEEE International Workshop on Source Code Analysis and Manipulation, pages 65–72.
[2] B. Christianson, S. Forth, and A. Griewank, editors. Special issue of Optimization Methods & Software: Advances in Algorithmic Differentiation, 2018.
[3] T. Coleman and W. Xu. Automatic Differentiation in MATLAB Using ADMAT with Applications. Number 27 in Software, Environments, and Tools. SIAM, Philadelphia, PA, 2016.
[4] M. Giles and P. Glasserman. Smoking adjoints: Fast Monte Carlo greeks. Risk, 19:88–92, 2006.
[5] A. Griewank, D. Juedes, and J. Utke. Algorithm 755: ADOL-C: A package for the automatic differentiation of algorithms written in C/C++. ACM Transactions on Mathematical Software, 22(2):131–167, 1996.
[6] A. Griewank and A. Walther. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Number 105 in Other Titles in Applied Mathematics. SIAM, Philadelphia, PA, 2nd edition, 2008.
[7] L. Hascoët and V. Pascual. The Tapenade automatic differentiation tool: Principles, model, and specification. ACM Transactions on Mathematical Software, 39(3):20:1–20:43, 2013.
[8] K. Leppkes, J. Lotz, and U. Naumann. dco/c++: Derivative Code by Overloading in C++. Technical Report TR2/20, Numerical Algorithms Group Ltd., 2020.
[9] N. Gauger M. Sagebaum, T. Albring. High-performance derivative computations using CoDiPack. ACM Transactions on Mathematical Software, 45(4):1–26, 2019.
[10] U. Naumann. The Art of Differentiating Computer Programs: An Introduction to Algorithmic Differentiation. Number 24 in Software, Environments, and Tools. SIAM, Philadelphia, PA, 2012.
[11] U. Naumann and J. Riehme. A differentiation-enabled Fortran 95 compiler. ACM Transactions on Mathematical Software, 31(4):458–474, December 2005.
[12] D. Rumelhart, G. Hinton, and R. Williams. Learning representations by back-propagating errors. Nature, (323):533–536, 1986.
[13] M. Towara and U. Naumann. A discrete adjoint model for OpenFOAM. Procedia Computer Science, 18:429–438, 2013.