This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Differential Invariants

Uwe Naumann Informatik 12: Software and Tools for Computational Engineering, RWTH Aachen University, Germany. naumann@stce.rwth-aachen.de
Abstract

Validation is a major challenge in differentiable programming. The state of the art is based on algorithmic differentiation. Consistency of first-order tangent and adjoint programs is defined by a well-known first-order differential invariant. This paper generalizes the approach through derivation of corresponding differential invariants of arbitrary order.

keywords:
differentiable programming, algorithmic differentiation, validation

1 Introduction and State of the Art

We consider implementations of multivariate vector functions

(1) F:IRnIRm:𝐱𝐲=F(𝐱)F:{I\!\!R}^{n}\rightarrow{I\!\!R}^{m}:{\bf x}\mapsto{\bf y}=F({\bf x})

over the real (floating-point) numbers IR{I\!\!R} as sufficiently often continuously differentiable computer programs. Distinctly named variables (e.g, 𝐱{\bf x} and 𝐲{\bf y}) are assumed to be unaliased111They occupy disjoint system memory locations. in the given implementation. We refer to FF as the primal program. Its ν\nu-th derivative (tensor) is denoted as

(2) F[ν]=F[ν](𝐱)dνFd𝐱ν(𝐱)IRm×nν,F^{[\nu]}=F^{[\nu]}({\bf x})\equiv\frac{d^{\nu}F}{d{\bf x}^{\nu}}({\bf x})\in{I\!\!R}^{m\times n^{\nu}}\;,

where nνn××nνtimes.n^{\nu}\equiv\overset{\nu~{}\text{times}}{\overbrace{n\times\ldots\times n}}. We set FF[1]F^{\prime}\equiv F^{[1]} (Jacobian) and F′′F[2]F^{\prime\prime}\equiv F^{[2]} (Hessian). Vectors are printed in bold font. Upper case letters denote other matrices and tensors. We use == to denote mathematical equality and \equiv in the sense of “is defined as.”

Index notation F[ν]=Fk,j1,,jν[ν]F^{[\nu]}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}} is used for operations on derivative tensors with k=1,,mk=1,\ldots,m and ji=1,,nj_{i}=1,\ldots,n for i=1,,ν.i=1,\ldots,\nu. Products with vectors 𝐮=𝐮ji(us)IRn{\bf u}={\bf u}_{j_{i}}\equiv(u_{s})\in{I\!\!R}^{n} are defined as

(3) Fk,j1,,jν[ν]𝐮jis=1nFk,j1,,ji1,s,ji+1,,jν[ν]us.F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot{\bf u}_{j_{i}}\equiv\sum_{s=1}^{n}F^{[\nu]}_{k,j_{1},\ldots,j_{i-1},s,j_{i+1},\ldots,j_{\nu}}\cdot u_{s}\;.

Similarly, products with vectors 𝐰=𝐰k(ws)IRm{\bf w}={\bf w}_{k}\equiv(w_{s})\in{I\!\!R}^{m} are defined as

(4) Fk,j1,,jν[ν]𝐰ks=1mFs,j1,,jν[ν]ws.F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot{\bf w}_{k}\equiv\sum_{s=1}^{m}F^{[\nu]}_{s,j_{1},\ldots,j_{\nu}}\cdot w_{s}\;.

The following rather obvious observation will be used in upcoming proofs.

Lemma 1.1.

For FF as in (1) with derivatives as in (2)

  1. 1.

    Fk,j1,,jν[ν]𝐮ji1𝐯ji2=Fk,j1,,jν[ν]𝐯ji2𝐮ji1F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot{\bf u}_{j_{i_{1}}}\cdot{\bf v}_{j_{i_{2}}}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot{\bf v}_{j_{i_{2}}}\cdot{\bf u}_{j_{i_{1}}} for 𝐮,𝐯IRn,{\bf u},{\bf v}\in{I\!\!R}^{n}, and i1,i2{1,,ν},i_{1},i_{2}\in\{1,\ldots,\nu\}, i1i2;i_{1}\neq i_{2};

  2. 2.

    Fk,j1,,jν[ν]𝐰k𝐮ji=Fk,j1,,jν[ν]𝐮ji𝐰kF^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot{\bf w}_{k}\cdot{\bf u}_{j_{i}}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot{\bf u}_{j_{i}}\cdot{\bf w}_{k} for 𝐰IRm{\bf w}\in{I\!\!R}^{m}, 𝐮IRn{\bf u}\in{I\!\!R}^{n} and i{1,,ν}.i\in\{1,\ldots,\nu\}.

Proof 1.2.

The lemma follows immediately from the definition of index notation as in (3) and (4) by exploitation of commutativity of scalar addition and multiplication.

  1. 1.

    Let 1i1<i2ν1\leq i_{1}<i_{2}\leq\nu while 𝐮=𝐮ji1(us1)IRn{\bf u}={\bf u}_{j_{i_{1}}}\equiv(u_{s_{1}})\in{I\!\!R}^{n} and 𝐯=𝐯ji2(vs2)IRn{\bf v}={\bf v}_{j_{i_{2}}}\equiv(v_{s_{2}})\in{I\!\!R}^{n}. Then

    Fk,j1,,jν[ν]\displaystyle F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}} 𝐮ji1𝐯ji2\displaystyle\cdot{\bf u}_{j_{i_{1}}}\cdot{\bf v}_{j_{i_{2}}}
    =s2=1ns1=1nFk,j1,,ji11,s1,ji1+1,,ji21,s2,ji2+1,,jν[ν]us1vs2\displaystyle=\sum_{s_{2}=1}^{n}\sum_{s_{1}=1}^{n}F^{[\nu]}_{k,j_{1},\ldots,j_{i_{1}-1},s_{1},j_{i_{1}+1},\ldots,j_{i_{2}-1},s_{2},j_{i_{2}+1},\ldots,j_{\nu}}\cdot u_{s_{1}}\cdot v_{s_{2}}
    =s1=1ns2=1nFk,j1,,ji11,s1,ji1+1,,ji21,s2,ji2+1,,jν[ν]vs2us1\displaystyle=\sum_{s_{1}=1}^{n}\sum_{s_{2}=1}^{n}F^{[\nu]}_{k,j_{1},\ldots,j_{i_{1}-1},s_{1},j_{i_{1}+1},\ldots,j_{i_{2}-1},s_{2},j_{i_{2}+1},\ldots,j_{\nu}}\cdot v_{s_{2}}\cdot u_{s_{1}}
    =Fk,j1,,jν[ν]𝐯ji2𝐮ji1.\displaystyle=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot{\bf v}_{j_{i_{2}}}\cdot{\bf u}_{j_{i_{1}}}\;.

    An analogous argument holds for i1>i2.i_{1}>i_{2}.

  2. 2.

    Similarly, for 𝐰=𝐰k(ws1)IRm,{\bf w}={\bf w}_{k}\equiv(w_{s_{1}})\in{I\!\!R}^{m}, 𝐮=𝐮ji(us2)IRn{\bf u}={\bf u}_{j_{i}}\equiv(u_{s_{2}})\in{I\!\!R}^{n} and 1iν1\leq i\leq\nu

    Fk,j1,,jν[ν]\displaystyle F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}} 𝐰k𝐮ji\displaystyle\cdot{\bf w}_{k}\cdot{\bf u}_{j_{i}}
    =s2=1ns1=1mFs1,j1,,ji1,s2,ji+1,,jν[ν]ws1us2\displaystyle=\sum_{s_{2}=1}^{n}\sum_{s_{1}=1}^{m}F^{[\nu]}_{s_{1},j_{1},\ldots,j_{i-1},s_{2},j_{i+1},\ldots,j_{\nu}}\cdot w_{s_{1}}\cdot u_{s_{2}}
    =s1=1ms2=1nFs1,j1,,ji1,s2,ji+1,,jν[ν]us2ws1\displaystyle=\sum_{s_{1}=1}^{m}\sum_{s_{2}=1}^{n}F^{[\nu]}_{s_{1},j_{1},\ldots,j_{i-1},s_{2},j_{i+1},\ldots,j_{\nu}}\cdot u_{s_{2}}\cdot w_{s_{1}}
    =Fk,j1,,jν[ν]𝐮ji𝐰k.\displaystyle=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot{\bf u}_{j_{i}}\cdot{\bf w}_{k}\;.

Continuous differentiability up to the required order ν\nu implies partial symmetry of derivative tensors as

Fk,π1(j1,,jν)[ν]=Fk,π2(j1,,jν)[ν]F^{[\nu]}_{k,\pi_{1}(j_{1},\ldots,j_{\nu})}=F^{[\nu]}_{k,\pi_{2}(j_{1},\ldots,j_{\nu})}

for arbitrary permutations π1\pi_{1} and π2\pi_{2} of j1,,jν.j_{1},\ldots,j_{\nu}.

Algorithmic differentiation (AD) [6, 10] of the primal program FF with respect to (wrt.) 𝐱{\bf x} in tangent (also: forward) mode yields the tangent program

F(1):IRn×IRnIRm:(𝐱,𝐱(1))𝐲(1)=F(1)(𝐱,𝐱(1)),F^{(1)}:{I\!\!R}^{n}\times{I\!\!R}^{n}\rightarrow{I\!\!R}^{m}:\quad({\bf x},{\bf x}^{(1)})\mapsto{\bf y}^{(1)}=F^{(1)}({\bf x},{\bf x}^{(1)})\;,

which computes 𝐲(1)=𝐲k(1)IRm{\bf y}^{(1)}={\bf y}^{(1)}_{k}\in{I\!\!R}^{m} in a given tangent direction 𝐱(1)=𝐱j1(1)IRn{\bf x}^{(1)}={\bf x}^{(1)}_{j_{1}}\in{I\!\!R}^{n} as

(5) 𝐲k(1)[dFd𝐱]k,j1𝐱j1(1)=Fk,j1𝐱j1(1).{\bf y}^{(1)}_{k}\equiv\left[\frac{dF}{d{\bf x}}\right]_{k,j_{1}}\cdot{\bf x}^{(1)}_{j_{1}}=F^{\prime}_{k,j_{1}}\cdot{\bf x}^{(1)}_{j_{1}}\;.

Tangents of primal variables are marked by the superscript (1).\!{}^{(1)}. Tensors are enclosed in square brackets whenever appropriate for clarification of index notation.

AD of the primal program FF wrt. 𝐱{\bf x} in adjoint (also: reverse) mode yields the adjoint program

F(1):IRn×IRmIRn:(𝐱,𝐲(1))𝐱(1)=F(1)(𝐱,𝐲(1)),F_{(1)}:{I\!\!R}^{n}\times{I\!\!R}^{m}\rightarrow{I\!\!R}^{n}:\quad({\bf x},{\bf y}_{(1)})\mapsto{\bf x}^{(1)}=F_{(1)}({\bf x},{\bf y}_{(1)})\;,

which computes 𝐱(1)=𝐱(1)j1IRn{\bf x}_{(1)}={\bf x}_{(1)_{j_{1}}}\in{I\!\!R}^{n} in a given adjoint direction 𝐲(1)=𝐲(1)kIRm{\bf y}_{(1)}={\bf y}_{(1)_{k}}\in{I\!\!R}^{m} as

(6) 𝐱(1)j1[dFd𝐱]k,j1𝐲(1)k=Fk,j1𝐲(1)k.{\bf x}_{(1)_{j_{1}}}\equiv\left[\frac{dF}{d{\bf x}}\right]_{k,j_{1}}\cdot{{\bf y}_{(1)}}_{k}=F^{\prime}_{k,j_{1}}\cdot{{\bf y}_{(1)}}_{k}\;.

Adjoints of primal variables are marked by the subscript (1).\!{}_{(1)}.

AD and adjoint methods in particular play a central role in modern simulation and data science. Key applications include computational fluid dynamics [13], quantitative finance [4] and machine learning [12]. A growing number of AD software tools have been developed. They support a variety of programming languages. Coverage includes C/C++ [5, 9], Fortran [7, 11] and Matlab [1, 3]. The high level of activity in AD research and development is also documented by so far seven international conferences on the subject with associated proceedings / special post-conference collections; see, for example, [2]. Refer to the AD community’s web portal www.autodiff.org for further information on research groups, software tools and applications. The web presence includes a comprehensive bibliography on the subject.

The following well-known first-order differential invariant follows immediately from (5) and (6).

Theorem 1.3.

For a primal program FF as in (1) with tangent and adjoint programs evaluating (5) and (6) the following first-order differential invariant holds:

(7) 𝐱(1)j1𝐱j1(1)=𝐲(1)k𝐲k(1).{\bf x}_{(1)_{j_{1}}}\cdot{\bf x}^{(1)}_{j_{1}}={{\bf y}_{(1)}}_{k}\cdot{\bf y}^{(1)}_{k}\;.

Proof 1.4.
𝐱(1)j1𝐱j1(1)=Fk,j1𝐲(1)k𝐱j1(1)=Lemma1.1Fk,j1𝐱j1(1)𝐲(1)k=𝐲(1)k𝐲k(1).{\bf x}_{(1)_{j_{1}}}\cdot{\bf x}^{(1)}_{j_{1}}=F^{\prime}_{k,j_{1}}\cdot{{\bf y}_{(1)}}_{k}\cdot{\bf x}^{(1)}_{j_{1}}=_{\text{\sc Lemma}~{}\ref{lem1}}F^{\prime}_{k,j_{1}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{{\bf y}_{(1)}}_{k}={{\bf y}_{(1)}}_{k}\cdot{\bf y}^{(1)}_{k}\;.

Theorem 1.3 can be used to verify the consistency of tangent and adjoint programs for given 𝐱,{\bf x}, 𝐱(1){\bf x}^{(1)} and 𝐲(1).{\bf y}_{(1)}. Shared conceptual errors are not detected, for example, if the derivative of x,\sqrt{x}, x>0,x>0, is incorrectly assumed to be equal to 12x.-\frac{1}{2\cdot\sqrt{x}}. (This mistake was made by the author during the implementation of an early prototype of the AD software dco/c++ … [8].) To address this issue the tangent can be approximated by a finite difference quotient. Consistency of finite differences, tangents and adjoints increases the likelihood of correctness of the derivative code. Special care must be taken to control numerical errors inflicted by finite difference approximation.

2 Second-Order Differential Invariants

Let us generalize the observations from the previous section for second derivative programs.

2.1 Second Derivative Programs

For primal programs as in (1) second derivative programs are obtained by differentiation of the first-order tangent or adjoint target programs with respect to 𝐱{\bf x} in tangent or adjoint mode. Four variants of second derivative programs can be generated.

Lemma 2.1.

AD of the tangent program

F(1)(𝐱,𝐱(1))Fk,j1(𝐱)𝐱j1(1)=𝐲k(1)F^{(1)}({\bf x},{\bf x}^{(1)})\equiv F^{\prime}_{k,j_{1}}({\bf x})\cdot{\bf x}^{(1)}_{j_{1}}={\bf y}_{k}^{(1)}

wrt. 𝐱IRn{\bf x}\in{I\!\!R}^{n} in the tangent direction 𝐱(2)IRn{\bf x}^{(2)}\in{I\!\!R}^{n} yields the tangent of tangent program

F(1,2):IRn×IRn×IRnIRm:(𝐱,𝐱(1),𝐱(2))𝐲k(1,2)=F(1,2)(𝐱,𝐱(1),𝐱(2))F^{(1,2)}:{I\!\!R}^{n}\times{I\!\!R}^{n}\times{I\!\!R}^{n}\rightarrow{I\!\!R}^{m}:\quad({\bf x},{\bf x}^{(1)},{\bf x}^{(2)})\mapsto{\bf y}_{k}^{(1,2)}=F^{(1,2)}({\bf x},{\bf x}^{(1)},{\bf x}^{(2)})

for computing

(8) 𝐲k(1,2)=Fk,j1,j2′′𝐱j1(1)𝐱j2(2).{\bf y}_{k}^{(1,2)}=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf x}^{(2)}_{j_{2}}\;.

Tangents of variables used in the target program (here: the tangent program) are marked by the superscript (2).\!{}^{(2)}. Chained superscripts are combined as (1,2).(1)(2)\!{}^{(1,2)}\equiv{\!{}^{(1)}}^{(2)}.

Proof 2.2.
𝐲k(1,2)\displaystyle{\bf y}_{k}^{(1,2)} =[dF(1)(𝐱,𝐱(1))d𝐱]k,j2𝐱j2(2)=[dFk,j1d𝐱]k,j1,j2𝐱j1(1)𝐱j2(2)\displaystyle=\left[\frac{dF^{(1)}({\bf x},{\bf x}^{(1)})}{d{\bf x}}\right]_{k,j_{2}}\cdot{\bf x}^{(2)}_{j_{2}}=\left[\frac{dF^{\prime}_{k,j_{1}}}{d{\bf x}}\right]_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf x}^{(2)}_{j_{2}}
=Fk,j1,j2′′𝐱j1(1)𝐱j2(2).\displaystyle=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf x}^{(2)}_{j_{2}}\;.

Lemma 2.3.

AD of the tangent program

F(1)(𝐱,𝐱(1))Fk,j1(𝐱)𝐱j1(1)=𝐲k(1)F^{(1)}({\bf x},{\bf x}^{(1)})\equiv F^{\prime}_{k,j_{1}}({\bf x})\cdot{\bf x}^{(1)}_{j_{1}}={\bf y}_{k}^{(1)}

wrt. 𝐱IRn{\bf x}\in{I\!\!R}^{n} in the adjoint direction 𝐲(2)(1)IRm{\bf y}^{(1)}_{(2)}\in{I\!\!R}^{m} yields the adjoint of tangent program

F(2)(1):IRn×IRn×IRmIRn:(𝐱,𝐱(1),𝐲(2)(1))𝐱(2)j2=F(2)(1)(𝐱,𝐱(1),𝐲(2)(1))F^{(1)}_{(2)}:{I\!\!R}^{n}\times{I\!\!R}^{n}\times{I\!\!R}^{m}\rightarrow{I\!\!R}^{n}:\quad({\bf x},{\bf x}^{(1)},{\bf y}^{(1)}_{(2)})\mapsto{\bf x}_{(2)_{j_{2}}}=F^{(1)}_{(2)}({\bf x},{\bf x}^{(1)},{\bf y}^{(1)}_{(2)})

for computing

(9) 𝐱(2)j2=Fk,j1,j2′′𝐱j1(1)𝐲(2)k(1).{\bf x}_{(2)_{j_{2}}}=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf y}^{(1)}_{(2)_{k}}\;.

Adjoints of variables used in the target (tangent) program carry the subscript (2).\!{}_{(2)}.

Proof 2.4.
𝐱(2)j2\displaystyle{\bf x}_{(2)_{j_{2}}} =[dF(1)(𝐱,𝐱(1))d𝐱]k,j2𝐲(2)k(1)=[dFk,j1d𝐱]k,j1,j2𝐱j1(1)𝐲(2)k(1)\displaystyle=\left[\frac{dF^{(1)}({\bf x},{\bf x}^{(1)})}{d{\bf x}}\right]_{k,j_{2}}\cdot{\bf y}^{(1)}_{(2)_{k}}=\left[\frac{dF^{\prime}_{k,j_{1}}}{d{\bf x}}\right]_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf y}^{(1)}_{(2)_{k}}
=Fk,j1,j2′′𝐱j1(1)𝐲(2)k(1).\displaystyle=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf y}^{(1)}_{(2)_{k}}\;.

Lemma 2.5.

AD of the adjoint program

F(1)(𝐱,𝐲(1))Fk,j1(𝐱)𝐲(1)k=𝐱(1)j1F_{(1)}({\bf x},{\bf y}_{(1)})\equiv F^{\prime}_{k,j_{1}}({\bf x})\cdot{\bf y}_{(1)_{k}}={\bf x}_{(1)_{j_{1}}}

wrt. 𝐱IRn{\bf x}\in{I\!\!R}^{n} in the tangent direction 𝐱(2)IRn{\bf x}^{(2)}\in{I\!\!R}^{n} yields the tangent of adjoint program

F(1)(2):IRn×IRm×IRnIRn:(𝐱,𝐲(1),𝐱(2))𝐱(1)j1(2)=F(1)(2)(𝐱,𝐲(1),𝐱(2))F^{(2)}_{(1)}:{I\!\!R}^{n}\times{I\!\!R}^{m}\times{I\!\!R}^{n}\rightarrow{I\!\!R}^{n}:\quad({\bf x},{\bf y}_{(1)},{\bf x}^{(2)})\mapsto{\bf x}^{(2)}_{(1)_{j_{1}}}=F^{(2)}_{(1)}({\bf x},{\bf y}_{(1)},{\bf x}^{(2)})

for computing

(10) 𝐱(1)j1(2)=Fk,j1,j2′′𝐲(1)k𝐱j2(2).{\bf x}^{(2)}_{(1)_{j_{1}}}=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\;.

Proof 2.6.
𝐱(1)j1(2)\displaystyle{\bf x}^{(2)}_{(1)_{j_{1}}} =[dF(1)(𝐱,𝐲(1))d𝐱]j1,j2𝐱j2(2)=[dFk,j1d𝐱]k,j1,j2𝐲(1)k𝐱j2(2)\displaystyle=\left[\frac{dF_{(1)}({\bf x},{\bf y}_{(1)})}{d{\bf x}}\right]_{j_{1},j_{2}}\cdot{\bf x}^{(2)}_{j_{2}}=\left[\frac{dF^{\prime}_{k,j_{1}}}{d{\bf x}}\right]_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}
=Fk,j1,j2′′𝐲(1)k𝐱j2(2).\displaystyle=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\;.

Lemma 2.7.

AD of the adjoint program

F(1)(𝐱,𝐲(1))Fk,j1(𝐱)𝐲(1)k=𝐱(1)j1F_{(1)}({\bf x},{\bf y}_{(1)})\equiv F^{\prime}_{k,j_{1}}({\bf x})\cdot{\bf y}_{(1)_{k}}={\bf x}_{(1)_{j_{1}}}

wrt. 𝐱IRn{\bf x}\in{I\!\!R}^{n} in the adjoint direction 𝐱(1,2)IRn{\bf x}_{(1,2)}\in{I\!\!R}^{n} yields the adjoint of adjoint program

F(1,2):IRn×IRm×IRnIRn:(𝐱,𝐲(1),𝐱(1,2))𝐱(2)j2=F(1,2)(𝐱,𝐲(1),𝐱(1,2))F_{(1,2)}:{I\!\!R}^{n}\times{I\!\!R}^{m}\times{I\!\!R}^{n}\rightarrow{I\!\!R}^{n}:\quad({\bf x},{\bf y}_{(1)},{\bf x}_{(1,2)})\mapsto{\bf x}_{(2)_{j_{2}}}=F_{(1,2)}({\bf x},{\bf y}_{(1)},{\bf x}_{(1,2)})

for computing

(11) 𝐱(2)j2=Fk,j1,j2′′𝐲(1)k𝐱(1,2)j1.{\bf x}_{(2)_{j_{2}}}=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}_{(1,2)_{j_{1}}}\;.

Adjoints of variables used in the target (adjoint) program are marked by the subscript (2)\!{}_{(2)} as before. Chained subscripts are combined as (1,2).(1)(2)\!{}_{(1,2)}\equiv{\!{}_{(1)_{(2)}}}.

Proof 2.8.
𝐱(2)j2\displaystyle{\bf x}_{(2)_{j_{2}}} =[dF(1)(𝐱,𝐲(1))d𝐱]j1,j2𝐱(1,2)j1=[dFk,j1d𝐱]k,j1,j2𝐲(1)k𝐱(1,2)j1\displaystyle=\left[\frac{dF_{(1)}({\bf x},{\bf y}_{(1)})}{d{\bf x}}\right]_{j_{1},j_{2}}\cdot{\bf x}_{(1,2)_{j_{1}}}=\left[\frac{dF^{\prime}_{k,j_{1}}}{d{\bf x}}\right]_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}_{(1,2)_{j_{1}}}
=Fk,j1,j2′′𝐲(1)k𝐱(1,2)j1.\displaystyle=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}_{(1,2)_{j_{1}}}\;.

2.2 Differential Invariants

Theorem 2.9.

For FF as in (1), F(1,2)F^{(1,2)} as in (8) and F(2)(1)F^{(1)}_{(2)} as in (9)

𝐱(2)j2𝐱j2(2)=𝐲k(1,2)𝐲(2)(1)k.{\bf x}_{(2)_{j_{2}}}\cdot{\bf x}^{(2)}_{j_{2}}={\bf y}^{(1,2)}_{k}\cdot{{\bf y}^{(1)}_{(2)}}_{k}\;.

Proof 2.10.

The theorem follows immediately from (8) and (9) as

𝐱(2)j2𝐱j2(2)\displaystyle{{\bf x}_{(2)}}_{j_{2}}\cdot{\bf x}^{(2)}_{j_{2}} =Fk,j1,j2′′𝐱j1(1)𝐲(2)(1)k𝐱j2(2)\displaystyle=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{{\bf y}_{(2)}^{(1)}}_{k}\cdot{\bf x}^{(2)}_{j_{2}}
=Fk,j1,j2′′𝐱j1(1)𝐱j2(2)𝐲(2)(1)k=𝐲k(1,2)𝐲(2)(1)k.\displaystyle=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{{\bf y}_{(2)}^{(1)}}_{k}={\bf y}^{(1,2)}_{k}\cdot{{\bf y}^{(1)}_{(2)}}_{k}\;.

Theorem 2.9 can be used to verify the consistency of tangent of tangent and adjoint of tangent programs for given 𝐱,{\bf x}, 𝐱(1),{\bf x}^{(1)}, 𝐱(2){\bf x}^{(2)} and 𝐲(2)(1).{\bf y}^{(1)}_{(2)}. Tangents can be approximated by finite differences. Potentially serious numerical errors should be expected from second-order finite difference approximation. Careful tuning of perturbations is crucial. High-precision floating-point arithmetic should be considered.

Theorem 2.11.

For FF as in (1), F(1,2)F^{(1,2)} as in (10) and F(2)(1)F^{(1)}_{(2)} as in (11)

𝐱(2)j2𝐱j2(2)=𝐱(1)j1(2)𝐱(1,2)j1.{\bf x}_{(2)_{j_{2}}}\cdot{\bf x}^{(2)}_{j_{2}}={\bf x}^{(2)}_{(1)_{j_{1}}}\cdot{{\bf x}_{(1,2)}}_{j_{1}}\;.

Proof 2.12.

The theorem follows immediately from (10) and (11) as

𝐱(2)j2𝐱j2(2)\displaystyle{\bf x}_{(2)_{j_{2}}}\cdot{\bf x}^{(2)}_{j_{2}} =Fk,j1,j2′′𝐲(1)k𝐱(1,2)j1𝐱j2(2)\displaystyle=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}_{(1,2)_{j_{1}}}\cdot{\bf x}^{(2)}_{j_{2}}
=Fk,j1,j2′′𝐲(1)k𝐱j2(2)𝐱(1,2)j1=𝐱(1)j1(2)𝐱(1,2)j1.\displaystyle=F^{\prime\prime}_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf x}_{(1,2)_{j_{1}}}={\bf x}^{(2)}_{(1)_{j_{1}}}\cdot{\bf x}_{(1,2)_{j_{1}}}\;.

Theorem 2.11 can be used to verify the consistency of tangent of adjoint and adjoint of adjoint programs for given 𝐱,{\bf x}, 𝐲(1),{\bf y}_{(1)}, 𝐱(2){\bf x}^{(2)} and 𝐱(1,2).{\bf x}_{(1,2)}. Tangents can be approximated by finite differences.

3 Higher-Order Differential Invariants

Derivative programs of order ν\nu have the form

𝐯=Fk,j1,,jν[ν]V,{\bf v}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot V\;,

where 𝐯{\bf v} is an indexed tangent or adjoint of 𝐱{\bf x} or 𝐲{\bf y} and VV denotes a chained (outer) product of indexed tangents or adjoints of 𝐱{\bf x} or 𝐲.{\bf y}. For example, ν=2\nu=2, 𝐯=𝐱(1)j1(2){\bf v}={\bf x}^{(2)}_{(1)_{j_{1}}} and V=𝐲(1)k𝐱j2(2)V={\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}} in a tangent of adjoint program. In the following sub- and superscripts of 𝐯{\bf v} will be appended to the (possibly empty) chains of sub- and superscripts in the expression represented by 𝐯.{\bf v}. For example,

𝐯(2)={𝐲k(1,2)if𝐯=𝐲k(1)𝐱(1)j1(2)if𝐯=𝐱(1)j1and𝐯(2)={𝐲(2)k(1)if𝐯=𝐲k(1)𝐱(1,2)j1if𝐯=𝐱(1)j1.{\bf v}^{(2)}=\begin{cases}{\bf y}^{(1,2)}_{k}&\text{if}~{}{\bf v}={\bf y}^{(1)}_{k}\\ {\bf x}^{(2)}_{(1)_{j_{1}}}&\text{if}~{}{\bf v}={\bf x}_{(1)_{j_{1}}}\end{cases}\quad\text{and}\quad{\bf v}_{(2)}=\begin{cases}{\bf y}^{(1)}_{(2)_{k}}&\text{if}~{}{\bf v}={\bf y}^{(1)}_{k}\\ {\bf x}_{(1,2)_{j_{1}}}&\text{if}~{}{\bf v}={\bf x}_{(1)_{j_{1}}}\end{cases}\;.

3.1 Higher Derivative Programs

Theorem 3.1.

Let 𝐲=F(𝐱){\bf y}=F({\bf x}) be defined by (1) with first-order tangent and adjoint programs defined by (5) and (6). AD of a (ν1)(\nu-1)-th derivative program

𝐯=Fk,j1,,jν1[ν1]V{\bf v}=F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}\cdot V

in tangent mode yields the ν\nu-th derivative program

𝐯(ν)=Fk,j1,,jν[ν]V𝐱jν(ν){\bf v}^{(\nu)}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot V\cdot{\bf x}^{(\nu)}_{j_{\nu}}

for ν2.\nu\geq 2.

Proof 3.2.

The proof is by induction over the order ν\nu of differentiation.

  • ν=2:\nu=2: Let the first derivative program be the

    • tangent program, that is, 𝐯=𝐲k(1),{\bf v}={\bf y}^{(1)}_{k}, Fk,j1,,jν1[ν1]=Fk,j1[1]F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}=F^{[1]}_{k,j_{1}} and V=𝐱j1(1).V={\bf x}^{(1)}_{j_{1}}. AD wrt. 𝐱{\bf x} in tangent mode yields

      𝐯(2)=𝐲k(1,2)\displaystyle{\bf v}^{(2)}={\bf y}^{(1,2)}_{k} =[d(Fk,j1[1]𝐱j1(1))d𝐱]k,j2𝐱j2(2)\displaystyle=\left[\frac{d(F^{[1]}_{k,j_{1}}\cdot{\bf x}^{(1)}_{j_{1}})}{d{\bf x}}\right]_{k,j_{2}}\cdot{\bf x}^{(2)}_{j_{2}}
      =Fk,j1,j2[2]𝐱j1(1)𝐱j2(2)=Fk,j1,j2[2]V𝐱j2(2)\displaystyle=F^{[2]}_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf x}^{(2)}_{j_{2}}=F^{[2]}_{k,j_{1},j_{2}}\cdot V\cdot{\bf x}^{(2)}_{j_{2}}

      due to independence of V=𝐱j1(1)V={\bf x}^{(1)}_{j_{1}} from 𝐱.{\bf x}.

    • adjoint program, that is, 𝐯=𝐱(1)j1{\bf v}={\bf x}_{(1)_{j_{1}}} Fk,j1,,jν1[ν1]=Fk,j1[1]F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}=F^{[1]}_{k,j_{1}} and V=𝐲(1)k.V={\bf y}_{(1)_{k}}. AD wrt. 𝐱{\bf x} in tangent mode yields

      𝐯(2)=𝐱(1)j1(2)\displaystyle{\bf v}^{(2)}={\bf x}_{(1)_{j_{1}}}^{(2)} =[d(Fk,j1[1]𝐲(1)k)d𝐱]j1,j2𝐱j2(2)\displaystyle=\left[\frac{d(F^{[1]}_{k,j_{1}}\cdot{\bf y}_{(1)_{k}})}{d{\bf x}}\right]_{j_{1},j_{2}}\cdot{\bf x}^{(2)}_{j_{2}}
      =Fk,j1,j2[2]𝐲(1)k𝐱j2(2)=Fk,j1,j2[2]V𝐱j2(2)\displaystyle=F^{[2]}_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}=F^{[2]}_{k,j_{1},j_{2}}\cdot V\cdot{\bf x}^{(2)}_{j_{2}}

      due to independence of V=𝐲(1)kV={\bf y}_{(1)_{k}} from 𝐱.{\bf x}.

  • ν1ν:\nu-1\Rightarrow\nu: AD of 𝐯=Fk,j1,,jν1[ν1]V{\bf v}=F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}\cdot V wrt. 𝐱{\bf x} in tangent mode yields

    𝐯(ν)=[d(Fk,j1,,jν1[ν1]V)d𝐱]k,jν𝐱jν(ν)=Fk,j1,,jν[ν]V𝐱jν(ν){\bf v}^{(\nu)}=\left[\frac{d(F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}\cdot V)}{d{\bf x}}\right]_{k,j_{\nu}}\cdot{\bf x}^{(\nu)}_{j_{\nu}}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot V\cdot{\bf x}^{(\nu)}_{j_{\nu}}

    due to independence of VV from 𝐱.{\bf x}.

Theorem 3.3.

Let 𝐲=F(𝐱){\bf y}=F({\bf x}) be defined by (1) with tangent and adjoint programs defined by (5) and (6). AD of a (ν1)(\nu-1)-th derivative program

𝐯=Fk,j1,,jν1[ν1]V{\bf v}=F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}\cdot V

in adjoint mode yields the ν\nu-th derivative program

𝐱(ν)jν=Fk,j1,,jν[ν]V𝐯(ν){\bf x}_{(\nu)_{j_{\nu}}}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot V\cdot{\bf v}_{(\nu)}

for ν2.\nu\geq 2.

Proof 3.4.

The proof is by induction over the order ν\nu of differentiation.

  • ν=2:\nu=2: Let the first derivative program be the

    • tangent program, that is, 𝐯=𝐲k(1),{\bf v}={\bf y}^{(1)}_{k}, Fk,j1,,jν1[ν1]=Fk,j1[1]F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}=F^{[1]}_{k,j_{1}} and V=𝐱j1(1).V={\bf x}^{(1)}_{j_{1}}. AD wrt. 𝐱{\bf x} in adjoint mode yields

      𝐱(2)j2\displaystyle{\bf x}_{(2)_{j_{2}}} =[d(Fk,j1[1]𝐱j1(1))d𝐱]k,j2𝐲(2)k(1)=Fk,j1,j2[2]𝐱j1(1)𝐲(2)k(1)\displaystyle=\left[\frac{d(F^{[1]}_{k,j_{1}}\cdot{\bf x}^{(1)}_{j_{1}})}{d{\bf x}}\right]_{k,j_{2}}\cdot{\bf y}^{(1)}_{(2)_{k}}=F^{[2]}_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf y}^{(1)}_{(2)_{k}}
      =Fk,j1,j2[2]V𝐯(2)\displaystyle=F^{[2]}_{k,j_{1},j_{2}}\cdot V\cdot{\bf v}_{(2)}

      due to independence of V=𝐱j1(1)V={\bf x}^{(1)}_{j_{1}} from 𝐱.{\bf x}.

    • adjoint program, that is, 𝐯=𝐱(1)j1{\bf v}={\bf x}_{(1)_{j_{1}}} Fk,j1,,jν1[ν1]=Fk,j1[1]F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}=F^{[1]}_{k,j_{1}} and V=𝐲(1)k.V={\bf y}_{(1)_{k}}. AD wrt. 𝐱{\bf x} in adjoint mode yields

      𝐱(2)j2\displaystyle{\bf x}_{(2)_{j_{2}}} =[d(Fk,j1[1]𝐲(1)k)d𝐱]j1,j2𝐱(1,2)j1=Fk,j1,j2[2]𝐲(1)k𝐱(1,2)j1\displaystyle=\left[\frac{d(F^{[1]}_{k,j_{1}}\cdot{\bf y}_{(1)_{k}})}{d{\bf x}}\right]_{j_{1},j_{2}}\cdot{\bf x}_{(1,2)_{j_{1}}}=F^{[2]}_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}_{(1,2)_{j_{1}}}
      =Fk,j1,j2[2]V𝐯(2)\displaystyle=F^{[2]}_{k,j_{1},j_{2}}\cdot V\cdot{\bf v}_{(2)}

      due to independence of V=𝐲(1)kV={\bf y}_{(1)_{k}} from 𝐱.{\bf x}.

  • ν1ν:\nu-1\Rightarrow\nu: AD of 𝐯=Fk,j1,,jν1[ν1]V{\bf v}=F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}\cdot V wrt. 𝐱{\bf x} in adjoint mode yields

    𝐱(ν)jν=[d(Fk,j1,,jν1[ν1]V)d𝐱]i,jν𝐯(ν)=Fk,j1,,jν[ν]V𝐯(ν){\bf x}_{(\nu)_{j_{\nu}}}=\left[\frac{d(F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu-1}}\cdot V)}{d{\bf x}}\right]_{i,j_{\nu}}\cdot{\bf v}_{(\nu)}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot V\cdot{\bf v}_{(\nu)}

    due to independence of VV from 𝐱{\bf x} and with the vector 𝐯{\bf v} as the result of the (ν1)(\nu-1)-th derivative program being indexed by i.i.

Examples

A third derivative program is derived in tangent of adjoint of tangent mode recursively as follows:

  • ν=1:\nu=1: 𝐯=𝐲k{\bf v}={\bf y}_{k}, V=1,V=1, tangent mode \Rightarrow 𝐲k(1)=Fk,j1[1]𝐱j1(1);{\bf y}^{(1)}_{k}=F^{[1]}_{k,j_{1}}\cdot{\bf x}^{(1)}_{j_{1}};

  • ν=2:\nu=2: 𝐯=𝐲k(1){\bf v}={\bf y}^{(1)}_{k}, V=𝐱j1(1),V={\bf x}^{(1)}_{j_{1}}, adjoint mode \Rightarrow 𝐱(2)j2=Fk,j1,j2[2]𝐱j1(1)𝐲(2)k(1);{\bf x}_{(2)_{j_{2}}}=F^{[2]}_{k,j_{1},j_{2}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf y}^{(1)}_{(2)_{k}}; Note that in this case i=ki=k with reference to the proof of Theorem 3.3.

  • ν=3:\nu=3: 𝐯=𝐱(2)j2{\bf v}={\bf x}_{(2)_{j_{2}}}, V=𝐱j1(1)𝐲(2)k(1)V={\bf x}^{(1)}_{j_{1}}\cdot{\bf y}^{(1)}_{(2)_{k}}, tangent mode \Rightarrow

    𝐱(2)j2(3)=Fk,j1,j2,j3[3]𝐱j1(1)𝐲(2)k(1)𝐱j3(3).{\bf x}_{(2)_{j_{2}}}^{(3)}=F^{[3]}_{k,j_{1},j_{2},j3}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf y}^{(1)}_{(2)_{k}}\cdot{\bf x}^{(3)}_{j_{3}}\;.

Eight third derivative programs can be generated by application of tangent mode

to (8)\displaystyle\text{to (\ref{eqn:TT})}\Rightarrow\quad 𝐲k(1,2,3)\displaystyle{\bf y}_{k}^{(1,2,3)} =Fk,j1,j2,j3[3]𝐱j1(1)𝐱j2(2)𝐱j3(3)\displaystyle=F^{[3]}_{k,j_{1},j_{2},j_{3}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf x}^{(3)}_{j_{3}}
to (9)\displaystyle\text{to (\ref{eqn:AT})}\Rightarrow 𝐱(2)j2(3)\displaystyle{\bf x}^{(3)}_{(2)_{j_{2}}} =Fk,j1,j2,j3[3]𝐱j1(1)𝐲(2)k(1)𝐱j3(3)\displaystyle=F^{[3]}_{k,j_{1},j_{2},j_{3}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf y}^{(1)}_{(2)_{k}}\cdot{\bf x}^{(3)}_{j_{3}}
to (10)\displaystyle\text{to (\ref{eqn:TA})}\Rightarrow 𝐱(1)j1(2,3)\displaystyle{\bf x}^{(2,3)}_{(1)_{j_{1}}} =Fk,j1,j2,j3[3]𝐲(1)k𝐱j2(2)𝐱j3(3)\displaystyle=F^{[3]}_{k,j_{1},j_{2},j_{3}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf x}^{(3)}_{j_{3}}
to (11)\displaystyle\text{to (\ref{eqn:AA})}\Rightarrow 𝐱(2)j2(3)\displaystyle{\bf x}^{(3)}_{(2)_{j_{2}}} =Fk,j1,j2,j3[3]𝐲(1)k𝐱(1,2)j1𝐱j3(3)\displaystyle=F^{[3]}_{k,j_{1},j_{2},j_{3}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}_{(1,2)_{j_{1}}}\cdot{\bf x}^{(3)}_{j_{3}}
and of adjoint mode
to (8)\displaystyle\text{to (\ref{eqn:TT})}\Rightarrow 𝐱(3)j3\displaystyle{\bf x}_{(3)_{j_{3}}} =Fk,j1,j2,j3[3]𝐱j1(1)𝐱j2(2)𝐲(3)k(1,2)\displaystyle=F^{[3]}_{k,j_{1},j_{2},j_{3}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf y}_{(3)_{k}}^{(1,2)}
to (9)\displaystyle\text{to (\ref{eqn:AT})}\Rightarrow 𝐱(3)j3\displaystyle{\bf x}_{(3)_{j_{3}}} =Fk,j1,j2,j3[3]𝐱j1(1)𝐲(2)k(1)𝐱(2,3)j2\displaystyle=F^{[3]}_{k,j_{1},j_{2},j_{3}}\cdot{\bf x}^{(1)}_{j_{1}}\cdot{\bf y}^{(1)}_{(2)_{k}}\cdot{\bf x}_{(2,3)_{j_{2}}}
to (10)\displaystyle\text{to (\ref{eqn:TA})}\Rightarrow 𝐱(3)j3\displaystyle{\bf x}_{(3)_{j_{3}}} =Fk,j1,j2,j3[3]𝐲(1)k𝐱j2(2)𝐱(1,3)j1(2)\displaystyle=F^{[3]}_{k,j_{1},j_{2},j_{3}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf x}^{(2)}_{(1,3)_{j_{1}}}
to (11)\displaystyle\text{to (\ref{eqn:AA})}\Rightarrow 𝐱(3)j3\displaystyle{\bf x}_{(3)_{j_{3}}} =Fk,j1,j2,j3[3]𝐲(1)k𝐱(1,2)j1𝐱(2,3)j2.\displaystyle=F^{[3]}_{k,j_{1},j_{2},j_{3}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}_{(1,2)_{j_{1}}}\cdot{\bf x}_{(2,3)_{j_{2}}}\;.

A fifth derivative program is derived in tangent of adjoint of adjoint of tangent of adjoint mode as follows:

  • ν=1:\nu=1: 𝐯=𝐲k{\bf v}={\bf y}_{k}, V=1V=1, adjoint mode \Rightarrow 𝐱(1)j1=Fk,j1[1]𝐲(1)k{\bf x}_{(1)_{j_{1}}}=F^{[1]}_{k,j_{1}}\cdot{\bf y}_{(1)_{k}}

  • ν=2:\nu=2: 𝐯=𝐱(1)j1{\bf v}={\bf x}_{(1)_{j_{1}}}, V=𝐲(1)kV={\bf y}_{(1)_{k}}, tangent mode \Rightarrow 𝐱(1)j1(2)=Fk,j1,j2[2]𝐲(1)k𝐱j2(2){\bf x}_{(1)_{j_{1}}}^{(2)}=F^{[2]}_{k,j_{1},j_{2}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}

  • ν=3:\nu=3: 𝐯=𝐱(1)j1(2){\bf v}={\bf x}_{(1)_{j_{1}}}^{(2)}, V=𝐲(1)k𝐱j2(2)V={\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}, adjoint mode \Rightarrow

    𝐱(3)j3=Fk,j1,j2,j3[3]𝐲(1)k𝐱j2(2)𝐱(1,3)j1(2){\bf x}_{(3)_{j_{3}}}=F^{[3]}_{k,j_{1},j_{2},j_{3}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf x}_{(1,3)_{j_{1}}}^{(2)}
  • ν=4:\nu=4: 𝐯=𝐱(3)j3{\bf v}={\bf x}_{(3)_{j_{3}}}, V=𝐲(1)k𝐱j2(2)𝐱(1,3)j1(2)V={\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf x}_{(1,3)_{j_{1}}}^{(2)}, adjoint mode \Rightarrow

    𝐱(4)j4=Fk,j1,j2,j3,j4[4]𝐲(1)k𝐱j2(2)𝐱(1,3)j1(2)𝐱(3,4)j3{\bf x}_{(4)_{j_{4}}}=F^{[4]}_{k,j_{1},j_{2},j_{3},j_{4}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf x}_{(1,3)_{j_{1}}}^{(2)}\cdot{\bf x}_{(3,4)_{j_{3}}}
  • ν=5:\nu=5: 𝐯=𝐱(4)j4{\bf v}={\bf x}_{(4)_{j_{4}}}, V=𝐲(1)k𝐱j2(2)𝐱(1,3)j1(2)𝐱(3,4)j3V={\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf x}_{(1,3)_{j_{1}}}^{(2)}\cdot{\bf x}_{(3,4)_{j_{3}}}, tangent mode \Rightarrow

    𝐱(4)j4(5)=Fk,j1,j2,j3,j4,j5[5]𝐲(1)k𝐱j2(2)𝐱(1,3)j1(2)𝐱(3,4)j3𝐱j5(5).{\bf x}_{(4)_{j_{4}}}^{(5)}=F^{[5]}_{k,j_{1},j_{2},j_{3},j_{4},j_{5}}\cdot{\bf y}_{(1)_{k}}\cdot{\bf x}^{(2)}_{j_{2}}\cdot{\bf x}_{(1,3)_{j_{1}}}^{(2)}\cdot{\bf x}_{(3,4)_{j_{3}}}\cdot{\bf x}^{(5)}_{j_{5}}.

3.2 Differential Invariants

Theorem 3.5.

For ν\nu-th derivative programs as in Theorems 3.1 and 3.3

𝐱(ν)jν𝐱jν(ν)=𝐯(ν)𝐯(ν).{\bf x}_{(\nu)_{j_{\nu}}}\cdot{\bf x}^{(\nu)}_{j_{\nu}}={\bf v}_{(\nu)}\cdot{\bf v}^{(\nu)}\;.

Proof 3.6.

Let 𝐯=Fk,j1,,jν1[ν1]V.{\bf v}=F^{[\nu-1]}_{k,j_{1},\ldots,j_{\nu}-1}\cdot V. With

𝐱(ν)jν=Fk,j1,,jν[ν]V𝐯(ν)and𝐯(ν)=Fk,j1,,jν[ν]V𝐱jν(ν){\bf x}_{(\nu)_{j_{\nu}}}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot V\cdot{\bf v}_{(\nu)}\;\;\text{and}\;\;{\bf v}^{(\nu)}=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot V\cdot{\bf x}^{(\nu)}_{j_{\nu}}

it follows that

𝐱(ν)jν𝐱jν(ν)\displaystyle{\bf x}_{(\nu)_{j_{\nu}}}\cdot{\bf x}^{(\nu)}_{j_{\nu}} =Fk,j1,,jν[ν]V𝐯(ν)𝐱jν(ν)\displaystyle=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot V\cdot{\bf v}_{(\nu)}\cdot{\bf x}^{(\nu)}_{j_{\nu}}
=Fk,j1,,jν[ν]V𝐱jν(ν)𝐯(ν)=𝐯(ν)𝐯(ν)=𝐯(ν)𝐯(ν)\displaystyle=F^{[\nu]}_{k,j_{1},\ldots,j_{\nu}}\cdot V\cdot{\bf x}^{(\nu)}_{j_{\nu}}\cdot{\bf v}_{(\nu)}={\bf v}^{(\nu)}\cdot{\bf v}_{(\nu)}={\bf v}_{(\nu)}\cdot{\bf v}^{(\nu)}

Note that the 2ν12^{\nu-1} derivative programs of order ν1\nu-1 yield ν\nu distinct 𝐯.{\bf v}. Hence, ν\nu differential invariants can be derived. For example,

  • 𝐯=𝐲k(1,2)𝐱(3)j3𝐱j3(3)=𝐲(3)k(1,2)𝐲k(1,2,3){\bf v}={\bf y}^{(1,2)}_{k}\Rightarrow{\bf x}_{(3)_{j_{3}}}\cdot{\bf x}^{(3)}_{j_{3}}={\bf y}^{(1,2)}_{(3)_{k}}\cdot{\bf y}^{(1,2,3)}_{k}

  • 𝐯=𝐱(2)j2𝐱(3)j3𝐱j3(3)=𝐱(2,3)j2𝐱(2)j2(3){\bf v}={\bf x}_{(2)_{j_{2}}}\Rightarrow{\bf x}_{(3)_{j_{3}}}\cdot{\bf x}^{(3)}_{j_{3}}={\bf x}_{(2,3)_{j_{2}}}\cdot{\bf x}^{(3)}_{(2)_{j_{2}}}

  • 𝐯=𝐱(1)j1(2)𝐱(3)j3𝐱j3(3)=𝐱(1)j1(2)𝐱(1)j1(2,3){\bf v}={\bf x}^{(2)}_{(1)_{j_{1}}}\Rightarrow{\bf x}_{(3)_{j_{3}}}\cdot{\bf x}^{(3)}_{j_{3}}={\bf x}^{(2)}_{(1)_{j_{1}}}\cdot{\bf x}^{(2,3)}_{(1)_{j_{1}}}

for ν=3.\nu=3. One out of six sixth-order differential invariants over the 26=642^{6}=64 different sixth derivative programs is 𝐱(6)j6𝐱j6(6)=𝐱(4,6)j4(5)𝐱(4)j4(5,6).{\bf x}_{(6)_{j_{6}}}\cdot{\bf x}^{(6)}_{j_{6}}={\bf x}^{(5)}_{(4,6)_{j_{4}}}\cdot{\bf x}^{(5,6)}_{(4)_{j_{4}}}.

4 Discussion

Any AD tool capable of generating derivative programs of arbitrary order can be used to implement the validation of differential invariants. Source code transformation tools such as Tapenade [7] need to be applicable to their own output. While this is typically straight forward for tangent of \ldots of tangent programs the repeated application of adjoint mode may cause difficulties due to technical details specific to the given AD tool.

Overloading tools such as dco/c++ [8] need to allow for recursive instantiation of their derivative types with derivative types of lower order; dco/c++ supports this feature through nested C++ templates. Derivative programs of arbitrary order can be generated by arbitrary nesting of tangent and adjoint types. From a practical perspective this level of flexibility may not be crucial. Higher-order adjoint programs can always be generated as tangent of \ldots of tangent of adjoint programs provided continuous differentiability of the primal program up to the required order.

Differential invariants can be used as a debugging criterion for derivative code generated by AD. The primal program FF evaluates a partially ordered sequence of differentiable elemental functions Φs=Φs(𝐯r)rs:IRnsIRms\Phi_{s}=\Phi_{s}({\bf v}_{r})_{r\prec s}:{I\!\!R}^{n_{s}}\rightarrow{I\!\!R}^{m_{s}} as a single assignment code222Each variable is assumed to be written once.

𝐯s:=Φs(𝐯r)rsfors:=1,,q{\bf v}_{s}\small\mathrel{{:}{=}}\Phi_{s}({\bf v}_{r})_{r\prec s}\quad\text{for}~{}s\small\mathrel{{:}{=}}1,\ldots,q

and where, adopting the notation from [6], rsr\prec s if and only if 𝐯r{\bf v}_{r} is an argument of Φs.\Phi_{s}. We use :=\small\mathrel{{:}{=}} to denote assignment as defined by imperative programming languages.

AD of FF results in the augmentation of the latter with code for computing tangents or/and adjoints. For example, AD of the single assignment code in tangent mode yields the tangent single assignment code

𝐰𝐯s:=Φs(𝐯r)rsΦs(𝐮)𝐰k(1):=[Φs]k,j1𝐮j1(1)}fors=1,,q.\left.\begin{split}{\bf w}\equiv{\bf v}_{s}&\small\mathrel{{:}{=}}\Phi_{s}({\bf v}_{r})_{r\prec s}\equiv\Phi_{s}({\bf u})\\ {\bf w}^{(1)}_{k}&\small\mathrel{{:}{=}}\left[\Phi^{\prime}_{s}\right]_{k,j_{1}}\cdot{\bf u}^{(1)}_{j_{1}}\\ \end{split}\quad\right\}\quad\text{for}~{}s=1,\ldots,q\;.

By the chain rule of differentiation the resulting tangent program computes

(12) 𝐲=F(𝐱);𝐲k(1)=Fk,j1𝐱j1(1).{\bf y}=F({\bf x});\quad{\bf y}^{(1)}_{k}=F^{\prime}_{k,j_{1}}\cdot{\bf x}^{(1)}_{j_{1}}\;.

Given values for the inputs 𝐱{\bf x} and 𝐱(1){\bf x}^{(1)} yield values for both outputs 𝐲{\bf y} and 𝐲(1).{\bf y}^{(1)}. Obviously, (5) is contained within (12). The adjoint single assignment code can be derived analogously.

Stepping forward through a single assignment code with support for the propagation of both tangents and adjoints enables debugging of derivative code as follows: Initialization of 𝐱(1){\bf x}^{(1)} (for example randomly) in addition to 𝐱{\bf x} yields 𝐯s(1){\bf v}^{(1)}_{s} for s=1,,q.s=1,\ldots,q. For each (or selected) ss the initialization of 𝐯s(1){\bf v}_{s_{(1)}} followed by backward propagation of adjoints yields 𝐱(1).{\bf x}_{(1)}. Consistency of tangents and adjoints up to the current ss can be validated by checking the differential invariant 𝐱(1)j1𝐱j1(1)=[𝐯s(1)]k[𝐯s(1)]k.{\bf x}_{(1)_{j_{1}}}\cdot{\bf x}^{(1)}_{j_{1}}=[{\bf v}_{s_{(1)}}]_{k}\cdot[{\bf v}^{(1)}_{s}]_{k}. Additional evidence for the desirable correctness of the adjoint program can be obtained by approximating the tangents by finite differences.

Most established AD software tools support user-defined elemental functions. Their built-in elemental functions can typically be expected to be correct leaving user intervention as the most likely source of errors. The sketched debugging algorithm enables the localization and subsequent correction of potential errors.

The formalism extends seamlessly to higher derivative programs. Implementation in the context of AD software yields a number of technical challenges the discussion of which is beyond the scope of this paper.

5 Conclusion

AD as a form of differentiable programming has become an indispensable ingredient of state of the art numerical methods. Software tools for AD provide valuable support for the (semi-)automatic generation of derivative programs. Validation of correctness and debugging of such programs poses a serious challenge. The work presented in this paper aims to set the mathematical stage for the development of corresponding methods and for their highly desirable implementation.

References

  • [1] H. Bischof, M. Bücker, B. Lang, A. Rasch, and A. Vehreschild. Combining source transformation and operator overloading techniques to compute derivatives for MATLAB programs. In Proceedings of the Second IEEE International Workshop on Source Code Analysis and Manipulation, pages 65–72.
  • [2] B. Christianson, S. Forth, and A. Griewank, editors. Special issue of Optimization Methods & Software: Advances in Algorithmic Differentiation, 2018.
  • [3] T. Coleman and W. Xu. Automatic Differentiation in MATLAB Using ADMAT with Applications. Number 27 in Software, Environments, and Tools. SIAM, Philadelphia, PA, 2016.
  • [4] M. Giles and P. Glasserman. Smoking adjoints: Fast Monte Carlo greeks. Risk, 19:88–92, 2006.
  • [5] A. Griewank, D. Juedes, and J. Utke. Algorithm 755: ADOL-C: A package for the automatic differentiation of algorithms written in C/C++. ACM Transactions on Mathematical Software, 22(2):131–167, 1996.
  • [6] A. Griewank and A. Walther. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Number 105 in Other Titles in Applied Mathematics. SIAM, Philadelphia, PA, 2nd edition, 2008.
  • [7] L. Hascoët and V. Pascual. The Tapenade automatic differentiation tool: Principles, model, and specification. ACM Transactions on Mathematical Software, 39(3):20:1–20:43, 2013.
  • [8] K. Leppkes, J. Lotz, and U. Naumann. dco/c++: Derivative Code by Overloading in C++. Technical Report TR2/20, Numerical Algorithms Group Ltd., 2020.
  • [9] N. Gauger M. Sagebaum, T. Albring. High-performance derivative computations using CoDiPack. ACM Transactions on Mathematical Software, 45(4):1–26, 2019.
  • [10] U. Naumann. The Art of Differentiating Computer Programs: An Introduction to Algorithmic Differentiation. Number 24 in Software, Environments, and Tools. SIAM, Philadelphia, PA, 2012.
  • [11] U. Naumann and J. Riehme. A differentiation-enabled Fortran 95 compiler. ACM Transactions on Mathematical Software, 31(4):458–474, December 2005.
  • [12] D. Rumelhart, G. Hinton, and R. Williams. Learning representations by back-propagating errors. Nature, (323):533–536, 1986.
  • [13] M. Towara and U. Naumann. A discrete adjoint model for OpenFOAM. Procedia Computer Science, 18:429–438, 2013.