Differential Invariants
Abstract
Validation is a major challenge in differentiable programming. The state of the art is based on algorithmic differentiation. Consistency of first-order tangent and adjoint programs is defined by a well-known first-order differential invariant. This paper generalizes the approach through derivation of corresponding differential invariants of arbitrary order.
keywords:
differentiable programming, algorithmic differentiation, validation1 Introduction and State of the Art
We consider implementations of multivariate vector functions
(1) |
over the real (floating-point) numbers as sufficiently often continuously differentiable computer programs. Distinctly named variables (e.g, and ) are assumed to be unaliased111They occupy disjoint system memory locations. in the given implementation. We refer to as the primal program. Its -th derivative (tensor) is denoted as
(2) |
where We set (Jacobian) and (Hessian). Vectors are printed in bold font. Upper case letters denote other matrices and tensors. We use to denote mathematical equality and in the sense of “is defined as.”
Index notation is used for operations on derivative tensors with and for Products with vectors are defined as
(3) |
Similarly, products with vectors are defined as
(4) |
The following rather obvious observation will be used in upcoming proofs.
Proof 1.2.
Continuous differentiability up to the required order implies partial symmetry of derivative tensors as
for arbitrary permutations and of
Algorithmic differentiation (AD) [6, 10] of the primal program with respect to (wrt.) in tangent (also: forward) mode yields the tangent program
which computes in a given tangent direction as
(5) |
Tangents of primal variables are marked by the superscript Tensors are enclosed in square brackets whenever appropriate for clarification of index notation.
AD of the primal program wrt. in adjoint (also: reverse) mode yields the adjoint program
which computes in a given adjoint direction as
(6) |
Adjoints of primal variables are marked by the subscript
AD and adjoint methods in particular play a central role in modern simulation and data science. Key applications include computational fluid dynamics [13], quantitative finance [4] and machine learning [12]. A growing number of AD software tools have been developed. They support a variety of programming languages. Coverage includes C/C++ [5, 9], Fortran [7, 11] and Matlab [1, 3]. The high level of activity in AD research and development is also documented by so far seven international conferences on the subject with associated proceedings / special post-conference collections; see, for example, [2]. Refer to the AD community’s web portal www.autodiff.org for further information on research groups, software tools and applications. The web presence includes a comprehensive bibliography on the subject.
Theorem 1.3.
Proof 1.4.
Theorem 1.3 can be used to verify the consistency of tangent and adjoint programs for given and Shared conceptual errors are not detected, for example, if the derivative of is incorrectly assumed to be equal to (This mistake was made by the author during the implementation of an early prototype of the AD software dco/c++ … [8].) To address this issue the tangent can be approximated by a finite difference quotient. Consistency of finite differences, tangents and adjoints increases the likelihood of correctness of the derivative code. Special care must be taken to control numerical errors inflicted by finite difference approximation.
2 Second-Order Differential Invariants
Let us generalize the observations from the previous section for second derivative programs.
2.1 Second Derivative Programs
For primal programs as in (1) second derivative programs are obtained by differentiation of the first-order tangent or adjoint target programs with respect to in tangent or adjoint mode. Four variants of second derivative programs can be generated.
Lemma 2.1.
AD of the tangent program
wrt. in the tangent direction yields the tangent of tangent program
for computing
(8) |
Tangents of variables used in the target program (here: the tangent program) are marked by the superscript Chained superscripts are combined as
Proof 2.2.
Lemma 2.3.
AD of the tangent program
wrt. in the adjoint direction yields the adjoint of tangent program
for computing
(9) |
Adjoints of variables used in the target (tangent) program carry the subscript
Proof 2.4.
Lemma 2.5.
AD of the adjoint program
wrt. in the tangent direction yields the tangent of adjoint program
for computing
(10) |
Proof 2.6.
Lemma 2.7.
AD of the adjoint program
wrt. in the adjoint direction yields the adjoint of adjoint program
for computing
(11) |
Adjoints of variables used in the target (adjoint) program are marked by the subscript as before. Chained subscripts are combined as
Proof 2.8.
2.2 Differential Invariants
Theorem 2.9 can be used to verify the consistency of tangent of tangent and adjoint of tangent programs for given and Tangents can be approximated by finite differences. Potentially serious numerical errors should be expected from second-order finite difference approximation. Careful tuning of perturbations is crucial. High-precision floating-point arithmetic should be considered.
Theorem 2.11 can be used to verify the consistency of tangent of adjoint and adjoint of adjoint programs for given and Tangents can be approximated by finite differences.
3 Higher-Order Differential Invariants
Derivative programs of order have the form
where is an indexed tangent or adjoint of or and denotes a chained (outer) product of indexed tangents or adjoints of or For example, , and in a tangent of adjoint program. In the following sub- and superscripts of will be appended to the (possibly empty) chains of sub- and superscripts in the expression represented by For example,
3.1 Higher Derivative Programs
Theorem 3.1.
Proof 3.2.
The proof is by induction over the order of differentiation.
-
Let the first derivative program be the
-
–
tangent program, that is, and AD wrt. in tangent mode yields
due to independence of from
-
–
adjoint program, that is, and AD wrt. in tangent mode yields
due to independence of from
-
–
-
AD of wrt. in tangent mode yields
due to independence of from
Theorem 3.3.
Proof 3.4.
The proof is by induction over the order of differentiation.
-
Let the first derivative program be the
-
–
tangent program, that is, and AD wrt. in adjoint mode yields
due to independence of from
-
–
adjoint program, that is, and AD wrt. in adjoint mode yields
due to independence of from
-
–
-
AD of wrt. in adjoint mode yields
due to independence of from and with the vector as the result of the -th derivative program being indexed by
Examples
A third derivative program is derived in tangent of adjoint of tangent mode recursively as follows:
-
, tangent mode
-
, adjoint mode Note that in this case with reference to the proof of Theorem 3.3.
-
, , tangent mode
Eight third derivative programs can be generated by application of tangent mode
and of adjoint mode | ||||||
A fifth derivative program is derived in tangent of adjoint of adjoint of tangent of adjoint mode as follows:
-
, , adjoint mode
-
, , tangent mode
-
, , adjoint mode
-
, , adjoint mode
-
, , tangent mode
3.2 Differential Invariants
Proof 3.6.
Let With
it follows that
Note that the derivative programs of order yield distinct Hence, differential invariants can be derived. For example,
for One out of six sixth-order differential invariants over the different sixth derivative programs is
4 Discussion
Any AD tool capable of generating derivative programs of arbitrary order can be used to implement the validation of differential invariants. Source code transformation tools such as Tapenade [7] need to be applicable to their own output. While this is typically straight forward for tangent of of tangent programs the repeated application of adjoint mode may cause difficulties due to technical details specific to the given AD tool.
Overloading tools such as dco/c++ [8] need to allow for recursive instantiation of their derivative types with derivative types of lower order; dco/c++ supports this feature through nested C++ templates. Derivative programs of arbitrary order can be generated by arbitrary nesting of tangent and adjoint types. From a practical perspective this level of flexibility may not be crucial. Higher-order adjoint programs can always be generated as tangent of of tangent of adjoint programs provided continuous differentiability of the primal program up to the required order.
Differential invariants can be used as a debugging criterion for derivative code generated by AD. The primal program evaluates a partially ordered sequence of differentiable elemental functions as a single assignment code222Each variable is assumed to be written once.
and where, adopting the notation from [6], if and only if is an argument of We use to denote assignment as defined by imperative programming languages.
AD of results in the augmentation of the latter with code for computing tangents or/and adjoints. For example, AD of the single assignment code in tangent mode yields the tangent single assignment code
By the chain rule of differentiation the resulting tangent program computes
(12) |
Given values for the inputs and yield values for both outputs and Obviously, (5) is contained within (12). The adjoint single assignment code can be derived analogously.
Stepping forward through a single assignment code with support for the propagation of both tangents and adjoints enables debugging of derivative code as follows: Initialization of (for example randomly) in addition to yields for For each (or selected) the initialization of followed by backward propagation of adjoints yields Consistency of tangents and adjoints up to the current can be validated by checking the differential invariant Additional evidence for the desirable correctness of the adjoint program can be obtained by approximating the tangents by finite differences.
Most established AD software tools support user-defined elemental functions. Their built-in elemental functions can typically be expected to be correct leaving user intervention as the most likely source of errors. The sketched debugging algorithm enables the localization and subsequent correction of potential errors.
The formalism extends seamlessly to higher derivative programs. Implementation in the context of AD software yields a number of technical challenges the discussion of which is beyond the scope of this paper.
5 Conclusion
AD as a form of differentiable programming has become an indispensable ingredient of state of the art numerical methods. Software tools for AD provide valuable support for the (semi-)automatic generation of derivative programs. Validation of correctness and debugging of such programs poses a serious challenge. The work presented in this paper aims to set the mathematical stage for the development of corresponding methods and for their highly desirable implementation.
References
- [1] H. Bischof, M. Bücker, B. Lang, A. Rasch, and A. Vehreschild. Combining source transformation and operator overloading techniques to compute derivatives for MATLAB programs. In Proceedings of the Second IEEE International Workshop on Source Code Analysis and Manipulation, pages 65–72.
- [2] B. Christianson, S. Forth, and A. Griewank, editors. Special issue of Optimization Methods & Software: Advances in Algorithmic Differentiation, 2018.
- [3] T. Coleman and W. Xu. Automatic Differentiation in MATLAB Using ADMAT with Applications. Number 27 in Software, Environments, and Tools. SIAM, Philadelphia, PA, 2016.
- [4] M. Giles and P. Glasserman. Smoking adjoints: Fast Monte Carlo greeks. Risk, 19:88–92, 2006.
- [5] A. Griewank, D. Juedes, and J. Utke. Algorithm 755: ADOL-C: A package for the automatic differentiation of algorithms written in C/C++. ACM Transactions on Mathematical Software, 22(2):131–167, 1996.
- [6] A. Griewank and A. Walther. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Number 105 in Other Titles in Applied Mathematics. SIAM, Philadelphia, PA, 2nd edition, 2008.
- [7] L. Hascoët and V. Pascual. The Tapenade automatic differentiation tool: Principles, model, and specification. ACM Transactions on Mathematical Software, 39(3):20:1–20:43, 2013.
- [8] K. Leppkes, J. Lotz, and U. Naumann. dco/c++: Derivative Code by Overloading in C++. Technical Report TR2/20, Numerical Algorithms Group Ltd., 2020.
- [9] N. Gauger M. Sagebaum, T. Albring. High-performance derivative computations using CoDiPack. ACM Transactions on Mathematical Software, 45(4):1–26, 2019.
- [10] U. Naumann. The Art of Differentiating Computer Programs: An Introduction to Algorithmic Differentiation. Number 24 in Software, Environments, and Tools. SIAM, Philadelphia, PA, 2012.
- [11] U. Naumann and J. Riehme. A differentiation-enabled Fortran 95 compiler. ACM Transactions on Mathematical Software, 31(4):458–474, December 2005.
- [12] D. Rumelhart, G. Hinton, and R. Williams. Learning representations by back-propagating errors. Nature, (323):533–536, 1986.
- [13] M. Towara and U. Naumann. A discrete adjoint model for OpenFOAM. Procedia Computer Science, 18:429–438, 2013.