\stackMath

High-order symplectic Lie group methods on $SO(n)$ using the polar decomposition

Xuefeng Shen , Khoa Tran and Melvin Leok

Abstract.

A variational integrator of arbitrarily high-order on the special orthogonal group $SO(n)$ is constructed using the polar decomposition and the constrained Galerkin method. It has the advantage of avoiding the second-order derivative of the exponential map that arises in traditional Lie group variational methods. In addition, a reduced Lie–Poisson integrator is constructed and the resulting algorithms can naturally be implemented by fixed-point iteration. The proposed methods are validated by numerical simulations on $SO(3)$ which demonstrate that they are comparable to variational Runge–Kutta–Munthe-Kaas methods in terms of computational efficiency. However, the methods we have proposed preserve the Lie group structure much more accurately and and exhibit better near energy preservation.

1. Introduction

1.1. Overview

Given a configuration manifold $Q$ , variational integrators provide a useful method of deriving symplectic integrators for Lagrangian mechanics on the tangent bundle $TQ$ in terms of the Lagrangian $L$ , or for Hamiltonian mechanics on the cotangent bundle $T^{*}Q$ in terms of the Hamiltonian $H$ . It involves discretizing Hamilton’s principle or Hamilton’s phase space principle rather than the Euler–Lagrange or Hamilton’s equations. Discrete Lagrangian variational mechanics is described in terms of a discrete Lagrangian $L_{d}(q_{0},q_{1})$ , which is an approximation of the exact discrete Lagrangian,

L_{d}^{\text{exact}}(q_{0},q_{1})=\operatorname*{ext}_{\begin{subarray}{c}q\in C^{2}([0,h],Q)\\ q(0)=q_{0},\text{ }q(h)=q_{1}\end{subarray}}\int_{0}^{h}L(q(t),\dot{q}(t))\,dt.

(1)

The discrete Hamilton’s principle states that the discrete action sum is stationary for variations that vanish at the endpoints. This yields the discrete Euler–Lagrange equations,

D_{2}L_{d}(q_{k-1},q_{k})+D_{1}L_{d}(q_{k},q_{k+1})=0,

where $D_{i}$ denotes the partial derivative with respect to the $i$ -th argument. This defines an update map on $Q\times Q$ , where $(q_{k-1},q_{k})\mapsto(q_{k},q_{k+1})$ . The update map can equivalently be described in terms of the discrete Legendre transforms,

p_{k}=-D_{1}L_{d}(q_{k},q_{k+1}),\qquad p_{k+1}=D_{2}L_{d}(q_{k},q_{k+1}),

(2)

which defines an update map $(q_{k},p_{k})\mapsto(q_{k+1},p_{k+1})$ on $T^{*}Q$ that automatically preserves the canonical symplectic structure on $T^{*}Q$ .

The order of the variational integrator depends on how accurately $L_{d}(q_{0},q_{1})$ approximates $L_{d}^{\text{exact}}(q_{0},q_{1})$ . To derive a high-order discrete Lagrangian, a typical approach is the Galerkin method [14]. This involves considering the definition of the exact discrete Lagrangian, replacing $C^{2}([0,h],Q)$ with a finite-dimensional function space, and approximating the integral with a numerical quadrature formula. When the configuration manifold $Q$ is a linear space and the polynomials of degree less than or equal to $s$ are chosen, the classical symplectic partitioned Runge–Kutta methods are recovered. Subsequently, Leok and Shingel [12] introduced the shooting-based discrete Lagrangian, which allows one to construct a symplectic integrator from an arbitrary one-step method.

When the configuration manifold $Q$ is a Lie group $G$ , the construction of the discrete Lagrangian is more complicated than the case of linear space. Leok [11] proposed parametrizing curves on the Lie group using the exponential map, namely a curve $g(t)$ connecting $g_{0}$ and $g_{1}$ that is represented by

g(t)=g_{0}\cdot\exp(\epsilon(t)),

where $\epsilon(t)\in\mathfrak{g}$ is a curve on the Lie algebra of $G$ with fixed endpoints $\epsilon(0)=0$ and $\epsilon(h)=\log(g_{0}^{-1}g_{1})$ . This allows one to replace variations in $g(t)$ by variations in $\epsilon(t)$ on the Lie algebra $\mathfrak{g}$ , which is a linear space. This yields the following expression for the exact discrete Lagrangian,

L_{d}^{\text{exact}}(g_{0},g_{1})=\operatorname*{ext}_{\begin{subarray}{c}\epsilon\in C^{2}([0,h],\mathfrak{g})\\ \epsilon(0)=0,\text{ }\epsilon(h)=\log(g_{0}^{-1}g_{1})\end{subarray}}\int_{0}^{h}L(g_{0}\cdot\exp(\epsilon(t)),g_{0}\cdot\operatorname*{d\exp}\nolimits_{\epsilon(t)}(\dot{\epsilon}(t)))\,dt,

(3)

where $\operatorname*{d\exp}_{\epsilon}(\dot{\epsilon})=\exp(\epsilon)\cdot\frac{1-\text{e}^{-\text{ad}_{\epsilon}}}{\text{ad}_{\epsilon}}(\dot{\epsilon})$ is the tangent lift of the exponential map. If $\epsilon(t)$ is restricted to a finite-dimensional function space and the integral is replaced with a quadrature rule, we obtain the Galerkin Lie group variational integrators. The error analysis and implementation details for such methods can be found in [8, 3]. The above construction can be naturally extended to any retraction [1] on $G$ , which is a diffeomorphism from a neighborhood of $0\in\mathfrak{g}$ to neighborhood of $e\in G$ that satisfies a rigidity condition. The main disadvantage of Galerkin Lie group variational integrators is the term $\operatorname*{d\exp}$ in (3). This implies that the resulting discrete Euler–Lagrange equations involve $\operatorname*{d^{2}\exp}2exp$ , which cannot be calculated exactly in general and requires the truncation of a series expansion.

1.2. Contributions

In this paper, we focus on the Lie group $SO(n)$ as our configuration space. By using the fact that every invertible square matrix can be uniquely decomposed into the product of an orthogonal matrix and a symmetric positive-definite matrix via the polar decomposition, we will circumvent the disadvantage discussed previously: Instead of parametrizing curves on $SO(n)$ by the exponential map or a retraction, $SO(n)$ is embedded naturally in the space $GL_{+}(n)=\{A\in\mathbb{R}^{n\times n}\mid\text{det}(A)>0\}$ , an open subset of $\mathbb{R}^{n\times n}$ . Given fixed endpoints $g_{0}$ and $g_{1}$ , we will construct interpolating polynomials in $GL_{+}(n)$ while ensuring that the internal points remain on $SO(n)$ by using the polar decomposition. Furthermore, we do not extend the Lagrangian $L$ to $GL_{+}(n)$ but instead project the trajectory onto $SO(n)$ in the same way. The variational integrator in Lagrangian form is derived following the usual variational approach for the constrained Galerkin method and the Hamiltonian form is derived using the discrete Legendre transforms.

For a system with rotational symmetry, we obtain a simpler integrator using Lie–Poisson reduction on the Hamiltonian side. Namely, if $L$ is $SO(n)$ -invariant, the constructed discrete Lagrangian is also $SO(n)$ -invariant and we can construct a reduced symplectic Lie–Poisson integrator. Lastly, we consider the dipole on a stick problem from [3], conduct numerical experiments using our method, and compare these to the variational Runge–Kutta–Munthe-Kaas methods (VRKMK) from the same reference.

2. Background

2.1. Notation

Recall that the Lie algebra of $SO(n)$ is the set $\operatorname*{Skew}(n)=\{\Omega\in\mathbb{R}^{n\times n}\mid\Omega^{T}=-\Omega\}$ , with the matrix commutator as the Lie bracket. Here, the inner products on $\mathbb{R}^{n\times n}$ and $\operatorname*{Skew}(n)$ are introduced, and we identify these spaces with their dual spaces using the Riesz representations. For any $A,B\in\mathbb{R}^{n\times n}$ , the inner product is given by

\operatorname*{tr}\left(AB^{T}\right)=\sum\limits_{i,j=1}^{n}a_{ij}b_{ij}.

For any $\Omega,\tilde{\Omega}\in\operatorname*{Skew}(n)$ , the inner product is defined by

\langle\Omega,\tilde{\Omega}\rangle=\sum\limits_{i<j}\Omega_{ij}\tilde{\Omega}_{ij}=\frac{1}{2}\operatorname*{tr}(\Omega\tilde{\Omega}^{T}).

In addition, consider the operator $\operatorname*{Asym}\colon\mathbb{R}^{n\times n}\to\operatorname*{Skew}(n)$ , defined by $\operatorname*{Asym}(A)=A-A^{T}$ . The following properties can be easily verified:

(a)

For any $A,B\in\mathbb{R}^{n\times n}$ , $\operatorname*{tr}(AB^{T})=\operatorname*{tr}(A^{T}B)$ .
(b)

For any $A,B,P,Q\in\mathbb{R}^{n\times n}$ , $\operatorname*{tr}(A(PBQ)^{T})=\operatorname*{tr}((P^{T}AQ^{T})B^{T})$ .
(c)

For any $\Omega\in\operatorname*{Skew}(n),A\in\mathbb{R}^{n\times n}$ , $\langle\Omega,\operatorname*{Asym}(A)\rangle=\sum\limits_{i<j}\Omega_{ij}(A_{ij}-A_{ji})=\operatorname*{tr}(\Omega A^{T})$ .

In particular, we note that (c) gives a relationship between the two inner products. Lastly, given the choice of inner products, Riesz representation allows us to identify $(\mathbb{R}^{n\times n})^{*}$ with $\mathbb{R}^{n\times n}$ and $\operatorname*{Skew}(n)^{*}$ with $\operatorname*{Skew}(n)$ .

2.2. Polar Decomposition

We introduce the polar decomposition and the construction of the retraction on $SO(n)$ described in [4]. Given $A\in GL(n)$ , it decomposes uniquely as

A\longrightarrow UP,\qquad U\in O(n),\text{ }P\in{\operatorname*{Sym}}^{+}(n),

where ${\operatorname*{Sym}}^{+}(n)$ is the set of $n\times n$ symmetric positive-definite matrices. This is the polar decomposition of $A$ , and we denote it as a coproduct mapping by $\operatorname*{pol}\colon GL(n)\to O(n)\times{\operatorname*{Sym}}^{+}(n)$ . The map of interest is the projection $\mathbb{P}=\pi_{1}\circ\operatorname*{pol}\colon GL(n)\to O(n)$ defined by

\mathbb{P}(A)=U,

where $\pi_{1}$ is the projection onto the first coordinate. In particular, when $A\in GL_{+}(n)$ , we have $U\in SO(n)$ . A fast and efficient algorithm for calculating the projection of the polar decomposition is by Newton iteration,

U_{k+1}=\frac{1}{2}\left(U_{k}+U_{k}^{-T}\right),\text{ }U_{0}=A.

(4)

Now, this projection can be used to construct a retraction on $SO(n)$ from its Lie algebra $\operatorname*{Skew}(n)$ relative to the identity element $I$ ,

\mathbb{P}(I+\Omega)=U,

where $\Omega\in\operatorname*{Skew}(n)$ . This provides a diffeomorphism between a neighborhood of $0\in\operatorname*{Skew}(n)$ and a neighborhood of $I\in SO(n)$ . To calculate its inverse, suppose that $I+\Omega=UP$ and take the transpose on both sides to obtain $I-\Omega=PU^{T}$ , which implies that $U^{T}(I+\Omega)=(I-\Omega)U$ . Thus, we have that

U^{T}\Omega+\Omega U+U^{T}-U=0.

(5)

This is a Lyapunov equation, and it is well-known that matrix equations of the form $AX+XB+C=0$ have a unique solution if and only if for any $\lambda\in\sigma(A),\mu\in\sigma(B)$ , $\lambda+\mu\neq 0$ . For $U$ in the neighborhood of identity, its eigenvalues lie in the open right-half plane, which ensures that a unique solution to (5) exists. In principle, this Lyapunov equation can be solved using classical algorithms [2, 7]. For convenience, we denote the solution to the Lyapunov equation as

X=\text{Lyap}(A,B,C).

Next we introduce the tangent map and its adjoint for $\mathbb{P}$ , which are essential for the derivation of the variational integrator.

2.2.1. The Tangent Map

Consider the polar decomposition $A(t)=U(t)P(t)$ and differentiate both sides to yield $\dot{A}=\dot{U}P+U\dot{P}$ . By left-trivialization on $SO(n)$ , we can write $\dot{U}=U\Omega$ , where $\Omega\in\operatorname*{Skew}(n)$ . Rearranging gives $\dot{P}=U^{T}\dot{A}-\Omega P$ , and since $\dot{P}\in{\operatorname*{Sym}}^{+}(n)$ , we get that $U^{T}\dot{A}-\Omega P=\dot{A}^{T}U+P\Omega$ . Consequently, we may write it in the form of a Lyapunov equation,

P\Omega+\Omega P+\dot{A}^{T}U-U^{T}\dot{A}=0.

(6)

We see that the tangent map of the polar decomposition $d\mathbb{P}_{A}:\mathbb{R}^{n\times n}\to\operatorname*{Skew}(n)$ is given by $d\mathbb{P}_{A}(\dot{A})=\Omega$ , where we solve the Lyapunov equation (6) to obtain $\Omega$ .

2.2.2. The Adjoint of the Tangent Map

The adjoint of $d\mathbb{P}_{A}$ can be defined as

\operatorname*{tr}(d\mathbb{P}_{A}^{*}(\Omega)B^{T})=\langle\Omega,d\mathbb{P}_{A}(B)\rangle,\qquad\forall\Omega\in\operatorname*{Skew}(n),B\in\mathbb{R}^{n\times n}.

Recall that $d\mathbb{P}_{A}(B)$ involves solving the Lyapunov equation (6). We aim to compute $d\mathbb{P}_{A}^{*}$ , so we define the following two maps,

	$\displaystyle\phi$	$\displaystyle\colon\operatorname{Skew}(n)\longrightarrow\operatorname{Skew}(n),$	$\displaystyle\qquad\Omega$	$\displaystyle\longmapsto\Omega P+P\Omega,$
	$\displaystyle\psi$	$\displaystyle\colon\mathbb{R}^{n\times n}\longrightarrow\operatorname*{Skew}(n),$	$\displaystyle\qquad B$	$\displaystyle\longmapsto U^{T}B-B^{T}U,$

where $A=UP$ is fixed. Therefore, $d\mathbb{P}_{A}$ can be viewed as composition of $\psi$ and $\phi^{-1}$ ,

d\mathbb{P}_{A}=(\phi^{-1}\circ\psi)\colon\mathbb{R}^{n\times n}\longrightarrow\operatorname*{Skew}(n),

and $d\mathbb{P}_{A}^{*}(\Omega)=(\phi^{-1}\circ\psi)^{*}(\Omega)=\psi^{*}\circ(\phi^{*})^{-1}(\Omega)$ . We shall derive the expressions for $\phi^{*}$ and $\psi^{*}$ by considering the Riesz representations for our domains and codomains. For the adjoint of $\phi$ , let $\Omega,\tilde{\Omega}\in\operatorname*{Skew}(n)$ , then

	$\displaystyle\langle\phi^{*}(\Omega),\tilde{\Omega}\rangle$	$\displaystyle=\langle\Omega,\phi(\tilde{\Omega})\rangle=\langle\Omega,\tilde{\Omega}P+P\tilde{\Omega}\rangle$
		$\displaystyle=\langle\Omega,\operatorname{Asym}(\tilde{\Omega}P)\rangle=\operatorname{tr}(\Omega(\tilde{\Omega}P)^{T})$
		$\displaystyle=\operatorname*{tr}((\Omega P)\tilde{\Omega}^{T})$
		$\displaystyle=\langle\Omega P+P\Omega,\tilde{\Omega}\rangle.$

Thus, $\phi^{*}=\phi$ , and $\phi$ is Hermitian. For $\psi^{*}$ , let $\Omega\in\operatorname*{Skew}(n)$ and $B\in\mathbb{R}^{n\times n}$ , then

	$\displaystyle\operatorname{tr}(\psi^{}(\Omega)B^{T})$	$\displaystyle=\langle\Omega,\psi(B)\rangle=\langle\Omega,U^{T}B-B^{T}U\rangle$
		$\displaystyle=\langle\Omega,\operatorname{Asym}(U^{T}B)\rangle=\operatorname{tr}(\Omega(U^{T}B)^{T})$
		$\displaystyle=\operatorname*{tr}((U\Omega)B^{T}).$

Therefore, $\psi^{*}(\Omega)=U\Omega$ , and we obtain

d\mathbb{P}_{A}^{*}(\Omega)=(\psi^{*}\circ(\phi^{*})^{-1})(\Omega)=(\psi^{*}\circ\phi^{-1})(\Omega)=U\operatorname*{Lyap}(P,\Omega^{T}),

where $\text{Lyap}(P,\Omega^{T})=\text{Lyap}(P,P,\Omega^{T})$ . Finally, we state a lemma that will be used later:

Lemma 1.

$\mathbb{P}(I+S)=I$ if and only if $S\in\operatorname*{Sym}(n)$ and $\lambda>-1$ for all $\lambda\in\sigma(S)$ .

3. Lagrangian Variational Integrators on the Rotation Group $SO(n)$

Let the configuration manifold be the rotation group $G=SO(n)$ , and $L\colon G\times\mathfrak{g}\to\mathbb{R}$ be a left-trivialized Lagrangian. We shall construct a discrete Lagrangian following the approach of constrained Galerkin methods on $GL_{+}(n)$ (see Appendix A). Denote the internal points by $\{U_{i}\}_{i=1}^{s}\subset G$ and the left-trivialized internal tangent vectors by $\{\Omega_{i}\}_{i=1}^{s}\subset\mathfrak{g}$ . Fixing the endpoints $g_{0}$ and $g_{1}$ , we have

L_{d}(g_{0},g_{1})=h\sum_{i=1}^{s}b_{i}L(U_{i},\Omega_{i}),

subject to the constraint

g_{1}=\mathbb{P}\left(g_{0}+h\sum\nolimits_{i=1}^{s}b_{i}U_{i}\Omega_{i}\right),

(7)

where the internal points $U_{i}$ are represented by

U_{i}=\mathbb{P}\left(g_{0}+h\sum\nolimits_{j=1}^{s}a_{ij}U_{j}\Omega_{j}\right).

(8)

The expressions inside the parentheses in (7) and (8) correspond to a Runge–Kutta method in the embedding space. But, since these points may not lie on the Lie group $G$ , they are projected to $G$ using the polar decomposition.

Observe that (7) is equivalent to the condition that $\mathbb{P}\left(g_{1}^{T}(g_{0}+h\sum_{i=1}^{s}b_{i}U_{i}\Omega_{i})\right)=I$ . Suppose that $h$ is small, and $g_{0}$ and $g_{1}$ are close enough to each other, then $g_{1}^{T}\left(g_{0}+h\sum_{i=1}^{s}b_{i}U_{i}\Omega_{i}\right)$ is in the neighborhood of $I$ . By Lemma 1, (7) holds if and only if $g_{1}^{T}\left(g_{0}+h\sum_{i=1}^{s}b_{i}U_{i}\Omega_{i}\right)\in\text{Sym}(n)$ , and so it is equivalent to

\operatorname*{Asym}\left(g_{1}^{T}\left(g_{0}+h\sum\nolimits_{i=1}^{s}b_{i}U_{i}\Omega_{i}\right)\right)=0.

Now, we can construct a discrete Lagrangian with the constraint using a Lagrange multiplier $\Lambda\in\operatorname*{Skew}(n)$ ,

\tilde{F}(g_{0},g_{1},\{\Omega_{i}\}_{i=1}^{s},\Lambda)=h\sum_{i=1}^{s}b_{i}L(U_{i},\Omega_{i})+\left\langle\Lambda,\operatorname*{Asym}\left(g_{1}^{T}\left(g_{0}+h\sum\nolimits_{i=1}^{s}b_{i}U_{i}\Omega_{i}\right)\right)\right\rangle.

The corresponding variational integrator is given by


$\displaystyle[left=\empheqlbrace\,]0$	$\displaystyle=\frac{\partial\tilde{F}}{\partial\Omega_{i}},\qquad i=1,2\dots s,$	(9a)
$\displaystyle 0$	$\displaystyle=\frac{\partial\tilde{F}}{\partial\Lambda},$	(9b)
$\displaystyle U_{i}$	$\displaystyle=\mathbb{P}\left(g_{0}+h\sum\nolimits_{j=1}^{s}a_{ij}U_{j}\Omega_{j}\right),$	(9c)
$\displaystyle-p_{0}$	$\displaystyle=\frac{\partial\tilde{F}}{\partial g_{0}},$	(9d)
$\displaystyle p_{1}$	$\displaystyle=\frac{\partial\tilde{F}}{\partial g_{1}}.$	(9e)

We shall compute the above equations more explicitly. It is easy to see that (9b) is equivalent to the constraint (7). Let us turn to (9a), where the main difficulty is the implicit dependence of $\{U_{i}\}_{i=1}^{s}$ on $\{\Omega_{i}\}_{i=1}^{s}$ that involves solving a nonlinear system (8). Suppose that $k\in\{1,2,\ldots,s\}$ is fixed, and we vary $\Omega_{k}$ such that $\Omega_{k}\to\Omega_{k}(t)$ with $\Omega_{k}(0)=\Omega_{k}$ and $\dot{\Omega}_{k}(0)=\delta\Omega_{k}$ , while $\{\Omega_{i}\}_{i\neq k}$ remain fixed. Then,

U_{i}(t)=\mathbb{P}\left(g_{0}+h\sum\nolimits_{j\neq k}a_{ij}U_{j}(t)\Omega_{j}+ha_{ik}U_{k}(t)\Omega_{k}(t)\right).

Differentiating both sides and letting $\dot{U}_{i}=U_{i}X_{ik}$ , we have that

X_{ik}=d\mathbb{P}_{A_{i}}\left(h\sum\nolimits_{j=1}^{s}a_{ij}U_{j}X_{jk}\Omega_{j}+ha_{ik}U_{k}\delta\Omega_{k}\right),

(10)

where $A_{i}=g_{0}+h\sum\nolimits_{j=1}^{s}a_{ij}U_{j}\Omega_{j}$ . Then, (10) can be rewritten as

X_{ik}-d\mathbb{P}_{A_{i}}\left(h\sum\nolimits_{j=1}^{s}a_{ij}U_{j}X_{jk}\Omega_{j}\right)=hd\mathbb{P}_{A_{i}}(a_{ik}U_{k}\delta\Omega_{k}).

(11)

In order to represent $\{X_{ik}\}_{i=1}^{s}$ in terms of $\delta\Omega_{k}$ , we define three maps,

$\displaystyle\tilde{\psi}_{k}$	$\displaystyle:\operatorname{Skew}(n)\longrightarrow\operatorname{Skew}(n)^{s},$	$\displaystyle\quad\delta\Omega_{k}$	$\displaystyle\longmapsto\{d\mathbb{P}_{A_{i}}(a_{ik}U_{k}\delta\Omega_{k})\}_{i=1}^{s},$
$\displaystyle\tilde{\phi}$	$\displaystyle:\operatorname{Skew}(n)^{s}\longrightarrow\operatorname{Skew}(n)^{s},$	$\displaystyle\quad\{X_{ik}\}_{i=1}^{s}$	$\displaystyle\longmapsto\left\{X_{ik}-d\mathbb{P}_{A_{i}}\left(h\sum\nolimits_{j=1}^{s}a_{ij}U_{j}X_{jk}\Omega_{j}\right)\right\}_{i=1}^{s},$
$\displaystyle\tilde{\pi}_{i}$	$\displaystyle:\operatorname{Skew}(n)^{s}\longrightarrow\operatorname{Skew}(n),$	$\displaystyle\quad\{\Omega_{i}\}_{i=1}^{s}$	$\displaystyle\longmapsto\Omega_{i}.$

Then,

X_{ik}=(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ h\tilde{\psi}_{k})(\delta\Omega_{k})=h(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\psi}_{k})(\delta\Omega_{k}).

(12)

Now, we compute $\frac{\partial\tilde{F}}{\partial\Omega_{k}}$ by evaluating $\left.\frac{d}{dt}\right|_{t=0}\tilde{F}(g_{0},g_{1},\ldots,\Omega_{k}(t),\ldots,\Lambda)$ and expressing $\frac{\partial L}{\partial U}:G\times\mathfrak{g}\to\mathfrak{g}^{*}$ as a left-trivialized cotangent vector. Since this is a straightforward calculation of the variation, we present the result here for equation (9a) (see section C.1 for details),

0=h\sum_{i=1}^{s}b_{i}(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\psi}_{k})^{*}\left(\frac{\partial L}{\partial U}(U_{i},\Omega_{i})+\operatorname*{Asym}(U_{i}^{T}g_{1}\Lambda\Omega_{i}^{T})\right)+b_{k}\left(\frac{\partial L}{\partial\Omega}(U_{k},\Omega_{k})+\operatorname*{Asym}(U_{k}^{T}g_{1}\Lambda)\right),

(13)

for any $k=1,2,\ldots,s$ .

Recall that $\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\psi}_{k}\colon\operatorname*{Skew}(n)\to\operatorname*{Skew}(n)$ and its dual is given by

(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\psi}_{k})^{*}=\tilde{\psi}_{k}^{*}\circ(\tilde{\phi}^{*})^{-1}\circ\tilde{\pi}_{i}^{*}.

Therefore, let us derive the explicit expressions for the adjoints of our three proposed maps, so we may write $(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\psi}_{k})^{*}$ explicitly. The adjoint of $\tilde{\pi}_{i}$ is easy to compute, and for any $S\in\operatorname*{Skew}(n)$ ,

\tilde{\pi}_{i}^{*}(S)=(0,\ldots,S,\ldots 0),

(14)

where $S$ is in the $i$ -th position. For the adjoint of $\tilde{\phi}$ , we consider the identity

\langle\tilde{\phi}^{*}(S_{1},\ldots,S_{s}),(\tilde{S}_{1},\ldots,\tilde{S}_{s})\rangle=\langle(S_{1},\ldots,S_{s}),\tilde{\phi}(\tilde{S}_{1},\ldots,\tilde{S}_{s})\rangle,

for any $(S_{1},\ldots,S_{s}),(\tilde{S}_{1},\ldots,\tilde{S}_{s})\in\operatorname*{Skew}(n)^{s}$ . Using the properties of the inner products again, we obtain the explicit expression (see section C.2 for details),

\tilde{\phi}^{*}(S_{1},\ldots,S_{s})=\left\{S_{j}-\operatorname*{Asym}\left(hU_{j}^{T}\sum\nolimits_{i=1}^{s}a_{ij}d\mathbb{P}_{A_{i}}^{*}(S_{i})\Omega_{j}^{T}\right)\right\}_{j=1}^{s}.

(15)

Similarly, we consider the same identity and techniques to obtain the explicit expression for $\tilde{\psi}_{k}^{*}$ (see section C.3 for details),

\tilde{\psi}_{k}^{*}(S_{1},\ldots,S_{s})=\operatorname*{Asym}\left(U_{k}^{T}\sum\nolimits_{i=1}^{s}a_{ik}d\mathbb{P}_{A_{i}}^{*}(S_{i})\right).

(16)

Combining (14), (15), and (16), $(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\psi}_{k})^{*}(S)$ for $S\in\operatorname*{Skew}(n)$ can be computed as follows,


$\displaystyle[left=\empheqlbrace\,]S_{j}-\operatorname{Asym}\left(hU_{j}^{T}\sum\nolimits_{l=1}^{s}a_{lj}d\mathbb{P}_{A_{l}}^{}(S_{l})\Omega_{j}^{T}\right)$	$\displaystyle=S\cdot\delta_{ij},\qquad j=1,2\dots s,$	(17a)
$\displaystyle(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\psi}_{k})^{*}(S)$	$\displaystyle=\operatorname{Asym}\left(U_{k}^{T}\sum\nolimits_{l=1}^{s}a_{lk}d\mathbb{P}_{A_{l}}^{}(S_{l})\right).$	(17b)

We can first calculate $\{S_{l}\}_{l=1}^{s}$ from (17a) by using fixed-point iteration, and then substitute the result into (17b) to obtain $(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\psi}_{k})^{*}(S)$ . This combined with (13) gives an explicit formula for (9a).

We now derive an explicit formula for (9d). Notice that $U_{i}$ depends on $g_{0}$ by the nonlinear system in (8), so we can use the method of variations again. If we vary $g_{0}\rightarrow g_{0}(t)$ such that $g_{0}(0)=g_{0}$ and $\dot{g}_{0}(0)=g_{0}\delta g_{0}$ , we obtain

U_{i}(t)=\mathbb{P}\left(g_{0}(t)+h\sum\nolimits_{j=1}^{s}a_{ij}U_{j}(t)\Omega_{j}\right).

Differentiating on both sides and letting $\dot{U}_{i}=U_{i}Y_{i}$ , where $Y_{i}\in\operatorname*{Skew}(n)$ is a left-trivialized tangent vector, we obtain

Y_{i}=d\mathbb{P}_{A_{i}}\left(g_{0}\delta g_{0}+h\sum\nolimits_{j=1}^{s}a_{ij}U_{j}Y_{j}\Omega_{j}\right),

which can be rewritten as

Y_{i}-d\mathbb{P}_{A_{i}}\left(h\sum\nolimits_{j=1}^{s}a_{ij}U_{j}Y_{j}\Omega_{j}\right)=d\mathbb{P}_{A_{i}}(g_{0}\delta g_{0}).

Similar to the approach used for $X_{ik}$ , we introduce a new map,

\tilde{\varphi}\colon\operatorname*{Skew}(n)\longrightarrow\operatorname*{Skew}(n)^{s},\quad\delta g_{0}\longmapsto\left\{d\mathbb{P}_{A_{i}}(g_{0}\delta g_{0})\right\}_{i=1}^{s},

then $Y_{i}=(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\varphi})(\delta g_{0})$ . The explicit expression for $\tilde{\varphi}^{*}$ can be written as (see section C.4 for details),

\tilde{\varphi}^{*}(S_{1},\ldots,S_{s})=\operatorname*{Asym}\left(g_{0}^{T}\sum\nolimits_{i=1}^{s}d\mathbb{P}_{A_{i}}^{*}(S_{i})\right).

(18)

As such, $(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\varphi})^{*}(S)$ can be computed as follows,


$\displaystyle[left=\empheqlbrace\,]S_{j}-\operatorname{Asym}\left(hU_{j}^{T}\sum\nolimits_{l=1}^{s}a_{lj}d\mathbb{P}_{A_{l}}^{}(S_{l})\Omega_{j}^{T}\right)$	$\displaystyle=S\cdot\delta_{ij},\qquad j=1,2,\ldots s,$	(19a)
$\displaystyle(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\varphi})^{*}(S)$	$\displaystyle=\operatorname{Asym}\left(g_{0}^{T}\sum\nolimits_{l=1}^{s}d\mathbb{P}_{A_{l}}^{}(S_{l})\right).$	(19b)

Now, we compute $\frac{\partial\tilde{F}}{\partial g_{0}}$ , which gives us (9d) explicitly (see section C.5 for details),

-p_{0}=h\sum\limits_{i=1}^{s}b_{i}(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\varphi})^{*}\left(\frac{\partial L}{\partial U}(U_{i},\Omega_{i})+\operatorname*{Asym}(U_{i}^{T}g_{1}\Lambda\Omega_{i}^{T})\right)+\operatorname*{Asym}(g_{0}^{T}g_{1}\Lambda).

(20)

For equation (9e), it is easy to show that

p_{1}=\operatorname*{Asym}\left(g_{1}^{T}\left(g_{0}+h\sum\nolimits_{i=1}^{s}b_{i}U_{i}\Omega_{i}\right)\Lambda^{T}\right).

(21)

Combining (13), (7), (8), (20), and (21), we obtain a Lagrangian variational integrator on $SO(n)$ ,


$\displaystyle[left=\empheqlbrace\,]\begin{split}0&=h\sum_{i=1}^{s}b_{i}(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\psi}_{k})^{}\left(\frac{\partial L}{\partial U}(U_{i},\Omega_{i})+\operatorname{Asym}(U_{i}^{T}g_{1}\Lambda\Omega_{i}^{T})\right)\\ &\qquad+b_{k}\left(\frac{\partial L}{\partial\Omega}(U_{k},\Omega_{k})+\operatorname*{Asym}(U_{k}^{T}g_{1}\Lambda)\right)\end{split}$		(22a)
$\displaystyle g_{1}$	$\displaystyle=\mathbb{P}\left(g_{0}+h\sum\nolimits_{i=1}^{s}b_{i}U_{i}\Omega_{i}\right),$	(22b)
$\displaystyle U_{i}$	$\displaystyle=\mathbb{P}\left(g_{0}+h\sum\nolimits_{j=1}^{s}a_{ij}U_{j}\Omega_{j}\right),$	(22c)
$\displaystyle-p_{0}$	$\displaystyle=h\sum\limits_{i=1}^{s}b_{i}(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\varphi})^{}\left(\frac{\partial L}{\partial U}(U_{i},\Omega_{i})+\operatorname{Asym}(U_{i}^{T}g_{1}\Lambda\Omega_{i}^{T})\right)+\operatorname*{Asym}(g_{0}^{T}g_{1}\Lambda),$	(22d)
$\displaystyle p_{1}$	$\displaystyle=\operatorname*{Asym}\left(g_{1}^{T}\left(g_{0}+h\sum\nolimits_{i=1}^{s}b_{i}U_{i}\Omega_{i}\right)\Lambda^{T}\right).$	(22e)

The integrator gives an update map $(g_{0},p_{0})\mapsto(g_{1},p_{1})$ on the cotangent bundle. In particular, one may solve for $(\{\Omega\}_{k=1}^{s},g_{1},\{U_{i}\}_{i=1}^{s},\Lambda)$ simultaneously using equations (22a)–(22d). Unfortunately, while (22b)–(22d) can be written in fixed-point form for the variables $g_{1}$ , $\{U_{i}\}_{i=1}^{s}$ , and $\Lambda$ , (22b) is implicit for $\{\Omega\}_{k=1}^{s}$ . However, we can arrive at a fixed-point form for (22a) on the Hamiltonian side if $L$ is hyperregular.

4. Hamiltonian Variational Integrators on the Rotation Group $SO(n)$

It is often desirable to transform a numerical method from the Lagrangian side to the Hamiltonian side, which is possible if $L$ is hyperregular. The same mechanical system can be represented by either a Lagrangian or a Hamiltonian, and they are related by the Legendre transform. In Euclidean space, this gives

and we have the following relationships,

\frac{\partial L}{\partial\dot{q}}(q,\dot{q})=p,\quad\frac{\partial H}{\partial p}(q,p)=\dot{q},\quad\frac{\partial L}{\partial q}(q,\dot{q})=-\frac{\partial H}{\partial q}(q,p).

Given a Lie group $G$ , a left-trivialized Lagrangian $L\colon G\times\mathfrak{g}\rightarrow\mathbb{R}$ , and its corresponding Hamiltonian $H\colon G\times\mathfrak{g}^{*}\rightarrow\mathbb{R}$ , it is easy to verify that similar relationships hold for these trivializations,

\frac{\partial L}{\partial\epsilon}(g,\epsilon)=\mu,\quad\frac{\partial H}{\partial\mu}(g,\mu)=\epsilon,\quad\frac{\partial L}{\partial g}(g,\epsilon)=-\frac{\partial H}{\partial g}(g,\mu).

(23)

Using (23) and denoting the corresponding internal cotangent vectors by $\{\mu_{k}\}_{k=1}^{s}$ , (22) can be transformed to the Hamiltonian form as


$\displaystyle[left=\empheqlbrace\,]\mu_{k}$	$\displaystyle=-\operatorname{Asym}(U_{k}^{T}g_{1}\Lambda)+h\sum\limits_{i=1}^{s}\frac{b_{i}}{b_{k}}(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\psi}_{k})^{}\left(\frac{\partial H}{\partial U}(U_{i},\mu_{i})-\operatorname*{Asym}(U_{i}^{T}g_{1}\Lambda\Omega_{i}^{T})\right),$	(24a)
$\displaystyle g_{1}$	$\displaystyle=\mathbb{P}\left(g_{0}+h\sum\nolimits_{i=1}^{s}b_{i}U_{i}\Omega_{i}\right),$	(24b)
$\displaystyle U_{i}$	$\displaystyle=\mathbb{P}\left(g_{0}+h\sum\nolimits_{j=1}^{s}a_{ij}U_{j}\Omega_{j}\right),$	(24c)
$\displaystyle\operatorname*{Asym}(g_{0}^{T}g_{1}\Lambda)$	$\displaystyle=-p_{0}+h\sum\limits_{i=1}^{s}b_{i}(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\varphi})^{}\left(\frac{\partial H}{\partial U}(U_{i},\mu_{i})-\operatorname{Asym}(U_{i}^{T}g_{1}\Lambda\Omega_{i}^{T})\right),$	(24d)
$\displaystyle p_{1}$	$\displaystyle=\operatorname*{Asym}\left(g_{1}^{T}\left(g_{0}+h\sum\nolimits_{i=1}^{s}b_{i}U_{i}\Omega_{i}\right)\Lambda^{T}\right),$	(24e)
$\displaystyle\Omega_{i}$	$\displaystyle=\frac{\partial H}{\partial\mu}(U_{i},\mu_{i}).$	(24f)

In the above algorithm, $\Omega_{i}$ is given explicitly by (24f) and only serves to reduce the redundancy in the computations because they are used numerous times in the other expressions. Similarly, $(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1})^{*}$ shows up in both (24a) and (24d), so one can save computational effort by reusing the shared solution to (17a) and (19a). Then, the first four equations can be solved simultaneously by fixed-point iterations, meaning the variables $(\{\mu_{k}\}_{k=1}^{s},g_{1},\{U_{i}\}_{i=1}^{s},\Lambda)$ are updated concurrently in each iteration. Observe that the fixed-point form for $\Lambda$ in (24d) requires solving a Lyapunov equation. Finally, $p_{1}$ is solved explicitly in (24e) after solving for $(\{\mu_{k}\}_{k=1}^{s},g_{1},\{U_{i}\}_{i=1}^{s},\Lambda)$ . We shall call the integrators defined by (24) the variational polar decomposition method or VPD for short.

4.1. Lie–Poisson Integrator by Reduction

We consider a $G$ -invariant Hamiltonian system given by $H$ on the cotangent bundle $T^{*}G$ . In this case, Hamilton’s equations can be reduced to a Lie–Poisson system on $\mathfrak{g}^{*}$ . If the Hamiltonian is hyperregular, then both the Lagrangian and the corresponding constrained Galerkin discrete Lagrangian $L_{d}(g_{0},g_{1})$ will be $G$ -invariant. As such, (2) naturally reduces to yield a Lie–Poisson integrator (see Appendix B). We only consider the reduction on the Hamiltonian side due to the nature of the constrained Galerkin methods, which give an update map on the cotangent bundle.

The discrete Lagrangian we have constructed becomes

L_{d}(g_{0},g_{1})=\operatorname*{ext}_{\{\Omega_{i}\}_{i=1}^{s}}\sum\limits_{i=1}^{s}b_{i}\bm{l}(\Omega_{i}),

where

	$\displaystyle U_{i}$	$\displaystyle=\mathbb{P}\left(g_{0}+h\sum\nolimits_{j=1}^{s}a_{ij}U_{j}\Omega_{j}\right),$
	$\displaystyle g_{1}$	$\displaystyle=\mathbb{P}\left(g_{0}+h\sum\nolimits_{i=1}^{s}b_{i}U_{i}\Omega_{i}\right),$

and $\bm{l}\colon\mathfrak{g}\to\mathbb{R}$ is the reduced Lagrangian. It is easy to verify that our system is $G$ -invariant, meaning

L_{d}(g\cdot g_{0},g\cdot g_{1})=L_{d}(g_{0},g_{1})=\operatorname*{ext}_{\{\Omega_{i}\}_{i=1}^{s}}\sum\limits_{i=1}^{s}b_{i}\bm{l}(\Omega_{i}),

where

	$\displaystyle g\cdot U_{i}$	$\displaystyle=\mathbb{P}\left((g\cdot g_{0})+h\sum\nolimits_{j=1}^{s}a_{ij}(g\cdot U_{j}\Omega_{j})\right),$
	$\displaystyle g\cdot g_{1}$	$\displaystyle=\mathbb{P}\left((g\cdot g_{0})+h\sum\nolimits_{i=1}^{s}b_{i}(g\cdot U_{i}\Omega_{i})\right).$

Therefore, the variational integrator (24) can theoretically be reduced to a Lie–Poisson integrator. By letting $g_{0}^{T}g_{1}=f_{0}$ and $U_{i}^{T}g_{1}=\Theta_{i}$ , (24) simplifies to


$\displaystyle[left=\empheqlbrace\,]\mu_{k}$	$\displaystyle=-\operatorname{Asym}(\Theta_{k}\Lambda)-h\sum\limits_{i=1}^{s}\frac{b_{i}}{b_{k}}(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\psi}_{k})^{}(\operatorname*{Asym}(\Theta_{i}\Lambda\Omega_{i}^{T})),$	(25a)
$\displaystyle 0$	$\displaystyle=\operatorname*{Asym}\left(f_{0}^{T}+h\sum\nolimits_{i=1}^{s}b_{i}\Theta_{i}^{T}\Omega_{i}\right),$	(25b)
$\displaystyle\Theta_{i}^{T}$	$\displaystyle=\mathbb{P}\left(f_{0}^{T}+h\sum\nolimits_{j=1}^{s}a_{ij}\Theta_{j}^{T}\Omega_{j}\right),$	(25c)
$\displaystyle\operatorname*{Asym}(f_{0}\Lambda)$	$\displaystyle=-p_{0}-h\sum\limits_{i=1}^{s}b_{i}(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\varphi})^{}(\operatorname{Asym}(\Theta_{i}\Lambda\Omega_{i}^{T})),$	(25d)
$\displaystyle p_{1}$	$\displaystyle=\operatorname*{Asym}\left(\left(f_{0}^{T}+h\sum\nolimits_{i=1}^{s}b_{i}\Theta_{i}^{T}\Omega_{i}\right)\Lambda^{T}\right),$	(25e)
$\displaystyle\Omega_{i}$	$\displaystyle=\frac{\partial\bm{h}}{\partial\mu}(\mu_{i}),$	(25f)

where $\bm{h}$ is the reduced Hamiltonian. Multiplying by $g_{1}^{T}$ on both sides of $g_{1}=\mathbb{P}\left(g_{0}+h\sum\nolimits_{i=1}^{s}b_{i}U_{i}\Omega_{i}\right)$ yields

I=\mathbb{P}\left(f_{0}^{T}+h\sum\nolimits_{i=1}^{s}b_{i}\Theta_{i}^{T}\Omega_{i}\right).

Suppose that $h$ is small and $g_{0}$ and $g_{1}$ are close, then $\left(f_{0}^{T}+h\sum\nolimits_{i=1}^{s}b_{i}\Theta_{i}^{T}\Omega_{i}\right)$ is in the neighborhood of $I$ . By Lemma 1, this is equivalent to

\operatorname*{Asym}\left(f_{0}^{T}+h\sum\nolimits_{i=1}^{s}b_{i}\Theta_{i}^{T}\Omega_{i}\right)=0,

which can be regarded as the fixed-point equation for $f_{0}$ . The first four equations can be solved using fixed-point iteration for the variables $(\{\mu_{k}\}_{k=1}^{s},f_{0},\{\Theta_{i}\}_{i=1}^{s},\Lambda)$ as in our previous discussions. Then, $p_{1}$ can be calculated explicitly.

In the above algorithm, we also need to figure out the reduced version of $(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\psi}_{k})^{*}$ and $(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\varphi})^{*}$ . Note that (17a) and (17b) involve $U_{j}^{T}d\mathbb{P}_{A_{i}}^{*}$ ; in addition, $g_{0},g_{1},U_{i}$ , and $A_{i}=g_{0}+h\sum\nolimits_{j=1}^{s}a_{ij}U_{j}\Omega_{j}=U_{i}P_{i}$ are reduced, so we need a reduced version of $U_{j}^{T}d\mathbb{P}_{A_{i}}^{*}$ as well. Multiplying $A_{i}$ on the left by $g_{1}^{T}$ , we obtain

g_{1}^{T}A_{i}=f_{0}^{T}+h\sum\limits_{j=1}^{s}a_{ij}\Theta_{j}^{T}\Omega_{j}=(g_{1}^{T}U_{i})P_{i}.

Then, $(g_{1}^{T}U_{i})P_{i}$ is the polar decomposition of $f_{0}^{T}+h\sum\nolimits_{j=1}^{s}a_{ij}\Theta_{j}^{T}\Omega_{j}$ , and for $S\in\operatorname*{Skew}(n)$ ,

	$\displaystyle U_{j}^{T}d\mathbb{P}_{A_{i}}^{*}(S)$	$\displaystyle=U_{j}^{T}\cdot U_{i}\operatorname{Lyap}(P_{i},S^{T})=\Theta_{j}g_{1}^{T}U_{i}\operatorname{Lyap}(P_{i},S^{T}),$
		$\displaystyle=\Theta_{j}\mathbb{P}\left(f_{0}^{T}+h\sum\nolimits_{j=1}^{s}a_{ij}\Theta_{j}^{T}\Omega_{j}\right)\operatorname*{Lyap}(P_{i},S^{T}).$

This is the reduced version of $U_{j}^{T}d\mathbb{P}_{A_{i}}^{*}(S)$ , and so $(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\psi}_{k})^{*}(S)$ can be computed as follows,


	$\displaystyle[left=\empheqlbrace\,]S_{j}-\operatorname{Asym}\left(h\Theta_{j}\sum\nolimits_{l=1}^{s}a_{lj}d\mathbb{P}_{A_{l}}^{}(S_{l})\Omega_{j}^{T}\right)=S\cdot\delta_{ij},\qquad j=1,2\ldots s,$		(26a)
	$\displaystyle(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\psi}_{k})^{}(S)=\operatorname{Asym}\left(\Theta_{k}\sum\nolimits_{l=1}^{s}a_{lk}d\mathbb{P}_{A_{l}}^{*}(S_{l})\right),$		(26b)

where $\{A_{i}\}$ are redefined to be $A_{i}=f_{0}^{T}+h\sum\nolimits_{j=1}^{s}a_{ij}\Theta_{j}^{T}\Omega_{j}$ . For $(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\varphi})^{*}$ , we need to compute $g_{0}^{T}d\mathbb{P}_{A_{i}}^{*}(S)$ for some $S\in\operatorname*{Skew}(n)$ , which is given by

	$\displaystyle g_{0}^{T}d\mathbb{P}_{A_{i}}^{*}(S)$	$\displaystyle=g_{0}^{T}\cdot U_{i}\operatorname{Lyap}(P_{i},S^{T})=f_{0}g_{1}^{T}U_{i}\operatorname{Lyap}(P_{i},S^{T})$
		$\displaystyle=f_{0}\mathbb{P}\left(f_{0}^{T}+h\sum\nolimits_{j=1}^{s}a_{ij}\Theta_{j}^{T}\Omega_{j}\right)\operatorname*{Lyap}(P_{i},S^{T}).$

Hence, we have


	$\displaystyle[left=\empheqlbrace\,]S_{j}-\operatorname{Asym}\left(h\Theta_{j}\sum\nolimits_{l=1}^{s}a_{lj}d\mathbb{P}_{A_{l}}^{}(S_{l})\Omega_{j}^{T}\right)=S\cdot\delta_{ij},\qquad j=1,2\ldots s,$		(27a)
	$\displaystyle(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\varphi})^{}(S)=\operatorname{Asym}\left(f_{0}\sum\nolimits_{l=1}^{s}a_{lk}d\mathbb{P}_{A_{l}}^{*}(S_{l})\right).$		(27b)

5. Numerical Experiments

We test our methods and compare them to the variational Runge–Kutta–Munthe-Kaas (VRKMK) methods from Bogfjellmo and Marthinsen [3] on the dipole on a stick problem that they considered. In particular, our configuration space is $SO(3)$ , and its Lie algebra is identified with $\mathbb{R}^{3}$ . We shall only recall the mathematical expressions here, so for a thorough description of the system one should refer to the reference above. The right-trivialized and left-trivialized Hamiltonians, $H^{R},H^{L}\colon SO(3)\times\mathbb{R}^{3}\to\mathbb{R}$ , can be written as

	$\displaystyle H^{R}(g,p)$	$\displaystyle=\frac{1}{2}p^{T}gJ^{-1}g^{T}p+U(g),$		(28)
	$\displaystyle H^{L}(g,\tilde{p})$	$\displaystyle=\frac{1}{2}\tilde{p}^{T}J^{-1}\tilde{p}+U(g),$		(29)

where $U(g)=me_{3}^{T}ge_{3}+q\beta\left(\lVert gy_{+}^{0}-z\rVert_{2}^{-1}-\lVert gy_{-}^{0}-z\rVert_{2}^{-1}\right)$ . Note that $J=m\,\text{diag}(1+\alpha^{2},1,\alpha^{2})$ , with $m=1$ and $\alpha=0.1$ . The constant vectors are $y_{\pm}^{0}=(0,\pm\alpha,-1)^{T}$ and $z=(0,0,-3/2)^{T}$ . Lastly, $q=\beta=1$ , and $\|\cdot\|_{2}$ is our usual Euclidean norm.

Both forms of the Hamiltonian are written here because while our method was developed for left-trivialized systems using Hamilton’s principle, their method was developed for right-trivialized systems using the Hamilton–Pontryagin principle. As a result, both discretizations yield symplectic variational integrators for the Hamiltonian with their corresponding choice of trivialization. In particular, we have

\tilde{p}=g^{T}p

(30)

as a relationship between the dual elements of the corresponding trivialization: $p$ is the dual representation of $\xi\in\mathfrak{g}$ for the right-trivialization $\dot{g}=\xi g$ , and $\tilde{p}$ is the dual representation of $\tilde{\xi}\in\mathfrak{g}$ for the left-trivialization $\dot{g}=g\tilde{\xi}$ . Consequently, we note that the left-trivialized cotangent vector $\frac{\partial H}{\partial g}$ in VPD (24) is computed as

\frac{\partial H}{\partial g}(g,\tilde{p})=\operatorname*{Asym}\left(g^{-1}\nabla_{g}H^{L}(g,\tilde{p})\right),

where $\nabla_{g}H^{L}(g,\tilde{p})$ is the matrix derivative of $H^{L}(g,\tilde{p})$ with respect to $g$ . On the other hand, the right-trivialized cotangent vector is computed as

\frac{\partial H}{\partial g}(g,p)=\operatorname*{Asym}\left(\nabla_{g}H^{R}(g,p)g^{-1}\right),

when implementing the VRKMK method.

For our tests, we also have the same initial data from [3],

	$\displaystyle g(0)$	$\displaystyle=\begin{pmatrix}1&0&0\\ 0&0&-1\\ 0&1&0\end{pmatrix},$
	$\displaystyle p(0)$	$\displaystyle=g(0)Jg(0)^{T}e_{2},$

for the VRKMK methods, and so (30) gives $\tilde{p}(0)$ to complete the initial data for the VPD methods. Since both methods involve fixed-point iteration, we terminate the processes when the norm $\|\cdot\|_{2}$ between the current and previous iteration is less than $10^{-15}$ for each variable. In particular, the norm for vectors is the Euclidean norm and for matrices it is the induced matrix norm. Lastly, we ran these implementations in Wolfram Mathematica 12 on a personal computer with the following specifications: Operating system: Windows 10; CPU: 3.7GHz AMD Ryzen 5 5600X; RAM: G.Skill Trident Z Neo Series 32Gb DDR4 3600 C16; Storage: 1TB Rocket Nvme PCIe 4.0 writing/reading up to 5000/4400 MB/s.

5.1. Order Tests

$\dfrac{1}{2}$	$\dfrac{1}{2}$
	1

(a) 2^nd order Gauss–Legendre method

$0$	$0$	$0$	$0$
$\dfrac{1}{2}$	$\dfrac{1}{2}$	$0$	$0$
$1$	$-1$	$2$	$0$
	$\dfrac{1}{6}$	$\dfrac{2}{3}$	$\dfrac{1}{6}$

(b) 3^rd order Runge–Kutta method

$\dfrac{1}{2}-\dfrac{\sqrt{3}}{6}$	$\dfrac{1}{4}$	$\dfrac{1}{4}-\dfrac{\sqrt{3}}{6}$
$\dfrac{1}{2}+\dfrac{\sqrt{3}}{6}$	$\dfrac{1}{4}+\dfrac{\sqrt{3}}{6}$	$\dfrac{1}{4}$
	$\dfrac{1}{2}$	$\dfrac{1}{2}$

$\dfrac{1}{2}-\dfrac{\sqrt{15}}{10}$	$\dfrac{5}{36}$	$\dfrac{2}{9}-\dfrac{\sqrt{15}}{15}$	$\dfrac{5}{36}-\dfrac{\sqrt{15}}{30}$
$\dfrac{1}{2}$	$\dfrac{5}{36}+\dfrac{\sqrt{15}}{24}$	$\dfrac{2}{9}$	$\dfrac{5}{36}-\dfrac{\sqrt{15}}{24}$
$\dfrac{1}{2}+\dfrac{\sqrt{15}}{10}$	$\dfrac{5}{36}+\dfrac{\sqrt{15}}{30}$	$\dfrac{2}{9}+\dfrac{\sqrt{15}}{15}$	$\dfrac{5}{36}$
	$\dfrac{5}{18}$	$\dfrac{4}{9}$	$\dfrac{5}{18}$

(d) 6^th order Gauss–Legendre method

Table 1. Butcher tableaux for the comparison tests

Refer to caption — Figure 1. Order comparison plots between VRKMK and VPD methods: The black-dashed lines are references for the corresponding orders.

We run both methods based on the one-, two-, and three-stage Gauss–Legendre methods and a third-order Runge–Kutta method for comparisons, and these methods are shown as Butcher tableaux in Table 1.

We compute the errors in $(g(0.5),p(0.5))$ from VRKMK and errors in $(g(0.5),\tilde{p}(0.5))$ from VPD with respect to a reference solution and these errors are shown in Figure 1. The reference solution was calculated using the sixth-order VPD method with step size $h=0.001$ . The errors for VRKMK is computed as $\|g(0.5)^{T}p(0.5)-\tilde{p}_{\text{ref}}\|_{2}+\|g(0.5)-g_{\text{ref}}\|_{2}$ while the errors for VPD is computed as $\|\tilde{p}(0.5)-\tilde{p}_{\text{ref}}\|_{2}+\|g(0.5)-g_{\text{ref}}\|_{2}$ . The black-dashed lines are reference lines for the appropriate orders: In the sixth order comparison plot, the errors from the fixed-point iteration are dominating for smaller step sizes, so the theoretical order is not observed in those regimes. Otherwise, both methods achieve their theoretical orders, and they are quite comparable, noting that VPD exhibits a noticeable, smaller error constant in the second order method.

5.2. Long-Term Behaviors

The long-term energy behaviors for both methods are presented in Figures 2 – 5. For second, third, and fourth order, we fixed the step size $h=0.01$ and ran $10^{5}$ integration steps to observe the energy errors. For second order, the energy errors have magnitude orders of $10^{-4}$ for VRKMK and $10^{-5}$ for VPD; for third order, they have magnitude orders of $10^{-7}$ for VRKMK and $10^{-6}$ for VPD; for fourth order, they have magnitude orders of $10^{-9}$ for both methods.

For the sixth order, we consider step size $h=1/26$ to avoid the regime where the errors in the fixed-point iteration are dominating. This step size corresponds to the third point from the right in the order comparison plots. We also run $10^{5}$ integration steps and observe the energy errors in Figure 5. The energy error for the VPD method is stable with an order of magnitude of $10^{-10}$ . On the other hand, the energy error in VRKMK exhibits a slow increase over the integrating time span. We investigated this phenomenon and attribute this to the framework of each method. In VPD, $g_{1}$ in (24b) and the internal points $U_{i}$ in (24c) are updated in each integration step via polar decomposition. As a result, the Lie group structure is always preserved up to machine precision for both the configuration space elements and internal points. However, in VRKMK, both $g_{1}$ and the internal points $U_{i}$ are updated by the left action of $SO(n)$ . This left action is a matrix multiplication which is performed to machine precision, but with a sufficient number of multiplications, the round-off error will still accumulate. Consequently, the Lie group structure is not as well-preserved in comparison, and this is illustrated in the comparison of orthogonality errors $\|gg^{T}-I_{3}\|_{2}$ in Figure 6.

We also observed that projecting the sixth-order VRKMK method onto the rotation group using the polar decomposition at each timestep does not recover the near energy preservation typical of symplectic methods. Presumably, this is because the projection subtly compromises the symplecticity of the method. This is consistent with the observation made in [10], that both the Lie group structure and symplecticity needs to be preserved for the methods to exhibit near energy preservation.

5.3. Runtime Comparison

We also have data for runtime comparison of VRKMK and VPD for the two-stage and three-stage Gauss–Legendre methods in Figure 7. For each step size $h$ , we recorded the runtime of each method in seconds and repeated the execution 256 times to compute the average runtime. The runtime for VPD methods is on average longer than for VRKMK methods. This may be due in parts to the Lyapunov solutions and multiple layers of fixed-point iterations required in the polar decomposition in (4) and $(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1})^{*}$ in the same equations (17a) and (19a).

6. Conclusion

By applying the polar decomposition in a constrained Galerkin method, we constructed high-order Lie group symplectic integrators on $SO(n)$ , which we refer to as the variational polar decomposition method. Furthermore, the integrator can be reduced to a Lie–Poisson integrator when the underlying mechanical system is group-invariant. We performed extensive numerical experiments comparing our proposed method to the variational Runge–Kutta–Munthe-Kaas method from [3]. These comparisons provide insights into each method and highlight an advantage of our proposed method, which is the preservation of Lie group structure up to machine precision for both the configurations and internal points. This appears to be important for high-order methods to exhibit the near energy preservation that one expects for symplectic integrators when applied to dynamics on Lie groups.

For future work, it would be interesting to explore the generalization of the proposed method to symmetric spaces, by applying the generalized polar decomposition [15]. This may be of particular interest in the construction of accelerated optimization algorithms on symmetric spaces, following the use of time-adaptive variational integrators for symplectic optimization [6] based on the discretization of Bregman Lagrangian and Hamiltonian flows [16, 5]. Examples of symmetric spaces include the space of Lorentzian metrics, the space of symmetric positive-definite matrices, and the Grassmannian.

Acknowledgements

This research has been supported in part by NSF under grants DMS-1010687, CMMI-1029445, DMS-1065972, CMMI-1334759, DMS-1411792, DMS-1345013, DMS-1813635, CCF-2112665, by AFOSR under grant FA9550-18-1-0288, and by the DoD under grant HQ00342010023 (Newton Award for Transformative Ideas during the COVID-19 Pandemic). The authors would also like to thank Geir Bogfjellmo and Håkon Marthinsen for sharing the code for their VRKMR method from [3], which we used in the numerical comparisons with our method.

Appendix A Constrained Galerkin Methods

Our Galerkin variational integrator will involve a discrete Lagrangian that differs from the classical construction in [14]. Traditionally in the linear space setting, (1) is approximated with a quadrature rule

\displaystyle L_{d}(q_{0},q_{1})

\displaystyle=h\sum\limits_{i=1}^{s}b_{i}L(q(c_{i}h),\dot{q}(c_{i}h))=h\sum\limits_{i=1}^{s}L(Q_{i},\dot{Q}_{i}),

and $q(t)$ is approximated by polynomials with degree less than or equal to $s$ with fixed endpoints $q_{0}$ and $q_{1}$ . By choosing interpolation nodes $\{d_{0}^{\nu}\}_{\nu=0}^{s}$ with $d_{0}^{0}=0,d_{0}^{s}=1$ and interpolation values $\{q_{0}^{\nu}\}_{\nu=0}^{s}$ with $q_{0}^{0}=q_{0}$ and $q_{0}^{s}=q_{1}$ , $q(t)$ can be expressed as $q(t)=\sum\nolimits_{\nu=0}^{s}q_{0}^{\nu}\phi_{\nu}\left(\frac{t}{h}\right)$ on $[0,h]$ , where $\phi_{\nu}(t)$ are Lagrange polynomials corresponding to the nodes $\{d_{0}^{\nu}\}_{\nu=0}^{s}$ . By taking variations with respect to the interpolation values $\{q_{0}^{\nu}\}_{\nu=1}^{s-1}$ , $q(t)$ is varied over the finite-dimensional function space,

M^{s}=\{q(t)\mid q(t)\in P_{s}[0,h],q(0)=q_{0},q(h)=q_{1}\}.

Consider the quadrature approximation of the action integral, viewing it as a function of the endpoint and interpolation values,

F(q_{0},q_{1},\{q_{0}^{\nu}\}_{\nu=1}^{s-1})=h\sum\limits_{i=1}^{s}b_{i}L(q(c_{i}h),\dot{q}(c_{i}h)),

where $q(t)=\sum\nolimits_{\nu=0}^{s}q_{0}^{\nu}\phi_{\nu}(\frac{t}{h})$ . Then, a variational integrator (2) can be obtained as follows,

\left\{\begin{aligned} 0&=\frac{\partial F}{\partial q_{0}^{\nu}},\qquad\nu=1,2\dots s-1,\\ -p_{0}&=\frac{\partial F}{\partial q_{0}},\\ p_{1}&=\frac{\partial F}{\partial q_{1}}.\end{aligned}\right.

(31)

However, (31) is often impractical due to the complexity of evaluating $q(c_{i}h)$ and $\dot{q}(c_{i}h)$ , which involve the Lagrange interpolating polynomials. In addition, computing the root of a system of nonlinear equations in (31) can be challenging because the formulation of a fixed-point problem could be complicated, and convergence issues could arise. In contrast, Runge-Kutta methods are already in fixed-point formulation and are convergent as long as consistency is satisfied.

Now, note that the finite-dimensional function space $M^{s}$ does not depend on the choice of nodes $\{d_{0}^{\nu}\}_{\nu=1}^{s-1}$ , and by a tricky elimination of $\phi_{\nu}(t)$ , (31) can be simplified to yield


$\displaystyle q_{1}$	$\displaystyle=q_{0}+h\sum\limits_{i=1}^{s}b_{i}\dot{Q}_{i},$	$\displaystyle\quad p_{1}$	$\displaystyle=p_{0}+h\sum\limits_{i=1}^{s}b_{i}\dot{P}_{i},$	(32a)
$\displaystyle Q_{i}$	$\displaystyle=q_{0}+h\sum\limits_{j=1}^{s}a_{ij}\dot{Q}_{j},$	$\displaystyle\quad P_{i}$	$\displaystyle=p_{0}+h\sum\limits_{j=1}^{s}\tilde{a}_{ij}\dot{P}_{j},$	(32b)
$\displaystyle P_{i}$	$\displaystyle=\frac{\partial L}{\partial\dot{q}}(Q_{i},\dot{Q}_{i}),$	$\displaystyle\quad\dot{P}_{i}$	$\displaystyle=\frac{\partial L}{\partial q}(Q_{i},\dot{Q}_{i}),$	(32c)

where $\tilde{a}_{ij}=b_{j}(1-\frac{a_{ji}}{b_{i}})$ . When transformed to the Hamiltonian side, (32) simply recovers the symplectic partitioned Runge–Kutta method.

The same variational integrator can be derived in a much simpler way: Instead of performing variations on internal points $\{q_{0}^{\nu}\}_{\nu=1}^{s-1}$ , we will perform variations on the internal derivatives $\{\dot{Q}\}_{i=1}^{s}$ , subject to the constraint $q_{1}=q_{0}+h\sum\nolimits_{i=1}^{s}b_{i}\dot{Q}_{i}$ . Then, the internal points are reconstructed using the fundamental theorem of calculus and the degree of precision of the quadrature rule to obtain $Q_{i}=q_{0}+h\sum\nolimits_{j=1}^{s}a_{ij}\dot{Q}_{j}$ . Now, consider the quadrature approximation of the action integral, viewed as a function of the endpoint values and the internal velocities,

\tilde{F}(q_{0},q_{1},\{\dot{Q}_{i}\}_{i=1}^{s},\lambda)=h\sum\limits_{i=1}^{s}b_{i}L(Q_{i},\dot{Q}_{i})+\lambda^{T}\left(q_{1}-q_{0}-h\sum\nolimits_{i=1}^{s}b_{i}\dot{Q}_{i}\right),

where $\lambda$ is a Lagrange multiplier that enforces the constraint. Then, a variational integrator (2) can be obtained as follows,

\left\{\begin{aligned} 0&=\frac{\partial\tilde{F}}{\partial\dot{Q}_{i}},\qquad i=1,2\ldots s,\\ 0&=\frac{\partial\tilde{F}}{\partial\lambda},\\ Q_{i}&=q_{0}+h\sum\limits_{j=1}^{s}a_{ij}\dot{Q}_{j},\\ -p_{0}&=\frac{\partial\tilde{F}}{\partial q_{0}},\\ p_{1}&=\frac{\partial\tilde{F}}{\partial q_{1}}.\end{aligned}\right.

(33)

Explicitly expanding (33) and eliminating the Lagrange multiplier yields (32) in a much more straightforward manner. This same technique, known as the constrained Galerkin method, is adopted on the rotation group $SO(n)$ in order to directly obtain a variational integrator in fixed-point form.

Appendix B Euler–Poincaré & Lie–Poisson Reductions

When the Lagrangian $L$ or Hamiltonian $H$ is left-invariant, the mechanical system can be symmetry reduced to evolve on the Lie algebra $\mathfrak{g}$ or its dual $\mathfrak{g}^{*}$ , respectively, assuming some regularity. On the Lagrangian side, this corresponds to Euler–Poincaré reduction [13], which is described by the Euler–Poincaré equations,

\frac{d}{dt}\left(\frac{\partial\bm{l}}{\partial\epsilon}\right)=\operatorname*{ad}\nolimits_{\epsilon}^{*}\left(\frac{\partial\bm{l}}{\partial\epsilon}\right).

The above is expressed in terms of the reduced Lagrangian $\bm{l}(\epsilon)=\bm{l}(g^{-1}\dot{g})=L(g,\dot{g})$ . As a result, this can be described in terms of a reduced variational principle $\delta\int_{a}^{b}\bm{l}(\epsilon(t))\,dt=0$ with respect to constrained variations of form $\delta\epsilon=\dot{\eta}+[\epsilon,\eta]$ , where $\eta(t)$ is an arbitrary path in the Lie algebra $\mathfrak{g}$ that vanishes at the endpoints, namely $\eta(a)=\eta(b)=0$ . The constraint on the variations $\delta\epsilon$ are induced by the condition that $\epsilon=g^{-1}\dot{g}$ together with arbitrary variations $\delta g$ that vanish at the endpoints.

On the Hamiltonian side, this corresponds to Lie–Poisson reduction [13]. Recall that the Lie–Poisson structure on $\mathfrak{g}^{*}$ is given by

\{F,G\}(\mu)=\left\langle\mu,\left[\frac{\partial F}{\partial\mu},\frac{\partial G}{\partial\mu}\right]\right\rangle,

and together with the reduced Hamiltonian $\bm{h}(\mu)$ , they gives the Lie–Poisson equations on $\mathfrak{g}^{*}$ ,

\frac{d\mu}{dt}=\text{ad}_{\frac{\partial\bm{h}}{\partial\mu}}^{*}(\mu).

If the discrete Lagrangian $L_{d}(g_{0},g_{1})$ is also $G$ -invariant, meaning $L_{d}(g\cdot g_{0},g\cdot g_{1})=L_{d}(g_{0},g_{1})$ for some $g\in G$ , then (2) can be reduced to a Lie–Poisson integrator [9],

\left\{\begin{aligned} {\mu}_{0}&=\bm{l}_{d}^{{}^{\prime}}(f_{0})f_{0}^{-1},\\ {\mu}_{1}&=f_{0}^{-1}\cdot{\mu}_{0}\cdot f_{0},\end{aligned}\right.

(34)

where $\bm{l}_{d}(f_{0})=L_{d}(e,f_{0})$ . This algorithm preserves the coadjoint orbits and, hence, the Poisson structure on $\mathfrak{g}^{*}$ .

Appendix C Detailed Derivations for the Lagrangian Variational Integrators

C.1. Derivations of $\frac{\partial\tilde{F}}{\partial\Omega_{k}}$

Initially, we have

	$\displaystyle\left.\frac{d}{dt}\right\|_{t=0}\tilde{F}(g_{0},g_{1},\ldots,\Omega_{k}(t),\ldots,\Lambda)$	$\displaystyle=h\sum\limits_{i=1}^{s}b_{i}\left\langle\frac{\partial L}{\partial U}(U_{i},\Omega_{i}),X_{ik}\right\rangle+hb_{k}\left\langle\frac{\partial L}{\partial\Omega}(U_{k},\Omega_{k}),\delta\Omega_{k}\right\rangle$
		$\displaystyle\qquad+\left\langle\Lambda,\operatorname*{Asym}\left(hg_{1}^{T}\sum\nolimits_{i=1}^{s}b_{i}U_{i}X_{ik}\Omega_{i}+hg_{1}^{T}b_{k}U_{k}\delta\Omega_{k}\right)\right\rangle,$

where we can use the properties of the inner products to express the last term as follows,

h\sum\limits_{i=1}^{s}b_{i}\langle\operatorname*{Asym}(U_{i}^{T}g_{1}\Lambda\Omega_{i}^{T}),X_{ik}\rangle+hb_{k}\langle\operatorname*{Asym}(U_{k}^{T}g_{1}\Lambda),\delta\Omega_{k}\rangle.

Then, we continue using equation (12) to obtain,

	$\displaystyle\left.\frac{d}{dt}\right\|_{t=0}\tilde{F}(g_{0},g_{1},\ldots,\Omega_{k}(t),\ldots,\Lambda)$	$\displaystyle=h\sum\limits_{i=1}^{s}b_{i}\left\langle\frac{\partial L}{\partial U}(U_{i},\Omega_{i})+\operatorname*{Asym}(U_{i}^{T}g_{1}\Lambda\Omega_{i}^{T}),X_{ik}\right\rangle$
		$\displaystyle\qquad+hb_{k}\left\langle\frac{\partial L}{\partial\Omega}(U_{k},\Omega_{k})+\operatorname*{Asym}(U_{k}^{T}g_{1}\Lambda),\delta\Omega_{k}\right\rangle$
		$\displaystyle=h\sum\limits_{i=1}^{s}b_{i}\left\langle\frac{\partial L}{\partial U}(U_{i},\Omega_{i})+\operatorname*{Asym}(U_{i}^{T}g_{1}\Lambda\Omega_{i}^{T}),h(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\psi}_{k})(\delta\Omega_{k})\right\rangle$
		$\displaystyle\qquad+hb_{k}\left\langle\frac{\partial L}{\partial\Omega}(U_{k},\Omega_{k})+\operatorname*{Asym}(U_{k}^{T}g_{1}\Lambda),\delta\Omega_{k}\right\rangle$
		$\displaystyle=h^{2}\sum\limits_{i=1}^{s}b_{i}\left\langle(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\psi}_{k})^{}\left(\frac{\partial L}{\partial U}(U_{i},\Omega_{i})+\operatorname{Asym}(U_{i}^{T}g_{1}\Lambda\Omega_{i}^{T})\right),\delta\Omega_{k}\right\rangle$
		$\displaystyle\qquad+hb_{k}\left\langle\frac{\partial L}{\partial\Omega}(U_{k},\Omega_{k})+\operatorname*{Asym}(U_{k}^{T}g_{1}\Lambda),\delta\Omega_{k}\right\rangle.$

We finally have

\frac{\partial\tilde{F}}{\partial\Omega_{k}}=h^{2}\sum\limits_{i=1}^{s}b_{i}(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\psi}_{k})^{*}\left(\frac{\partial L}{\partial U}(U_{i},\Omega_{i})+\operatorname*{Asym}(U_{i}^{T}g_{1}\Lambda\Omega_{i}^{T})\right)+hb_{k}\left(\frac{\partial L}{\partial\Omega}(U_{k},\Omega_{k})+\operatorname*{Asym}(U_{k}^{T}g_{1}\Lambda)\right).

C.2. Explicit Expression for $\tilde{\phi}^{*}$ : A Derivation

Let us consider $(S_{1},\ldots,S_{s}),(\tilde{S}_{1},\ldots,\tilde{S}_{s})\in\operatorname*{Skew}(n)^{s}$ , and so

	$\displaystyle\langle\tilde{\phi}^{*}(S_{1},\ldots,S_{s}),(\tilde{S}_{1},\ldots,\tilde{S}_{s})\rangle$	$\displaystyle=\langle(S_{1},\ldots,S_{s}),\tilde{\phi}(\tilde{S}_{1},\ldots,\tilde{S}_{s})\rangle$
		$\displaystyle=\left\langle(S_{1},\ldots,S_{s}),\left\{\tilde{S}_{i}-d\mathbb{P}_{A_{i}}\left(h\sum\nolimits_{j=1}^{s}a_{ij}U_{j}\tilde{S}_{j}\Omega_{j}\right)\right\}_{i=1}^{s}\right\rangle$
		$\displaystyle=\sum\limits_{i=1}^{s}\left\langle S_{i},\tilde{S}_{i}-d\mathbb{P}_{A_{i}}\left(h\sum\nolimits_{j=1}^{s}a_{ij}U_{j}\tilde{S}_{j}\Omega_{j}\right)\right\rangle$
		$\displaystyle=\sum\limits_{i=1}^{s}\langle S_{i},\tilde{S}_{i}\rangle-\sum\limits_{i=1}^{s}\operatorname{tr}\left(d\mathbb{P}_{A_{i}}^{}(S_{i})\left(h\sum\nolimits_{j=1}^{s}a_{ij}U_{j}\tilde{S}_{j}\Omega_{j}\right)^{T}\right)$
		$\displaystyle=\sum\limits_{i=1}^{s}\langle S_{i},\tilde{S}_{i}\rangle-\sum\limits_{i,j=1}^{s}\operatorname{tr}\left(\left(ha_{ij}U_{j}^{T}d\mathbb{P}_{A_{i}}^{}(S_{i})\Omega_{j}^{T}\right)\tilde{S}_{j}^{T}\right)$
		$\displaystyle=\sum\limits_{i=1}^{s}\langle S_{i},\tilde{S}_{i}\rangle-\sum\limits_{j=1}^{s}\operatorname{tr}\left(\left(hU_{j}^{T}\sum\nolimits_{i=1}^{s}a_{ij}d\mathbb{P}_{A_{i}}^{}(S_{i})\Omega_{j}^{T}\right)\tilde{S}_{j}^{T}\right)$
		$\displaystyle=\sum\limits_{j=1}\left\langle S_{j}-\operatorname{Asym}\left(hU_{j}^{T}\sum\nolimits_{i=1}^{s}a_{ij}d\mathbb{P}_{A_{i}}^{}(S_{i})\Omega_{j}^{T}\right),\tilde{S}_{j}\right\rangle.$

This gives us equation (15).

C.3. Explicit Expression for $\tilde{\psi}_{k}^{*}$ : A Derivation

Consider $(S_{1},\ldots,S_{s}),\in\operatorname*{Skew}(n)^{s}$ and $\tilde{S}\in\operatorname*{Skew}(n)$ , then

	$\displaystyle\langle\tilde{\psi}_{k}^{*}(S_{1},\ldots,S_{s}),\tilde{S}\rangle$	$\displaystyle=\langle(S_{1},\ldots,S_{s}),\tilde{\psi}_{k}(\tilde{S})\rangle$
		$\displaystyle=\left\langle(S_{1},\ldots,S_{s}),\left\{d\mathbb{P}_{A_{i}}(a_{ik}U_{k}\tilde{S})\right\}_{i=1}^{s}\right\rangle$
		$\displaystyle=\sum\limits_{i=1}^{s}\operatorname{tr}\left(d\mathbb{P}_{A_{i}}^{}(S_{i})(a_{ik}U_{k}\tilde{S})^{T}\right)$
		$\displaystyle=\sum\limits_{i=1}^{s}\operatorname{tr}\left((a_{ik}U_{k}^{T}d\mathbb{P}_{A_{i}}^{}(S_{i}))\tilde{S}^{T}\right)$
		$\displaystyle=\left\langle\operatorname{Asym}\left(U_{k}^{T}\sum\nolimits_{i=1}^{s}a_{ik}d\mathbb{P}_{A_{i}}^{}(S_{i})\right),\tilde{S}\right\rangle.$

This gives us equation (16).

C.4. Explicit Expression for $\tilde{\varphi}^{*}$ : A Derivation

Let us derive $\tilde{\varphi}^{*}$ by considering $(S_{1},\ldots,S_{s})\in\operatorname*{Skew}(n)^{s}$ and $\tilde{S}\in\operatorname*{Skew}(n)$ , and so

	$\displaystyle\langle\tilde{\varphi}^{*}(S_{1},\ldots,S_{s}),\tilde{S}\rangle$	$\displaystyle=\left\langle(S_{1},\ldots,S_{s}),\{d\mathbb{P}_{A_{i}}(g_{0}\tilde{S})\}_{i=1}^{s}\right\rangle$
		$\displaystyle=\sum\limits_{i=1}^{s}\operatorname{tr}\left(d\mathbb{P}_{A_{i}}^{}(S_{i})(g_{0}\tilde{S})^{T}\right)$
		$\displaystyle=\sum\limits_{i=1}^{s}\operatorname{tr}\left((g_{0}^{T}d\mathbb{P}_{A_{i}}^{}(S_{i}))\tilde{S}^{T}\right)$
		$\displaystyle=\left\langle\operatorname{Asym}\left(g_{0}^{T}\sum\nolimits_{i=1}^{s}d\mathbb{P}_{A_{i}}^{}(S_{i})\right),\tilde{S}\right\rangle.$

This gives us equation (18).

C.5. Derivation of $\frac{\partial\tilde{F}}{\partial g_{0}}$

We compute

	$\displaystyle\left.\frac{d}{dt}\right\|_{t=0}\tilde{F}(g_{0}(t),g_{1},\{\Omega_{i}\}_{i=1}^{s},\Lambda)$	$\displaystyle=h\sum\limits_{i=1}^{s}b_{i}\left\langle\frac{\partial L}{\partial U}(U_{i},\Omega_{i}),Y_{i}\right\rangle+\left\langle\Lambda,\operatorname*{Asym}\left(g_{1}^{T}\left(g_{0}\delta g_{0}+h\sum\nolimits_{i=1}^{s}b_{i}U_{i}Y_{i}\Omega_{i}\right)\right)\right\rangle$
		$\displaystyle=h\sum\limits_{i=1}^{s}b_{i}\left\langle\frac{\partial L}{\partial U}(U_{i},\Omega_{i}),Y_{i}\right\rangle+\operatorname{tr}\left(\Lambda(g_{1}^{T}g_{0}\delta g_{0})^{T}\right)+h\sum\limits_{i=1}^{s}\operatorname{tr}\left(\Lambda(b_{i}g_{1}^{T}U_{i}Y_{i}\Omega_{i})^{T}\right)$
		$\displaystyle=h\sum\limits_{i=1}^{s}b_{i}\left\langle\frac{\partial L}{\partial U}(U_{i},\Omega_{i}),Y_{i}\right\rangle+\operatorname{tr}\left((g_{0}^{T}g_{1}\Lambda)\delta g_{0}^{T}\right)+h\sum\limits_{i=1}^{s}\operatorname{tr}\left(b_{i}(U_{i}^{T}g_{1}\Lambda\Omega_{i}^{T})Y_{i}\right)$
		$\displaystyle=h\sum\limits_{i=1}^{s}b_{i}\left\langle\frac{\partial L}{\partial U}(U_{i},\Omega_{i})+\operatorname{Asym}(U_{i}^{T}g_{1}\Lambda\Omega_{i}^{T}),Y_{i}\right\rangle+\langle\operatorname{Asym}(g_{0}^{T}g_{1}\Lambda),\delta g_{0}\rangle$
		$\displaystyle=h\sum\limits_{i=1}^{s}b_{i}\left\langle\frac{\partial L}{\partial U}(U_{i},\Omega_{i})+\operatorname*{Asym}(U_{i}^{T}g_{1}\Lambda\Omega_{i}^{T}),(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\varphi})(\delta g_{0})\right\rangle$
		$\displaystyle\qquad+\langle\operatorname*{Asym}(g_{0}^{T}g_{1}\Lambda),\delta g_{0}\rangle$

Thus, we have

\frac{\partial\tilde{F}}{\partial g_{0}}=h\sum\limits_{i=1}^{s}b_{i}(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\varphi})^{*}\left(\frac{\partial L}{\partial U}(U_{i},\Omega_{i})+\operatorname*{Asym}(U_{i}^{T}g_{1}\Lambda\Omega_{i}^{T})\right)+\operatorname*{Asym}(g_{0}^{T}g_{1}\Lambda).

References

Absil et al. [2008] P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008.
Bartels and Stewart [1972] R. H. Bartels and G. W. Stewart. Solution of the matrix equation $ax+xb=c$ [f4]. Communications of the ACM, 15(9):820–826, 1972.
Bogfjellmo and Marthinsen [2016] G. Bogfjellmo and H. Marthinsen. High-order symplectic partitioned Lie group methods. Foundations of Computational Mathematics, 16:493–530, 2016.
Celledoni and Owren [2002] E. Celledoni and B. Owren. A class of intrinsic schemes for orthogonal integration. SIAM Journal on Numerical Analysis, 40(6):2069–2084, 2002.
Duruisseaux and Leok [2021] V. Duruisseaux and M. Leok. A variational formulation of accelerated optimization on Riemannian manifolds. 2021.
Duruisseaux et al. [2021] V. Duruisseaux, J. Schmitt, and M. Leok. Adaptive Hamiltonian variational integrators and applications to symplectic accelerated optimization. SIAM Journal on Scientific Computing, 43(4):A2949–A2980, 2021. doi: 10.1137/20M1383835.
Golub et al. [1979] G. Golub, S. Nash, and C. Van Loan. A Hessenberg-Schur method for the problem $ax+xb=c$ . IEEE Transactions on Automatic Control, 24(6):909–913, 1979.
Hall and Leok [2017] J. Hall and M. Leok. Lie group spectral variational integrators. Foundations of Computational Mathematics, 17:199–257, 2017.
J.E. Marsden and Shkoller [1999] S. Pekarsky J.E. Marsden and S. Shkoller. Discrete Euler–Poincaré and Lie-Poisson equations. Nonlinearity, 12:1647–1662, 1999.
Lee et al. [2007] T. Lee, M. Leok, and N. H. McClamroch. Lie group variational integrators for the full body problem in orbital mechanics. Celest. Mech. Dyn. Astr., 98(2):121–144, 2007.
Leok [2005] M. Leok. Generalized Galerkin variational integrators. arXiv:math/0508360, 2005.
Leok and Shingel [2012] M. Leok and T. Shingel. General techniques for constructing variational integrators. Frontiers of Mathematics in China, 7:273––303, 2012.
Marsden and Ratiu [1999] J. E. Marsden and T. S. Ratiu. Introduction to Mechanics and Symmetry, volume 17 of Texts in Applied Mathematics. Springer-Verlag, second edition, 1999.
Marsden and West [2001] J. E. Marsden and M. West. Discrete mechanics and variational integrators. Acta Numer., 10:357–514, 2001.
Munthe-Kaas et al. [2001] Hans Z Munthe-Kaas, GRW Quispel, and Antonella Zanna. Generalized polar decompositions on lie groups with involutive automorphisms. Foundations of Computational Mathematics, 1(3):297–324, 2001.
Wibisono et al. [2016] A. Wibisono, A. Wilson, and M. Jordan. A variational perspective on accelerated methods in optimization. Proceedings of the National Academy of Sciences, 113(47):E7351–E7358, 2016.

	$\displaystyle\langle\tilde{\varphi}^{*}(S_{1},\ldots,S_{s}),\tilde{S}\rangle$	$\displaystyle=\left\langle(S_{1},\ldots,S_{s}),\{d\mathbb{P}_{A_{i}}(g_{0}\tilde{S})\}_{i=1}^{s}\right\rangle$
		$\displaystyle=\sum\limits_{i=1}^{s}\operatorname{tr}\left(d\mathbb{P}_{A_{i}}^{}(S_{i})(g_{0}\tilde{S})^{T}\right)$
		$\displaystyle=\sum\limits_{i=1}^{s}\operatorname{tr}\left((g_{0}^{T}d\mathbb{P}_{A_{i}}^{}(S_{i}))\tilde{S}^{T}\right)$
		$\displaystyle=\left\langle\operatorname{Asym}\left(g_{0}^{T}\sum\nolimits_{i=1}^{s}d\mathbb{P}_{A_{i}}^{}(S_{i})\right),\tilde{S}\right\rangle.$

	$\displaystyle\left.\frac{d}{dt}\right\|_{t=0}\tilde{F}(g_{0}(t),g_{1},\{\Omega_{i}\}_{i=1}^{s},\Lambda)$	$\displaystyle=h\sum\limits_{i=1}^{s}b_{i}\left\langle\frac{\partial L}{\partial U}(U_{i},\Omega_{i}),Y_{i}\right\rangle+\left\langle\Lambda,\operatorname*{Asym}\left(g_{1}^{T}\left(g_{0}\delta g_{0}+h\sum\nolimits_{i=1}^{s}b_{i}U_{i}Y_{i}\Omega_{i}\right)\right)\right\rangle$
		$\displaystyle=h\sum\limits_{i=1}^{s}b_{i}\left\langle\frac{\partial L}{\partial U}(U_{i},\Omega_{i}),Y_{i}\right\rangle+\operatorname{tr}\left(\Lambda(g_{1}^{T}g_{0}\delta g_{0})^{T}\right)+h\sum\limits_{i=1}^{s}\operatorname{tr}\left(\Lambda(b_{i}g_{1}^{T}U_{i}Y_{i}\Omega_{i})^{T}\right)$
		$\displaystyle=h\sum\limits_{i=1}^{s}b_{i}\left\langle\frac{\partial L}{\partial U}(U_{i},\Omega_{i}),Y_{i}\right\rangle+\operatorname{tr}\left((g_{0}^{T}g_{1}\Lambda)\delta g_{0}^{T}\right)+h\sum\limits_{i=1}^{s}\operatorname{tr}\left(b_{i}(U_{i}^{T}g_{1}\Lambda\Omega_{i}^{T})Y_{i}\right)$
		$\displaystyle=h\sum\limits_{i=1}^{s}b_{i}\left\langle\frac{\partial L}{\partial U}(U_{i},\Omega_{i})+\operatorname{Asym}(U_{i}^{T}g_{1}\Lambda\Omega_{i}^{T}),Y_{i}\right\rangle+\langle\operatorname{Asym}(g_{0}^{T}g_{1}\Lambda),\delta g_{0}\rangle$
		$\displaystyle=h\sum\limits_{i=1}^{s}b_{i}\left\langle\frac{\partial L}{\partial U}(U_{i},\Omega_{i})+\operatorname*{Asym}(U_{i}^{T}g_{1}\Lambda\Omega_{i}^{T}),(\tilde{\pi}_{i}\circ\tilde{\phi}^{-1}\circ\tilde{\varphi})(\delta g_{0})\right\rangle$
		$\displaystyle\qquad+\langle\operatorname*{Asym}(g_{0}^{T}g_{1}\Lambda),\delta g_{0}\rangle$

High-order symplectic Lie group methods on S​O​(n)SO(n) using the polar decomposition