This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\chapterstyle

article \setbeforesubsecskip1ex \setaftersubsecskip-0.25em \setsubsecheadstyle

On the differential equation Ξ˜Λ™=(Ξ˜π–³βˆ’Ξ˜)β€‹Ξ˜\dot{\Theta}=(\Theta^{\mathsf{T}}-\Theta)\Theta with Θ∈S​O​(n)\Theta\in{SO(n)}

Gerd S. Schmidt    Christian Ebenbauer    Frank AllgΓΆwer
Institute for Systems Theory and Automatic Control
Abstract

In this note we consider the global convergence properties of the differential equation Ξ˜Λ™=(Ξ˜π–³βˆ’Ξ˜)β€‹Ξ˜\dot{\Theta}=(\Theta^{\mathsf{T}}-\Theta)\Theta with Θ∈S​O​(n)\Theta\in{SO(n)}, which is a gradient flow of the function f:S​O​(n)→ℝ,Ξ˜β†¦2​nβˆ’2​tr(Θ)f:SO(n)\rightarrow\mathbb{R},\Theta\mapsto{2n-2\mathop{\mathrm{tr}}\left({\Theta}\right)}. Many of the presented results are not new, but scattered throughout literature. The motivation of this note is to summarize and extend the convergence results known from literature. Rather than giving an exhaustive list of references, the results are presented in a self-contained fashion.

In this note, we discuss the properties of a function and a differential equation on a smooth manifold. If we speak about a manifoldΒ β„³\mathcal{M} of dimension mm we always mean a smooth manifold in the sense of [1], i.e. the subset of some ℝk\mathbb{R}^{k} with kβ‰₯mk\geq{m} and β„³\mathcal{M} is locally diffeomorphic to ℝm\mathbb{R}^{m}. In the context of this note, we need the notions of measure zero and dense. A set AβŠ‚β„³A\subset\mathcal{M} of a manifold β„³\mathcal{M} is a set of measure zero if there is a collection of smooth charts {Ul,Ο•l}\{U_{l},\phi_{l}\} whose domains cover AA and such that Ο•l​(A∩Ul)\phi_{l}(A\cap{U_{l}}) have measure zero in ℝn\mathbb{R}^{n}, i.e. the Ο•l​(A∩Ul)\phi_{l}(A\cap{U_{l}}) can be covered for any Ξ΅\varepsilon by a countable collection of open balls whose volumes sum up to less than Ξ΅\varepsilon, for details see e.g. [2, Chapter 10]. To define dense, we need the topological closure AΒ―\overline{A} of a set AβŠ‚β„³A\subset\mathcal{M}, i.e. the intersection of all closed sets in β„³\mathcal{M} that contain AA. A dense subset of a smooth manifold β„³\mathcal{M} is a set AβŠ‚β„³A\subset\mathcal{M} such that the topological closure fulfills AΒ―=β„³\overline{A}=\mathcal{M}, see e.g. [2, Appendix, Topology]. AA is dense if and only if every nonempty open subset of XX has non-empty intersection with AA. The complement ℝnβˆ–A\mathbb{R}^{n}\setminus{A} of a set of measure zero AβŠ‚β„nA\subset\mathbb{R}^{n} is dense in ℝn\mathbb{R}^{n}, since if there is a point xβˆˆβ„nx\in\mathbb{R}^{n} such that there is an open UβŠ‚β„nU\subset\mathbb{R}^{n} with x∈Ux\in{U} and U∩(ℝnβˆ–A)=βˆ…U\cap{(\mathbb{R}^{n}\setminus{A})}=\emptyset, then AA contains an open set and cannot have measure zero, see also [3, Chapter 2].

Here, we consider a function and a differential equation on the set of special orthogonal matrices SO(n)={Ξ˜βˆˆβ„nΓ—n|Ξ˜βˆ’1=Ξ˜π–³,det(Θ)=1}{SO(n)}=\{\Theta\in\mathbb{R}^{n\times{n}}\rvert{\Theta^{-1}=\Theta^{\mathsf{T}},~\det(\Theta)=1}\}. S​O​(n)SO(n) is a smooth manifold of dimension n​(nβˆ’1)2\tfrac{n(n-1)}{2} with the subspace topology induced by ℝnΓ—n\mathbb{R}^{n\times{n}}. The tangent space TΞ˜β€‹S​O​(n)T_{\Theta}SO(n) at Θ\Theta is given by

(1) TΞ˜β€‹S​O​(n)={Xβˆˆβ„nΓ—n|X=ΩΘ,Ξ©βˆˆβ„nΓ—n,Ξ©=βˆ’Ξ©π–³}​.\begin{aligned} T_{\Theta}SO(n)&=\{X\in\mathbb{R}^{n\times{n}}\rvert{X=\Omega\Theta,~\Omega\in\mathbb{R}^{n\times{n}},~\Omega=-\Omega^{\mathsf{T}}}\}\end{aligned}\text{.}

The Riemannian metric g:TΞ˜β€‹S​O​(n)Γ—TΞ˜β€‹S​O​(n)→ℝg:T_{\Theta}SO(n)\times{T_{\Theta}SO(n)}\rightarrow\mathbb{R} induced by the standard Euclidean metric on S​O​(n)SO(n) is given by

(2) g​(Ξ©1β€‹Ξ˜,Ξ©2β€‹Ξ˜)=tr((Ξ©1β€‹Ξ˜)𝖳​Ω2β€‹Ξ˜)=tr(Ξ©1𝖳​Ω2)​.g(\Omega_{1}\Theta,\Omega_{2}\Theta)=\mathop{\mathrm{tr}}\left({(\Omega_{1}\Theta)^{\mathsf{T}}\Omega_{2}\Theta}\right)=\mathop{\mathrm{tr}}\left({\Omega_{1}^{\mathsf{T}}\Omega_{2}}\right)\text{.}

In the following, we define the differential and the Hessian of a function f:S​O​(n)→ℝf:SO(n)\rightarrow\mathbb{R} at a point Θ0∈S​O​(n)\Theta_{0}\in{SO(n)}. Let Ξ“:(βˆ’Ξ΅,Ξ΅)β†’S​O​(n)\Gamma:(-\varepsilon,\varepsilon)\rightarrow{SO(n)} be a smooth curve with Γ˙​(t)=Ω​(t)​Γ​(t)\dot{\Gamma}(t)=\Omega(t)\Gamma(t), Γ​(0)=Θ0\Gamma(0)=\Theta_{0} and Γ˙​(0)=Ξ©0β€‹Ξ˜0\dot{\Gamma}(0)=\Omega_{0}\Theta_{0} with Ξ©0βˆˆβ„nΓ—n\Omega_{0}\in\mathbb{R}^{n\times{n}} and Ξ©0=βˆ’Ξ©0𝖳\Omega_{0}=-\Omega_{0}^{\mathsf{T}}. The differential d​fΘ0:TΘ0​S​O​(n)β†’Tf​(Θ0)​ℝ≅ℝd{f}_{\Theta_{0}}:T_{\Theta_{0}}SO(n)\rightarrow{T_{f(\Theta_{0})}\mathbb{R}\cong\mathbb{R}} of a function f:S​O​(n)→ℝf:SO(n)\rightarrow\mathbb{R} at a point Θ0\Theta_{0} evaluated at Ξ©0β€‹Ξ˜0∈TΘ0​S​O​(n)\Omega_{0}\Theta_{0}\in{T_{\Theta_{0}}SO(n)} is defined by

(3) dfΘ0(Ξ©0Θ0)=dd​t|t=0(fβˆ˜Ξ“)(t).d{f}_{\Theta_{0}}(\Omega_{0}\Theta_{0})=\tfrac{d}{d{t}}\rvert_{t=0}(f\circ\Gamma)(t)\text{.}

The critical points of ff are the points Θ0\Theta_{0} where d​fΘ0d{f}_{\Theta_{0}} is not surjective. Because of dim(Tf​(Θ0)​ℝ)=1\dim(T_{f(\Theta_{0})}\mathbb{R})=1, this means that these are the points Θ0\Theta_{0} where d​fΘ0=0d{f}_{\Theta_{0}}=0. The gradient of ff is defined as the unique vector field gradf\mathop{\mathrm{grad}}{f} with

(4) d​fΘ0​(Ξ©0β€‹Ξ˜0)=g​(gradf​(Θ0),Ξ©0β€‹Ξ˜0)​,df_{\Theta_{0}}(\Omega_{0}\Theta_{0})=g(\mathop{\mathrm{grad}}{f}(\Theta_{0}),\Omega_{0}\Theta_{0})\text{,}

see e.g. [2, Chapter 11]. The Hessian Hf​(Θ0)H_{f}(\Theta_{0}) of ff at a critical point Θ0\Theta_{0} evaluated at (Ξ©0β€‹Ξ˜0,Ξ©0β€‹Ξ˜0)(\Omega_{0}\Theta_{0},\Omega_{0}\Theta_{0}) is defined by

(5) Hf(Θ0)(Ξ©0Θ0,Ξ©0Θ0)=d2d​t2|t=0(fβˆ˜Ξ“)(t).H_{f}(\Theta_{0})(\Omega_{0}\Theta_{0},\Omega_{0}\Theta_{0})=\tfrac{d^{2}}{d{t}^{2}}\rvert_{t=0}(f\circ\Gamma)(t)\text{.}

Since the Hessian at a critical point is bilinear and symmetric, we have for (Ξ©1+Ξ©2)β€‹Ξ˜0∈TΘ0​S​O​(n)(\Omega_{1}+\Omega_{2})\Theta_{0}\in{T_{\Theta_{0}}SO(n)} with Ξ©1,2=βˆ’Ξ©1,2𝖳\Omega_{1,2}=-\Omega_{1,2}^{\mathsf{T}} the equality

(6) Hf​(Θ0)​((Ξ©1+Ξ©2)β€‹Ξ˜0,(Ξ©1+Ξ©2)β€‹Ξ˜0)=Hf​(Θ0)​(Ξ©1β€‹Ξ˜0,Ξ©1β€‹Ξ˜0)+2​Hf​(Θ0)​(Ξ©1β€‹Ξ˜0,Ξ©2β€‹Ξ˜0)+Hf​(Θ0)​(Ξ©2β€‹Ξ˜0,Ξ©2β€‹Ξ˜0)​.H_{f}(\Theta_{0})((\Omega_{1}+\Omega_{2})\Theta_{0},(\Omega_{1}+\Omega_{2})\Theta_{0})\\ =H_{f}(\Theta_{0})(\Omega_{1}\Theta_{0},\Omega_{1}\Theta_{0})+2H_{f}(\Theta_{0})(\Omega_{1}\Theta_{0},\Omega_{2}\Theta_{0})+H_{f}(\Theta_{0})(\Omega_{2}\Theta_{0},\Omega_{2}\Theta_{0})\text{.}

As a consequence, the value Hf​(Θ0)​(Ξ©1β€‹Ξ˜0,Ξ©2β€‹Ξ˜0)H_{f}(\Theta_{0})(\Omega_{1}\Theta_{0},\Omega_{2}\Theta_{0}) can be computed utilizing the values Hf​(Θ0)​(Ξ©1β€‹Ξ˜0,Ξ©1β€‹Ξ˜0)H_{f}(\Theta_{0})(\Omega_{1}\Theta_{0},\Omega_{1}\Theta_{0}), Hf​(Θ0)​(Ξ©2β€‹Ξ˜0,Ξ©2β€‹Ξ˜0)H_{f}(\Theta_{0})(\Omega_{2}\Theta_{0},\Omega_{2}\Theta_{0}), Hf​(Θ0)​((Ξ©1+Ξ©2)β€‹Ξ˜0,(Ξ©1+Ξ©2)β€‹Ξ˜0)H_{f}(\Theta_{0})((\Omega_{1}+\Omega_{2})\Theta_{0},(\Omega_{1}+\Omega_{2})\Theta_{0}) and (6). For details on the Hessian at a critical point, see [4, AppendixΒ C.5].

Lemma 1

Consider the function f:S​O​(n)→ℝ,Ξ˜β†¦nβˆ’tr(Θ)f:SO(n)\rightarrow\mathbb{R},\Theta\mapsto{n-\mathop{\mathrm{tr}}\left({\Theta}\right)}.

  1. a)

    The differential d​fΘ0d{f}_{\Theta_{0}} of ff at Θ0\Theta_{0} is given for any Ξ©0β€‹Ξ˜0∈TΘ0​S​O​(n)\Omega_{0}\Theta_{0}\in{T_{\Theta_{0}}SO(n)} by

    d​fΘ0​(Ξ©0β€‹Ξ˜0)=βˆ’12​tr(Θ0​(Θ0βˆ’Ξ˜0𝖳)​Ω0β€‹Ξ˜0)d{f}_{\Theta_{0}}(\Omega_{0}\Theta_{0})=-\frac{1}{2}\mathop{\mathrm{tr}}\left({\Theta_{0}(\Theta_{0}-\Theta_{0}^{\mathsf{T}})\Omega_{0}\Theta_{0}}\right)

    and the critical points of ff are given by

    (7) β„±={Θ0∈SO(n)|Θ0𝖳=Θ0}.\mathcal{F}=\{\Theta_{0}\in{SO(n)}\rvert{\Theta_{0}^{\mathsf{T}}=\Theta_{0}}\}\text{.}

    Furthermore, the gradientΒ gradf​(Θ0)\mathop{\mathrm{grad}}{f}(\Theta_{0}) at Θ0\Theta_{0} is given by

    (8) gradf​(Θ0)=12​(Θ0βˆ’Ξ˜0𝖳)β€‹Ξ˜0​.\mathop{\mathrm{grad}}{f}({\Theta_{0}})=\frac{1}{2}(\Theta_{0}-\Theta_{0}^{\mathsf{T}})\Theta_{0}\text{.}
  2. b)

    The Hessian Hf​(Θ0)H_{f}(\Theta_{0}) at a critical point Θ0\Theta_{0} is given by

    Hf​(Θ0)​(Ξ©1β€‹Ξ˜0,Ξ©2β€‹Ξ˜0)=12​tr(Ξ©1π–³β€‹Ξ˜0​Ω2+Ξ©2π–³β€‹Ξ˜0​Ω1)​.H_{f}(\Theta_{0})(\Omega_{1}\Theta_{0},\Omega_{2}\Theta_{0})=\frac{1}{2}\mathop{\mathrm{tr}}\left({\Omega_{1}^{\mathsf{T}}\Theta_{0}\Omega_{2}+\Omega_{2}^{\mathsf{T}}\Theta_{0}\Omega_{1}}\right)\text{.}
  3. c)

    The set of critical points β„±\mathcal{F} has the following properties:

    1. i)

      β„±=βˆͺk=0⌊n2βŒ‹β„±k\mathcal{F}=\cup_{k=0}^{\lfloor{\tfrac{n}{2}}\rfloor}\mathcal{F}_{k} where

      (9) β„±k={Θ0∈SO(n)|Θ0=Θ0𝖳,tr(Θ0)=nβˆ’4k}.\mathcal{F}_{k}=\{\Theta_{0}\in{SO(n)}\rvert{\Theta_{0}=\Theta_{0}^{\mathsf{T}},~\mathop{\mathrm{tr}}\left({\Theta_{0}}\right)=n-4k}\}\text{.}
    2. ii)

      Each β„±k\mathcal{F}_{k} is connected and isolated, i.e. there exists a neighborhood UU of each β„±k\mathcal{F}_{k} such that Uβˆ©β„±l=βˆ…U\cap\mathcal{F}_{l}=\emptyset for all lβ‰ kl\neq{k}.

    3. iii)

      β„±k\mathcal{F}_{k} are compact submanifolds of dimension 2​k​(nβˆ’2​k)2k(n-2k) and the tangent space at Θ0βˆˆβ„±k\Theta_{0}\in\mathcal{F}_{k} is TΘ0β„±k={Ξ£βˆˆβ„nΓ—n|Ξ£=Σ𝖳}∩TΘ0SO(n)T_{\Theta_{0}}\mathcal{F}_{k}=\{\Sigma\in\mathbb{R}^{n\times{n}}\rvert{\Sigma=\Sigma^{\mathsf{T}}}\}\cap{T_{\Theta_{0}}SO(n)}.

    4. iv)

      For every k∈{0,…,⌊n2βŒ‹}k\in\{0,\ldots,\lfloor{\tfrac{n}{2}}\rfloor\} and every Θ0βˆˆβ„±k\Theta_{0}\in\mathcal{F}_{k} we have

      kerHf(Θ0)={X∈TΘ0SO(n)|Β Hf​(Θ0)​(X,Y)=0Β for allΒ Y∈TΘ0​S​O​(n)Β }=TΘ0β„±k.\ker{H}_{f}(\Theta_{0})=\{X\in{T_{\Theta_{0}}SO(n)}\rvert\\ {\text{ $H_{f}(\Theta_{0})(X,Y)=0$ for all $Y\in{T_{\Theta_{0}}SO(n)}$ }}\}=T_{\Theta_{0}}\mathcal{F}_{k}\text{.}
  4. d)

    ff has a unique minimum at Θ0=I\Theta_{0}=I, the other critical points are saddle points.

Corollary 2

The differential equation

(10) Ξ˜Λ™=(Ξ˜π–³βˆ’Ξ˜)β€‹Ξ˜\dot{\Theta}=(\Theta^{\mathsf{T}}-\Theta)\Theta

is the gradient flow of f:S​O​(n)→ℝ,Ξ˜β†¦2​nβˆ’2​tr(Θ)f:SO(n)\rightarrow\mathbb{R},\Theta\mapsto{2n-2\mathop{\mathrm{tr}}\left({\Theta}\right)} with respect to the Riemannian metric (2).

In the following we prove Lemma 1.
Proof. a) As stated above, Ξ“:(βˆ’Ξ΅,Ξ΅)β†’S​O​(n)\Gamma:(-\varepsilon,\varepsilon)\rightarrow{SO(n)} is a differentiable curve with Γ˙​(t)=Ω​(t)​Γ​(t)\dot{\Gamma}(t)=\Omega(t)\Gamma(t), Γ​(0)=Θ0\Gamma(0)=\Theta_{0} and Γ˙​(0)=Ξ©0β€‹Ξ˜0\dot{\Gamma}(0)=\Omega_{0}\Theta_{0} with Ξ©0βˆˆβ„nΓ—n\Omega_{0}\in\mathbb{R}^{n\times{n}} and Ξ©0=βˆ’Ξ©0𝖳\Omega_{0}=-\Omega_{0}^{\mathsf{T}}. Then

(11) d​fΘ0​(Ξ©0β€‹Ξ˜0)\displaystyle d{f}_{\Theta_{0}}(\Omega_{0}\Theta_{0}) =dd​t|t=0(nβˆ’tr(Ξ“(t)))=βˆ’tr(Ξ©0Θ0)\displaystyle=\frac{d}{d{t}}\rvert_{t=0}\left(n-\mathop{\mathrm{tr}}\left({\Gamma(t)}\right)\right)=-\mathop{\mathrm{tr}}\left({\Omega_{0}\Theta_{0}}\right)
=βˆ’12​tr(Θ0𝖳​Ω0𝖳+Ξ©0β€‹Ξ˜0)\displaystyle=-\frac{1}{2}\mathop{\mathrm{tr}}\left({\Theta_{0}^{\mathsf{T}}\Omega_{0}^{\mathsf{T}}+\Omega_{0}\Theta_{0}}\right)
=βˆ’12​tr((Θ0βˆ’Ξ˜0𝖳)​Ω0)\displaystyle=-\frac{1}{2}\mathop{\mathrm{tr}}\left({(\Theta_{0}-\Theta_{0}^{\mathsf{T}})\Omega_{0}}\right)
=βˆ’12​tr(Θ0𝖳​(Θ0βˆ’Ξ˜0𝖳)​Ω0β€‹Ξ˜0)​.\displaystyle=-\frac{1}{2}\mathop{\mathrm{tr}}\left({\Theta_{0}^{\mathsf{T}}(\Theta_{0}-\Theta_{0}^{\mathsf{T}})\Omega_{0}\Theta_{0}}\right)\text{.}

Therefore, the critical points ofΒ ff are given by

(12) {Θ0∈SO(n)|Θ0=Θ0𝖳}.\{\Theta_{0}\in{SO(n)}\rvert{\Theta_{0}=\Theta_{0}^{\mathsf{T}}}\}\text{.}

With the definition of the Riemannian metric by (2), the gradient gradfΘ\mathop{\mathrm{grad}}{f}_{\Theta} at Θ0\Theta_{0} is given by

(13) gradf​(Θ0)=12​(Θ0βˆ’Ξ˜0𝖳)β€‹Ξ˜0𝖳​.\mathop{\mathrm{grad}}{f}(\Theta_{0})=\frac{1}{2}(\Theta_{0}-\Theta_{0}^{\mathsf{T}})\Theta_{0}^{\mathsf{T}}\text{.}

b) Let Θ0\Theta_{0} denote a critical point of ff. As stated above, Ξ“:(βˆ’Ξ΅,Ξ΅)β†’S​O​(n)\Gamma:(-\varepsilon,\varepsilon)\rightarrow{SO(n)} is a differentiable curve with Γ˙​(t)=Ω​(t)​Γ​(t)\dot{\Gamma}(t)=\Omega(t)\Gamma(t), Γ​(0)=Θ0\Gamma(0)=\Theta_{0} and Γ˙​(0)=Ξ©0β€‹Ξ˜0\dot{\Gamma}(0)=\Omega_{0}\Theta_{0} with Ξ©0βˆˆβ„nΓ—n\Omega_{0}\in\mathbb{R}^{n\times{n}} and Ξ©0=βˆ’Ξ©0𝖳\Omega_{0}=-\Omega_{0}^{\mathsf{T}}. Then

(14) Hf​(Θ0)​(Ξ©0β€‹Ξ˜0,Ξ©0β€‹Ξ˜0)\displaystyle H_{f}(\Theta_{0})(\Omega_{0}\Theta_{0},\Omega_{0}\Theta_{0}) =d2d​t2|t=0(f(Ξ“(t))=βˆ’tr(Γ¨(t))|t=0\displaystyle=\frac{d^{2}}{d{t^{2}}}\rvert_{t=0}(f(\Gamma(t))=-\mathop{\mathrm{tr}}\left({\ddot{\Gamma}(t)}\right)\rvert_{t=0}
=βˆ’tr(Ξ©Λ™(t)Ξ“(t)+Ξ©(t)Ξ“Λ™(t))|t=0\displaystyle=-\mathop{\mathrm{tr}}\left({\dot{\Omega}(t)\Gamma(t)+\Omega(t)\dot{\Gamma}(t)}\right)\rvert_{t=0}
=βˆ’tr(Ξ©02β€‹Ξ˜0)=tr(Ξ©0π–³β€‹Ξ˜0​Ω)=tr(Θ0𝖳​Ω0π–³β€‹Ξ˜0​Ω0β€‹Ξ˜0)​,\displaystyle=-\mathop{\mathrm{tr}}\left({\Omega_{0}^{2}\Theta_{0}}\right)=\mathop{\mathrm{tr}}\left({\Omega_{0}^{\mathsf{T}}\Theta_{0}\Omega}\right)=\mathop{\mathrm{tr}}\left({\Theta_{0}^{\mathsf{T}}\Omega_{0}^{\mathsf{T}}\Theta_{0}\Omega_{0}\Theta_{0}}\right)\text{,}

where we utilized thatΒ tr(Ω˙​(0)β€‹Ξ˜0)=0\mathop{\mathrm{tr}}\left({\dot{\Omega}(0)\Theta_{0}}\right)=0 since Ω˙​(t)\dot{\Omega}(t) is skew symmetric for all tt and Θ0=Θ0𝖳\Theta_{0}=\Theta_{0}^{\mathsf{T}} since Θ0\Theta_{0} is a critical point. Utilizing (6) we get

(15) Hf​(Θ0)​(Ξ©1β€‹Ξ˜0,Ξ©2β€‹Ξ˜0)\displaystyle H_{f}(\Theta_{0})(\Omega_{1}\Theta_{0},\Omega_{2}\Theta_{0}) =12(Hf(Θ0)((Ξ©1+Ξ©2)Θ0,(Ξ©1+Ξ©2)Θ0)\displaystyle=\frac{1}{2}\biggl{(}H_{f}(\Theta_{0})((\Omega_{1}+\Omega_{2})\Theta_{0},(\Omega_{1}+\Omega_{2})\Theta_{0})
βˆ’Hf(Θ0)(Ξ©1Θ0,Ξ©1Θ0)βˆ’Hf(Θ0)(Ξ©2Θ0,Ξ©2Θ0))\displaystyle~~~~-H_{f}(\Theta_{0})(\Omega_{1}\Theta_{0},\Omega_{1}\Theta_{0})-H_{f}(\Theta_{0})(\Omega_{2}\Theta_{0},\Omega_{2}\Theta_{0})\biggr{)}
=12​tr((Ξ©1+Ξ©2)π–³β€‹Ξ˜0​(Ξ©1+Ξ©2)βˆ’Ξ©1π–³β€‹Ξ˜0​Ω1βˆ’Ξ©2π–³β€‹Ξ˜0​Ω2)\displaystyle=\frac{1}{2}\mathop{\mathrm{tr}}\left({\!\!(\Omega_{1}+\Omega_{2})^{\mathsf{T}}\Theta_{0}(\Omega_{1}+\Omega_{2})\!-\!\Omega_{1}^{\mathsf{T}}\Theta_{0}\Omega_{1}\!-\!\Omega_{2}^{\mathsf{T}}\Theta_{0}\Omega_{2}\!\!}\right)
=12​tr(Ξ©1π–³β€‹Ξ˜0​Ω2+Ξ©2π–³β€‹Ξ˜0​Ω1)​.\displaystyle=\frac{1}{2}\mathop{\mathrm{tr}}\left({\Omega_{1}^{\mathsf{T}}\Theta_{0}\Omega_{2}+\Omega_{2}^{\mathsf{T}}\Theta_{0}\Omega_{1}}\right)\text{.}

c)i) Since Θ0\Theta_{0} is symmetric, Θ0\Theta_{0} is orthonormally diagonalizable, i.e. Θ0=Π𝖳​D​Π\Theta_{0}=\Pi^{\mathsf{T}}D\Pi for some diagonalΒ DD and orthonormalΒ Ξ \Pi where the columns ofΒ Ξ \Pi are eigenvectors of Θ0\Theta_{0}. Since Θ0π–³β€‹Ξ˜0=I\Theta_{0}^{\mathsf{T}}\Theta_{0}=I, we getΒ D2=ID^{2}=I and consequently the eigenvalues areΒ Β±1\pm{1}. Since Θ0β‰ I\Theta_{0}\neq{I} andΒ det(Θ0)=1\det(\Theta_{0})=1, we always have an even number of negative eigenvalues. A similarity transformation leaves the trace invariant, hence a critical point Θ0\Theta_{0} fulfills

(16) tr(Θ0)=nβˆ’4​k​,\mathop{\mathrm{tr}}\left({\Theta_{0}}\right)=n-4k\text{,}

where k∈{0,…,⌊n2βŒ‹}k\in\{0,\ldots,\lfloor{\frac{n}{2}}\rfloor\} is the number of eigenvalue pairs which are βˆ’1-1. Β 
c)ii) We start by showing that each β„±k\mathcal{F}_{k} is path connected and thus connected. Let k∈{0,…,⌊n2βŒ‹}k\in\{0,\ldots,\lfloor{\frac{n}{2}}\rfloor\} be arbitrary but fixed and let Θ1,Θ2βˆˆβ„±k\Theta_{1},\Theta_{2}\in\mathcal{F}_{k}. Then there are orthogonal Ξ 1,Ξ 2\Pi_{1},\Pi_{2} such that Θ1=Ξ 1𝖳​D​Π1\Theta_{1}=\Pi_{1}^{\mathsf{T}}D\Pi_{1} and Θ2=Ξ 2𝖳​D​Π2\Theta_{2}=\Pi_{2}^{\mathsf{T}}D\Pi_{2}. Furthermore, there are real skew-symmetric matrices Ξ©1,Ξ©2\Omega_{1},\Omega_{2} such that Ξ 1=exp⁑(Ξ©1)\Pi_{1}=\exp(\Omega_{1}) and Ξ 2=exp⁑(Ξ©2)\Pi_{2}=\exp(\Omega_{2}) with exp\exp denoting the matrix exponential. Then Ξ±:[0,1]β†’S​O​(n)\alpha:[0,1]\rightarrow{SO(n)} defined by

(17) t↦exp⁑(Ξ©1𝖳​(1βˆ’t))​exp⁑(Ξ©2𝖳​t)​D​exp⁑(Ξ©2​t)​exp⁑(Ξ©1​(1βˆ’t))t\mapsto\exp\left(\Omega_{1}^{\mathsf{T}}(1-t)\right)\exp\left(\Omega_{2}^{\mathsf{T}}t\right)D\exp\left(\Omega_{2}t\right)\exp\left(\Omega_{1}(1-t)\right)

is a smooth curve in β„±k\mathcal{F}_{k} which connects Θ1\Theta_{1} and Θ2\Theta_{2}. Since Θ1,Θ2βˆˆβ„±k\Theta_{1},\Theta_{2}\in\mathcal{F}_{k} were arbitrary, this implies the path-connectedness of β„±k\mathcal{F}_{k}. To show that β„±k\mathcal{F}_{k} is isolated, we utilize that nβˆ’4l=f|β„±lβ‰ f|β„±k=nβˆ’4kn-4l=f\rvert_{\mathcal{F}_{l}}\neq{f\rvert_{\mathcal{F}_{k}}}=n-4k for lβ‰ kl\neq{k}. Then there is a Ρ​(l)\varepsilon(l) with (nβˆ’4​lβˆ’Ξ΅β€‹(l),nβˆ’4​l+Ρ​(l))∩(nβˆ’4​kβˆ’Ξ΅β€‹(l),nβˆ’4​k+Ρ​(l))=βˆ…(n-4l-\varepsilon(l),n-4l+\varepsilon(l))\cap(n-4k-\varepsilon(l),n-4k+\varepsilon(l))=\emptyset for kβ‰ lk\neq{l}. As a consequence, the intersection of the preimage of these sets under ff is empty. Since ff is continuous and both, (nβˆ’4​lβˆ’Ξ΅β€‹(l),nβˆ’4​l+Ρ​(l))(n-4l-\varepsilon(l),n-4l+\varepsilon(l)) and (nβˆ’4​kβˆ’Ξ΅β€‹(l),nβˆ’4​k+Ρ​(l))(n-4k-\varepsilon(l),n-4k+\varepsilon(l)) are open, their preimages are open and contain β„±l\mathcal{F}_{l} and β„±k\mathcal{F}_{k} respectively. With Uk​(l)=fβˆ’1​((nβˆ’4​lβˆ’Ξ΅β€‹(l),nβˆ’4​l+Ρ​(l)))U_{k}(l)=f^{-1}\bigl{(}(n-4l-\varepsilon(l),n-4l+\varepsilon(l))\bigr{)} we thus have Uk​(l)βˆ©β„±l=βˆ…U_{k}(l)\cap{\mathcal{F}_{l}}=\emptyset. Since this is possible for every l∈{0,…,⌊n2βŒ‹}l\in\{0,\ldots,\lfloor{\tfrac{n}{2}}\rfloor\} and since a finite intersection of open sets is an open set, we find an open neighborhood UU of β„±k\mathcal{F}_{k} such that Uβˆ©β„±l=βˆ…U\cap{\mathcal{F}_{l}}=\emptyset for all l∈{0,…,⌊n2βŒ‹}l\in\{0,\ldots,\lfloor{\tfrac{n}{2}}\rfloor\}. Β 
c)iii) The property that the β„±k\mathcal{F}_{k} are submanifolds is given in [5]. The tangent space follows from (7). Β 
c)iv) Let k∈{0,…,⌊n2βŒ‹}k\in\{0,\ldots,\lfloor{\frac{n}{2}}\rfloor\} be arbitrary but fixed and Θ0βˆˆβ„±k\Theta_{0}\in\mathcal{F}_{k}. Since ker⁑Hf​(Θ0)βŠƒTΘ0​ℱk\ker{H}_{f}(\Theta_{0})\supset{T_{\Theta_{0}}\mathcal{F}_{k}} is always true, we have to check ker⁑Hf​(Θ0)βŠ‚TΘ0​ℱk\ker{H}_{f}(\Theta_{0})\subset{T_{\Theta_{0}}\mathcal{F}_{k}}. Since every critical point is symmetric, there is an orthogonal Ξ \Pi such that Ξ π–³β€‹Ξ˜0​Π=D\Pi^{\mathsf{T}}\Theta_{0}\Pi=D where DD is a diagonal matrix with non-zero diagonal elements. Furthermore, we know that

Hf​(Θ0)​(Ξ©1β€‹Ξ˜0,Ξ©2β€‹Ξ˜0)\displaystyle H_{f}(\Theta_{0})(\Omega_{1}\Theta_{0},\Omega_{2}\Theta_{0}) =βˆ’12​tr(Ξ©1π–³β€‹Ξ˜0​Ω2+Ξ©2π–³β€‹Ξ˜0​Ω1)\displaystyle=-\frac{1}{2}\mathop{\mathrm{tr}}\left({\Omega_{1}^{\mathsf{T}}\Theta_{0}\Omega_{2}+\Omega_{2}^{\mathsf{T}}\Theta_{0}\Omega_{1}}\right)
=βˆ’12​tr(Θ0𝖳​Ω1π–³β€‹Ξ˜0​Ω2β€‹Ξ˜0+Θ0𝖳​Ω2π–³β€‹Ξ˜0​Ω1β€‹Ξ˜0)\displaystyle=-\frac{1}{2}\mathop{\mathrm{tr}}\left({\Theta_{0}^{\mathsf{T}}\Omega_{1}^{\mathsf{T}}\Theta_{0}\Omega_{2}\Theta_{0}+\Theta_{0}^{\mathsf{T}}\Omega_{2}^{\mathsf{T}}\Theta_{0}\Omega_{1}\Theta_{0}}\right)
=βˆ’12​tr((Ξ©1​Π𝖳​D​Π)𝖳​Π𝖳​D​Π​(Ξ©2​Π𝖳​D​Π))\displaystyle=-\frac{1}{2}\mathop{\mathrm{tr}}\left({(\Omega_{1}\Pi^{\mathsf{T}}D\Pi)^{\mathsf{T}}\Pi^{\mathsf{T}}D\Pi(\Omega_{2}\Pi^{\mathsf{T}}D\Pi)}\right)
βˆ’12​tr((Ξ©2​Π𝖳​D​Π)𝖳​Π𝖳​D​Π​(Ξ©1​Π𝖳​D​Π))\displaystyle~~~-\frac{1}{2}\mathop{\mathrm{tr}}\left({(\Omega_{2}\Pi^{\mathsf{T}}D\Pi)^{\mathsf{T}}\Pi^{\mathsf{T}}D\Pi(\Omega_{1}\Pi^{\mathsf{T}}D\Pi)}\right)
=βˆ’12​tr((Π​Ω1​Π𝖳​D)𝖳​D​(Π​Ω2​Π𝖳​D))\displaystyle=-\frac{1}{2}\mathop{\mathrm{tr}}\left({(\Pi\Omega_{1}\Pi^{\mathsf{T}}D)^{\mathsf{T}}D(\Pi\Omega_{2}\Pi^{\mathsf{T}}D)}\right)
βˆ’12​tr((Π​Ω2​Π𝖳​D)𝖳​D​(Π​Ω1​Π𝖳​D))\displaystyle~~~-\frac{1}{2}\mathop{\mathrm{tr}}\left({(\Pi\Omega_{2}\Pi^{\mathsf{T}}D)^{\mathsf{T}}D(\Pi\Omega_{1}\Pi^{\mathsf{T}}D)}\right)
=βˆ’12​tr((Ξ©~1​D)𝖳​D​(Ξ©~2​D)+(Ξ©~2​D)𝖳​D​(Ξ©~1​D))\displaystyle=-\frac{1}{2}\mathop{\mathrm{tr}}\left({(\tilde{\Omega}_{1}D)^{\mathsf{T}}D(\tilde{\Omega}_{2}{D})+(\tilde{\Omega}_{2}D)^{\mathsf{T}}D(\tilde{\Omega}_{1}{D})}\right)
=Hf​(D)​(Ξ©~1​D,Ξ©~2​D)\displaystyle=H_{f}(D)(\tilde{\Omega}_{1}D,\tilde{\Omega}_{2}D)

where Ξ©~1=Π​Ω1​Π𝖳\tilde{\Omega}_{1}=\Pi\Omega_{1}\Pi^{\mathsf{T}} and Ξ©~2=Π​Ω2​Π𝖳\tilde{\Omega}_{2}=\Pi\Omega_{2}\Pi^{\mathsf{T}}. As a consequence

ker⁑Hf​(Θ0)\displaystyle\ker{H}_{f}(\Theta_{0}) ={X∈TΘ0SO(n)|Β Hf​(Θ0)​(X,Y)=0Β for allΒ Y∈TΘ0​S​O​(n)Β }\displaystyle=\{X\in{T_{\Theta_{0}}SO(n)}\rvert{\text{ $H_{f}(\Theta_{0})(X,Y)=0$ for all $Y\in{T_{\Theta_{0}}SO(n)}$ }}\}
={X∈TDSO(n)|Β tr(X𝖳​D​Ω~2​Dβˆ’D​Ω~2​D​X)=0Β for allΒ Ξ©~2​D∈TD​S​O​(n)Β }\displaystyle=\{X\in{T_{D}SO(n)}\rvert{\text{ $\mathop{\mathrm{tr}}\left({X^{\mathsf{T}}D\tilde{\Omega}_{2}D-D\tilde{\Omega}_{2}DX}\right)=0$ for all $\tilde{\Omega}_{2}{D}\in{T_{D}SO(n)}$ }}\}
={X∈TDSO(n)|Β tr(Ξ©~2​D​(Xπ–³βˆ’X)​D)=0Β for allΒ Ξ©~2​D∈TD​S​O​(n)Β }.\displaystyle=\{X\in{T_{D}SO(n)}\rvert{\text{ $\mathop{\mathrm{tr}}\left({\tilde{\Omega}_{2}D(X^{\mathsf{T}}-X)D}\right)=0$ for all $\tilde{\Omega}_{2}{D}\in{T_{D}SO(n)}$ }}\}\text{.}

Observe that Ξ©~2\tilde{\Omega}_{2} is skew symmetric. Furthermore, Xπ–³βˆ’XX^{\mathsf{T}}-X is skew symmetric and since DD is diagonal with non-zero entries, D​(Xπ–³βˆ’X)​DD(X^{\mathsf{T}}-X)D is skew symmetric. Because the equation tr(D​Ω~2​D​(Xπ–³βˆ’X))=0\mathop{\mathrm{tr}}\left({D\tilde{\Omega}_{2}D(X^{\mathsf{T}}-X)}\right)=0 has to hold for all skew-symmetric Ξ©~2\tilde{\Omega}_{2}, we obtain D​(Xπ–³βˆ’X)​D=0D(X^{\mathsf{T}}-X)D=0. With the non-singular DD this implies Xπ–³βˆ’X=0X^{\mathsf{T}}-X=0 which is equivalent to X=X𝖳X=X^{\mathsf{T}}. In c)ii) we showed TΘ0​ℱkT_{\Theta_{0}}\mathcal{F}_{k} is {Ξ£βˆˆβ„nΓ—n|Ξ£=Σ𝖳}∩TΘ0SO(n)\{\Sigma\in\mathbb{R}^{n\times{n}}\rvert{\Sigma=\Sigma^{\mathsf{T}}}\}\cap{T_{\Theta_{0}}SO(n)}, thus the previous calculation shows ker⁑Hf​(Θ0)βŠ‚TΘ0​ℱk\ker{H}_{f}(\Theta_{0})\subset{T_{\Theta_{0}}\mathcal{F}_{k}}. Β 
d) Since Θ0=Θ0𝖳\Theta_{0}=\Theta_{0}^{\mathsf{T}}, Θ0\Theta_{0} is orthogonally diagonalizable, i.e. Θ0=Π𝖳​D​Π\Theta_{0}=\Pi^{\mathsf{T}}D\Pi for some diagonalΒ DD and orthogonalΒ Ξ \Pi. Therefore

(18) Hf​((Ξ©0β€‹Ξ˜0),(Ξ©0β€‹Ξ˜0))\displaystyle H_{f}((\Omega_{0}\Theta_{0}),(\Omega_{0}\Theta_{0})) =tr(Ξ©0π–³β€‹Ξ˜0​Ω0)=βˆ’tr(Ξ©~02​D)​,\displaystyle=\mathop{\mathrm{tr}}\left({\Omega_{0}^{\mathsf{T}}\Theta_{0}\Omega_{0}}\right)=-\mathop{\mathrm{tr}}\left({\tilde{\Omega}_{0}^{2}D}\right)\text{,}

where Ξ©0~=Π​Ω0​Π𝖳\tilde{\Omega_{0}}=\Pi\Omega_{0}\Pi^{\mathsf{T}} is skew symmetric. Consequently, tr(Ξ©0π–³β€‹Ξ˜0​Ω0)\mathop{\mathrm{tr}}\left({\Omega_{0}^{\mathsf{T}}\Theta_{0}\Omega_{0}}\right) is definite for all skew symmetric Ξ©0\Omega_{0} at a critical point Θ0\Theta_{0} if and only if tr(Ξ©0~2​D)\mathop{\mathrm{tr}}\left({\tilde{\Omega_{0}}^{2}D}\right) is definite for all skew symmetric Ξ©0~\tilde{\Omega_{0}} where D=Ξ β€‹Ξ˜0​Π𝖳D=\Pi\Theta_{0}\Pi^{\mathsf{T}} is diagonal. Thus, we have to consider HfH_{f} only for diagonal DD, i.e.

(19) Hf​((Ξ©0β€‹Ξ˜0),(Ξ©0β€‹Ξ˜0))\displaystyle H_{f}((\Omega_{0}\Theta_{0}),(\Omega_{0}\Theta_{0})) =βˆ’tr(Ξ©02​D)=βˆ’βˆ‘1≀k≀n(Ξ©02​D)k​k\displaystyle=-\mathop{\mathrm{tr}}\left({\Omega_{0}^{2}{D}}\right)=-\sum_{1\leq{k}\leq{n}}{\left(\Omega_{0}^{2}D\right)_{kk}}
=βˆ’βˆ‘1≀k≀n(Ξ©02)k​k​Dk​k=βˆ’βˆ‘1≀k≀nβˆ‘1≀l≀n(Ξ©0)k​l​(Ξ©0)l​k​Dk​k\displaystyle=-\sum_{1\leq{k}\leq{n}}{(\Omega_{0}^{2})_{kk}D_{kk}}=-\sum_{1\leq{k}\leq{n}}{\sum_{1\leq{l}\leq{n}}{(\Omega_{0})_{kl}(\Omega_{0})_{lk}D_{kk}}}
=βˆ‘1≀k,l≀n((Ξ©0)k​l)2​Dk​k=βˆ‘1≀k,l≀nkβ‰ l((Ξ©0)k​l)2​Dk​k\displaystyle=\sum_{1\leq{k,l}\leq{n}}{((\Omega_{0})_{kl})^{2}D_{kk}}=\sum_{\begin{subarray}{c}1\leq{k,l}\leq{n}\\ k\neq{l}\end{subarray}}{((\Omega_{0})_{kl})^{2}D_{kk}}
=βˆ‘1≀k<l≀n(Dk​k+Dl​l)​((Ξ©0)k​l)2​.\displaystyle=\sum_{1\leq{k}<{l}\leq{n}}{(D_{kk}+D_{ll})((\Omega_{0})_{kl})^{2}}\text{.}

The critical points Θ0\Theta_{0} are such that Θ0∈S​O​(n)\Theta_{0}\in{SO(n)} are symmetric, therefore all eigenvalues of Θ0\Theta_{0} are real. Observe now that Θ0=Π𝖳​D​Π\Theta_{0}=\Pi^{\mathsf{T}}D\Pi with orthogonal Ξ \Pi implies D2=ID^{2}=I. Hence, the eigenvalues are Dk​k∈{βˆ’1,1}D_{kk}\in\{-1,1\} and since Θ0∈S​O​(n)\Theta_{0}\in{SO(n)} the number of βˆ’1-1-eigenvalues is even. Consequently we have to determine the definiteness of HfH_{f} by considering (19) for all diagonal matrices DD with Β±1\pm{1} on the diagonals where the number of βˆ’1-1 entries is zero or even. Β 
Suppose first, that all Dk​kD_{kk} are equal to 11, i.e. D=ID=I. The associated Θ0\Theta_{0} is Θ0=Π𝖳​D​Π=Π𝖳​Π=I\Theta_{0}=\Pi^{\mathsf{T}}D\Pi=\Pi^{\mathsf{T}}\Pi=I. (​19​)\eqref{eqn:traceson:1} then implies Hf​(Ξ©0β€‹Ξ˜0,Ξ©0β€‹Ξ˜0)=βˆ‘1≀k<l≀n2​(Ξ©0)k​l2H_{f}(\Omega_{0}\Theta_{0},\Omega_{0}\Theta_{0})=\sum_{1\leq{k}<{l}\leq{n}}{2(\Omega_{0})_{kl}^{2}}, i.e. Hf>0H_{f}>0 for all skew-symmetric Ξ©0\Omega_{0}. Thus HfH_{f} is positive definite if Θ0=I\Theta_{0}=I. Suppose now there is an even number of eigenvalues Dk​kD_{kk} equal to βˆ’1-1. Then, there are indices l,kl,k and lβ‰ kl\neq{k} such that Dk​k=βˆ’1D_{kk}=-1 and Dl​l=βˆ’1D_{ll}=-1, and therefore there are skew symmetric Ξ©0\Omega_{0} such that Hf​(Ξ©0β€‹Ξ˜0,Ξ©0β€‹Ξ˜0)<0H_{f}(\Omega_{0}\Theta_{0},\Omega_{0}\Theta_{0})<0. As consequence, HfH_{f} is indefinite at a critical point Θ0\Theta_{0} where Θ0\Theta_{0} has an even number of negative eigenvalues Dk​k=βˆ’1D_{kk}=-1. Therefore, Θ0=I\Theta_{0}=I is the only local (global) minimum of ff. All other critical points are saddle points. Β  Β 

Definition 3

[4, on p.21] Let β„³\mathcal{M} be a smooth Riemannian manifold and f:ℳ→ℝf:\mathcal{M}\rightarrow\mathbb{R} be a smooth function. Denote the set of critical points of ff by C​(f)C(f). ff is called Morse-Bott function provided the following conditions are satisfied:

  1. a)

    ff has compact sublevel sets.

  2. b)

    C​(f)=βˆͺj=1k𝒩jC(f)=\cup_{j=1}^{k}\mathcal{N}_{j} where 𝒩j\mathcal{N}_{j} are disjoint, closed and connected submanifolds of β„³\mathcal{M} and ff is constant on 𝒩j\mathcal{N}_{j} for j=1,…,kj=1,\ldots,k.

  3. c)

    ker⁑Hf​(x)=Tx​𝒩k\ker{H_{f}(x)}=T_{x}\mathcal{N}_{k} for all xβˆˆπ’©jx\in\mathcal{N}_{j} and all j=1,…,kj=1,\ldots,k.

Lemma 4

f:S​O​(n)→ℝ,Ξ˜β†¦nβˆ’tr(Θ)f:SO(n)\rightarrow\mathbb{R},\Theta\mapsto{n-\mathop{\mathrm{tr}}\left({\Theta}\right)} is a Morse-Bott function.

Proof. We show only DefinitionΒ 3a) since b) and c) were shown in Lemma 1. S​O​(n)SO(n) is compact, hence ff attains its minimal and its maximal value on S​O​(n)SO(n). The minimal value of ff is zero, the maximal value is 2​nβˆ’22n-2 for nn odd and 2​n2n for nn even. If nn is odd we thus have

Lc={Θ∈SO(n)|f(Θ)≀c}={fβˆ’1​([0,c])forΒ c≀2​nβˆ’2fβˆ’1​([0,2​nβˆ’2])forΒ c>2​nβˆ’2.L_{c}=\{\Theta\in{SO(n)}\rvert{f(\Theta)\leq{c}}\}=\begin{cases}f^{-1}([0,c])&\text{for $c\leq{2n-2}$}\\ f^{-1}([0,2n-2])&\text{for $c>{2n-2}$}\text{.}\end{cases}

If nn is even we have

Lc={Θ∈SO(n)|f(Θ)≀c}={fβˆ’1​([0,c])forΒ c≀2​nfβˆ’1​([0,2​n])forΒ c>2​n.L_{c}=\{\Theta\in{SO(n)}\rvert{f(\Theta)\leq{c}}\}=\begin{cases}f^{-1}([0,c])&\text{for $c\leq{2n}$}\\ f^{-1}([0,2n])&\text{for $c>{2n}$}\text{.}\end{cases}

Since ff is continuous, the preimage of a closed set is a closed set and since S​O​(n)SO(n) is bounded, its subsets are bounded as well. Since S​O​(n)βŠ‚β„nΓ—nSO(n)\subset\mathbb{R}^{n\times{n}}, the boundedness and closedness of the sublevel sets implies their compactness. Β  Β 
The convergence properties of the gradient flow associated with Ξ˜β†¦nβˆ’tr(Θ)\Theta\mapsto{n-\mathop{\mathrm{tr}}\left({\Theta}\right)} are thus determined by the following proposition.

Proposition 5

[4, Proposition 3.9] Let f:ℳ→ℝf:\mathcal{M}\rightarrow\mathbb{R} be a Morse-Bott function on a Riemannian manifold β„³\mathcal{M}. The Ο‰\omega-limit set ω​(x)\omega(x) of xβˆˆβ„³x\in\mathcal{M} with respect to the gradient flow of ff is a single critical point of ff. Every solution of the gradient flow converges to an equilibrium point.

To give a more detailed specification the convergence behavior of the gradient flow of Ξ˜β†¦nβˆ’tr(Θ)\Theta\mapsto{n-\mathop{\mathrm{tr}}\left({\Theta}\right)} we need the following result.

Lemma 6

Let β„³\mathcal{M} be a smooth and compact Riemannian manifold of dimension mm, f:ℳ→ℝf:\mathcal{M}\rightarrow\mathbb{R} be a Morse-Bott function and denote the set of critical points of ff by C​(f)C(f). Let 𝒩\mathcal{N} be a fixed connected component of C​(f)C(f) of dimension nn. If at least one of the mβˆ’nm-n eigenvalues with nonzero real part of the linearization of gradf\mathop{\mathrm{grad}}{f} at some xβˆˆπ’©x\in\mathcal{N} has a real part greater than zero, then the set A{A} of initial conditions x0βˆˆβ„³x_{0}\in\mathcal{M} for which the solutions t↦ϕ​(t,x0)t\mapsto{\phi(t,x_{0})} of the gradient flow xΛ™=βˆ’gradf​(x)\dot{x}=-\mathop{\mathrm{grad}}{f}(x) converge towards 𝒩\mathcal{N}, i.e.

(20) A={x0βˆˆβ„³|limtβ†’βˆžΟ•(t,x0)βˆˆπ’©},{A}=\{x_{0}\in\mathcal{M}\rvert{\lim_{t\rightarrow\infty}\phi(t,x_{0})\in\mathcal{N}}\}\text{,}

has measure zero. Furthermore β„³βˆ–A\mathcal{M}\setminus{{A}} is dense in β„³\mathcal{M}, i.e. Mβˆ–AΒ―=β„³\overline{M\setminus{{A}}}=\mathcal{M}.

Proof. The goal of the proof is to show that A{A} has measure zero and that β„³βˆ–A\mathcal{M}\setminus{A} is dense. We show this in the following way. First, we consider the set of points lying in a suitable neighborhood of 𝒩\mathcal{N} and which contains the orbits of the solutions of the gradient flow xΛ™=βˆ’gradf​(x)\dot{x}=-\mathop{\mathrm{grad}}{f}(x) which eventually converge towards 𝒩\mathcal{N}. We utilize a result from [6] to conclude that this set has measure zero and β„³\mathcal{M} without this set is dense. Then, we utilize this set to derive the same result for A{A} utilizing the properties of the flow of the gradient vector field on β„³\mathcal{M}.

In the following, we apply [6, Proposition 4.1]. This proposition concerns the case of a a three times continuously differentiable vector field v:ℝl→ℝlv:\mathbb{R}^{l}\rightarrow\mathbb{R}^{l} together with submanifold of equilibria 𝒩¯\overline{\mathcal{N}} in ℝl\mathbb{R}^{l} under the assumption that 𝒩¯\overline{\mathcal{N}} is normally hyperbolic with respect to vv. Normal hyperbolicity of 𝒩¯\overline{\mathcal{N}} means that the linearization of the vector field vv at xβˆˆπ’©Β―x\in\overline{\mathcal{N}} has nβˆ’dim𝒩¯n-\dim{\overline{\mathcal{N}}} eigenvalues with real parts different from zero. Under these assumptions, there exists a neighborhood 𝒰¯\overline{\mathcal{U}} of 𝒩¯\overline{\mathcal{N}} such that any solution t↦ϕ​(t,x0)t\mapsto\phi(t,x_{0}) of xΛ™=v​(x)\dot{x}=v(x) with initial condition x0x_{0} and with a forward orbit ϕ​([0,∞);x0)\phi([0,\infty);x_{0}) in 𝒰¯\overline{\mathcal{U}} lies on the stable of manifold Wlocs​(p)W^{s}_{\text{loc}}(p) of a point pβˆˆπ’©Β―p\in\overline{\mathcal{N}}. Wlocs​(p)W^{s}_{\text{loc}}(p) is defined by

(21) Wlocs(p)={xβˆˆπ’°|limtβ†’βˆžΟ•(t,x)=p}.W^{s}_{\text{loc}}(p)=\{x\in\mathcal{U}\rvert{\lim_{t\rightarrow\infty}\phi(t,x)=p}\}\text{.}

We can always embed β„³\mathcal{M} into ℝl\mathbb{R}^{l} for ll large enough, see e.g. [2, Chapter 10], therefore we can utilize [6] also for our case of a vector field on a manifold β„³\mathcal{M}.

Since β„³\mathcal{M} is compact and ff is smooth, we have a global flow Ο•:ℝ×ℳ→ℳ\phi:\mathbb{R}\times\mathcal{M}\rightarrow\mathcal{M}, which means that t↦ϕ​(t,x0)t\mapsto{\phi(t,x_{0})} is a solution of the gradient flow xΛ™=βˆ’gradf​(x)\dot{x}=-\mathop{\mathrm{grad}}{f}(x) defined for all tβˆˆβ„t\in\mathbb{R} and with ϕ​(0,x0)=x0\phi(0,x_{0})=x_{0}, see e.g. [2, Chapter 17]. Furthermore ϕ​(t,β‹…):β„³β†’β„³\phi(t,\cdot):\mathcal{M}\rightarrow\mathcal{M} is a diffeomorphism for every tβˆˆβ„t\in\mathbb{R}. Since ff is a Morse-Bott function, 𝒩\mathcal{N} is normally hyperbolic, i.e. the linearization of the gradient flow at any xβˆˆπ’©x\in\mathcal{N} has exactly mβˆ’nm-n eigenvalues with real parts different from zero, see [7, p. 183, Morse-Bott functions]. According to [6, Proposition 4.1], we have a neighborhood 𝒰\mathcal{U} of 𝒩\mathcal{N} such that for every solution ϕ​(β‹…,x)\phi(\cdot,x) with xβˆˆπ’©x\in\mathcal{N} and a forward orbit ϕ​([0,∞],x)\phi([0,\infty],x) in 𝒰\mathcal{U}, the solution has to lie in one Wlocs​(p)W^{s}_{\text{loc}}(p) with pβˆˆπ’©p\in\mathcal{N}. We know from [8, Proposition 3.2], that if we choose 𝒰\mathcal{U} small enough, then the local stable manifold Wlocs​(𝒩)W^{s}_{\text{loc}}(\mathcal{N}) of 𝒩\mathcal{N} given by

(22) Wlocs​(𝒩)=βˆͺpβˆˆπ’©Wlocs​(p)W^{s}_{\text{loc}}(\mathcal{N})=\cup_{p\in\mathcal{N}}W^{s}_{\text{loc}}(p)

is a smooth submanifold of dimension m+km+k where k<mβˆ’nk<m-n is the number of eigenvalues with real part smaller than zero. Since the stable manifold Wlocs​(𝒩)W^{s}_{\text{loc}}(\mathcal{N}) is a submanifold of β„³\mathcal{M} with smaller dimension than β„³\mathcal{M}, the stable manifold has measure zero and β„³βˆ–Wlocs​(𝒩)\mathcal{M}\setminus{W^{s}_{\text{loc}}(\mathcal{N})} is dense in β„³\mathcal{M}, see [2, Theorem 10.5].

Let A{A} be defined by (20). Define A1{A}_{1} by

(23) A1={x∈A|βˆ€tβ‰₯1:Ο•(t,x)βˆˆπ’°}{A}_{1}=\{x\in{A}\rvert{\forall{t\geq{1}}:\phi(t,x)\in\mathcal{U}}\}

and let Ak{A}_{k} for kβ‰₯2k\geq{2} be defined by

(24) Ak={x∈Aβˆ–(A1βˆͺ…βˆͺAkβˆ’1)|βˆ€tβ‰₯k:Ο•(t,x)βˆˆπ’°}.{A}_{k}=\{x\in{A}\setminus\bigl{(}{A}_{1}\cup\ldots\cup{A}_{k-1}\bigr{)}\rvert{\forall{t\geq{k}}:\phi(t,x)\in\mathcal{U}}\}\text{.}

If x∈Ax\in{A}, then there is an integer kβˆˆβ„•k\in\mathbb{N} such that x∈Akx\in{A}_{k}. As a consequence

(25) A=βˆͺkβˆˆβ„•Ak​.{A}=\cup_{k\in\mathbb{N}}{{A}_{k}}\text{.}

Because of (24), ϕ​(k,Ak)βŠ‚π’°\phi(k,{A}_{k})\subset\mathcal{U} for every Ak{A}_{k}. Moreover, [6, Proposition 4.1] implies that

(26) ϕ​(k,Ak)βŠ‚Wlocs​(𝒩)​.\phi(k,{A}_{k})\subset{W^{s}_{\text{loc}}(\mathcal{N})}\text{.}

As subset of a set of measure zero, ϕ​(k,Ak)\phi(k,{A}_{k}) has measure zero, see e.g. [2, Lemma A.60(b)]. Since ϕ​(k,β‹…):β„³β†’β„³\phi(k,\cdot):\mathcal{M}\rightarrow\mathcal{M} is a diffeomorphism, this means that Ak{A}_{k} also has measure zero, see e.g. [2, Lemma 10.1]. According to (25), A{A} is a countable union of the Ak{A}_{k}, i.e. A{A} is a countable union of sets of measure zero. Therefore, A{A} has measure zero and as a consequence β„³βˆ–A\mathcal{M}\setminus{A} is dense.

Β 

To finally derive the global stability properties of the identity matrix Iβˆˆβ„nΓ—nI\in\mathbb{R}^{n\times{n}} for the gradient flow of Ξ˜β†¦nβˆ’tr(Θ)\Theta\mapsto{n-\mathop{\mathrm{tr}}\left({\Theta}\right)} and thus also for the differential equation (10), we linearize the gradient flow around the equilibria.

Lemma 7

The convergence properties of the gradient flow of f:S​O​(n)→ℝ,Ξ˜β†¦nβˆ’tr(Θ)f:SO(n)\rightarrow\mathbb{R},\Theta\mapsto{n-\mathop{\mathrm{tr}}\left({\Theta}\right)} are the following:

  1. a)

    The Ο‰\omega-limit set of any solution is contained in the set of equilibria given by (7), i.e.

    β„±={Θ0∈SO(n)|Θ0𝖳=Θ0}.\mathcal{F}=\{\Theta_{0}\in{SO(n)}\rvert{\Theta_{0}^{\mathsf{T}}=\Theta_{0}}\}\text{.}
  2. b)

    The equilibriumΒ II is locally exponentially stable and all other equilibria are unstable.

  3. c)

    The set of initial conditions for which the solutions of the gradient flow xΛ™=βˆ’gradf​(x)\dot{x}=-\mathop{\mathrm{grad}}{f(x)} of ff converge towards II is dense in S​O​(n)SO(n) and the set of initial conditions for which the solutions of the gradient flow converge to the other equilibria has measure zero.

Corollary 8

The identity matrix II is an almost globally asymptotically stable equilibrium for the differential equation

(27) Ξ˜Λ™=(Ξ˜π–³βˆ’Ξ˜)β€‹Ξ˜β€‹.\dot{\Theta}=(\Theta^{\mathsf{T}}-\Theta)\Theta\text{.}

In the following we prove Lemma 7.
Proof. a) is a consequence of Lemma 4 and Proposition 5.
b) To prove the property b) we linearize the gradient flow with the vector field gradf\mathop{\mathrm{grad}}{f} defined by Ξ˜β†¦12​(Ξ˜π–³βˆ’Ξ˜)β€‹Ξ˜\Theta\mapsto\tfrac{1}{2}(\Theta^{\mathsf{T}}-\Theta)\Theta around the equilibria. To do this directly we compute dgradfΘ0(X)=dd​t|t=0((gradf)βˆ˜Ξ“)(t)d{\mathop{\mathrm{grad}}{f}}_{\Theta_{0}}(X)=\tfrac{d}{d{t}}\rvert_{t=0}((\mathop{\mathrm{grad}}{f})\circ{\Gamma})(t) where Θ0\Theta_{0} is an equilibrium and Ξ“:(βˆ’Ξ΅,Ξ΅)β†’S​O​(n)\Gamma:(-\varepsilon,\varepsilon)\rightarrow{SO(n)} is smooth with Γ​(0)=Θ0\Gamma(0)=\Theta_{0}, Γ˙​(0)=X\dot{\Gamma}(0)=X and X∈TΘ0​S​O​(n)X\in{T_{\Theta_{0}}}SO(n). This yields

(28) d​gradfΘ0​(X)\displaystyle{d\mathop{\mathrm{grad}}{f}}_{\Theta_{0}}(X) =12dd​t|t=0(Γ𝖳(t)βˆ’Ξ“(t))Ξ“(t)\displaystyle=\frac{1}{2}\frac{d}{d{t}}\rvert_{t=0}(\Gamma^{\mathsf{T}}(t)-\Gamma(t))\Gamma(t)
=12(Γ˙𝖳(t)Ξ“(t)+Γ𝖳(t)Ξ“Λ™(t)βˆ’Ξ“Λ™(t)Ξ“(t)βˆ’Ξ“(t)Ξ“Λ™(t))|t=0\displaystyle=\frac{1}{2}\left(\dot{\Gamma}^{\mathsf{T}}(t)\Gamma(t)+\Gamma^{\mathsf{T}}(t)\dot{\Gamma}(t)-\dot{\Gamma}(t)\Gamma(t)-\Gamma(t)\dot{\Gamma}(t)\right)\rvert_{t=0}
=12​(Xπ–³β€‹Ξ˜0+Θ0𝖳​Xβˆ’Xβ€‹Ξ˜0βˆ’Ξ˜0​X)\displaystyle=\frac{1}{2}(X^{\mathsf{T}}\Theta_{0}+\Theta_{0}^{\mathsf{T}}X-X\Theta_{0}-\Theta_{0}X)
=βˆ’12​(Θ0​X+Xβ€‹Ξ˜0)​,\displaystyle=-\frac{1}{2}\left(\Theta_{0}X+X\Theta_{0}\right)\text{,}

since Θ0=Θ0𝖳\Theta_{0}=\Theta_{0}^{\mathsf{T}} and X=Ξ©0β€‹Ξ˜0X=\Omega_{0}\Theta_{0} for a Ξ©0βˆˆβ„nΓ—n\Omega_{0}\in\mathbb{R}^{n\times{n}} with Ξ©0=βˆ’Ξ©0𝖳\Omega_{0}=-\Omega_{0}^{\mathsf{T}}. Consequently the linearization of the gradient flow at an equilibrium Θ0\Theta_{0} is given by

(29) XΛ™\displaystyle\dot{X} =βˆ’12​(Θ0𝖳​X+Xβ€‹Ξ˜0)​,\displaystyle=-\frac{1}{2}\left(\Theta_{0}^{\mathsf{T}}X+X{\Theta_{0}}\right)\text{,}

whereΒ X∈TΘ0​S​O​(n)X\in{T_{\Theta_{0}}SO(n)}. Note that due to the simple nature of the Riemannian metric (2) and the connection of the linearization of a gradient flow to the Hessian, we could have obtained (29) directly from (14). More precisely, utilize

(30) Hf​(Θ0)​(X,X)\displaystyle H_{f}(\Theta_{0})(X,X) =tr(Xπ–³β€‹Ξ˜0​X)=vec(Θ0​X𝖳)​vec(X)\displaystyle=\mathop{\mathrm{tr}}\left({X^{\mathsf{T}}\Theta_{0}X}\right)=\mathop{\mathrm{vec}}(\Theta_{0}{X}^{\mathsf{T}})\mathop{\mathrm{vec}}(X)
=vec(X)𝖳​(IβŠ—Ξ˜0+(IβŠ—Ξ˜0)𝖳)​vec(X)\displaystyle=\mathop{\mathrm{vec}}(X)^{\mathsf{T}}(I\otimes\Theta_{0}+(I\otimes{\Theta_{0}})^{\mathsf{T}})\mathop{\mathrm{vec}}(X)
=βˆ’12​vec(X)𝖳​vec(Θ0𝖳​X+Xβ€‹Ξ˜0)​.\displaystyle=-\frac{1}{2}\mathop{\mathrm{vec}}(X)^{\mathsf{T}}\mathop{\mathrm{vec}}(\Theta_{0}^{\mathsf{T}}X+X\Theta_{0})\text{.}

If Θ0=I\Theta_{0}=I, then the linearization is

(31) XΛ™\displaystyle\dot{X} =βˆ’X​,\displaystyle=-X\text{,}

which shows that the equilibrium II is locally exponentially stable. Now consider the linearization at the other equilibrium points, i.e. Θ0β‰ I\Theta_{0}\neq{I} and Θ0=Θ0𝖳\Theta_{0}=\Theta_{0}^{\mathsf{T}}. Since Θ0\Theta_{0} is symmetric, Θ0\Theta_{0} is orthonormally diagonalizable, i.e. Θ0=Π𝖳​D​Π\Theta_{0}=\Pi^{\mathsf{T}}D\Pi for some diagonalΒ DD and orthonormalΒ Ξ \Pi where the columns ofΒ Ξ \Pi are eigenvectors of Θ0\Theta_{0}. Since Θ0π–³β€‹Ξ˜0=I\Theta_{0}^{\mathsf{T}}\Theta_{0}=I, we getΒ D2=ID^{2}=I and consequently the eigenvalues areΒ Β±1\pm{1}. Since Θ0β‰ I\Theta_{0}\neq{I} andΒ det(Θ0)=1\det(\Theta_{0})=1, we always have an even number of negative eigenvalues with associated eigenvectorsΒ v1,…,vkv_{1},\ldots,v_{k}. Set UΒ―=v1​v2π–³βˆ’v2​v1𝖳\overline{U}=v_{1}v_{2}^{\mathsf{T}}-v_{2}v_{1}^{\mathsf{T}} and X=UΒ―β€‹Ξ˜0∈TΘ0​S​O​(n)X=\overline{U}\Theta_{0}\in{T_{\Theta_{0}}SO(n)}. Therefore

(32) βˆ’12​(Θ0​X+Xβ€‹Ξ˜0)=βˆ’12​(Θ0𝖳​UΒ―β€‹Ξ˜0+UΒ―β€‹Ξ˜0β€‹Ξ˜0)=βˆ’12​(Θ0​(v1​v2π–³βˆ’v2​v1𝖳)β€‹Ξ˜0+(v1​v2π–³βˆ’v2​v1𝖳)β€‹Ξ˜0β€‹Ξ˜0)=βˆ’12​((βˆ’v1​v2𝖳+v2​v1𝖳)β€‹Ξ˜0+(βˆ’v1​v2𝖳+v2​v1𝖳)β€‹Ξ˜0)=UΒ―β€‹Ξ˜0​.\begin{gathered}-\frac{1}{2}(\Theta_{0}X+X\Theta_{0})=-\frac{1}{2}\left(\Theta_{0}^{\mathsf{T}}\overline{U}\Theta_{0}+\overline{U}\Theta_{0}\Theta_{0}\right)\\ =-\frac{1}{2}\left(\Theta_{0}(v_{1}v_{2}^{\mathsf{T}}-v_{2}v_{1}^{\mathsf{T}})\Theta_{0}+(v_{1}v_{2}^{\mathsf{T}}-v_{2}v_{1}^{\mathsf{T}})\Theta_{0}\Theta_{0}\right)\\ =-\frac{1}{2}\left((-v_{1}v_{2}^{\mathsf{T}}+v_{2}v_{1}^{\mathsf{T}})\Theta_{0}+(-v_{1}v_{2}^{\mathsf{T}}+v_{2}v_{1}^{\mathsf{T}})\Theta_{0}\right)=\overline{U}\Theta_{0}\text{.}\end{gathered}

Therefore, X=UΒ―β€‹Ξ˜0X=\overline{U}\Theta_{0} is an eigenvector of the operator defined by the right hand side of (29). Since the associated eigenvalue is positive (one), the linearization (29) is unstable. Consequently, the linearization of the gradient flow at the equilibria Θ0\Theta_{0} with Θ0β‰ =I\Theta_{0}\neq{=}I and Θ0=Θ0𝖳\Theta_{0}=\Theta_{0}^{\mathsf{T}} is unstable, which proves b).
c) Denote the flow of Ξ˜Λ™=βˆ’gradf​(Θ)\dot{\Theta}=-\mathop{\mathrm{grad}}{f}(\Theta) by Ο•:ℝ×S​O​(n)β†’S​O​(n)\phi:\mathbb{R}\times{SO(n)}\rightarrow{SO(n)} and by Bk{B}_{k} the set

(33) Bk={x0∈SO(n)|limtβ†’βˆžΟ•(t,x0)βˆˆβ„±k},{B}_{k}=\{x_{0}\in{SO(n)}\rvert{\lim_{t\rightarrow\infty}\phi(t,x_{0})\in\mathcal{F}_{k}}\}\text{,}

i.e. the set of initial conditions that converges to the connected component β„±k\mathcal{F}_{k} of the set of critical points β„±\mathcal{F} given in Lemma 1c). Because of Proposition 5, we are certain that any solution of the gradient flow converges to the critical set of ff, and as a consequence, S​O​(n)=βˆͺk=0⌊n2βŒ‹BkSO(n)=\cup_{k=0}^{\lfloor{\tfrac{n}{2}}\rfloor}{{B}_{k}}. Then B0{B}_{0} is the set of initial conditions for which the flow converges to II and B=βˆͺk=1⌊n2βŒ‹Bk{B}=\cup_{k=1}^{\lfloor{\tfrac{n}{2}}\rfloor}{B}_{k} is the set of initial conditions for which the flow converges to any of the other critical points. In Lemma 6 we showed that Bk{B}_{k} has measure zero and that S​O​(n)βˆ–BkSO(n)\setminus{{B}_{k}} is dense in S​O​(n)SO(n). Since B{B} is the union of a finite number of sets of measure zero, it has measure zeros, see e.g. [2, Lemma 10.1]. In particular B0=S​O​(n)βˆ–B{B}_{0}=SO(n)\setminus{B} is dense. Β  Β 

References

  • [1] V.Β Guillemin, Differential Topology, Prentice-Hall, Inc., 1974.
  • [2] J.Β M. Lee, Introduction to Smooth Manifolds, Vol. 218 of Graduate Texts in Mathematics, Springer, 2006.
  • [3] J.Β Milnor, Topology from the differentiable viewpoint, Princeton Landmarks in Mathematics, Princeton University Press, 1997, originally published by University Press of Virginia, 1965.
  • [4] U.Β Helmke, J.Β B. Moore, Optimization and Dynamical Systems, Springer, 1994.
  • [5] T.Β Frankel, Critical submanifolds of the classical groups and Stiefel manifolds, in: S.Β S. Cairns (Ed.), Differential and Combinatorial Topology β€” A Symposium in Honor of Marston Morse, Princeton University Press, 1965, pp. 37–54.
  • [6] B.Β Aulbach, Continuous and Discrete Dynamics near Manifolds of Equilibria, Vol. 1058 of Lecture Notes in Mathematics, Springer, 1984.
  • [7] D.Β McDuff, D.Β Salamon, Introduction to Symplectic Topology, Clarendon Press, 1998.
  • [8] D.Β M. Austin, P.Β J. Braam, Morse-Bott theory and equivariant cohomology, in: H.Β Hofer, C.Β H. Taubes, A.Β Weinstein, Z.Β Eduard (Eds.), The Floer Memorial Volume, Vol. 133 of Progress in Mathematics, BirkhΓ€user, 1995, pp. 123–184.