This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\newsiamremark

remarkRemark \newsiamremarkhypothesisHypothesis \newsiamthmclaimClaim \newsiamthmexampleExample \headersSemi-Riemannian Manifold OptimizationT. Gao, L.-H. Lim, and K. Ye

Semi-Riemannian Manifold Optimization

Tingran Gao Department of Statistics and Committee on Computational and Applied Mathematics (CCAM), The University of Chicago, Chicago IL () tingrangao@galton.uchicago.edu    Lek-Heng Lim Department of Statistics and Committee on Computational and Applied Mathematics (CCAM), The University of Chicago, Chicago, IL () lekheng@galton.uchicago.edu    Ke Ye Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China () keyk@amss.ac.cn
Abstract

We introduce in this paper a manifold optimization framework that utilizes semi-Riemannian structures on the underlying smooth manifolds. Unlike in Riemannian geometry, where each tangent space is equipped with a positive definite inner product, a semi-Riemannian manifold allows the metric tensor to be indefinite on each tangent space, i.e., possessing both positive and negative definite subspaces; differential geometric objects such as geodesics and parallel-transport can be defined on non-degenerate semi-Riemannian manifolds as well, and can be carefully leveraged to adapt Riemannian optimization algorithms to the semi-Riemannian setting. In particular, we discuss the metric independence of manifold optimization algorithms, and illustrate that the weaker but more general semi-Riemannian geometry often suffices for the purpose of optimizing smooth functions on smooth manifolds in practice.

keywords:
manifold optimization, semi-Riemannian geometry, degenerate submanifolds, Lorentzian geometry, steepest descent, conjugate gradient, Newton’s method, trust region method
{AMS}

90C30, 53C50, 53B30, 49M05, 49M15

1 Introduction

Manifold optimization [12, 2] is a class of techniques for solving optimization problems of the form

(1) minxf(x)\min_{x\in\mathcal{M}}f\left(x\right)

where \mathcal{M} is a (typically nonlinear and nonconvex) manifold and f:f:\mathcal{M}\rightarrow\mathbb{R} is a smooth function over \mathcal{M}. These techniques generally begin with endowing the manifold \mathcal{M} with a Riemannian structures, which amounts to specifying a smooth family of inner products on the tangent spaces of \mathcal{M}, with which analogies of differential quantities such as gradient and Hessian can be defined on \mathcal{M} in parallel with their well-known counterparts on Euclidean spaces. This geometric perspective enables us to tackle a constrained optimization problem Eq. 1 using methodologies of unconstrained optimization, which becomes particularly beneficial when the constraints (expressed in \mathcal{M}) appear highly nonlinear and nonconvex.

The optimization problem Eq. 1 is certainly independent of the choice of Riemannian structures on \mathcal{M}; in fact, all critical points of ff on \mathcal{M} are metric independent. From a differential geometric perspective, equipping the manifold with a Riemannian structure and studying the critical points of a generic smooth function is highly reminiscent of the classical Morse theory [27, 33], for which the main interest is to understand the topology of the underlying manifold; the topological information needs to be extracted using tools from differential geometry, but is certainly independent of the choice of Riemannian structures. It is thus natural to inquire the influence of different choices of Riemannian metrics on manifold optimization algorithms, which to our knowledge has never been explored in existing literature. This paper stems from our attempts at understanding the dependence of manifold optimization on Riemannian structure. It turns out that most technical tools for optimization on Riemannian manifolds can be extended to a larger class of metric structures on manifolds, namely, semi-Riemannian structures. Just as a Riemannian metric is a smooth assignment of inner products to tangent spaces, a semi-Riemannian metric smoothly assigns to each tangent space a scalar product, which is a symmetric bilinear form but without the constraint of positive definiteness; our major technical contribution in this paper is an optimization framework built upon the rich differential geometry in such weaker but more general metric structures, of which standard unconstrained optimization on Euclidean spaces and Riemannian manifold optimization are special cases. Though semi-Riemannian geometry has attracted generations of mathematical physicists for its effectiveness in providing space-time model in general relativity [35, 9], to the best of our knowledge, the link with manifold optimization has never been explored.

A different yet strong motivation for investigating optimization problems on semi-Riemannian manifolds arises from the Riemannian geometric interpretation of interior point methods [31, 41]. For a twice differentiable and strongly convex function ff defined over an open convex domain QQ in an Euclidean space, denote by f\nabla f and 2f\nabla^{2}f for the gradient and Hessian of ff, respectively. The strong convexity of ff ensures 2f(x)0\nabla^{2}f\left(x\right)\succeq 0 which defines a local inner product gx(,):TxQ×TxQg_{x}\left(\cdot,\cdot\right):T_{x}Q\times T_{x}Q\rightarrow\mathbb{R} by

gx(v,w):=v[2f(x)]w,v,wTxQ.g_{x}\left(v,w\right):=v^{\top}\left[\nabla^{2}f\left(x\right)\right]w,\quad\forall v,w\in T_{x}Q.

With respect to this class of new local inner products, which can be interpreted as turning QQ into a Riemannian manifold (Q,g)\left(Q,g\right), the gradient of ff takes the form

~f(x)=[f(x)]1f(x).\tilde{\nabla}f\left(x\right)=\left[\nabla f\left(x\right)\right]^{-1}\nabla f\left(x\right).

The negative manifold gradient ~f(x)=[f(x)]1f(x)-\tilde{\nabla}f\left(x\right)=-\left[\nabla f\left(x\right)\right]^{-1}\nabla f\left(x\right) coincides with the descent direction ηx\eta_{x} satisfying the Newton’s equation

(2) [2f(x)]ηx=f(x)\left[\nabla^{2}f\left(x\right)\right]\eta_{x}=-\nabla f\left(x\right)

at xMx\in M. In other words, the Newton method, which is second order, can be interpreted as a first order method in the Riemannian setting. Such equivalence between first and second order methods under coordinate transformation is also known in other contexts such as natural gradient descent in information geometry; see [40] and the references therein. Extending this geometric picture beyond the relatively well-understood case of strongly convex functions requires understanding optimization on semi-Riemannian manifolds as a first step; we expect the theoretical foundation laid out in this paper will shed light upon gaining deeper geometric insights on the convergence of non-convex optimization algorithms.

The rest of this paper is organized as follows. In Section 2 we provide a brief but self-contained introduction to Riemannian optimization and semi-Riemannian geometry. Section 3 details the algorithmic framework of semi-Riemannian optimization, and proposes semi-Riemannian analogies of the Riemannian steepest descent and conjugate gradient algorithms; the metric independence of some second-order algorithms are also investigated. We specialize the general geometric framework to submanifolds in Section 4, in which we characterize the phenomenon (which does not exist in Riemannian geometry) of degeneracy for induced semi-Riemannian structures, and identify several (nearly) non-degenerate examples to which our general algorithmic framework applies. We illustrate the utility of the proposed framework with several examples in Section 5 and conclude with Section 6. More examples and some omitted proofs are deferred to the Supplementary Materials.

2 Preliminaries

2.1 Notations

We denote a smooth manifold using MM or \mathcal{M}. Lower case letters such as a,b,ca,b,c or x,y,zx,y,z will be used to denote vectors or points on a manifold, depending on the context. We write TMT\!M and TMT^{*}M for the tangent and cotangent bundles of MM, respectively. For a fibre bundle EE, Γ(E)\Gamma\left(E\right) will be used to denote smooth sections of this bundle. Unless otherwise specified, we use ,\left\langle\cdot,\cdot\right\rangle or gΓ(TMTM)g\in\Gamma\left(T^{*}\!M\otimes T^{*}\!M\right) to denote a semi-Riemannian metric. For a smooth function ff, notations DfDf and D2fD^{2}f stand for semi-Riemannian gradients and Hessians, respectively, when they exist; f\nabla f and 2f\nabla^{2}f will be reserved for Riemannian gradients and Hessians, respectively. More generally, DD will be used to denote the Levi-Civita connection on the semi-Riemannian manifold, while \nabla denotes for the Levi-Civita connection on a Riemannian manifold. We denote anti-symmetric (i.e. skew-symmetric) matrices and symmetric matrices of size nn-by-nn with Skew(n×n)\mathrm{Skew}\left(\mathbb{R}^{n\times n}\right) and Sym(n×n)\mathrm{Sym}\left(\mathbb{R}^{n\times n}\right), respectively. For a vector space VV, kV\bigwedge^{k}V and SkVS^{k}V stands for alternated or symmetrized kk copies of VV, respectively.

2.2 Riemannian Manifold Optimization

As stated at the beginning of this paper, manifold optimization is a type of nonlinear optimization problems taking the form of Eq. 1. The methodology of Riemannian optimization is to equip the smooth manifold MM with a Riemannian metric structure, i.e. positive definite bilinear forms ,\left\langle\cdot,\cdot\right\rangle on the tangent spaces of \mathcal{M} that varies smoothly on the manifold [28, 10, 38]. The differentiable structure on \mathcal{M} facilitates generalizing the concept of differentiable functions from Euclidean spaces to these nonlinear objects; in particular, notions such as gradient and Hessian are available on Riemannian manifolds and play the same role as their Euclidean space counterparts.

The algorithmic framework of Riemannian manifold optimization has been established and investigated in a sequence of works [13, 44, 12, 2]. These algorithms typically builds upon the concepts of gradient, the first-order differential operator :C1()Γ(TM)\nabla:C^{1}\left(\mathcal{M}\right)\rightarrow\Gamma\left(TM\right) defined by

f(x),X=Xf(x)XTxM,\left\langle\nabla f\left(x\right),X\right\rangle=Xf\left(x\right)\quad\forall X\in T_{x}M,

and Hessian, the covariant derivative of the gradient operator defined by

2f(X,Y)=XYf(XY)fX,YΓ(TM)\nabla^{2}f\left(X,Y\right)=XYf-\left(\nabla_{X}Y\right)f\quad\forall X,Y\in\Gamma\left(TM\right)

as well as a retraction Retrx:Tx\mathrm{Retr}_{x}:T_{x}\mathcal{M}\rightarrow\mathcal{M} from each tangent plane TxT_{x}\mathcal{M} to the manifold \mathcal{M} such that (1) Retrx(0)=x\mathrm{Retr}_{x}\left(0\right)=x for all xx\in\mathcal{M}, and (2) the differential map of Retrx\mathrm{Retr}_{x} is identify at 0Tx0\in T_{x}\mathcal{M}. On Riemannian manifolds it is natural to use the exponential mapping as the retraction, but any general map from tangent spaces to the Riemannian manifold suffices; in fact, the only requirement implied by conditions (1) and (2) is that the retraction map coincides with the exponential map up to the first order.

The optimality conditions for unconstrained optimization on Euclidean spaces in terms of gradients and Hessians can be naturally translated into the Riemannian manifold setting:

Proposition 2.1 ([8], Proposition 1.1).

A local optimum xx\in\mathcal{M} of Problem Eq. 1 satisfies the following necessary conditions:

  1. (i)

    f(x)=0\nabla f\left(x\right)=0 if f:f:\mathcal{M}\rightarrow\mathbb{R} is first-order differentiable;

  2. (ii)

    f(x)=0\nabla f\left(x\right)=0 and 2f(x)0\nabla^{2}f\left(x\right)\succeq 0 if f:f:\mathcal{M}\rightarrow\mathbb{R} is second-order differentiable.

Following [8], we call xx\in\mathcal{M} satisfying condition (i) in Proposition 2.1 a (first-order) critical point or stationary point, and a point satisfying condition (i) in Proposition 2.1 a second-order critical point.

The heart of Riemannian manifold optimization is to transform the nonlinear constrained optimization problem Eq. 1 into an unconstrained problem on the manifold \mathcal{M}. Following this methodology, classical unconstrained optimization algorithms such as gradient descent, conjugate gradients, Newton’s method, and trust region methods have been generalized to Riemannian manifolds; see [2, Chapter 8]. For instance, the dynamics of the iterates x0,x1,,xk,x_{0},x_{1},\cdots,x_{k},\cdots generated by gradient descent algorithm on Riemannian manifolds essentially replaces the descent step xk+1=xkf(xk)x_{k+1}=x_{k}-\nabla f\left(x_{k}\right) with its Riemannian counterpart xk+1=Retrxk(f(xk))x_{k+1}=\mathrm{Retr}_{x_{k}}\left(-\nabla f\left(x_{k}\right)\right). Other differential geometric objects such as parallel-transport, Hessian, and curvature render themselves naturally en route to adapting other unconstrained optimization algorithms to the manifold setting. We refer interested readers to [2] for more details.

2.3 Semi-Riemannian Geometry

Semi-Riemannian geometry differs from Riemannian geometry in that the bilinear form equipped on each tangent space can be indefinite. Classical examples include Lorentzian spaces and De Sitter spaces in general relativity; see e.g. [35, 9]. Although one may think of Riemannian geometry as a special case of semi-Riemannian geometry as all Riemannian metric tensors are automatically semi-Riemannian, the existence of a semi-Riemannian metric with nontrivial index (see definition below) actually imposes additional constraints on the tangent bundle of the manifold and is thus often more restrictive—the tangent bundle should admit a non-trivial splitting into the direct sum of “positive definite” and “negative definite” sub-bundles. Nevertheless, such metric structures have found vast applications in and beyond understanding the geometry of spacetime, for instance, in the study of the regularity of optimal transport maps [21, 20, 3].

Definition 2.2.

A symmetric bilinear form ,:V×V\left\langle\cdot,\cdot\right\rangle:V\times V\rightarrow\mathbb{R} on a vector space VV is non-degenerate if

v,w=0for allwVv=0.\left\langle v,w\right\rangle=0\,\,\textrm{for all}\,\,w\in V\quad\Leftrightarrow\quad v=0.

The index ν0\nu\in\mathbb{Z}_{\geq 0} of a symmetric bilinear form on VV is the dimension of the maximum negative definite subspace of VV; similarly, we denote π0\pi\in\mathbb{Z}_{\geq 0} for the dimension of the maximum positive definite subspace of VV. A scalar product on a vector space VV is a non-degenerate symmetric bilinear form on VV. The signature of a scalar product on VV with index ν\nu is a vector of length dim(V)\mathrm{dim}\left(V\right) with the first ν\nu entries equaling 1-1 and the rest of entries equaling 11. A subspace WVW\subset V is said to be non-degenerate if the restriction of the scalar product to WW is non-degenerate.

The main difference between a scalar product and an inner product is that the former needs not possess positive definiteness. The main issue with this lack of positivity is the consequent lack of a meaningful definition for “orthogonality” — a vector subspace may well be the orthogonal complement of itself: consider for example the subspace spanned by (1,1)\left(1,1\right) in 2\mathbb{R}^{2} equipped with a scalar product with signature (,+)\left(-,+\right). The same example illustrates that the property of non-degeneracy is not always inheritable by subspaces. Nonetheless, the following is true:

Lemma 2.3 (Chapter 2, Lemma 23, [35]).

A subspace WW of a vector space VV is non-degenerate if and only if V=WWV=W\oplus W^{\perp}.

Definition 2.4 (Semi-Riemannian Manifolds).

A metric tensor gg on a smooth manifold MM is a symmetric non-degenerate (0,2)\left(0,2\right) tensor field on MM of constant index. A semi-Riemannian manifold is a smooth manifold MM equipped with a metric tensor.

Example 2.5 (Minkowski Spaces p,q\mathbb{R}^{p,q}).

Consider the Euclidean space n\mathbb{R}^{n} and denote Ip,q\operatorname{I}_{p,q} for the nn-by-nn diagonal matrix with the first pp diagonal entries equaling 1-1 and the rest q=npq=n-p entries equaling 11, where 0pn0\leq p\leq n and n1n\geq 1. For arbitrary u,wnu,w\in\mathbb{R}^{n}, define the bilinear form

u,v:=uIp,qw.\left\langle u,v\right\rangle:=u^{\top}\operatorname{I}_{p,q}w.

It is straightforward to verify that this bilinear form is nondegenerate on n\mathbb{R}^{n}, and that such defined (,,)\left(\mathbb{R},\left\langle\cdot,\cdot\right\rangle\right) is a semi-Riemannian manifold. This space is known as the Minkowski space of signature (p,q)\left(p,q\right).

Example 2.6.

Consider the vector space of matrices n×n\mathbb{R}^{n\times n}, where nn\in\mathbb{N} and n=p+qn=p+q, p,qp,q\in\mathbb{N}. Define a bilinear form on n×n\mathbb{R}^{n\times n} by

A,B:=Tr(AIp,qB),A,Bn×n.\left\langle A,B\right\rangle:=\mathrm{Tr}\left(A^{\top}\operatorname{I}_{p,q}B\right),\quad\forall A,B\in\mathbb{R}^{n\times n}.

This bilinear form is non-degenerate on n×n\mathbb{R}^{n\times n}, because for any A,Bn×nA,B\in\mathbb{R}^{n\times n} we have

Tr(AIp,qB)=vec(A)(InIp,q)vec(B)\mathrm{Tr}\left(A^{\top}\operatorname{I}_{p,q}B\right)=\mathrm{vec}\left(A\right)^{\top}\left(I_{n}\otimes I_{p,q}\right)\mathrm{vec}\left(B\right)

where InI_{n} is the identity matrix of size nn-by-nn, \otimes denotes for the Kronecker product, and vec:n×nn2\mathrm{vec}:\mathbb{R}^{n\times n}\rightarrow\mathbb{R}^{n^{2}} is the vectorization operator that vertically stacks the columns of a matrix in n×n\mathbb{R}^{n\times n}. The non-degeneracy then follows from Example 2.5. This example gives rise to a semi-Riemannian structure for matrices in n×n\mathbb{R}^{n\times n}.

The non-degeneracy of the semi-Riemannian metric tensor ensures that most classical constructions on Riemannian manifolds have their analogies on a semi-Riemannian manifold. Most fundamentally, the “miracle of Riemannian geometry” — the existence and uniqueness of a canonical connection — is beheld on semi-Riemannian manifolds as well. Quoting [35, Theorem 11], on a semi-Riemannian manifold MM there is a unique connection D:Γ(M,TM)Γ(M,T2M)D:\Gamma\left(M,TM\right)\rightarrow\Gamma\left(M,T^{\otimes 2}M\right) such that

(3) [V,W]=DVWDWV\left[V,W\right]=D_{V}W-D_{W}V

and

(4) XV,W=DXV,W+V,DXWX\left\langle V,W\right\rangle=\left\langle D_{X}V,W\right\rangle+\left\langle V,D_{X}W\right\rangle

for all X,V,WΓ(M,TM)X,V,W\in\Gamma\left(M,TM\right). This connection is called the Levi-Civita connection of MM and is characterized by the Koszul formula

(5) 2DVW,X=\displaystyle 2\left\langle D_{V}W,X\right\rangle= VW,X+WX,VXV,W\displaystyle V\left\langle W,X\right\rangle+W\left\langle X,V\right\rangle-X\left\langle V,W\right\rangle
V,[W,X]+W,[X,V]+X,[V,W]X,V,WΓ(M,TM).\displaystyle-\left\langle V,\left[W,X\right]\right\rangle+\left\langle W,\left[X,V\right]\right\rangle+\left\langle X,\left[V,W\right]\right\rangle\quad\forall X,V,W\in\Gamma\left(M,TM\right).

Geodesics, parallel-transport, and curvature of MM can be defined via the Levi-Civita connection on MM in an entirely analogous manner as on Riemannian manifolds.

Differential operators can be defined on semi-Riemannian manifolds much the same way as on Riemannian manifolds. For any fC1(M)f\in C^{1}\left(M\right), where MM is a semi-Riemannian manifold, the gradient of ff, denoted as DfΓ(M,TM)Df\in\Gamma\left(M,TM\right), is defined by the equality (c.f. [35, Definition 47])

(6) Df,X=Xf,XΓ(M,TM).\left\langle Df,X\right\rangle=Xf,\quad\forall X\in\Gamma\left(M,TM\right).

The Hessian of fC2(M)f\in C^{2}\left(M\right) can be similarly defined, also similar to the Riemannian case ([35, Definition 48, Lemma 49]), by D2f=D(Df)Γ(M,TMTM)D^{2}f=D\left(Df\right)\in\Gamma\left(M,T^{*}M\otimes T^{*}M\right), or equivalently

(7) D2f(X,Y)=XYf(DXY)f,X,YΓ(M,TM).D^{2}f\left(X,Y\right)=XYf-\left(D_{X}Y\right)f,\quad\forall X,Y\in\Gamma\left(M,TM\right).

Since the Levi-Civita connection on MM is torsion-free, 2f\nabla^{2}f is a symmetric (0,2)\left(0,2\right) tensor field on MM, i.e.,

D2f(X,Y)=D2f(Y,X),X,YΓ(M,TM).D^{2}f\left(X,Y\right)=D^{2}f\left(Y,X\right),\quad\forall X,Y\in\Gamma\left(M,TM\right).

One way to compare the semi-Riemannian and Riemannian gradients and Hessians, when both metric structures exist on the same smooth manifold, is through their local coordinate expressions. In fact, the local coordinate expressions for the two types (Riemannian/semi-Riemannian) of differential operators can be unified as follows. Let {x1,,xn}\left\{x^{1},\cdots,x^{n}\right\} be a local coordinate system around an arbitrary point xx\in\mathcal{M}, and denote gijg_{ij} and hijh_{ij} for the components of the Riemannian and semi-Riemannian metric tensors, respectively; the Christoffel symbols will be denoted as Γijkg\phantom{}{}^{g}\Gamma_{ij}^{k} and Γijkh\phantom{}{}^{h}\Gamma_{ij}^{k}, respectively. Direct computation reveals

(8) f=gijjfi,2f=(ij2fgΓijkkf)dxidxj,\displaystyle\nabla f=g^{ij}\partial_{j}f\partial_{i},\qquad\nabla^{2}f=\left(\partial_{ij}^{2}f-\phantom{}^{g}\Gamma_{ij}^{k}\partial_{k}f\right)\mathrm{d}x^{i}\otimes\mathrm{d}x^{j},
Df=hijjfi,D2f=(ij2fhΓijkkf)dxidxj.\displaystyle Df=h^{ij}\partial_{j}f\partial_{i},\qquad D^{2}f=\left(\partial_{ij}^{2}f-\phantom{}^{h}\Gamma_{ij}^{k}\partial_{k}f\right)\mathrm{d}x^{i}\otimes\mathrm{d}x^{j}.

Using the music isomorphism induced from the (Riemannian or semi-Riemannian) metric, the Hessians can be cast in the form of (2,0)\left(2,0\right)-tensors on Γ(TMTM)\Gamma\left(T\!M\otimes T\!M\right) as

(2f)\displaystyle\left(\nabla^{2}f\right)^{\sharp} =gigjm(ij2fgΓijkkf)im,\displaystyle=g^{i\ell}g^{jm}\left(\partial_{ij}^{2}f-\phantom{}^{g}\Gamma_{ij}^{k}\partial_{k}f\right)\partial_{i}\otimes\partial_{m},
(D2f)\displaystyle\left(D^{2}f\right)^{\sharp} =hihjm(ij2fhΓijkkf)im.\displaystyle=h^{i\ell}h^{jm}\left(\partial_{ij}^{2}f-\phantom{}^{h}\Gamma_{ij}^{k}\partial_{k}f\right)\partial_{i}\otimes\partial_{m}.
Remark 2.7.

Notably, for any xx\in\mathcal{M}, if we compute the Hessians D2f(x)D^{2}f\left(x\right) and 2f(x)\nabla^{2}f\left(x\right) in the corresponding geodesic normal coordinates centered at xx, Eq. 8 implies that the two Hessians take the same coordinate form (ij2f)1i,jn\left(\partial_{ij}^{2}f\right)_{1\leq i,j\leq n} since both Γijkg\phantom{}{}^{g}\Gamma_{ij}^{k} and Γijkh\phantom{}{}^{h}\Gamma_{ij}^{k} vanish at xx. For instance, n\mathbb{R}^{n} has the same geodesics under the Euclidean or Lorentzian metric (straight lines), and the standard coordinate system serves as geodesic normal coordinate system for both metrics; see Example 2.10. In particular, the notion of geodesic convexity [39, 46] is equivalent for the two different of metrics; this equivalence is not completely trivial by the well-known first and second order characterization (see e.g. [46, Theorem 5.1] and [46, Theorem 6.1]) since geodesics need not be the same under different metrics.

Proposition 2.8.

On a smooth manifold \mathcal{M} admitting two different Riemannian or semi-Riemannian structures, an optimization problem is geodesic convex with respect to one metric if and only if it is also geodesic convex with respect to another.

Proof 2.9.

Denote the two metric tensors on \mathcal{M} as gg and hh, respectively. Both gg and hh can be Riemannian or semi-Riemannian, respectively or simultaneously. For any xx\in\mathcal{M}, let x1,,xnx^{1},\cdots,x^{n} and y1,,yny^{1},\cdots,y^{n} be the geodesic coordinates around xx with respect to gg and hh, respectively. Denote J=(yj/xi)1i,jnJ=\left(\partial y_{j}/\partial x_{i}\right)_{1\leq i,j\leq n} for the Jacobian of the coordinate transformation between the two normal coordinate systems. The coordinate expressions of a tangent vector vTxv\in T_{x}\mathcal{M} in the two normal coordinate systems are linked by (Einstein summation convention adopted)

v=vi/xi=v~j/yjvi=v~jxi/yj.v=v^{i}\partial/\partial x_{i}=\tilde{v}^{j}\partial/\partial y_{j}\quad\Leftrightarrow\quad v^{i}=\tilde{v}^{j}\partial x_{i}/\partial y_{j}.

Therefore

[2f(x)](v,v)0vTx\displaystyle\left[\nabla^{2}f\left(x\right)\right]\left(v,v\right)\geq 0\quad\forall v\in T_{x}\mathcal{M}
\displaystyle\Leftrightarrow vivj2fxixj(x)0v1,,vn\displaystyle v^{i}v^{j}\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}\left(x\right)\geq 0\quad\forall v^{1},\cdots,v^{n}\in\mathbb{R}
\displaystyle\Leftrightarrow v~xiyv~mxjym2fyiyj0v~1,,v~n\displaystyle\tilde{v}^{\ell}\frac{\partial x_{i}}{\partial y_{\ell}}\tilde{v}^{m}\frac{\partial x_{j}}{\partial y_{m}}\frac{\partial^{2}f}{\partial y_{i}\partial y_{j}}\geq 0\quad\forall\tilde{v}^{1},\cdots,\tilde{v}^{n}\in\mathbb{R}
\displaystyle\Leftrightarrow [D2f(x)](v,v)0vTx.\displaystyle\left[D^{2}f\left(x\right)\right]\left(v,v\right)\geq 0\quad\forall v\in T_{x}\mathcal{M}.

which establishes the desired equivalence.

Example 2.10 (Gradient and Hessian in Minkowski Spaces).

Consider the Euclidean space n\mathbb{R}^{n}. Denote Ip,qn×nI_{p,q}\in\mathbb{R}^{n\times n} for the nn-by-nn diagonal matrix with the first pp diagonal entries equaling 1-1 and the rest q=npq=n-p diagonal entries equaling 11. We compute and compare in this example the gradients and Hessians of differentiable functions on n\mathbb{R}^{n}. We take the Riemannian metric as the standard Euclidean metric, and the semi-Riemannian metric given by Ip,q\operatorname{I}_{p,q}. For any fC2(M)f\in C^{2}\left(M\right), the gradient of ff is determined by

(Df)Ip,qX\displaystyle\left(Df\right)^{\top}\operatorname{I}_{p,q}X =Xf=(f)X,XΓ(n,n)\displaystyle=Xf=\left(\nabla f\right)^{\top}X,\quad\forall X\in\Gamma\left(\mathbb{R}^{n},\mathbb{R}^{n}\right)
Df=Ip,qfwhere f=(1f,,nf)n.\displaystyle\Leftrightarrow Df=\operatorname{I}_{p,q}\nabla f\qquad\textrm{where $\nabla f=\left(\partial_{1}f,\cdots,\partial_{n}f\right)\in\mathbb{R}^{n}$.}

Furthermore, since in this case the semi-Riemannian metric tensor is constant on n\mathbb{R}^{n}, the Christoffel symbol vanishes (c.f. [35, Chap 3. Proposition 13 and Lemma 14]), and thus DXDf=Ip,qXf=Ip,q(2f)XD_{X}Df=\operatorname{I}_{p,q}\nabla_{X}\nabla f=\operatorname{I}_{p,q}\left(\nabla^{2}f\right)X for all XΓ(n,n)X\in\Gamma\left(\mathbb{R}^{n},\mathbb{R}^{n}\right), where

2f=(ijf)1i,jnn×n.\nabla^{2}f=\left(\partial_{i}\partial_{j}f\right)_{1\leq i,j\leq n}\in\mathbb{R}^{n\times n}.

By the definition of Hessian, for all X,YΓ(n,n)X,Y\in\Gamma\left(\mathbb{R}^{n},\mathbb{R}^{n}\right) we have

D2f(X,Y)=DXDf,Y=YIp,qIp,q(2f)X=Y(2f)XD^{2}f\left(X,Y\right)=\left\langle D_{X}Df,Y\right\rangle=Y^{\top}\operatorname{I}_{p,q}\cdot\operatorname{I}_{p,q}\left(\nabla^{2}f\right)X=Y^{\top}\left(\nabla^{2}f\right)X

from which we deduce the equality D2f=2fD^{2}f=\nabla^{2}f. In fact, the equivalence of the two Hessians also follows directly from Remark 2.7, since the geodesics under the Riemannian and semi-Riemannian metrics coincide in this example (see e.g. [35, Chapter 3 Example 25]). In particular, the equivalence between the two types of geodesics and Hessians imply the equivalence of geodesic convexity for the two metrics.

3 Semi-Riemannian Optimization Framework

This section introduces the algorithmic framework of semi-Riemannian optimization. To begin with, we point out that the first- and second-order necessary conditions for optimality in unconstrained optimization and Riemannian optimization can be directly generalized to semi-Riemannian manifolds. We then generalize several Riemannian manifold optimization algorithms to their semi-Riemannian counterparts, and illustrate the difference with a few numerical examples. We end this section by showing global and local convergence results for semi-Riemannian optimization.

3.1 Optimality Conditions

The following Proposition 3.1 should be considered as the semi-Riemannian analogy of the optimality conditions Proposition 2.1 .

Proposition 3.1 (Semi-Riemannian First- and Second-Order Necessary Conditions for Optimality).

Let \mathcal{M} be a semi-Riemannian manifold. A local optimum xx\in\mathcal{M} of Problem Eq. 1 satisfies the following necessary conditions:

  1. (i)

    Df(x)=0Df\left(x\right)=0 if f:f:\mathcal{M}\rightarrow\mathbb{R} is first-order differentiable;

  2. (ii)

    Df(x)=0Df\left(x\right)=0 and D2f(x)0D^{2}f\left(x\right)\succeq 0 if f:f:\mathcal{M}\rightarrow\mathbb{R} is second-order differentiable.

Proof 3.2.
  1. (i)

    If xx\in\mathcal{M} is a local optimum of Eq. 1, then for any XΓ(TM)X\in\Gamma\left(TM\right) we have Xf(x)=0Xf\left(x\right)=0, which, by definition Eq. 6 and the non-degeneracy of the semi-Riemannian metric, implies that Df(x)=0Df\left(x\right)=0.

  2. (ii)

    If xx\in\mathcal{M} is a local optimum of Eq. 1, then there exists a local neighborhood UU\subset\mathcal{M} of xx such that f(y)f(x)f\left(y\right)\geq f\left(x\right) for all yUy\in U. Without loss of generality we can assume that UU is sufficiently small so as to be geodesically convex (see e.g. [10, §3.4]). Denote γ:[1,1]U\gamma:\left[-1,1\right]\rightarrow U for a constant-speed geodesic segment connecting γ(0)=x\gamma\left(0\right)=x to γ(1)=y\gamma\left(1\right)=y that lies entirely in UU. The one-variable function tfγ(t)t\mapsto f\circ\gamma\left(t\right) admits Taylor expansion

    f(y)\displaystyle f\left(y\right) =fγ(1)=fγ(0)+(fγ)(0)+12(fγ)′′(ξ)\displaystyle=f\circ\gamma\left(1\right)=f\circ\gamma\left(0\right)+\left(f\circ\gamma\right)^{\prime}\left(0\right)+\frac{1}{2}\left(f\circ\gamma\right)^{\prime\prime}\left(\xi\right)
    =f(x)+Df(x),γ(0)+12Dγ(ξ)Df(γ(ξ)),γ(ξ)\displaystyle=f\left(x\right)+\left\langle Df\left(x\right),\gamma^{\prime}\left(0\right)\right\rangle+\frac{1}{2}D_{\gamma^{\prime}\left(\xi\right)}\left\langle Df\left(\gamma\left(\xi\right)\right),\gamma^{\prime}\left(\xi\right)\right\rangle
    =f(x)+12[D2f(γ(ξ))](γ(ξ),γ(ξ))\displaystyle=f\left(x\right)+\frac{1}{2}\left[D^{2}f\left(\gamma\left(\xi\right)\right)\right]\left(\gamma^{\prime}\left(\xi\right),\gamma^{\prime}\left(\xi\right)\right)

    where the last equality used Df(x)=0Df\left(x\right)=0. Letting yxy\rightarrow x on \mathcal{M}, the smoothness of D2fD^{2}f ensures that

    D2f(x)[V,V]0VTxD^{2}f\left(x\right)\left[V,V\right]\geq 0\qquad\forall V\in T_{x}\mathcal{M}

    which establishes D2f(x)0D^{2}f\left(x\right)\succeq 0.

The formal similarity between Proposition 3.1 and Proposition 2.1 is not entirely surprising. As can be seen from the proofs, both optimality conditions are based on geometric interpretations of the same Taylor expansion; the metrics affect the specific forms of the gradient and Hessian, but the optimality conditions are essentially derived from the Taylor expansions only. Completely parallel to the Riemannian setting, we can also translate the second-order sufficient conditions [26, §7.3] into the semi-Riemannian setting without much difficulty. The proof essentially follows [26, §7.3 Proposition 3], with the Taylor expansion replaced with the expansion along geodesics in Proposition 3.1 (ii); we omit the proof since it is straightforward, but document the result in Proposition 3.3 below for future reference. Recall from [26, §7.1] that xx\in\mathcal{M} is a strict relative minimum point of ff on \mathcal{M} if there is a local neighborhood of xx on \mathcal{M} such that f(y)>f(x)f\left(y\right)>f\left(x\right) for all yU\{x}y\in U\backslash\left\{x\right\}.

Proposition 3.3 (Semi-Riemannian Second-Order Sufficient Conditions).

Let ff be a second differentiable function on a semi-Riemannian manifold \mathcal{M}, and xx\in\mathcal{M} is a an interior point. If Df(x)=0Df\left(x\right)=0 and D2f(x)0D^{2}f\left(x\right)\succ 0, then xx is a strict relative minimum point of ff.

The formal similarity between the Riemannian and semi-Riemannian optimality conditions indicates that it might be possible to transfer many technologies in manifold optimization from the Riemannian to the semi-Riemannian setting. For instance, the equivalence of the first-order necessary condition implies that, in order to search for a first-order stationary point, on a semi-Riemannian manifold we should look for points at which the semi-Riemannian gradient DfDf vanishes, just like in the Riemannian realm we look for points at which the Riemannian gradient f\nabla f vanishes. However, extra care has to be taken regarding the influence different metric structures have on the induced topology of the underlying manifold. For Riemannian manifolds, it is straightforward to check that the induced topology coincides with the original topology of the underlying manifold (see e.g. [10, Chap 7 Proposition 2.6]), whereas the “topology” induced by a semi-Riemannian structure is generally quite pathological — for instance, two distinct points connected by a light-like geodesic (a geodesic along which all tangent vectors are null vectors (c.f. Definition 4.1)) has zero distance. An exemplary consequence is that, in search of a first-order stationary point, we shouldn’t be looking for points at which Df2\left\|Df\right\|^{2} vanishes since this does not imply Df=0Df=0.

3.2 Determining the “Steepest Descent Direction”

As long as gradients, Hessians, retractions, and parallel-transports can be properly defined, one might think there exists no essential difficulty in generalizing any Riemannian optimization algorithms to the semi-Riemannian setup, with the Riemannian geometric quantities replaced with their semi-Riemannian counterparts, mutatis mutandis. It is tempting to apply this methodology to all standard manifold optimization algorithms, including but not limited to first-order methods such as steepest descent, conjugate gradient descent, and quasi-Newton methods, or second-order methods such as Newton’s method and trust region methods. We discuss in this subsection how to determine a proper descent direction for steepest-descent-type algorithms on a semi-Riemannian manifold. Some exemplary first- and second-order methods will be discussed in the next subsection.

As one of the prototypical first-order optimization algorithms, gradient descent is known for its simplicity yet surprisingly powerful theoretical guarantees under mild technical assumptions. A plausible “Semi-Riemannian Gradient Descent” algorithm that naïvely follows the paradigm of Riemannian gradient descent could be designed as simply replacing the Riemannian gradient f\nabla f with the semi-Riemannian gradient DfDf defined in Eq. 6, as listed in Algorithm 1. Of course, a key step in Algorithm 1 is to determine the descent direction ηk\eta_{k} in each iteration. However, while negative gradient is an obvious choice in Riemannian manifold optimization, the “steepest descent direction” is a slightly more subtle notion in semi-Riemannian geometry, as will be demonstrated shortly in this section.

A first difficulty with replacing f(x)-\nabla f\left(x\right) by Df(x)-Df\left(x\right) is that Df(x)-Df\left(x\right) needs not be a descent direction at all: consider, for instance, an illustrative example of optimization in the Minkowski space (Euclidean space equipped with the standard semi-Riemannian metric): the first order Taylor expansion at xx gives for any small t>0t>0

(9) f(xtDf(x))f(x)tDf(x),Df(x)f\left(x-tDf\left(x\right)\right)\approx f\left(x\right)-t\left\langle Df\left(x\right),Df\left(x\right)\right\rangle

but in the semi-Riemannian setting the scalar product term Df(x),Df(x)\left\langle Df\left(x\right),Df\left(x\right)\right\rangle may well be negative, unlike the Riemannian case. In order for the value of the objective function to decrease (at least in the first order), we have to pick the descent direction to be either Df(x)Df\left(x\right) or Df(x)-Df\left(x\right), whichever makes Df(x),Df(x)>0\left\langle Df\left(x\right),Df\left(x\right)\right\rangle>0.

Though the quick fix by replacing Df(x)Df\left(x\right) with ±Df(x)\pm Df\left(x\right) would work generically in many problems of practical interest, a second, and more serious issue with choosing ±Df(x)\pm Df\left(x\right) as the descent direction lies inherently at the indefiniteness of the metric tensor. For standard gradient descent algorithms (e.g. on Euclidean spaces with standard metric, or more generally on Riemannian manifolds), the algorithm terminates after f\left\|\nabla f\right\| becomes smaller than a predefined threshold; for norms induced from positive definite metric tensors, f0\left\|\nabla f\right\|\approx 0 is equivalent to characterizing f0\nabla f\approx 0, implying that the sequence {xkk=0,1,}\left\{x_{k}\mid k=0,1,\cdots\right\} is truly approaching a first order stationary point. This intuition breaks down for indefinite metric tensors as Df0\left\|Df\right\|\approx 0 no longer implies the proximity between DfDf and 0. Even though one can fix this ill-defined termination condition by introducing an auxiliary Riemannian metric (which always exists on a Riemannian manifold), when DfDf is a null vector (i.e. Df=0\left\|Df\right\|=0, see Definition 4.1), the gradient algorithm loses the first order decrease in the objection function value (see Eq. 9); the validity of the algorithm then relies upon second-order information, with which we lose the benefits of first-order methods. As a concrete example, consider the unconstrained optimization problem on the Minkowski space 2\mathbb{R}^{2} equipped with a metric of signature (1,1)\left(-1,1\right):

minx,yf(x,y)=12(xy)2.\min_{x,y\in\mathbb{R}}f\left(x,y\right)=\frac{1}{2}\left(x-y\right)^{2}.

Recall from Example 2.10 that

Df(x,y)=I1,1f(x,y)=(xy)(1,1)Df\left(x,y\right)=\operatorname{I}_{1,1}\nabla f\left(x,y\right)=-\left(x-y\right)\cdot\left(1,1\right)^{\top}

which is a direction parallel to the isolines of the objective function ff. Thus the semi-Riemannian gradient descent will never decrease the objective function value.

Algorithm 1 Semi-Riemannian Steepest Descent
1:Manifold MM, semi-Riemannian metric ,\left\langle\cdot,\cdot\right\rangle, objective function ff, retraction Retr:TMM\mathrm{Retr}:T\!M\rightarrow M, initial value x0Mx_{0}\in M, parameters for Linesearch, gradient DfDf
2:x0x_{0}\leftarrow Initiate
3:k0k\leftarrow 0
4:while not converge do
5:  η\eta\leftarrowFindDescentDirection(xk,M,Df(xk))\left(x_{k},M,Df\left(x_{k}\right)\right) \triangleright c.f. Algorithm 4
6:  0<tk0<t_{k}\leftarrow LineSearch(f,xk,ηk)\left(f,x_{k},\eta_{k}\right) \triangleright tkt_{k} is the Armijo step size
7:  Choose xk+1x_{k+1} such that \triangleright c(0,1)c\in\left(0,1\right) is a parameter
f(xk)f(xk+1)>c[f(xk)f(Retrxk(tkηk))]f\left(x_{k}\right)-f\left(x_{k+1}\right)>c\left[f\left(x_{k}\right)-f\left(\mathrm{Retr}_{x_{k}}\left(t_{k}\eta_{k}\right)\right)\right]
8:  kk+1k\leftarrow k+1
9:end while
10:return Sequence of iterates {xk}\left\{x_{k}\right\}

To rectify these issues, it is necessary to revisit the motivating, geometric interpretation of the negative gradient direction as the direction of “steepest descent,” i.e. for any Riemannian manifold (M,g)\left(M,g\right) and function ff on MM differentiable at xMx\in M, we know from vector arithmetic that

(10) f(x)g(f(x),f(x))=argminVTxMg(V,V)=1g(V,f(x))=argminVTxMg(V,V)=1Vf(x).-\frac{\nabla f\left(x\right)}{\sqrt{g\left(\nabla f\left(x\right),\nabla f\left(x\right)\right)}}=\operatorname*{argmin}_{V\in T_{x}M\atop g\left(V,V\right)=1}g\left(V,\nabla f\left(x\right)\right)=\operatorname*{argmin}_{V\in T_{x}M\atop g\left(V,V\right)=1}Vf\left(x\right).

In the semi-Riemannian setting, assuming MM is equipped with a semi-Riemannian metric hh, we can also set the descent direction leading to the steepest decrease of the objective function value. It is not hard to see that in general

(11) ±Df(x)|h(Df(x),Df(x))|argminVTxM|h(V,V)|=1h(V,f(x))=argminVTxM|h(V,V)|=1Vf(x).\pm\frac{Df\left(x\right)}{\sqrt{\left|h\left(Df\left(x\right),Df\left(x\right)\right)\right|}}\neq\operatorname*{argmin}_{V\in T_{x}M\atop\left|h\left(V,V\right)\right|=1}h\left(V,\nabla f\left(x\right)\right)=\operatorname*{argmin}_{V\in T_{x}M\atop\left|h\left(V,V\right)\right|=1}Vf\left(x\right).

In fact, in both versions the search for the “steepest descent direction” is guided by making the directional derivative Vf(x)Vf\left(x\right) as negative as possible, but constrained on different unit spheres. The precise relation between the two steepest descent directions is not readily visible, for the two unit spheres could differ drastically in geometry. In fact, for cases in which the unit ball {vTxM|h(v,v)|=1}\left\{v\in T_{x}M\mid\left|h\left(v,v\right)\right|=1\right\} is noncompact, the “steepest descent direction” so defined may not even exist.

Example 3.4.

Consider the optimization problem over the Minkowski space 1,1\mathbb{R}^{1,1} equipped with a metric of signature (,+)\left(-,+\right)

minx,yf(x,y)=12[x2+(y+1)2].\min_{x,y\in\mathbb{R}}f\left(x,y\right)=\frac{1}{2}\left[x^{2}+\left(y+1\right)^{2}\right].

At (x,y)=(0,0)\left(x,y\right)=\left(0,0\right), recall from Example 2.10 that f(0,0)=(0,1)=Df(0,0)\nabla f\left(0,0\right)=\left(0,1\right)^{\top}=Df\left(0,0\right). Over the unit ball {(u,v)2u2v2=±1}T(0,0)2\left\{\left(u,v\right)^{\top}\in\mathbb{R}^{2}\mid u^{2}-v^{2}=\pm 1\right\}\subset T_{\left(0,0\right)}\mathbb{R}^{2} under this Lorentzian metric, the scalar product Df(0,0),(u,v)=v\left\langle Df\left(0,0\right),\left(u,v\right)^{\top}\right\rangle=v\rightarrow-\infty as (u,v)(,)\left(u,v\right)\rightarrow\left(\infty,-\infty\right). Even worse, since the scalar product approaches -\infty, it is not possible to find a descent direction η\eta with Df(0,0)γminVTxM,|V,V|=1Vf(0,0)\left\langle Df\left(0,0\right)\right\rangle\geq\gamma\min_{V\in T_{x}M,\,\left|\,\left\langle V,V\right\rangle\right|=1}Vf\left(0,0\right) for some pre-set threshold γ>0\gamma>0.

One way to fix this non-compactness issue is to restrict the candidate tangent vectors VV in the minimization of Vf(x)Vf\left(x\right) to lie in a compact subset of the tangent space TMT\!M. For instance, one can consider the unit sphere in TMT\!M under a Riemannian metric. Comparing the right hand sides of Eq. 10 and Eq. 11, descent directions determined in this manner will be the negative gradient direction under the Riemannian metric, thus in general has nothing to do with the semi-Riemannian metric; moreover, if a Riemannian metric has to be defined laboriously in addition to the semi-Riemannian one, in principle we can already employ well-established, fully-functioning Riemannian optimization techniques, thus bypassing the semi-Riemannian setup entirely. While this argument might well render first-order semi-Riemannian optimization futile, we emphasize here that one can define steepest descent directions with the aid of “Riemannian structures” that arise naturally from the semi-Riemannian structure, and thus there is no need to specify a separate Riemannian structure in parallel to the semi-Riemannian one, though this affiliated “Riemannian structure” is highly local.

The key observation here is that one does not need to consistently specify a Riemannian structure over the entire manifold, if the only goal is to find one steepest descent direction in that tangent space — in other words, when we search for the steepest descent direction in the tangent space TxMT_{x}M of a semi-Riemannian manifold MM, it suffices to specify a Riemannian structure locally around xx, or more extremely, only on the tangent space TxMT_{x}M, in order for the “steepest descent direction” to be well-defined over a compact subset of TxMT_{x}M. These local inner products do not have to “patch together” to give rise to a globally defined Riemannian structure. A very handy way to find local inner products is through the help of geodesic normal coordinates that reduce the local calculation to the Minkowski spaces. For any xMx\in M, there is a normal neighborhood UMU\subset M containing xx such that the exponential map expx:TxMM\mathrm{exp}_{x}:T_{x}M\rightarrow M is a diffeomorphism when restricted to UU, and one can pick an orthonormal basis (with respect to the semi-Riemannian metric on MM), denoted as {e1,,en}\left\{e_{1},\cdots,e_{n}\right\}, such that ei,ejx=δijϵj\left\langle e_{i},e_{j}\right\rangle_{x}=\delta_{ij}\epsilon_{j}, where 1i,jn1\leq i,j\leq n, n=dim(M)n=\mathrm{dim}\left(M\right), δij\delta_{ij} are the Kronecker delta’s, and ϵj{±1}\epsilon_{j}\in\left\{\pm 1\right\}. Without loss of generality, assume MM is a semi-Riemannian manifold of order pp, where 0pn0\leq p\leq n, and that ϵ1==ϵp=1\epsilon_{1}=\cdots=\epsilon_{p}=-1, ϵp+1==ϵn=1\epsilon_{p+1}=\cdots=\epsilon_{n}=1. The normal coordinates of any yUy\in U are determined by the coefficients of expx1yTxM\exp_{x}^{-1}y\in T_{x}M with respect to the orthonormal basis {e1,,en}\left\{e_{1},\cdots,e_{n}\right\}. It is straightforward (see [35, Proposition 33]) to verify that

gij(x)=δijϵj,Γijk(x)=01i,j,kng_{ij}\left(x\right)=\delta_{ij}\epsilon_{j},\quad\Gamma_{ij}^{k}\left(x\right)=0\qquad\forall 1\leq i,j,k\leq n

where {gij}\left\{g_{ij}\right\} denotes the semi-Riemannian metric tensor components and {Γijk}\left\{\Gamma_{ij}^{k}\right\} stands for the Christoffel symbols. Under this coordinate system, it is straightforward to verify that the scalar product between tangent vectors u,vTxMu,v\in T_{x}M can be written as

u,v=i=1nϵiuivi\left\langle u,v\right\rangle=\sum_{i=1}^{n}\epsilon_{i}u^{i}v^{i}

where u=uieiu=u^{i}e_{i} and v=vjejv=v^{j}e_{j} (Einstein’s summation convention implicitly invoked). The local Riemannian structure can thus be defined as

(12) g(u,v)=i=1nuivi.g\left(u,v\right)=\sum_{i=1}^{n}u^{i}v^{i}.

Essentially, such a local inner product is defined by imposing orthogonality between positive and negative definite subspaces of TxMT_{x}M and “reversing the sign” of the negative definite component of the scalar product. Making such a modification consistently and smoothly over the entire manifold is certainly subject to topological obstructions; nevertheless, locally (in fact, pointwise) defined Riemannian structures suffice for our purposes, and in practical applications we can simply the workflow by choosing an arbitrary orthonormal basis in the tangent space in place of the geodesic frame. The orthonormalization process, of course, is adapted for the semi-Riemannian setting; see [35, Chapter 2, Lemma 24 and Lemma 25] or Algorithm 2. The output set of vectors {e1,,en}\left\{e_{1},\cdots,e_{n}\right\} satisfies

ei,ej=δijϵi\left\langle e_{i},e_{j}\right\rangle=\delta_{ij}\epsilon_{i}

where δij\delta_{ij} are the Kronecker symbols, and ϵi=ei,ei=±1\epsilon_{i}=\left\langle e_{i},e_{i}\right\rangle=\pm 1. A generic approach which works with high probability is to pick a random linearly independent set of vectors and apply a (pivoted) Gram-Schmidt orthogonalization process with respect to the indefinite scalar product; see Algorithm 3.

Algorithm 2 Finding an Orthonormal Basis with respect to a Nondegenerate Indefinite Scalar Product
1:Vector space VV of finite dimension nn\in\mathbb{N}, scalar product ,:V×V\left\langle\cdot,\cdot\right\rangle:V\times V\rightarrow\mathbb{R} of type (p,q)\left(p,q\right) with p+q=np+q=n
2:function FindONBasis(VV)
3:  Find vVv\in V with v,v0\left\langle v,v\right\rangle\neq 0\triangleright vv exists by nondegeneracy
4:  e1v/|v,v|e_{1}\leftarrow v/\sqrt{\left|\left\langle v,v\right\rangle\right|}
5:  for k=2,,nk=2,\cdots,n do
6:   Vkspan{e1,,ek1}V_{k}\leftarrow\mathrm{span}\left\{e_{1},\cdots,e_{k-1}\right\}
7:   WkVkW_{k}\leftarrow V_{k}^{\perp} \triangleright V=VkVkV=V_{k}\oplus V_{k}^{\perp} by [35, Lemma 3.19 and 3.23]
8:   Find wkWkw_{k}\in W_{k} with wk,wk0\left\langle w_{k},w_{k}\right\rangle\neq 0\triangleright wkw_{k} exists by nondegeneracy of WkW_{k}
9:   ekwk/|wk,wk|e_{k}\leftarrow w_{k}/\sqrt{\left|\left\langle w_{k},w_{k}\right\rangle\right|}
10:  end for
11:  return {e1,,en}\left\{e_{1},\cdots,e_{n}\right\}
12:end function
Algorithm 3 Gram-Schmidt for an Indefinite Scalar Product
1:Vector space VV of finite dimension nn\in\mathbb{N}, scalar product ,:V×V\left\langle\cdot,\cdot\right\rangle:V\times V\rightarrow\mathbb{R} of type (p,q)\left(p,q\right) with p+q=np+q=n, input linearly independent vectors {v1,,vn}\left\{v_{1},\cdots,v_{n}\right\}
2:function IndefGramSchmidt({v1,,vn}\left\{v_{1},\cdots,v_{n}\right\})
3:  e1v1/|v1,v1|e_{1}\leftarrow v_{1}/\sqrt{\left|\left\langle v_{1},v_{1}\right\rangle\right|} \triangleright w.l.o.g. assume v1,v10\left\langle v_{1},v_{1}\right\rangle\neq 0
4:  for k=2,,nk=2,\cdots,n do
5:   wkvk=1k1vk,vv\displaystyle w_{k}\leftarrow v_{k}-\sum_{\ell=1}^{k-1}\left\langle v_{k},v_{\ell}\right\rangle v_{\ell} \triangleright w.l.o.g. assume wk,wk0\left\langle w_{k},w_{k}\right\rangle\neq 0 after pivoting
6:   ekwk/|wk,wk|e_{k}\leftarrow w_{k}/\sqrt{\left|\left\langle w_{k},w_{k}\right\rangle\right|}
7:  end for
8:  return {e1,,en}\left\{e_{1},\cdots,e_{n}\right\}
9:end function

In geodesic normal coordinates, the gradient DfDf takes the form

Df(x)=i=1nϵiif(x)i|xDf\left(x\right)=\sum_{i=1}^{n}\epsilon_{i}\partial_{i}f\left(x\right)\partial_{i}\big{|}_{x}

and choosing the steepest descent direction reduces to the problem

maxv1,,vn(v1)2++(vn)2=1i=1nϵiviif(x)\max_{v^{1},\cdots,v^{n}\in\mathbb{R}\atop\left(v^{1}\right)^{2}+\cdots+\left(v^{n}\right)^{2}=1}\sum_{i=1}^{n}\epsilon_{i}v^{i}\partial_{i}f\left(x\right)

of which the optimum is obviously attained at

(v1,,vn)=1i=1n(if(x))2(ϵ11f(x),,ϵnnf(x)).\left(v^{1},\cdots,v^{n}\right)=\frac{1}{\displaystyle\sum_{i=1}^{n}\left(\partial_{i}f\left(x\right)\right)^{2}}\left(\epsilon_{1}\partial_{1}f\left(x\right),\cdots,\epsilon_{n}\partial_{n}f\left(x\right)\right).

For the simplicity of statement, we introduce the notation

[X]+:=i=1nX,eiei\left[X\right]^{+}:=\sum_{i=1}^{n}\left\langle X,e_{i}\right\rangle e_{i}

for XTxMX\in T_{x}M, where {e1,,en}\left\{e_{1},\cdots,e_{n}\right\} is an orthonormal basis for the semi-Riemannian metric tensor ,\left\langle\cdot,\cdot\right\rangle on TxMT_{x}M. Using this notation, the descent direction we will choose can be written as

(13) [Df(x)]+=i=1nDf(x),eiei.-\left[Df\left(x\right)\right]^{+}=-\sum_{i=1}^{n}\left\langle Df\left(x\right),e_{i}\right\rangle e_{i}.

Note that, by [35, Lemma 3.25], with respect to an orthonormal basis {e1,,en}\left\{e_{1},\cdots,e_{n}\right\} we have in general

Df(x)=i=1nϵiDf(x),eieii=1nDf(x),eiei=[Df(x)]+Df\left(x\right)=\sum_{i=1}^{n}\epsilon_{i}\left\langle Df\left(x\right),e_{i}\right\rangle e_{i}\neq\sum_{i=1}^{n}\left\langle Df\left(x\right),e_{i}\right\rangle e_{i}=\left[Df\left(x\right)\right]^{+}

which is consistent with our previous discussion that the steepest descent direction in the semi-Riemannian setting is not Df(x)-Df\left(x\right) in general. Intuitively, the “steepest descent direction” is obtained by reversing signs of components of the gradient that “corresponds to” the negative definite subspace, and then rescale according to the induced Riemannian metric. This leads to the routine Algorithm 4 for finding descent directions.

Remark 3.5.

The definition [X]+\left[X\right]^{+} certainly depends on the choice of the orthonormal basis with respect to the semi-Riemannian metric tensor. In other words, if we choose a different orthonormal basis with respect to the same semi-Riemannian metric on TxMT_{x}M, the resulting descent direction will also be different. In practical computations, we could pre-compute an orthonormal basis for all points on the manifold, but that will complicate the proofs for convergence since the amount of descent will be uncomparable to each other across tangent vectors. A compromise is to cover the entire semi-Riemannian manifold with a chart consisting of geodesic normal neighborhoods, and extend the definition Eq. 13 from at a single point to over the geodesic normal neighborhood around each point, with the orthonormal basis given by geodesic normal frame fields [35, pp.84-85] defined over each normal neighborhood. Under suitable compactness assumptions, this construction essentially defines a Riemannian structure on the semi-Riemannian manifold by means of partition of unity and

(14) g(X,Y):=X,[Y]+=i=1nX,eiY,ei.g\left(X,Y\right):=\left\langle X,\left[Y\right]^{+}\right\rangle=\sum_{i=1}^{n}\left\langle X,e_{i}\right\rangle\left\langle Y,e_{i}\right\rangle.

The arbitrariness of the choice of geodesic normal frame fields makes this Riemannian structure non-canonical, but the bilinear form g(,)g\left(\cdot,\cdot\right) is symmetric and coercive, and can thus be used for performing steepest descent in the semi-Riemannian setting.

Algorithm 4 Finding Semi-Riemannian Descent Direction
1:function FindDescentDirection(x,M,Df(x)x,M,Df\left(x\right))
2:  {e1,,en}\left\{e_{1},\cdots,e_{n}\right\}\leftarrowFindONBasis(TxM,,)\left(T_{x}M,\left\langle\cdot,\cdot\right\rangle\right)
3:  η[Df(x)]+=i=1nDf(x),eiei\displaystyle\eta\leftarrow-\left[Df\left(x\right)\right]^{+}=-\sum_{i=1}^{n}\left\langle Df\left(x\right),e_{i}\right\rangle e_{i}
4:  return η\eta
5:end function
Remark 3.6.

For Minkowski spaces, it is easy to check that the descent direction output from Algorithm 4 coincides with f(x)-\nabla f\left(x\right) exactly. In this sense Algorithm 1 can be viewed as a generalization of the Riemannian steepest descent algorithm. In fact, the pointwise construction of positive-definite scalar products in each tangent space Eq. 12 indicates that the methodology of Riemannian manifold optimization can be carried over to settings with weaker geometric assumptions, namely, when the inner product structure on the tangent spaces need not vary smoothly from point to point. From this perspective, we can also view semi-Riemannian optimization as a type of manifold optimization with weaker geometric assumptions.

Remark 3.7.

Algorithm 1 can indeed be viewed as an instance of a more general paradigm of line-search based optimization on manifolds [42, §3]. Our choice of the descent direction in Algorithm 4 ensures that the objective function value indeed decreases, at least for sufficiently small step size, which further facilitates convergence.

Example 3.8 (Semi-Riemannian Gradient Descent for Minkowski Spaces).

Recall from Example 2.10 that the semi-Riemannian gradient of a differentiable function on Minkowski space p,q\mathbb{R}^{p,q} is Df(x)=Ip,qf(x)Df\left(x\right)=\operatorname{I}_{p,q}\nabla f\left(x\right). If we choose the standard canonical basis for p,q\mathbb{R}^{p,q}, the descent direction [Df(x)]+\left[Df\left(x\right)\right]^{+} produced by Algorithm 4 and needed for Algorithm 1 is

[Df(x)]+=InIp,qInIp,qf(x)=f(x)\left[Df\left(x\right)\right]^{+}=\operatorname{I}_{n}\cdot\operatorname{I}_{p,q}\cdot\operatorname{I}_{n}\cdot\operatorname{I}_{p,q}\nabla f\left(x\right)=\nabla f\left(x\right)

and thus the semi-Riemannian gradient descent coincides with the standard gradient descent algorithm on the Euclidean space if the standard orthonormal basis is used at every point of p,q\mathbb{R}^{p,q}. Of course, if we use a randomly generated orthonormal basis (under the semi-Riemannian metric) at each point, the semi-Riemannian gradient descent will be drastically different from standard gradient descent on Euclidean spaces; see Section 5.1 for an illustration.

When studying self-concordant barrier functions for interior point methods, a useful guiding principle is to consider the Riemannian geometry defined by the Hessian of a strictly convex self-concordant barrier function [31, 11, 41, 32]; in this setting, descent directions produced from Newton’s method can be equivalently viewed as gradients with respect to the Riemannian structure. When the barrier function is non-convex, however, the Hessians are no longer positive definite, and the Riemannain geometry is replaced with semi-Riemannian geometry. It is well known that the direction computed from Newton’s equation Eq. 2 may not always be a descent direction if 2f\nabla^{2}f is not positive definite [48, §3.3], which is consistent with our observation in this subsection that semi-Riemannian gradients need not be descent directions in general. In this particular case, our modification Eq. 13 can also be interpreted as a novel variant of the Hessian modification strategy [48, §3.4], as follows. Denote the function under consideration as f:Qf:Q\rightarrow\mathbb{R}, where QnQ\subset\mathbb{R}^{n} is a connected, closed convex subset with non-empty interior and contains no straight lines. Assume 2f\nabla^{2}f is non-degenerate on QQ, which necessarily implies that 2f\nabla^{2}f is of constant signature on QQ. At any xQx\in Q, the negative gradient of ff with respect to the semi-Riemannian metric defined by the Hessian of ff is Df(x)=[2f(x)]1f(x)-Df\left(x\right)=-\left[\nabla^{2}f\left(x\right)\right]^{-1}\nabla f\left(x\right), where f\nabla f and 2f\nabla^{2}f stand for the gradient and Hessian of ff with respect to the Euclidean geometry of QQ. Our proposed modification first finds a matrix Un×nU\in\mathbb{R}^{n\times n} satisfying

U[2f(x)]U=Ip,qU^{\top}\left[\nabla^{2}f\left(x\right)\right]U=\operatorname{I}_{p,q}

where (p,q)\left(p,q\right) is the constant signature of 2f\nabla^{2}f on QQ, and then set

(15) [Df(x)]+=UU[2f(x)]Df(x)=UUf(x)-\left[Df\left(x\right)\right]^{+}=-UU^{\top}\left[\nabla^{2}f\left(x\right)\right]Df\left(x\right)=-UU^{\top}\nabla f\left(x\right)

which is guaranteed to be a descent direction since

[f(x)][Df(x)]+=Uf(x)20.-\left[\nabla f\left(x\right)\right]^{\top}\left[Df\left(x\right)\right]^{+}=-\left\|U\nabla f\left(x\right)\right\|^{2}\leq 0.

From Eq. 15 it is evident that the semi-Riemannian descent direction [Df(x)]+-\left[Df\left(x\right)\right]^{+} is obtained from Df(x)-Df\left(x\right) by replacing the inverse Hessian with UUUU^{\top}. This is close to Hessian modification in spirit, but also drastically different from common Hessian modification techniques that adds a correction matrix to the true Hessian 2f(x)\nabla^{2}f\left(x\right); see [48, §3.4] for more detailed explanation.

3.3 Semi-Riemannian Conjugate Gradient

Using the same steepest descent directions and line search strategy, we can also adapt conjugate gradient methods to the semi-Riemannian setting. See Algorithm 5 for the algorithm description. Note that in Algorithm 5 we used the Polak-Rebière formula to determine βk\beta_{k}, but alternatives such as Hestenes-Stiefel or Fletcher-Reeves methods (see e.g. [12, §2.6] or [42]) can be easily adapted to the semi-Riemannian setting as well, since none of the major steps in Riemannian conjugate gradient algorithm relies essentially on the positive-definiteness of the metric tensor, except that the (steepest) descent direction needs to be modified according to Eq. 13. We noticed in practice that Polak-Rebière and formulae tend to be more robust and efficient than the Fletcher-Reeves formula for the choice of βk\beta_{k}, which is consistent with general observations of nonlinear conjugate gradient methods [48, §5.2].

Algorithm 5 Semi-Riemannian Conjugate Gradient (Polak-Rebière)
1:Manifold MM, objective function ff, retraction Retr\mathrm{Retr}, parallel transport PP, initial value x0Mx_{0}\in M, parameters for Linesearch, gradient DfDf and Hessian D2fD^{2}f
2:k0k\leftarrow 0
3:x0x_{0}\leftarrow Initiate
4:η0\eta_{0}\leftarrowFindDescentDirection(x0,M,Df(x0))\left(x_{0},M,Df\left(x_{0}\right)\right) \triangleright c.f. Algorithm 4
5:while not converge do
6:  0<tk0<t_{k}\leftarrow LineSearch(f,xk,ηk)\left(f,x_{k},\eta_{k}\right) \triangleright tkt_{k} is the Armijo step size
7:  xk+1Retrxk(tkηk)x_{k+1}\leftarrow\mathrm{Retr}_{x_{k}}\left(t_{k}\eta_{k}\right)
8:  ξk+1\xi_{k+1}\leftarrowFindDescentDirection(xk+1,M,Df(xk+1))\left(x_{k+1},M,Df\left(x_{k+1}\right)\right)
9:  ηk+1=ξk+1+βkPηk\eta_{k+1}=\xi_{k+1}+\beta_{k}P\eta_{k}, where \triangleright P:TxkMTxk+1MP:T_{x_{k}}\!M\rightarrow T_{x_{k+1}}\!M
βk:=max{0,Df(xk+1)P[Df(xk)],[Df(xk+1)]+Df(xk),[Df(xk)]+}\beta_{k}:=\max\left\{0,\frac{\left\langle Df\left(x_{k+1}\right)-P\left[Df\left(x_{k}\right)\right],\left[Df\left(x_{k+1}\right)\right]^{+}\right\rangle}{\left\langle Df\left(x_{k}\right),\left[Df\left(x_{k}\right)\right]^{+}\right\rangle}\right\}
10:  kk+1k\leftarrow k+1
11:end while
12:return Sequence of iterates {xk}\left\{x_{k}\right\}
Remark 3.9.

For Minkowski spaces (including Lorentzian spaces) with the standard orthonormal basis, both steepest descent and conjugate gradient methods coincide with their counterparts on standard Euclidean spaces, since they share identical descent directions, parallel-transports, and Hessians of the objective function.

Remark 3.10.

Algorithm 5 can also be applied to self-concordant barrier functions for interior point methods, when the objective function is not necessarily strictly convex but has non-degenerate Hessians. In this context, where the semi-Riemannian metric tensor is given by the Hessian of the objective function, Algorithm 5 can be viewed as a hybrid of Newton and conjugate gradient methods, in the sense that the “steepest descent directions” are determined by the Newton equations but the actual descent directions are combined using the methodology of conjugate gradient methods. To the best of our knowledge, such a hybrid algorithm has not been investigated in existing literature.

3.4 Metric Independence of Second Order Methods

In this subsection we consider two prototypical second-order optimization methods on semi-Riemannian manifolds, namely, Newton’s method and trust region method. Surprisingly, both methods turn out to produce descent directions that are independent of the choice of scalar products on tangent spaces. We give a geometric interpretation of this independence from the perspective of jets in Section 3.4.2.

3.4.1 Semi-Riemannian Newton’s Method

As an archetypal second-order method, Newton’s method on Riemannian manifolds has already been developed in detail in the early literature of Riemannian optimization [2, Chap 6]. The rationale behind Newton’s method is that the first order stationary points of a differentiable function f:Mf:M\rightarrow\mathbb{R} are in one-to-one correspondence with the minimum of f2=f,f\left\|\nabla f\right\|^{2}=\left\langle\nabla f,\nabla f\right\rangle when the metric is positive-definite (i.e., when MM is a Riemannian manifold). Thus by choosing the direction VV to satisfy the Newton equation Vf=f\nabla_{V}\nabla f=-\nabla f we ensure that VV is a descent direction

Vf,f=2Vf,f=2f,f=2f2V\left\langle\nabla f,\nabla f\right\rangle=2\left\langle\nabla_{V}\nabla f,\nabla f\right\rangle=-2\left\langle\nabla f,\nabla f\right\rangle=-2\left\|\nabla f\right\|^{2}

and the right hand side is strictly negative as long as f0\nabla f\neq 0. The main difficulty in generalizing this procedure to the semi-Riemannian setting is similar with the difficulty we faced in Section 3.2: when the metric is indefinite, Df=0Df=0 has nothing to do with Df=0\left\|Df\right\|=0, and thus one can no longer find the stationary points of ff by minimizing Df2\left\|Df\right\|^{2}. The approach we’ll adopt to fix this issue is also similar to that in Section 3.2: instead of minimizing Df(x),Df(x)\left\langle Df\left(x\right),Df\left(x\right)\right\rangle, we will focus on the coercive bilinear form Df(x),[Df(x)]+\left\langle Df\left(x\right),\left[Df\left(x\right)\right]^{+}\right\rangle.

Let E1,,EnE_{1},\cdots,E_{n} be a local geodesic normal coordinate frame centered at xMx\in M, i.e. for any 1i,jn1\leq i,j\leq n

Ei(x),Ej(x)=ϵiδij,EiEj(x)=0.\left\langle E_{i}\left(x\right),E_{j}\left(x\right)\right\rangle=\epsilon_{i}\delta_{ij},\quad\nabla_{E_{i}}E_{j}\left(x\right)=0.

Then we have

(16) Df(x),[Df(x)]+=i=1n|Df(x),Ei(x)|2\left\langle Df\left(x\right),\left[Df\left(x\right)\right]^{+}\right\rangle=\sum_{i=1}^{n}\left|\left\langle Df\left(x\right),E_{i}\left(x\right)\right\rangle\right|^{2}

and thus for any tangent vector VTxMV\in T_{x}M we have

V\displaystyle V Df(x),[Df(x)]+=2i=1nDf(x),Ei(x)VDf(x),Ei(x)\displaystyle\left\langle Df\left(x\right),\left[Df\left(x\right)\right]^{+}\right\rangle=2\sum_{i=1}^{n}\left\langle Df\left(x\right),E_{i}\left(x\right)\right\rangle V\left\langle Df\left(x\right),E_{i}\left(x\right)\right\rangle
=2i=1nDf(x),Ei(x)[DVDf(x),Ei(x)+Df(x),DVEi(x)]\displaystyle=2\sum_{i=1}^{n}\left\langle Df\left(x\right),E_{i}\left(x\right)\right\rangle\left[\left\langle D_{V}Df\left(x\right),E_{i}\left(x\right)\right\rangle+\left\langle Df\left(x\right),D_{V}E_{i}\left(x\right)\right\rangle\right]
=2i=1nDf(x),Ei(x)DVDf(x),Ei(x)\displaystyle=2\sum_{i=1}^{n}\left\langle Df\left(x\right),E_{i}\left(x\right)\right\rangle\left\langle D_{V}Df\left(x\right),E_{i}\left(x\right)\right\rangle

where in the last equality we used the fact that EiEj(x)=0\nabla_{E_{i}}E_{j}\left(x\right)=0 for all 1i,jn1\leq i,j\leq n. Therefore, as long as we pick VV to satisfy Newton’s equation

(17) [D2f(x)](V)=DVDf(x)=Df(x)\left[D^{2}f\left(x\right)\right]\left(V\right)=D_{V}Df\left(x\right)=-Df\left(x\right)

we can ensure decrease in the value of Eq. 16. In other words, we can obtain a descent direction for semi-Riemannian optimization using the same Newton’s equation as for Riemannian optimization, with the only difference that Riemannian gradient and Hessian get replaced with their semi-Riemannian counterparts.

Algorithm 6 Semi-Riemannian Newton’s Method
1:Manifold MM, objective function ff, retraction Retr\mathrm{Retr}, initial value x0Mx_{0}\in M, parameters for Linesearch, gradient DfDf and Hessian D2fD^{2}f
2:while not converge do
3:  Obtain the descent direction by solving the Newton equation
[D2f(xk)](ηk)=Df(xk)\left[D^{2}f\left(x_{k}\right)\right]\left(\eta_{k}\right)=-Df\left(x_{k}\right)
4:  0<tk0<t_{k}\leftarrow LineSearch(f,xk,ηk)\left(f,x_{k},\eta_{k}\right) \triangleright tkt_{k} is the Armijo step size
5:  xk+1Retrxk(tkηk)x_{k+1}\leftarrow\mathrm{Retr}_{x_{k}}\left(t_{k}\eta_{k}\right)
6:  kk+1k\leftarrow k+1
7:end while
8:return Sequence of iterates {xk}\left\{x_{k}\right\}

Given that our semi-Riemannian Newton’s method builds upon the “Riemannian surrogate” Eq. 16, it is not surprising that the semi-Riemannian Newton’s method reduces to the ordinary Newton’s method on Minkowski spaces, and the geodesics and parallel-transports stays the same as their Riemannian counterparts (i.e. when the scalar product is positive definite). This is best illustrated in the following calculation.

Example 3.11 (Semi-Riemannian Newton’s Method for Minkowski Spaces).

Recalling the definitions of semi-Riemannian gradient and Hessians from Example 2.10, the descent direction ηk\eta_{k} needed in Algorithm 6 is determined by

Ip,q2f(xk)ηk=Ip,qf(xk)ηk=[2f(xk)]1f(xk)\operatorname{I}_{p,q}\nabla^{2}f\left(x_{k}\right)\eta_{k}=-\operatorname{I}_{p,q}\nabla f\left(x_{k}\right)\quad\Leftrightarrow\quad\eta_{k}=-\left[\nabla^{2}f\left(x_{k}\right)\right]^{-1}\nabla f\left(x_{k}\right)

for all k=0,1,2,k=0,1,2,\cdots. This calculation made it clear that the semi-Riemannian Newton’s method coincides with the standard Newton’s method.

The metric independence demonstrated in Example 3.11 reflects a more general phenomenon of metric independence in Newton’s method as formulated in [41, §1.6]. Though the discussion in phenomenon of metric independence in Newton’s method as formulated in [41, §1.6] is restricted to the Riemannian case (scalar product required to be positive definite), it is straightforward to see that the metric independence persists under non-degenerate change of semi-Riemannian structures. In fact, if we denote J(xk)J\left(x_{k}\right) for the Jacobian matrix of a non-degenerate coordinate transformation at xkx_{k}, it is straightforward to check from the coordinate expressions of semi-Riemannian gradient and Hessian Eq. 8 that the Newton equation Eq. 17 in the new coordinate system takes the form J(xk)[D2f(xk)](V)=J(xk)f(xk)J\left(x_{k}\right)\left[D^{2}f\left(x_{k}\right)\right]\left(V\right)=-J\left(x_{k}\right)\nabla f\left(x_{k}\right), which yields the same descent direction as Eq. 17. In the Riemannian regime, this metric independence is often attributed to the fact that second-order approximation is independent of inner products (see e.g. [41, §1.6]); we provide a general and unified differential geometric interpretation of this independence in terms of jets in Section 3.4.2.

3.4.2 Jets and the Metric Independence of Trust Region Method

It is well known that first-order and Newton’s methods suffer from various drawbacks from a numerical optimization methods, such as slow local convergence and/or prohibitive computational cost in determining the descent direction. It is thus argued (c.f. [2], [1]) that it could be more efficient to consider successive optimization of local models of the cost function on the domain of the problem. Trust region methods, which considers quadratic local models through approximate Taylor expansions of the cost function, fall into this category (see e.g. [48] and the references therein). This methodology has also been generalized to Riemannian manifolds for manifold optimization [1, 2, 19, 18]. In a nutshell, at each point xMx\in M the Riemannian trust-region method strives to find the descent direction by solving locally the quadratic optimization problem on the tangent plane TxMT_{x}M:

(18) minηTxMηΔ0mx(η)=f(x)+f(x),η+12[2f(x)](η,η)\min_{\eta\in T_{x}M\atop\left\|\eta\right\|\leq\Delta_{0}}m_{x}\left(\eta\right)=f\left(x\right)+\left\langle\nabla f\left(x\right),\eta\right\rangle+\frac{1}{2}\left[\nabla^{2}f\left(x\right)\right]\left(\eta,\eta\right)

where ,\left\langle\cdot,\cdot\right\rangle is the inner product specified by the Riemannian metric tensor, \left\|\cdot\right\| is the induced norm, and Δ0\Delta_{0} is the radius of the trust region which is updated through the iterations according to certain technical criteria (e.g. the geometry of the manifold, the approximation quality of the local model, etc.).

When generalizing trust region methods to semi-Riemannian optimization, again we are faced with the difficulties for the other methods discussed previously, such as the non-compactness of the “metric ball” of bounded radius Δ0>0\Delta_{0}>0. This can be resolved by introducing a positive definite inner product accompanying the indefinite metric tensor as in Section 3.2 and Section 3.4.1, then restrict the search for the descent direction to a bounded domain defined by the norm induced from the inner product. Denoting +\left\|\cdot\right\|_{+} for the induced norm on TxMT_{x}M, the local quadratic optimization problem in the semi-Riemannian setting can be written as

(19) minηTxMη+Δ0mxsemi(η)=f(x)+Df(x),η+12[D2f(x)](η,η).\min_{\eta\in T_{x}M\atop\left\|\eta\right\|_{+}\leq\Delta_{0}}m^{\textrm{semi}}_{x}\left(\eta\right)=f\left(x\right)+\left\langle Df\left(x\right),\eta\right\rangle+\frac{1}{2}\left[D^{2}f\left(x\right)\right]\left(\eta,\eta\right).

We argue that this local quadratic model coincides with the Riemannian model Eq. 18 with the (frame-field-dependent) Riemannian structure Eq. 14. In fact, the verification is straightforward by picking geodesic normal coordinate systems under the Riemannian and semi-Riemannian metric (which ensures the Christoffel symbols vanish at xx) and a change-of-coordinate argument as in the proof of Proposition 2.8, together with the coordinate expressions Eq. 8. This implies that a trust region method based on Eq. 19 for the semi-Riemannian manifold MM can be interpreted and analyzed using more or less the same techniques in existing literature of Riemannian trust region methods. The only subtlety here is the frame dependence of locality of the Riemannian structures accompanying the semi-Riemannian metric; nevertheless, this technicality can be resolved by noticing the direct dependence of the local Riemannian structure with the smooth semi-Riemannian structure.

The argument we gave in this section can be carried out to establish the “metric independence” of trust region methods on manifolds. While it is certainly desirable to pick a metric on the manifold so as to enable numerical implementations of the optimization algorithms, at the end of the day the only influence of the metric enters the trust region methods through choosing the size Δ0\Delta_{0} of the trust region, which eventually does not matter after the region radius update rules are carried out (which ultimately depends on the value distribution of the cost function only). One geometric explanation for this phenomenon is through the notion of jets (see e.g. [43, 47, 36]), which characterizes the manifold analogy of “polynomial approximation” for smooth functions. Though the formal invarance of under change of coordinates breaks down for derivatives greater than or equal to the second order, it turns out that one can define equivalence classes of “Taylor polynomial expansion modulo higher order terms” by the matching of a fixed number of lower order derivatives at a fixed point. More concretely, consider an arbitrary point qMq\in M and denote (U,(x1,,xd))\left(U,\left(x^{1},\cdots,x^{d}\right)\right) for a coordinate system around qq, and assume without loss of generality that xj(q)=0x^{j}\left(q\right)=0 for all j=1,,dj=1,\cdots,d. By a direct calculate, one can verify that the second order Taylor expansion

(20) f(x)=f(0)+xiif(0)+12xjxkjk2f(0)+O(x3),xUf\left(x\right)=f\left(0\right)+x^{i}\partial_{i}f\left(0\right)+\frac{1}{2}x^{j}x^{k}\partial^{2}_{jk}f\left(0\right)+O\left(\left\|x\right\|^{3}\right),\quad x\in U

is formally preserved under change of coordinates up to cubic polynomials. This indicates that, as long as we interpret the big-OO notation in Eq. 20 as containing not only “metrically” O(x3)O\left(\left\|x\right\|^{3}\right) terms (characterized by the local smooth structure or the metric tensor thereof) but also polynomials of degree 3\geq 3 in the components of xdx\in\mathbb{R}^{d}, then the expansion Eq. 20 makes sense geometrically as an element in the polynomial ring modulo ideals generated by cubic polynomials. (In fact, for fixed kk\in\mathbb{N}, the union of kk-jets over all points on the manifold form a fibre bundle often referred to as a jet bundle.) For the purpose of trust region methods this equivalence relation suffices for specifying local models, as equivalent polynomials (as the same jet) give rise to local models of the same order (see e.g. [2, Proposition 7.1.3]). It then follows that, for distinct Riemannian or semi-Riemannian metrics on the same smooth manifold and under geodesic normal coordinates chosen respectively with respect to the metric structures, the local models Eqs. 18 and 19 correspond to the same jet and will metrically differ from each other in terms of cubic geodesic distances only, whenever the metrics involved are all Riemannian. When at least one of the metric tensors involved is semi-Riemannian, the metric comparison has to be carried out with extra caution (e.g. with respect to the metric structure induced by another Riemannian structure) since coordinate polynomials are no longer bounded by “semi-Riemannian norms” of the same order, again due to the indefiniteness of the semi-Riemannian metric tensor.

4 Semi-Riemannian Optimization on Submanifolds

Submanifolds of Euclidean spaces are most often encountered in practical applications of manifold optimization. A key difference between Riemannian and semi-Riemannian geometry is that the non-degeneracy of the metric tensor can not be inherited by sub-manifolds as easily from semi-Riemannian ambient manifolds: for a submanifold XX of MM, any Riemannian metric on MM induces a Rimannian metric on XX since gg is positive definite at every point xXx\in X, but a semi-Rimannian metric on MM could become degenerate when restrict to XX; this degeneracy is the main obstruction to finding a well-defined “orthogonal projection” which is essential for (i) relating gradients on the manifold with gradients in the ambient space, and (ii) defining geodesics on submanifolds. Semi-Riemannian manifolds with degeneracy are of interest to the theory of general relativity and mathematical physics; see [22, 23, 24, 25, 45] and the references therein. This section provides some characterization of degenerate semi-Riemannian manifolds (see Definition 4.3) in terms of their degenerate bundles (see Definition 4.4). The goal is to identify non-degenerate semi-Riemannian submanifolds of Minkowski spaces for which our algorithmic framework in Section 3 applies. As demonstrated in the computation in this section and Appendix B, unfortunately, semi-Riemannian structures inherited from the ambient Minkowski space are degenerate for most matrix Lie groups. Nonetheless, many interesting hypersurfaces (co-dimension one submanifolds) of Minkowski spaces admit non-degenerate induced semi-Riemannian structures, or degenerate ones but with degeneracy contained in a set of measure zero; the semi-Riemannian optimization framework introduced in Section 3 applies seamlessly to these examples, some of which we illustrate in Section 5.

4.1 Degeneracy of Semi-Riemannian Submanifolds

Theories of Riemannian and semi-Riemannian geometry build upon the non-degeneracy of metric tensors. However, physical models of spacetime renders itself naturally to the occurrence of singularities, as pointed out in general relativity [37, 15, 17, 16]. A lot of work in semi-Riemannian geometry are thus devoted to the development of singular semi-Riemannian geometry — the geometry of semi-Riemannian manifolds with degeneracy in their metric tensors, either with constant signature [22, 23, 24] or more generally, with possibly variable signature [25, 45]. In special cases such as null hypersurfaces of Lorentzian manifolds, specific techniques such as rigging [14, 6] have been developed, but generalizing these special constructions to other degenerate semi-Riemannian submanifolds is much less straightforward, if possible at all. For the simplicity of exposition, we’ll confine our discussion to the constant signature scenario regardless of whether singularities occur.

Definition 4.1.

A symmetric bilinear form ,:V×V\left\langle\cdot,\cdot\right\rangle:V\times V\rightarrow\mathbb{R} on a vector space VV is said to have signature (κ,ν,π)\left(\kappa,\nu,\pi\right) if the maximum positive definite subspace is of dimension π0\pi\in\mathbb{Z}_{\geq 0}, the maximum negative definite subspace is of dimension ν0\nu\in\mathbb{Z}_{\geq 0}, and the dimension of the degenerate subspace with respect to this bilinear form

V:={vVv,u=0uV}V^{\perp}:=\left\{v\in V\mid\left\langle v,u\right\rangle=0\quad\forall u\in V\right\}

is of dimension κ0\kappa\in\mathbb{Z}_{\geq 0}. A vector 0vV0\neq v\in V is said to be (1) degenerate if vVv\in V; (2) null if v,v=0\left\langle v,v\right\rangle=0 but vVv\notin V^{\perp}; (3) timelike if v,v<0\left\langle v,v\right\rangle<0; (4) spacelike if v,v>0\left\langle v,v\right\rangle>0.

Definition 4.2.

Let WW be a subspace of a vector space VV equipped with a bilinear form ,:V×V\left\langle\cdot,\cdot\right\rangle:V\times V\rightarrow\mathbb{R}. Denote (κ,ν,π)\left(\kappa,\nu,\pi\right) for the type of the bilinear form on WW obtained from restricting ,\left\langle\cdot,\cdot\right\rangle to WW. We say that WW is (1) degenerate if κ1\kappa\geq 1; (2) nondegenerate if κ=0\kappa=0; (3) timelike if κ=0\kappa=0 and ν1\nu\geq 1; (4) spacelike if κ=0\kappa=0, ν=0\nu=0, and π1\pi\geq 1.

Definition 4.3 (Degenerate Semi-Riemannian Manifolds).

A degenerate semi-Riemannian manifold is a smooth manifold equipped with a possibly degenerate (0,2)\left(0,2\right) tensor field. This tensor field will be referred to as the degenerate metric tensor of the degenerate semi-Riemannian manifold; the signature of the degenerate metric tensor will also be referred to as the signature of the manifold when no confusion exists. Unless otherwise specified, the degenerate metric tensor is of constant signature in the rest of this paper.

When the context is clear, we will occasionally omit the adjective “degenerate” when referring to degenerate semi-Riemannian manifolds and the degenerate metric tensor on it, since non-degenerate semi-Riemannian manifolds are special cases of degenerate ones with κ=0\kappa=0.

Definition 4.4 (Degenerate Bundle, [24] Definition 3.1).

The degenerate bundle of a (possibly degenerate) semi-Riemannian manifold (M,,)\left(M,\left\langle\cdot,\cdot\right\rangle\right) is defined as the distribution

(21) M:=xM{uTxMu,v=0vTxM}.M^{\perp}:=\bigcup_{x\in M}\left\{u\in T_{x}M\mid\left\langle u,v\right\rangle=0\quad\forall v\in T_{x}M\right\}.

We say MM is integrable if the distribution MM^{\perp} is integrable. We denote by MxM^{\perp}_{x} the linear space {uTxM:u,v=0,vTxM}\{u\in\operatorname{T}_{x}M:\langle u,v\rangle=0,~\forall v\in\operatorname{T}_{x}M\} and we call the set of point xMx\in M such that Mx{0}M_{x}^{\perp}\neq\{0\} the degenerate locus of MM.

As in the setup of Riemannian manifold optimization, for practice it is of primary interest to understand the differential geometry of submanifolds of an ambient manifold for which most differential geometric quantities can be characterized explicitly. In the context of semi-Riemannian geometry, a first technical subtlety with the notion of “semi-Riemannian submanifolds” is that the induced semi-Riemannian metric tensor may well suffer from certain degeneracy even when the ambient semi-Riemannian geometry is non-degenerate. The main difficulty lies at the non-existence of a canonical “orthogonal projection” from the ambient to the submanifold tangent spaces — this complicates the definitions of normal bundles, second fundamental forms, as well as extrinsic characterizations of intrinsic geometric concepts such as affine connections, geodesics, and parallel-translates. For instance, it is well-known that covariant derivatives on a semi-Riemannian submanifold can be obtained from calculating the covariant derivatives on the ambient semi-Riemannian manifold and then projecting the result to the tangent spaces of the submanifold (see e.g. [35, Chapter 4, Lemma 3]), but this characterization breaks down if the projection operator can not be properly defined. In fact, on a degenerate semi-Riemannian manifold there does not exist in general a semi-Riemannian analogue of the Levi-Civita (metric-compatible and torsion-free) connection, even for a degenerate semi-Riemannian submanifold of a non-degenerate semi-Riemannian manifold. Such an analogue, if exists, is called a Koszul derivative of the degenerate semi-Riemannian manifold; a semi-Riemannian manifold admitting a Koszul derivative is called a singular semi-Riemannian manifold in [22, 23, 24]. In general, a singular semi-Riemannian manifold MM admits more than one Koszul derivatives, and any two Koszul derivatives on MM differ from each other by a map from Γ(TM)×Γ(TM)\Gamma\left(TM\right)\times\Gamma\left(TM\right) to the degenerate bundle MM^{\perp}; see e.g. [22, Proposition 3.5]. Note that though it is tempting to define a connection on a degenerate semi-Riemannian manifold through the Koszul formula Eq. 5, the formula defines a Koszul derivative if and only if the metric tensor is Lie parallel along all sections of the degenerate bundle ([24, Theorem 3.4]). Another useful (necessary but insufficient) criterion for the existence of a Koszul derivative on a degenerate semi-Riemannian manifold is the integrability the degenerate bundle: as shown in [24, Corollary 3.6], if a semi-Riemannian manifold MM admits a Koszul derivative, then MM^{\perp} is integrable.

A large class of examples of semi-Riemannian manifolds commonly encountered in scientific computation are matrix Lie groups. They admit semi-Riemannian structures of arbitrary signature since tangent bundles of Lie groups are trivial. For instance, it is straightforward to verify that the semi-Riemannian structure on n×n\mathbb{R}^{n\times n} specified in Example 2.6 induces a non-degenerate semi-Riemannian structure on the general linear group GL(n,)\mathrm{GL}\left(n,\mathbb{R}\right), though non-degeneracy becomes evident for almost all interesting matrix subgroups of GL(n,)\mathrm{GL}\left(n,\mathbb{R}\right). We demonstrate the ubiquity of such degeneracy in the following two examples; more examples of matrix Lie groups are deferred to Section B.2.

Example 4.5 (Indefinite Orthogonal Group).

Let n=p+q\mathbb{N}\ni n=p+q, 0pn0\leq p\leq n, and p,qp,q\in\mathbb{N}. Define the indefinite orthogonal group of signature (p,q)\left(p,q\right) as

(22) O(p,q):={An×nAIp,qA=Ip,q}O\left(p,q\right):=\left\{A\in\mathbb{R}^{n\times n}\mid A^{\top}\operatorname{I}_{p,q}A=\operatorname{I}_{p,q}\right\}

where Ip,q\operatorname{I}_{p,q} is defined in Example 2.6. The Lie algebra of this Lie group can be easily verified as

(23) 𝔬(p,q){Xn×n:XIp,q+Ip,qX=0}.\mathfrak{o}(p,q)\coloneqq\left\{X\in\mathbb{R}^{n\times n}:X^{\top}\operatorname{I}_{p,q}+\operatorname{I}_{p,q}X=0\right\}.

The tangent space at an arbitrary AO(p,q)A\in O\left(p,q\right) is thus

(24) TAO(p,q)\displaystyle T_{A}O\left(p,q\right) ={AXXn×n,XIp,q+Ip,qX=0}\displaystyle=\left\{AX\mid X\in\mathbb{R}^{n\times n},X^{\top}\operatorname{I}_{p,q}+\operatorname{I}_{p,q}X=0\right\}
={AIp,qYYn×n,Y+Y=0}.\displaystyle=\left\{A\operatorname{I}_{p,q}Y\mid Y\in\mathbb{R}^{n\times n},Y^{\top}+Y=0\right\}.

Equipping 𝔬(p,q)\mathfrak{o}\left(p,q\right) with bilinear form specified in Example 2.6, the Lie group structure on O(p,q)O\left(p,q\right) induces a left-invariant semi-Riemannian metric on O(p,q)O\left(p,q\right) by

(25) X,YA:=\displaystyle\left\langle X,Y\right\rangle_{A}= Tr((A1X)Ip,qA1Y)X,YTAO(p,q)\displaystyle\mathrm{Tr}\left(\left(A^{-1}X\right)^{\top}\operatorname{I}_{p,q}A^{-1}Y\right)\quad\forall X,Y\in T_{A}O\left(p,q\right)
=\displaystyle= Tr(XIp,qY)since AIp,qA=Ip,q.\displaystyle\mathrm{Tr}\left(X^{\top}\operatorname{I}_{p,q}Y\right)\qquad\textrm{since $A^{\top}\operatorname{I}_{p,q}A=I_{p,q}$.}

For the ease of notation, we shall drop the sub-script AGA\in G unless there is a potential risk of confusion. This semi-Riemannian metric will be referred to as the natural semi-Riemannian metric on O(p,q)O\left(p,q\right). The degenerate bundle of this semi-Riemannian structure can be easily determined as follows. Let Δn×n\Delta\in\mathbb{R}^{n\times n} be a skew-symmetric matrix such that AIp,qΔTAO(p,q)A\operatorname{I}_{p,q}\Delta\in T_{A}O\left(p,q\right) for an arbitrary AO(p,q)A\in O\left(p,q\right). Setting

0=Tr(XAIp,qA1Δ)=Tr(XIp,qΔ)Xn×n,X+X=00=\mathrm{Tr}\left(X^{\top}A^{-\top}\operatorname{I}_{p,q}A^{-1}\Delta\right)=\mathrm{Tr}\left(X^{\top}\operatorname{I}_{p,q}\Delta\right)\quad\forall X\in\mathbb{R}^{n\times n},X^{\top}+X=0

we have that Ip,qΔ\operatorname{I}_{p,q}\Delta must be symmetric, i.e.

(26) Ip,qΔ=ΔIp,q(=ΔIp,q)\operatorname{I}_{p,q}\Delta=\Delta^{\top}\operatorname{I}_{p,q}(=-\Delta\operatorname{I}_{p,q})

Writing Δ\Delta in the partitioned form

Δ=[Δ1Δ2Δ2Δ3]\Delta=\begin{bmatrix}\Delta_{1}&\Delta_{2}\\ -\Delta_{2}^{\top}&\Delta_{3}\end{bmatrix}

where

Δ1p×p,Δ2p×q,Δ3q×q\Delta_{1}\in\mathbb{R}^{p\times p},\quad\Delta_{2}\in\mathbb{R}^{p\times q},\quad\Delta_{3}\in\mathbb{R}^{q\times q}

satisfying

Δ1+Δ1=0,Δ3+Δ3=0.\Delta_{1}+\Delta_{1}^{\top}=0,\,\,\Delta_{3}+\Delta_{3}^{\top}=0.

Plugging this partitioned form into Eq. 26 gives

Δ1=0,Δ3=0\Delta_{1}=0,\quad\Delta_{3}=0

from which it follows that the degenerate bundle of O(p,q)O\left(p,q\right) takes the form

O(p,q)\displaystyle O\left(p,q\right)^{\perp} =AO(p,q){AIp,q[Δ2Δ2]|Δ2p×q}\displaystyle=\bigcup_{A\in O\left(p,q\right)}\left\{A\operatorname{I}_{p,q}\begin{bmatrix}&\Delta_{2}\\ -\Delta_{2}^{\top}&\end{bmatrix}\,\Bigg{|}\,\Delta_{2}\in\mathbb{R}^{p\times q}\right\}
=AO(p,q){A[Δ2Δ2]|Δ2p×q}\displaystyle=\bigcup_{A\in O\left(p,q\right)}\left\{A\begin{bmatrix}&\Delta_{2}\\ \Delta_{2}^{\top}&\end{bmatrix}\,\Bigg{|}\,\Delta_{2}\in\mathbb{R}^{p\times q}\right\}

In particular, this indicates that the natural semi-Riemannian structure on O(p,q)O\left(p,q\right) is degenerate. By checking at the identity it is clear that [𝔬(p,q),𝔬(p,q)]𝔬(p,q)\left[\mathfrak{o}\left(p,q\right),\mathfrak{o}\left(p,q\right)\right]\nsubseteq\mathfrak{o}\left(p,q\right), hence the degenerate bundle O(p,q)O\left(p,q\right)^{\perp} is not integrable. It then follows from [24, Corollary 3.6] that O(p,q)O\left(p,q\right) equipped with the natural semi-Riemannian metric does not admit a Koszul derivative.

Example 4.6 (Orthogonal Group).

Let n=p+q\mathbb{N}\ni n=p+q, 0pn0\leq p\leq n, and p,qp,q\in\mathbb{N}. The manifold structure on the orthogonal group O(n)O\left(n\right) is well-known:

O(n)\displaystyle O\left(n\right) ={An×nAA=AA=In}\displaystyle=\left\{A\in\mathbb{R}^{n\times n}\mid AA^{\top}=A^{\top}A=I_{n}\right\}
𝔬(n)\displaystyle\mathfrak{o}\left(n\right) ={Xn×nX+X=0}\displaystyle=\left\{X\in\mathbb{R}^{n\times n}\mid X+X^{\top}=0\right\}
TAO(n)\displaystyle T_{A}O\left(n\right) ={AXXn×n,X+X=0}=A𝔬(n),AO(n).\displaystyle=\left\{AX\mid X\in\mathbb{R}^{n\times n},X+X^{\top}=0\right\}=A\mathfrak{o}\left(n\right),\quad\forall A\in O\left(n\right).

Equip O(n)O\left(n\right) with the same left-invariant semi-Riemannian metric as in Example 4.5:

U,VA:=Tr((A1U)Ip,qA1V)U,VTAO(n).\left\langle U,V\right\rangle_{A}:=\mathrm{Tr}\left(\left(A^{-1}U\right)^{\top}\operatorname{I}_{p,q}A^{-1}V\right)\quad\forall U,V\in T_{A}O\left(n\right).

In this example, again the semi-Riemannian metric is degenerate. In fact, by a similar argument as in Example 4.5 one has

(27) O(n)=AO(p,q){A[Δ2Δ2]|Δ2p×q}O\left(n\right)^{\perp}=\bigcup_{A\in O\left(p,q\right)}\left\{A\begin{bmatrix}&\Delta_{2}\\ -\Delta_{2}^{\top}&\end{bmatrix}\,\Bigg{|}\,\Delta_{2}\in\mathbb{R}^{p\times q}\right\}

and again, O(n)O\left(n\right)^{\perp} is not integrable. In fact, one can also easily verify that the involution [O(n),O(n)]\left[O\left(n\right)^{\perp},O\left(n\right)^{\perp}\right] is orthogonal to O(n)O\left(n\right)^{\perp} with respect to the natural Riemannian (not semi-Riemannian!) metric on O(n)O\left(n\right). Again, from [24, Corollary 3.6] we know that O(n)O\left(n\right) with semi-Riemannian metric Eq. 27 does not admit a Koszul derivative.

Remark 4.7.

Example 4.6 is a special case of a more general practice: one can equip O(p,q)O\left(p,q\right) with a semi-Riemannian structure with Ip,q\operatorname{I}_{p,q} replaced with Ip,q\operatorname{I}_{p^{\prime},q^{\prime}} in Example 4.5, where p+q=p+qp+q=p^{\prime}+q^{\prime} but ppp\neq p^{\prime} and qqq\neq q^{\prime}. Again this is due to the triviality of the tangent bundle of the Lie group O(p,q)O\left(p,q\right) for any integers pp and qq.

Remark 4.8.

It is natural to ask at this point whether a given manifold of interest, such as O(p,q)\operatorname{O}\left(p,q\right) or O(n)\operatorname{O}\left(n\right), admits a semi-Riemannian structure of a particular type for which a Koszul derivative exists. We are not aware of general results of this sort. Some related work (e.g. [34, 7, 4]) have been devoted to the existence of left-invariant Lorentz metrics satisfying certain curvature sign conditions, following the seminal work of Milnor [29]. Bi-invariant semi-Riemannian metrics on Lie groups have also been widely explored since the 1910s; see [30, §1.4] for a brief survey.

Remark 4.9.

Though the notion of orthogonality breaks down for degenerate semi-Riemannian submanifolds, the tangent bundle of any semi-Riemannian manifold of type (κ,ν,π)\left(\kappa,\nu,\pi\right) admits a direct sum decomposition TM=MHT\!M=M^{\perp}\oplus H, where HH is a sub-bundle of TMT\!M with rank ν+π\nu+\pi. In this case, the restriction of the semi-Riemannian metric on HH gives rise to a non-degenerate semi-Riemannian metric of type (0,ν,π)\left(0,\nu,\pi\right). We will fully leverage this partial non-degeneracy in the semi-Riemannian optimization algorithm presented in this paper.

4.1.1 Gradient and Hessian of Submanifolds of Minkowski Spaces

When MM is a non-degenerate semi-Riemannian submanifold of a Minkowski space p+q\mathbb{R}^{p+q}, gradient and Hessian of a twice differentiable function ff on MM can be computed explicitly from the gradient and Hessian of ff on the ambient Minkowski space p+q\mathbb{R}^{p+q}, thanks to the non-degeneracy which ensures for any xMx\in M that the tangent space TxMT_{x}M has an orthogonal complement in p+q\mathbb{R}^{p+q}, and thus DfDf and D2fD^{2}f on p+q\mathbb{R}^{p+q} can be orthogonally projected onto TxMT_{x}M. Specifically, the same argument as in [2, §3.6.1] indicates that the semi-Riemannian gradient of ff on MM is exactly the orthogonal projection to the tangent space of MM of the semi-Riemannian gradient of ff as a function defined on the Minkowski space p+q\mathbb{R}^{p+q}; a similar argument yields the fact that the Hessian of ff is the composition of the Hessian of ff on the Minkowski space p+q\mathbb{R}^{p+q} composed with the orthogonal projection from p+q\mathbb{R}^{p+q} to the tangent space of MM.

Example 4.10 (Euclidean Sphere in Minkowski Spaces).

Consider the standard Euclidean sphere

𝕊p+q1={xp,qx12++xp+q2=1}\mathbb{S}^{p+q-1}=\left\{x\in\mathbb{R}^{p,q}\mid x_{1}^{2}+\cdots+x_{p+q}^{2}=1\right\}

as a submanifold of p,q\mathbb{R}^{p,q}, the Minkowski space equipped with inner product Ip,qp+q\operatorname{I}_{p,q}\in\mathbb{R}^{p+q} as defined in Example 2.5. For any x𝕊p+q1x\in\mathbb{S}^{p+q-1}, the tangent space Tx𝕊p+q1T_{x}\mathbb{S}^{p+q-1} can be specified as

Tx𝕊p+q1={vp,qvx=v,Ip,qx=0}T_{x}\mathbb{S}^{p+q-1}=\left\{v\in\mathbb{R}^{p,q}\mid v^{\top}x=\left\langle v,\operatorname{I}_{p,q}x\right\rangle=0\right\}

and thus the projection from p,q\mathbb{R}^{p,q} to Tx𝕊p+q1T_{x}\mathbb{S}^{p+q-1} is

Px(v):=vv,Ip,qxIp,qx,Ip,qxIp,qx=vvxxIp,qxIp,qx,xIp,qx0.P_{x}\left(v\right):=v-\frac{\left\langle v,\operatorname{I}_{p,q}x\right\rangle}{\left\langle\operatorname{I}_{p,q}x,\operatorname{I}_{p,q}x\right\rangle}\operatorname{I}_{p,q}x=v-\frac{v^{\top}x}{x^{\top}\operatorname{I}_{p,q}x}\operatorname{I}_{p,q}x,\quad\forall x^{\top}\operatorname{I}_{p,q}x\neq 0.

For x𝕊p+q1x\in\mathbb{S}^{p+q-1} with xIp,qx=0x^{\top}\operatorname{I}_{p,q}x=0, the projection operator PxP_{x} is not defined since xx is a null vector. Nevertheless, this occurs only for a set of measure zero on 𝕊p+q1\mathbb{S}^{p+q-1}, which means they almost never occur in practice. In our numerical experiments on 𝕊p+q1\mathbb{S}^{p+q-1} (see Section 5.2), we just randomly perturb the point xx so that the optimization trajectory stays away from the degenerate locus. This works perfectly as long as the optimum is not on the degenerate locus. If unavoidable, we can also temporarily resort to the Riemannian orthogonal projection for x𝕊p+q1x\in\mathbb{S}^{p+q-1} with xIp,qx=0x^{\top}\operatorname{I}_{p,q}x=0. For a twice differentiable function f:𝕊p+q1f:\mathbb{S}^{p+q-1}\rightarrow\mathbb{R}, if we denote DfDf and D2fD^{2}f for the semi-Riemannian gradient and Hessian of ff on the ambient Minkowski space (following Example 2.10), then the semi-Riemannian gradient and Hessian of ff on 𝕊p+q1\mathbb{S}^{p+q-1} are Px(Df(x))P_{x}\left(Df\left(x\right)\right) and Px(D2f(x))P_{x}\left(D^{2}f\left(x\right)\right), respectively.

4.1.2 Geodesics and Parallel-Transports

Regardless of whether the semi-Riemannian submanifold under consideration is degenerate, we can define analogies of geodesics and parallel-transports on them by means of their semi-normal bundles. To this end, for semi-Riemannian manifold MM and its submanifold XX we denote by TMTM and TXTX the tangent bundles of MM and XX, respectively. Let xXx\in X, we define the semi-normal space of XX in MM at xx to be

SNx(X,M){uTxM:g~x(u,v)=0,vTxX}.\operatorname{SN}_{x}(X,M)\coloneqq\{u\in T_{x}M:\widetilde{g}_{x}(u,v)=0,~\forall v\in T_{x}X\}.

We also define the semi-normal distribution of XX in MM to be

SN(X,M)xXSNx(X,M).\operatorname{SN}(X,M)\coloneqq\bigsqcup_{x\in X}\operatorname{SN}_{x}(X,M).

Consider the linear map

SNx(X,M)TxMTxM/TxX\operatorname{SN}_{x}(X,M)\hookrightarrow\operatorname{T}_{x}M\to\operatorname{T}_{x}M/\operatorname{T}_{x}X

where the first map is the inclusion and the second map is the quotient map. The following observation is straightforward by definition.

Lemma 4.11.

Fibres of the degenerate bundle of XX (c.f. Definition 4.4) at xXx\in X can be written as

Xx=SNx(X,M)TxX.X_{x}^{\perp}=\operatorname{SN}_{x}(X,M)\cap\operatorname{T}_{x}X.

In particular, if XX is an open submanifold of MM, then XX is a non-degenerate semi-Riemannian submanifold of MM.

Hence we have an injective map

SNx(X,M)/XxTxM/TxX\operatorname{SN}_{x}(X,M)/X_{x}^{\perp}\hookrightarrow\operatorname{T}_{x}M/\operatorname{T}_{x}X

and thus SN(X,M)/X\operatorname{SN}(X,M)/X^{\perp} is a sub-distribution of the normal bundle N(X,M)TM|X/TX\operatorname{N}(X,M)\coloneqq\operatorname{T}M|_{X}/\operatorname{T}X. If dimSNx(X,M)dimXx\dim\operatorname{SN}_{x}(X,M)-\dim X_{x}^{\perp} is constant with respect to xXx\in X, then SN(X,M)\operatorname{SN}(X,M) is a sub-bundle of N(X,M)\operatorname{N}(X,M) and will be referred to as the semi-normal bundle of XX with respect to MM. We define the analogy of geodesics on (possibly degenerate) semi-Riemannian submanifolds as curves with accelerations in the semi-normal bundle — when the semi-Riemannian submanifold becomes non-degenerate these geodesics reduces to standard semi-Riemannian geodesics.

Definition 4.12.

For a given xXx\in X and ΔTxX\Delta\in\operatorname{T}_{x}X, if a smooth curve γ:[ϵ,ϵ]X\gamma:[-\epsilon,\epsilon]\to X satisfies γ(0)=x,γ˙(0)=Δ\gamma(0)=x,\dot{\gamma}(0)=\Delta and

Ddt(γ˙(t))SNγ(t)(X,M)\frac{D}{dt}(\dot{\gamma}(t))\in\operatorname{SN}_{\gamma(t)}(X,M)

for all t[ϵ,ϵ]t\in[-\epsilon,\epsilon], then γ\gamma is called an embedded geodesic curve on XX passing through xx with the tangent direction Δ\Delta. Here Ddt(γ˙(t))\frac{D}{dt}(\dot{\gamma}(t)) is the covariant derivative of γ˙(t)\dot{\gamma}(t) along γ(t)\gamma(t) on the ambient semi-Riemannian manifold (M,g)(M,g).

Definition 4.13.

Let γ(t)\gamma(t) be a curve passing through x=γ(0)x=\gamma(0) on XX and let ΔTxX\Delta\in\operatorname{T}_{x}X be a given tangent vector. A parallel transportation of Δ\Delta along the curve γ(t)\gamma(t) is a vector field Δ(t)\Delta(t) such that Δ(0)=Δ\Delta(0)=\Delta and

Ddt(Δ(t))SNγ(t)(X,M).\frac{D}{dt}\left(\Delta(t)\right)\in\operatorname{SN}_{\gamma(t)}(X,M).

We remark that on a (semi-)Riemannian manifold (Z,g)(Z,g), a geodesic τ\tau passing through zZz\in Z with the tangent direction UTzZU\in\operatorname{T}_{z}Z is traditionally defined by the second order ODE with initial condition:

(28) {τ˙(t)τ˙(t)=0,τ(0)=x,τ˙(0)=U\begin{cases}\nabla_{\dot{\tau}(t)}\dot{\tau}(t)=0,\\ \tau(0)=x,\,\,\dot{\tau}(0)=U\end{cases}

where \nabla is the covariant derivative uniquely determined by the metric gg. In the meanwhile, a well-known fact (cf. [35, Corollary 10]) is that if (Z,g)(Z,g) is isometrically embedded in a (semi-)Riemannian manifold (Z¯,g¯)(\overline{Z},\overline{g}) then Eq. 28 is equivalent to the condition that D¯/dt(γ˙(t))\overline{D}/dt(\dot{\gamma}(t)) is always perpendicular to ZZ, i.e.

g¯(D¯dt(γ˙(t)),V)=0\overline{g}\left(\frac{\overline{D}}{dt}\left(\dot{\gamma}(t)\right),V\right)=0

for all VTγ(t)ZV\in\operatorname{T}_{\gamma(t)}Z. Here D¯\overline{D} is the covariant derivative on Z¯\overline{Z} along the curve γ(t)\gamma(t). From this second perspective, Definition 4.12 and Definition 4.13 are natural generalizations of geodesics and parallel-transports from nondegenerate to degenerate semi-Riemannian geometry. Of course, it is in general not possible to obtain closed-form expressions for the embedded geodesic curves and parallel-transports; see Section B.2 for some examples. These definitions apply to the particular case when the semi-Riemannian structure under consideration is actually Riemannian, and thus the optimization methods are also applicable to degenerate Riemannian manifolds.

4.2 Semi-Riemannian Hypersurfaces of Minkowski Spaces

In this subsection we describe the semi-Riemannian geometry of submanifolds of codimension one in the Minkowski space p,q\mathbb{R}^{p,q} (see Example 2.5), which are prototypical examples of semi-Riemannian manifolds. Throughout this subsection XX denotes a submanifold of p,q\mathbb{R}^{p,q}. Unraveling the definition of semi-normal and normal bundles yields:

Proposition 4.14.

For each xXx\in X, we have

SNx(X,p,q)=Ip,qNx(X,p+q)\operatorname{SN}_{x}(X,\mathbb{R}^{p,q})=\operatorname{I}_{p,q}\operatorname{N}_{x}(X,\mathbb{R}^{p+q})

where

Nx(X,p+q)={vTxp+qv1w1++vp+qwp+q=0 for all wTxX}\operatorname{N}_{x}(X,\mathbb{R}^{p+q})=\left\{v\in T_{x}\mathbb{R}^{p+q}\mid v_{1}w_{1}+\cdots+v_{p+q}w_{p+q}=0\textrm{ for all }w\in T_{x}X\right\}

and

Ip,qNx(X,p+q)={Ip,qvvNx(X,p+q)}.\operatorname{I}_{p,q}\operatorname{N}_{x}(X,\mathbb{R}^{p+q})=\left\{\operatorname{I}_{p,q}v\mid v\in N_{x}(X,\mathbb{R}^{p+q})\right\}.

In particular, SN(X,p,q)\operatorname{SN}(X,\mathbb{R}^{p,q}) is a vector bundle on XX of rank (p+qdimX)(p+q-\dim X).

Corollary 4.15.

Let Xp,qX\subseteq\mathbb{R}^{p,q} be a hypersurface (submanifolds of co-dimension one), and xXx\in X. Then either Xx={0}X_{x}^{\perp}=\{0\} or Xx=SNx(X,p,q)=Ip,qNx(X,p+q)X_{x}^{\perp}=\operatorname{SN}_{x}(X,\mathbb{R}^{p,q})=\operatorname{I}_{p,q}\operatorname{N}_{x}(X,\mathbb{R}^{p+q}).

Proof 4.16.

By Lemma 4.11 we have Xx=SNx(X,p,q)TxXX_{x}^{\perp}=\operatorname{SN}_{x}(X,\mathbb{R}^{p,q})\cap T_{x}X, but by Proposition 4.14, we know SNx(X,p,q)=Ip,qNx(X,p+q)\operatorname{SN}_{x}(X,\mathbb{R}^{p,q})=\operatorname{I}_{p,q}\operatorname{N}_{x}(X,\mathbb{R}^{p+q}) is one-dimensional.

Example 4.17 (Euclidean Spheres in Minkowski Spaces).

Consider as in Example 4.10 the hypersurface

𝕊p+q1={xp,qx12++xp+q2=1}p,q.\mathbb{S}^{p+q-1}=\left\{x\in\mathbb{R}^{p,q}\mid x_{1}^{2}+\cdots+x_{p+q}^{2}=1\right\}\subseteq\mathbb{R}^{p,q}.

Direct calculation yields

Nx(𝕊p+q1,p+q)\displaystyle\operatorname{N}_{x}(\mathbb{S}^{p+q-1},\mathbb{R}^{p+q}) ={λx:λ},\displaystyle=\left\{\lambda x:\lambda\in\mathbb{R}\right\},
SNx(𝕊p+q1,p,q)\displaystyle\operatorname{SN}_{x}(\mathbb{S}^{p+q-1},\mathbb{R}^{p,q}) =Ip,qNx(𝕊p+q1,p+q)={λIp,qx:λ}\displaystyle=\operatorname{I}_{p,q}\operatorname{N}_{x}(\mathbb{S}^{p+q-1},\mathbb{R}^{p+q})=\left\{\lambda\operatorname{I}_{p,q}x:\lambda\in\mathbb{R}\right\}
Tx𝕊p+q1\displaystyle T_{x}\mathbb{S}^{p+q-1} ={vp,qv1x1++vp+qxp+q=0}\displaystyle=\left\{v\in\mathbb{R}^{p,q}\mid v_{1}x_{1}+\cdots+v_{p+q}x_{p+q}=0\right\}

and thus

(𝕊p+q1)x\displaystyle(\mathbb{S}^{p+q-1})_{x}^{\perp} =SNx(𝕊p+q1,p,q)Tx𝕊p+q1\displaystyle=\operatorname{SN}_{x}(\mathbb{S}^{p+q-1},\mathbb{R}^{p,q})\cap T_{x}\mathbb{S}^{p+q-1}
={λIp,qxλ,x12++xp2=xp+12++xp+q2}\displaystyle=\left\{\lambda\operatorname{I}_{p,q}x\mid\lambda\in\mathbb{R},\,\,x_{1}^{2}+\cdots+x_{p}^{2}=x_{p+1}^{2}+\dots+x_{p+q}^{2}\right\}
={SNx(𝕊p+q1,p,q)=Ip,qNx(𝕊p+q1,p+q),ifxIp,qx=0,{0},otherwise.\displaystyle=

It is conceivable that hypersurfaces, and in particular those linear ones — known as hyperplanes — play an important role in semi-Riemannian geometry as they can provide rich yet elementary examples of non-degenerate semi-Riemannian sub-manifolds. In fact, generically speaking, hyperplanes inherit non-degenerate semi-Riemannian structures from the ambient Minkowski spaces; we defer a simple proof to supplementary materials. It makes use of a handy criterion for the non-degeneracy of semi-Riemannian structures on hypersurfaces which we establish as follows. First of all, we point out that Proposition 4.14 can be equivalently interpreted in terms Gauss maps: for closed sub-manifolds Xp,qX\subset\mathbb{R}^{p,q} with dimX<p+q\operatorname{dim}X<p+q, denote m:=p+qdim(X)m:=p+q-\operatorname{dim}\left(X\right) and define the Gauss map

N:XGr(m,p+q),N(x)=Nx(X,p+q).\operatorname{N}:X\to\operatorname{Gr}(m,p+q),\quad\operatorname{N}(x)=\operatorname{N}_{x}(X,\mathbb{R}^{p+q}).

and the semi-Gauss map

SN:XGr(m,p+q),SN(x)=SNx(X,p,q).\operatorname{SN}:X\to\operatorname{Gr}(m,p+q),\quad\operatorname{SN}(x)=\operatorname{SN}_{x}(X,\mathbb{R}^{p,q}).

Proposition 4.14 states essentially the commutativity of the following diagram:

(29) X{X}Gr(m,p+q){\operatorname{Gr}(m,p+q)}Gr(m,p+q){\operatorname{Gr}(m,p+q)}SN\scriptstyle{\operatorname{SN}}N\scriptstyle{\operatorname{N}}Ip,q\scriptstyle{\operatorname{I}_{p,q}}

Denote by 𝒱\mathcal{V} the quadratic hypersurface in p+q\mathbb{P}\mathbb{R}^{p+q} defined by

𝒱:={(x1,,xp,y1,,yq)p+q|j=1pxj2j=1qyj2=0}.\mathcal{V}:=\left\{\left(x_{1},\cdots,x_{p},y_{1},\cdots,y_{q}\right)\in\mathbb{P}\mathbb{R}^{p+q}\,\Bigg{|}\,\sum_{j=1}^{p}x_{j}^{2}-\sum_{j=1}^{q}y_{j}^{2}=0\right\}.

The degeneracy of semi-Riemannian structures on hypersurfaces is totally determined by the intersections of semi-normal bundles with 𝒱\mathcal{V}. More concretely, it follows directly from the definitions that

Proposition 4.18.

If dimX=p+q1\dim X=p+q-1, then N1(𝒱)\operatorname{N}^{-1}(\mathcal{V}) is the degenerate locus of XX. In particular, XX is non-degenerate if and only if 𝒱N(X)=\mathcal{V}\cap\operatorname{N}(X)=\emptyset, where N(X)\operatorname{N}(X) is the image of the Gauss map of XX.

In the remainder of this section we provide two classes of hypersurfaces, namely, pseudo-spheres and pseudo-hyperbolic spaces, in the Minkowski space p,q\mathbb{R}^{p,q} that are different from hyperplanes. For both examples we obtain closed form expressions for embedded geodesic curves and parallel-transports (see Definition 4.12 and Definition 4.13) needed for implementing the algorithmic framework proposed in Section 3. Numerical experiments demonstrating the efficacy of the semi-Riemannian optimization framework on these hypersurfaces can be found in Section 5.

4.2.1 Pseudo-spheres

Let 𝕊p,q\mathbb{S}^{p,q} be the hypersurface in p,q\mathbb{R}^{p,q} defined by the equation

j=1pxj2+j=1qyj2=1.-\sum_{j=1}^{p}x_{j}^{2}+\sum_{j=1}^{q}y_{j}^{2}=1.

Here we write xp,qx\in\mathbb{R}^{p,q} as x=(x1,,xp,y1,,yq)x=(x_{1},\dots,x_{p},y_{1},\dots,y_{q}). In the literature, 𝕊p,q\mathbb{S}^{p,q} is called the unit pseudo-sphere in p,q\mathbb{R}^{p,q}, and 𝕊1,q\mathbb{S}^{1,q} (resp. 𝕊p,1\mathbb{S}^{p,1}) is known asx the de Sitter (resp. Anti-de Sitter) space. The tangent space Tx𝕊p,q\operatorname{T}_{x}\mathbb{S}^{p,q} is characterized by

Tx𝕊p,q={(u,v)p,q:j=1pxjuj+j=1qyjvj=0}\operatorname{T}_{x}\mathbb{S}^{p,q}=\left\{(u,v)\in\mathbb{R}^{p,q}:-\sum_{j=1}^{p}x_{j}u_{j}+\sum_{j=1}^{q}y_{j}v_{j}=0\right\}

for each x=(x1,,xp,y1,,yq)p,qx=(x_{1},\dots,x_{p},y_{1},\dots,y_{q})\in\mathbb{R}^{p,q}. Hence we also have

Nx(𝕊p,q,p,q)={λIp,qx:λ},SNx(𝕊p,q,p+q)={λx:λ}.\operatorname{N}_{x}(\mathbb{S}^{p,q},\mathbb{R}^{p,q})=\left\{\lambda\operatorname{I}_{p,q}x:\lambda\in\mathbb{R}\right\},\quad\operatorname{SN}_{x}(\mathbb{S}^{p,q},\mathbb{R}^{p+q})=\left\{\lambda x:\lambda\in\mathbb{R}\right\}.

This together with Proposition 4.18 implies the following

Lemma 4.19.

For any positive integers pp and qq, 𝕊p,q\mathbb{S}^{p,q} is a non-degenerate semi-Riemannian sub-manifold of p,q\mathbb{R}^{p,q}.

We now turn to investigating the embedded geodesics on 𝕊p,q\mathbb{S}^{p,q}.

Proposition 4.20.

The embedded geodesic passing through x𝕊p,qx\in\mathbb{S}^{p,q} with tangent direction XTx𝕊p,qX\in T_{x}\mathbb{S}^{p,q} is

(30) γ(t)={xcos(tX)+(X/X)sin(tX),ifX,X>0,xcosh(tX)+(X/X)sinh(tX),ifX,X<0,x+tX,otherwise\gamma(t)=\begin{cases}\displaystyle x\cos(t\left\|X\right\|)+\left(X/\left\|X\right\|\right)\sin(t\left\|X\right\|),&\text{if}~\langle X,X\rangle>0,\\ \displaystyle x\cosh(t\left\|X\right\|)+\left(X/\left\|X\right\|\right)\sinh(t\left\|X\right\|),&\text{if}~\langle X,X\rangle<0,\\ \displaystyle x+tX,&\text{otherwise}\end{cases}

where X=|X,Xx|\left\|X\right\|=\sqrt{\left|\langle X,X\rangle_{x}\right|}.

Proof 4.21.

First, it is straightforward to verify that γ(0)=x\gamma(0)=x and γ˙(0)=X\dot{\gamma}(0)=X. Next we notice that

γ(t)𝖳Ip,qγ(t)=1\gamma(t)^{\mathsf{T}}\operatorname{I}_{p,q}\gamma(t)=1

since x𝖳Ip,qX=0x^{\mathsf{T}}\operatorname{I}_{p,q}X=0. This implies that γ(t)\gamma(t) is indeed a curve on 𝕊p,q\mathbb{S}^{p,q}. Lastly, by taking second derivative, we have

γ¨(t)={X,Xγ(t),ifX,X0,0,otherwise\ddot{\gamma}(t)=\begin{cases}-\langle X,X\rangle\gamma(t),~\text{if}~\langle X,X\rangle\neq 0,\\ 0,~\text{otherwise}\\ \end{cases}

and hence γ¨(t)SNγ(t)(𝕊p,q,p,q)\ddot{\gamma}(t)\in\operatorname{SN}_{\gamma(t)}(\mathbb{S}^{p,q},\mathbb{R}^{p,q}). Therefore, γ(t)\gamma(t) is the geodesic curve passing through xx with tangent direction XX.

We now compute the parallel translation on 𝕊p,q\mathbb{S}^{p,q}. Let xx be a point on 𝕊p,q\mathbb{S}^{p,q} and let Δ\Delta be a tangent vector on 𝕊p,q\mathbb{S}^{p,q} at xx. We denote by γ(t)\gamma(t) the geodesic curve passing through xx with tangent direction XX. Let Δ(t)\Delta(t) be the parallel transportation of Δ\Delta along γ\gamma. By definition, we must have that Δ(t)Tγ(t)𝕊p,q\Delta(t)\in\operatorname{T}_{\gamma(t)}\mathbb{S}^{p,q} and Δ˙(t)SNγ(t)(𝕊p,q,p,q)\dot{\Delta}(t)\in\operatorname{SN}_{\gamma(t)}(\mathbb{S}^{p,q},\mathbb{R}^{p,q}). This implies

(31) Δ(t),γ(t)\displaystyle\langle\Delta(t),\gamma(t)\rangle =0,\displaystyle=0,
(32) Δ˙(t),γ(t)γ(t)\displaystyle\langle\dot{\Delta}(t),\gamma(t)\rangle\gamma(t) =Δ˙(t).\displaystyle=\dot{\Delta}(t).

Differentiating Eq. 31, we obtain Δ˙(t),γ=Δ(t),γ˙(t)\langle\dot{\Delta}(t),\gamma\rangle=-\langle\Delta(t),\dot{\gamma}(t)\rangle and hence

Δ˙(t)=Δ(t),γ˙(t)γ(t).\dot{\Delta}(t)=-\langle\Delta(t),\dot{\gamma}(t)\rangle\gamma(t).

Since parallel translation preserves inner product, we see that Δ(t),γ˙(t)=Δ,X\langle\Delta(t),\dot{\gamma}(t)\rangle=\langle\Delta,X\rangle and

(33) Δ˙(t)=Δ,Xγ(t).\dot{\Delta}(t)=-\langle\Delta,X\rangle\gamma(t).

Integrating Eq. 33 and using the initial condition that Δ(0)=Δ\Delta(0)=\Delta to get

Proposition 4.22.

Let γ(t)\gamma(t) be the geodesic passing through x𝕊p,qx\in\mathbb{S}^{p,q} with tangent direction XTx𝕊p,qX\in\operatorname{T}_{x}\mathbb{S}^{p,q}. The parallel transport of ΔTx𝕊p,q\Delta\in\operatorname{T}_{x}\mathbb{S}^{p,q} along the γ(t)\gamma(t) is

Δ(t)=Δ,X0tγ(τ)𝑑τ+Δ.\Delta(t)=-\langle\Delta,X\rangle\int_{0}^{t}\gamma(\tau)d\tau+\Delta.

More precisely, we have

Δ(t)=\displaystyle\Delta(t)=
{Δ,XX[xsin(tX)XXcos(tX)]+(ΔΔ,XX2X),ifX,X>0,Δ,XX[xsinh(tX)+XXcosh(tX)]+(Δ+Δ,XX2X),ifX,X<0,Δ,X(tx+12t2X)+Δ,otherwise.\displaystyle

4.2.2 Pseudo-hyperbolic Spaces

The unit pseudo-hyperbolic space p,q\mathbb{H}^{p,q} in p,q\mathbb{R}^{p,q} is defined by the equation

j=1pxj2+j=1qyj2=1.-\sum_{j=1}^{p}x_{j}^{2}+\sum_{j=1}^{q}y_{j}^{2}=-1.

The tangent space of p,q\mathbb{H}^{p,q} at a point x=(x1,,xp,y1,,yq)x=(x_{1},\dots,x_{p},y_{1},\dots,y_{q}) is

Txp,q={(u,v)p,q:j=1pxjuj+j=1qyjvj=0}\operatorname{T}_{x}\mathbb{H}^{p,q}=\{(u,v)\in\mathbb{R}^{p,q}:-\sum_{j=1}^{p}x_{j}u_{j}+\sum_{j=1}^{q}y_{j}v_{j}=0\}

Let σp,q:p,qq,p\sigma_{p,q}:\mathbb{R}^{p,q}\to\mathbb{R}^{q,p} be the map defined by

σp,q(x1,,xp,y1,,yq)=(y1,,yq,x1,,xp).\sigma_{p,q}(x_{1},\dots,x_{p},y_{1},\dots,y_{q})=(y_{1},\dots,y_{q},x_{1},\dots,x_{p}).

It is straightforward ([35, Lemma 24]) to verify that σp,q\sigma_{p,q} is an anti-isometry between p,q\mathbb{H}^{p,q} and 𝕊q,p\mathbb{S}^{q,p}, whose inverse is σq,p\sigma_{q,p}. Therefore we have

Nx(p,q,p,q)={λIp,qx:λ},SNx(p,q,p,q)={λx:λ}.\operatorname{N}_{x}(\mathbb{H}^{p,q},\mathbb{R}^{p,q})=\{\lambda\operatorname{I}_{p,q}x:\lambda\in\mathbb{R}\},\quad\operatorname{SN}_{x}(\mathbb{H}^{p,q},\mathbb{R}^{p,q})=\{\lambda x:\lambda\in\mathbb{R}\}.
Corollary 4.23.

For any positive integers p,qp,q, the pseudo-hyperbolic space p,q\mathbb{H}^{p,q} is a non-degenerate semi-Riemannian sub-manifold of p,q\mathbb{R}^{p,q}.

Moreover, geodesics and parallel transports on p,q\mathbb{H}^{p,q} can be easily obtained from those on 𝕊q,p\mathbb{S}^{q,p} via the anti-isometry σp,q\sigma_{p,q}.

Corollary 4.24.

Let xx be a point on p,q\mathbb{H}^{p,q} and let XX be a tangent direction of p,q\mathbb{H}^{p,q} at xx. The geodesic curve γ(t)\gamma(t) passing through xx with tangent direction XX is

γ(t)={xcosh(tX)+(X/X)sinh(tX),ifX,X>0,xcos(tX))+(X/X)sin(tX),ifX,X<0,x+tX,otherwise.\gamma(t)=\begin{cases}x\cosh(t\left\|X\right\|)+\left(X/\left\|X\right\|\right)\sinh(t\left\|X\right\|),&\text{if}~\langle X,X\rangle>0,\\ x\cos(t\left\|X\right\|))+\left(X/\left\|X\right\|\right)\sin(t\left\|X\right\|),&\text{if}~\langle X,X\rangle<0,\\ x+tX,&\text{otherwise}.\end{cases}

Corollary 4.25.

Let γ(t)\gamma(t) be the geodesic on p,q\mathbb{H}^{p,q} passing through xp,qx\in\mathbb{H}^{p,q} with tangent direction XTxp,qX\in\operatorname{T}_{x}\mathbb{H}^{p,q}. The parallel transport of ΔTxp,q\Delta\in\operatorname{T}_{x}\mathbb{H}^{p,q} along γ(t)\gamma(t) is

Δ(t)=\displaystyle\Delta(t)=
{Δ,XX[xsinh(tX)+XXcosh(tX)]+(ΔΔ,XX2X),ifX,X>0,Δ,XX[xsin(tX)XXcos(tX)]+(Δ+Δ,XX2X),ifX,X<0,Δ,X(tx+12t2X)+Δ,otherwise.\displaystyle

5 Numerical Experiments

We demonstrate in this section the feasibility of the proposed semi-Riemannian optimization framework through various conceptual or numerical experiments.

5.1 Minkowski Spaces

Although we know from Example 3.8 that the semi-Riemannian descent direction coincides with the negative Riemannian gradient when the standard orthonormal basis is chosen and fixed at every point of 1,1\mathbb{R}^{1,1}, the two types of gradients nevertheless differ from each other if we follow the random orthonormal basis construction Algorithm 2. To illustrate the difference between Riemannian and semi-Riemannian optimization on Minkowski spaces, we solve a simple quadratic convex optimization problem

(34) minx2xAx\min_{x\in\mathbb{R}^{2}}x^{\top}Ax

on 1,1\mathbb{R}^{1,1} equipped with the standard semi-Riemannian metric of signature (,+)\left(-,+\right). Here A2×2A\in\mathbb{R}^{2\times 2} is a randomly generated symmetric positive definite matrix, and we apply both steepest descent Algorithm 1 and conjugate gradient Algorithm 5, using random orthonormal bases in subrountine Algorithm 4 for finding descent directions and Armijo’s rule for line search. The semi-Riemannian optimization trajectories vary from instances to instances due to the randomness in basis construction, but global convergence to the global minimum x=(0,0)x=\left(0,0\right)^{\top} is empirically observed. We illustrate in Fig. 1 the comparison among trajectories of Riemannian/semi-Riemannian steepest descent and conjugate gradient algorithms for one random instance.

Refer to caption
Refer to caption
Figure 1: Riemannian and semi-Riemannian steepest descent Algorithm 1 and conjugate gradient Algorithm 5 optimization on the Minkowski Space 1,1\mathbb{R}^{1,1} for an instance of the quadratic convex problem Eq. 34, where A=[0.3649,0.1065;0.1065,1.7427]A=[0.3649,-0.1065;-0.1065,1.7427] and the initial point is chosen as x0=(0.7285,0.0230)x_{0}=\left(-0.7285,0.0230\right)^{\top}.

5.2 Euclidean Spheres in Minkowski Spaces

The calculations in Example 4.17 imply that the unit Euclidean sphere 𝕊p+q1\mathbb{S}^{p+q-1} is nondegenerate as a semi-Riemannian submanifold in p,q\mathbb{R}^{p,q} except for a degenerate locus of measure zero. Let f:𝕊p+q1f:\mathbb{S}^{p+q-1}\rightarrow\mathbb{R} be a differentiable function on 𝕊p+q1\mathbb{S}^{p+q-1}, and denote f\nabla f for the gradient of ff in the ambient Euclidean space. As shown in Example 2.10, the semi-Riemannian gradient of ff in the Minkowski space can be written as Df=Ip,qfDf=I_{p,q}\nabla f, and the descent directions in the ambient space take the form [Df]+=f-\left[Df\right]^{+}=-\nabla f. Recall from Example 4.10 and Example 4.17 that, unless xx is a null vector (which is a set of measure zero), the fibre of the degenerate bundle (𝕊p+q1)\left(\mathbb{S}^{p+q-1}\right)^{\perp} vanishes at xx and thus we can project DfDf to a unique tangent vector in Tx𝕊p+q1T_{x}\mathbb{S}^{p+q-1} by Lemma 2.3. This indicates that the optimization trajectory falls outside of the degenerate locus with probability 11, as long as the optimum in not inside the degenerate locus.

To illustrate the feasibility of our proposed semi-Riemannian optimization framework, we solve the problem

(35) maxx12++xp+q2=1xAx\max_{x_{1}^{2}+\cdots+x_{p+q}^{2}=1}x^{\top}Ax

using semi-Riemannian steepest descent and conjugate gradient methods, where A(p+q)×(p+q)A\in\mathbb{R}^{\left(p+q\right)\times\left(p+q\right)} is a randomly generated symmetric (but not necessarily positive definite) matrix. Obviously, the maximum of Eq. 35 is attaned at the eigenvector associated with the maximum eigenvalue of the matrix AA, and hence we can visualize and compare the convergence dynamics of Riemannian and semi-Riemannian optimization schemes by keeping track of the L2L^{2}-discrepancy between solutions obtained at each iteration and the true maximizer. As there does not seem to exist explicit expressions for the semi-Riemannian geodesic and parallel-transport on 𝕊p+q1\mathbb{S}^{p+q-1} (see Table 1), we use Riemannian geodesic and parallel transport on 𝕊p+q1\mathbb{S}^{p+q-1}; generically, these choices do not essentially affect the convergence of manifold optimization algorithms, which allows for arbitrary retractions [2, 8] and general parallel-transports [42, 19]. Apart from the random basis generation inherent to the local semi-Riemannain Gram-Schmidt orthonormalization Algorithm 2, for p+q>2p+q>2 there also exist multiple semi-Riemannian structures on p+q\mathbb{R}^{p+q} which induce distinct semi-Riemannian structures on 𝕊p+q1\mathbb{S}^{p+q-1}; our experimental results in Fig. 2 suggest that all semi-Riemannian structures ensure convergence, though the convergence rates may differ. A deeper investigation of the depenence of convergence rate on the choice of semi-Riemannian structures appears highly intriguing but is beyond the scope of this paper; we defer such exploration to future work.

Refer to caption
Refer to caption
Figure 2: Semi-log convergence plots for Riemannian and semi-Riemannian steepest descent Algorithm 1 and conjugate gradient Algorithm 5 applied to a random instance of optimization problem Eq. 35 on the Euclidean unit sphere in Minkowski Space p,q\mathbb{R}^{p,q} with p+q=10p+q=10. In this example, AA is a 1010-by-1010 symmetric matrix (not necessarily positive definite), and the optimum is attained at the eigenvector associated with the largest eigenvalue of AA; the vertical axes stand for the squared Euclidean distance between the iterate xkx_{k} and the true optimum (obtained using Riemannian trust region method). In both subplots, each curve represents a different Minkowski space p,q\mathbb{R}^{p,q}, i.e., the same base space p+q=10\mathbb{R}^{p+q}=\mathbb{R}^{10} but endowed with a different semi-Riemannian structure; the two blue curves corresponding to the case p=0p=0 and q=10q=10 are Riemannian steepest descent and conjugate gradient algorithms, which appear to require the least number of iterations for both steepest descent and conjugate gradient algorithms. These figures indicate that the convergence of semi-Riemannian optimization algorithms is guaranteed regardless of the specific semi-Riemannian structure imposed on the manifold, though the convergence rates vary.

5.3 Pseudo-spheres in Minkowski Spaces

Since the pseudo-spheres (Section 4.2.1) and pseudo-hyperbolic spaces (Section 4.2.2) differ from each other only by an anti-isometry [35, Lemma 24], we will only consider pseudo-spheres in this numerical experiment. Note that, given an arbitrary point x𝕊p,qx\in\mathbb{S}^{p,q} and a tangent direction Tx𝕊p,q\operatorname{T}_{x}\mathbb{S}^{p,q}, it is generally difficult to calculate Riemannian geodesics on pseudo-spheres explicitly (except for some particular cases where e.g. Clairaut’s relation holds, see [10, Chapter 3 Ex. 1]), but semi-Riemannian geodesics adopt closed-form expression Eq. 30 and thus can be used as retractions for semi-Riemannian optimization algorithms. We consider the problem

(36) minx𝕊p,qxξ22\min_{x\in\mathbb{S}^{p,q}}\left\|x-\xi\right\|_{2}^{2}

where xξ2\left\|x-\xi\right\|_{2} is the Euclidean distance between x𝕊p,qx\in\mathbb{S}^{p,q} and an arbitrarily chosen point ξp+q\xi\in\mathbb{R}^{p+q} that does not lie on 𝕊p,q\mathbb{S}^{p,q}. An illustration of the convergence of semi-Riemannian steepest descent and conjugate gradient methods (in semi-log scale) for a random instance of Eq. 36 with p=3p=3 and q=12q=12 can be found in Fig. 3, where the vertical axis marks the squared Euclidean distance between xkx_{k} and the ground truth solution xtruex_{\textrm{true}} computed using the constrained optimization routine fmincon provided in the Matlab optimization toolbox. This numerical experiment indicates that the convergence rates of both semi-Riemannian first-order methods are linear for Eq. 36.

Refer to caption
Figure 3: Semi-log convergence plot for semi-Riemannian steepest descent Algorithm 1 and conjugate gradient Algorithm 5 applied to a random instance of optimization problem Eq. 36 on the unit pseudo-sphere 𝕊p,q\mathbb{S}^{p,q} in Minkowski Space p,q\mathbb{R}^{p,q} with p=3p=3 and q=12q=12. The convergence is measured with respect to the ground truth solution xtruex_{\mathrm{true}} computed directly using the constrained optimization functionality provided in the Matlab optimization toolbox. Linear convergence is demonstrated for both semi-Riemannian optimization algorithmsz.

6 Conclusion

Motivated by the metric independence of Riemannian optimization algorithms and the Riemannian geometry of self-concordant barrier functions, we developed an algorithmic framework for optimization on semi-Riemannian manifolds in this paper, which includes Riemannian manifold optimization and standard unconstrained optimization in Euclidean spaces as special cases. We proposed a modification to the semi-Riemannian gradients for obtaining descent directions, and used this methodology to devise steepest descent and conjugate gradient algorithms for semi-Riemannian manifold optimization. We also showed that second-order methods such as Newton’s method and trust region methods are invariant with respect to difference choices of semi-Riemannian (including Riemannian) metrics. We provided numerical experiments to demonstrate the feasibility of the proposed algorithmic framework on non-degenerate semi-Riemannian submanifolds of Minkowski spaces. We defer more rigorous theoretical analysis, as well as broader ranges of applications of, semi-Riemannian manifold optimization to future work.

Software

MATLAB code for the surface registration algorithm is publicly available at https://github.com/trgao10/SemiRiem.

Acknowledgments

The authors would like to thank Lin Lin (UC Berkeley) for inspirational discussions.

References

  • [1] P.-A. Absil, C. G. Baker, and K. A. Gallivan, Trust-Region Methods on Riemannian Manifolds, Foundations of Computational Mathematics, 7 (2007), p. 303–330.
  • [2] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms on Matrix Manifolds, Princeton University Press, 2009.
  • [3] N. Ahmad, H. K. Kim, and R. J. McCann, Optimal Transportation, Topology and Uniqueness, Bulletin of Mathematical Sciences, 1 (2011), p. 13–32.
  • [4] R. Albuquerque, On Lie Groups with Left Invariant Semi-Riemannian Metric, in Proceedings 1st International Meeting on Geometry and Topology, 1998, p. 1–13.
  • [5] E. Andruchow, G. Larotonda, L. Recht, and A. Varela, The Left Invariant Metric in the General Linear Group, Journal of Geometry and Physics, 86 (2014), p. 241–257.
  • [6] C. Atindogbe, M. Gutiérrez, and R. Hounnonkpe, New Properties on Normalized Null Hypersurfaces, Mediterranean Journal of Mathematics, 15 (2018), p. 166, https://doi.org/10.1007/s00009-018-1210-0, https://doi.org/10.1007/s00009-018-1210-0.
  • [7] F. Barnet et al., On Lie Groups That Admit Left-Invariant Lorentz Metrics of Constant Sectional Curvature, Illinois Journal of Mathematics, 33 (1989), p. 631–642.
  • [8] N. Boumal, P.-A. Absil, and C. Cartis, Global Rates of Convergence for Nonconvex Optimization on Manifolds, IMA Journal of Numerical Analysis, (2016).
  • [9] S. M. Carroll, Spacetime and Geometry: An Introduction to General Relativity, San Francisco, USA: Addison-Wesley (2004), 2004.
  • [10] M. P. Do Carmo, Riemannian Geometry, Springer, 1992.
  • [11] J. J. Duistermaat, On Hessian Riemannian Structures, Asian Journal of Mathematics, 5 (2001), pp. 79–91.
  • [12] A. Edelman, T. A. Arias, and S. T. Smith, The Geometry of Algorithms with Orthogonality Constraints, SIAM Journal on Matrix Analysis and Applications, 20 (1998), p. 303–353.
  • [13] D. Gabay, Minimizing a Differentiable Function over a Differential Manifold, Journal of Optimization Theory and Applications, 37 (1982), p. 177–219.
  • [14] M. Gutiérrez and B. Olea, Induced Riemannian Structures on Null Hypersurfaces, Mathematische Nachrichten, 289 (2016), pp. 1219–1236.
  • [15] S. W. Hawking, The Occurrence of Singularities in Cosmology, Proc. R. Soc. Lond. A, 294 (1966), p. 511–521.
  • [16] S. W. Hawking and G. F. R. Ellis, The Large Scale Structure of Space-Time, vol. 1, Cambridge University Press, 1973.
  • [17] S. W. Hawking and R. Penrose, The Singularities of Gravitational Collapse and Cosmology, Proc. R. Soc. Lond. A, 314 (1970), p. 529–548.
  • [18] G. Heidel and V. Schulz, A Riemannian Trust-Region Method for Low-Rank Tensor Completion, Numerical Linear Algebra with Applications, (2017), p. e2175.
  • [19] W. Huang, P.-A. Absil, and K. A. Gallivan, A Riemannian Symmetric Rank-One Trust-Region Method, Mathematical Programming, 150 (2015), p. 179–216.
  • [20] Y. Kim and R. McCann, Continuity, Curvature, and the General Covariance of Optimal Transportation, Journal of the European Mathematical Society, 12 (2010), p. 1009–1040.
  • [21] Y.-H. Kim, R. J. McCann, and M. Warren, Pseudo-Riemannian Geometry Calibrates Optimal Transportation, Mathematical Research Letters, 17 (2010), p. 1183–1197, https://doi.org/10.4310/MRL.2010.v17.n6.a16.
  • [22] D. N. Kupeli, Degenerate Manifolds, Geometriae Dedicata, 23 (1987), p. 259–290.
  • [23] D. N. Kupeli, Degenerate Submanifolds in Semi-Riemannian Geometry, Geometriae Dedicata, 24 (1987), p. 337–361.
  • [24] D. N. Kupeli, On Null Submanifolds in Spacetimes, Geometriae Dedicata, 23 (1987), p. 33–51.
  • [25] J. C. Larsen, Singular Semi-Riemannian Geometry, Journal of Geometry and Physics, 9 (1992), p. 3–23.
  • [26] D. G. Luenberger and Y. Ye, Linear and Nonlinear Programming, vol. 2, Springer, 1984.
  • [27] J. Milnor, Morse Theory, vol. 51, Princeton university press, 1963.
  • [28] J. Milnor, Morse Theory, vol. 51 of Annals of Mathematic Studies, Princeton University Press, 1973.
  • [29] J. Milnor, Curvatures of Left Invariant Metrics on Lie Groups, Advances in Mathematics, 21 (1976), p. 293–329, https://doi.org/10.1016/S0001-8708(76)80002-3.
  • [30] N. Miolane and X. Pennec, Computing Bi-Invariant Pseudo-Metrics on Lie Groups for Consistent Statistics, Entropy, 17 (2015), p. 1850–1881.
  • [31] Y. Nesterov and A. Nemirovskii, Interior-Point Polynomial Algorithms in Convex Programming, vol. 13, Siam, 1994.
  • [32] Y. E. Nesterov and M. J. Todd, On the Riemannian Geometry Defined by Self-Concordant Barriers and Interior-Point Methods, Foundations of Computational Mathematics, 2 (2002), pp. 333–361.
  • [33] L. Nicolaescu, An Invitation to Morse Theory, Springer Science & Business Media, 2011.
  • [34] K. Nomizu, Left-Invariant Lorentz Metrics on Lie Groups, Osaka Journal of Mathematics, 16 (1979), p. 143–150.
  • [35] B. O’Neill, Semi-Riemannian Geometry with Applications to Relativity, vol. 103 of Pure and Applied Mathematics, Academic Press, 1983.
  • [36] R. S. Palais, Seminar on Atiyah-Singer Index Theorem.(AM-57), vol. 57, Princeton University Press, 2016.
  • [37] R. Penrose, Gravitational Collapse and Space-Time Singularities, Physical Review Letters, 14 (1965), p. 57.
  • [38] P. Petersen, Riemannian Geometry, vol. 171 of Graduate Texts in Mathematics, Springer New York, 2006.
  • [39] T. Rapcsak, Geodesic Convexity in Nonlinear Optimization, Journal of Optimization Theory and Applications, 69 (1991), p. 169–183.
  • [40] G. Raskutti and S. Mukherjee, The Information Geometry of Mirror Descent, IEEE Transactions on Information Theory, 61 (2015), pp. 1451–1457.
  • [41] J. Renegar, A Mathematical View of Interior-Point Methods in Convex Optimization, vol. 3, Siam, 2001.
  • [42] W. Ring and B. Wirth, Optimization Methods on Riemannian Manifolds and Their Application to Shape Space, SIAM Journal on Optimization, 22 (2012), p. 596–627, https://doi.org/10.1137/11082885X.
  • [43] D. J. Saunders, The Geometry of Jet Bundles, vol. 142, Cambridge University Press, 1989.
  • [44] S. T. Smith, Optimization Techniques on Riemannian Manifolds, Fields institute communications, 3 (1994), p. 113–135.
  • [45] O. C. Stoica, On Singular Semi-Riemannian Manifolds, International Journal of Geometric Methods in Modern Physics, 11 (2014), p. 1450041.
  • [46] C. Udriste, Convex Functions and Optimization Methods on Riemannian Manifolds, vol. 297, Springer Science & Business Media, 1994.
  • [47] R. Vakil, A Beginner’s Guide to Jet Bundles from the Point of View of Algebraic Geometry, Notes, (1998).
  • [48] S. Wright and J. Nocedal, Numerical Optimization, Springer Science, 35 (1999), p. 7.

Appendix A Genericity Non-degeneracy of Semi-Riemannian Structures on Hyperplanes of Minkowski Spaces

We begin with a brief discussion for the Gauss map defined in Section 4.2. Consider Z={(x,W)X×Gr(m,p+q):W=Nx(X,p+q)}Z=\{(x,W)\in X\times\operatorname{Gr}(m,p+q):W=\operatorname{N}_{x}(X,\mathbb{R}^{p+q})\}. Since W=Nx(X,p+q)W=\operatorname{N}_{x}(X,\mathbb{R}^{p+q}) if and only if WW is perpendicular to TxX\operatorname{T}_{x}X with respect to the Euclidean metric on p+q\mathbb{R}^{p+q}, ZZ is a closed subset of X×Gr(m,p+q)X\times\operatorname{Gr}(m,p+q). More precisely, we have

Proposition A.1.

Let π:X×Gr(m,p+q)Gr(m,p+q)\pi:X\times\operatorname{Gr}(m,p+q)\to\operatorname{Gr}(m,p+q) be the canonical projection onto the second factor. The following facts hold:

  1. (1)

    π(Z)=N(X)\pi(Z)=\operatorname{N}(X);

  2. (2)

    If XX is compact, then π\pi is a closed map, (i.e. mapping closed sets to closed sets). In particular, N(X)Gr(m,p+q)\operatorname{N}(X)\subseteq\operatorname{Gr}(m,p+q) is a closed subset if XX is compact.

The non-degeneracy of a generic hyperplane can be easily obtained as a corollary of Proposition 4.18. Recall that hyperplanes Hp+qH\subset\mathbb{R}^{p+q} can be characterized as

H:={x=(x1,,xp,y1,,yq)p+q|j=1pajxj+j=1qbjyj=0},H:=\left\{x=\left(x_{1},\cdots,x_{p},y_{1},\cdots,y_{q}\right)\in\mathbb{R}^{p+q}\,\Bigg{|}\,\sum_{j=1}^{p}a_{j}x_{j}+\sum_{j=1}^{q}b_{j}y_{j}=0\right\},

and we denote

𝐧(a1,,ap,b1,,bq)p+q\mathbf{n}\coloneqq(a_{1},\dots,a_{p},b_{1},\dots,b_{q})\in\mathbb{R}^{p+q}

for the normal vector of HH. Obviously, vector Ip,q𝐧\operatorname{I}_{p,q}\mathbf{n} lies in HH if and only if 𝐧\mathbf{n} satisfies the equation 𝐧,𝐧=0\langle\mathbf{n},\mathbf{n}\rangle=0, or equivalently, the equation j=1paj2=j=1qbj2\sum_{j=1}^{p}a_{j}^{2}=\sum_{j=1}^{q}b_{j}^{2}.

Corollary A.2.

A generic hyperplane in p,q\mathbb{R}^{p,q} is non-degenerate.

Proof A.3.

If HH is a hyperplane, then the image of its Gauss map is a single point 𝐧\mathbf{n}, which is the line determined by the normal vector of HH. Since 𝒱\mathcal{V} is a hypersurface in p+q\mathbb{P}\mathbb{R}^{p+q}, we see that the normal vector of a generic hyperplane does not lie on 𝒱\mathcal{V} and hence a generic hyperplane is non-degenerate.

Appendix B Additional Examples

The following Table 1 summarizes the examples computed in this paper.

Table 1: Explicit Examples Calculated
Manifolds Non-degenerate Geodesics Paral. Transp.
(generic) hyperplane yes
sphere no
pseudo-sphere yes
pseudo-hyperbolic space yes
indefinite orthogonal group no
orthogonal group no
special linear group yes (pqp\neq q)
symplectic group no
SPD matrices no

B.1 Semi-Riemannian Geometry of Symmetric Positive Definite Matrices

Let 𝕊++n\mathbb{S}^{n}_{++} be the manifold consisting of all n×nn\times n symmetric positive definite matrices. A matrix An×nA\in\mathbb{R}^{n\times n} is symmetric positive definite if and only if there exists some MGL(n,)M\in\mathrm{GL}(n,\mathbb{R}) such that A=M𝖳MA=M^{\mathsf{T}}M. The tangent space of 𝕊++n\mathbb{S}^{n}_{++} at AA is S2n\operatorname{S}^{2}\mathbb{R}^{n}. Hence if we regard 𝕊++n\mathbb{S}^{n}_{++} as a semi-Riemannian sub-manifold of GL(n,)\mathrm{GL}(n,\mathbb{R}) with respect to the semi-Riemannian metric ,\langle\cdot,\cdot\rangle with signature (p,q)=(p,np)(p,q)=(p,n-p), then the semi-normal space of 𝕊++n\mathbb{S}^{n}_{++} at AA is

SNA(𝕊++n,GL(n,))={Ip,qΔ:Δ22}.\operatorname{SN}_{A}(\mathbb{S}^{n}_{++},\mathrm{GL}(n,\mathbb{R}))=\left\{\operatorname{I}_{p,q}\Delta:\Delta\in\bigwedge^{2}\mathbb{R}^{2}\right\}.

It is straightforward to compute the intersection of TA𝕊++n\operatorname{T}_{A}\mathbb{S}^{n}_{++} and SNA(𝕊++n,GL(n,))\operatorname{SN}_{A}(\mathbb{S}^{n}_{++},\mathrm{GL}(n,\mathbb{R})). Hence we obtain the following:

Proposition B.1.

The degenerate bundle of 𝕊++n\mathbb{S}^{n}_{++} in GL(n,)\mathrm{GL}(n,\mathbb{R}) is

(𝕊++n)A={[0X𝖳X0]:Xq×p}.(\mathbb{S}^{n}_{++})_{A}^{\perp}=\left\{\begin{bmatrix}0&X^{\mathsf{T}}\\ X&0\end{bmatrix}:X\in\mathbb{R}^{q\times p}\right\}.

In particular, 𝕊++n\mathbb{S}^{n}_{++} is a degenerate semi-Riemannian sub-manifold of GL(n,)\mathrm{GL}(n,\mathbb{R}).

Proposition B.2.

If a geodesic passing through A𝕊++nA\in\mathbb{S}^{n}_{++} with the tangent direction Ip,qΔ\operatorname{I}_{p,q}\Delta exists, then Δ=[0Δ2𝖳Δ20]\Delta=\begin{bmatrix}0&-\Delta_{2}^{\mathsf{T}}\\ \Delta_{2}&0\end{bmatrix} and γ(t)\gamma(t) can be written as

γ(t)=0t[0U(τ)𝖳U(τ)0]+A,\gamma(t)=\int_{0}^{t}\begin{bmatrix}0&U(\tau)^{\mathsf{T}}\\ U(\tau)&0\end{bmatrix}+A,

for some suitable q×pq\times p matrix-valued function U(t)U(t) such that U(0)=Δ2U(0)=\Delta_{2} and

t[0U(τ)𝖳U(τ)0]dτ+A0t\begin{bmatrix}0&U(\tau)^{\mathsf{T}}\\ U(\tau)&0\end{bmatrix}d\tau+A\succ 0

for each tt.

Proof B.3.

Let γ(t)\gamma(t) be a geodesic curve passing through AA with the tangent direction Ip,qΔ\operatorname{I}_{p,q}\Delta. Then γ(t)\gamma(t) must satisfies the following conditions:

γ(0)=A,γ˙(0)=Ip,qΔ,γ˙(t)S2n,Ip,qγ¨(t)22.\gamma(0)=A,\quad\dot{\gamma}(0)=\operatorname{I}_{p,q}\Delta,\quad\dot{\gamma}(t)\in\operatorname{S}^{2}\mathbb{R}^{n},\quad\operatorname{I}_{p,q}\ddot{\gamma}(t)\in\bigwedge^{2}\mathbb{R}^{2}.

Hence γ˙(t)=[0U(t)𝖳U(t)0]\dot{\gamma}(t)=\begin{bmatrix}0&U(t)^{\mathsf{T}}\\ U(t)&0\end{bmatrix} for some q×pq\times p matrix-valued function U(t)U(t). Therefore, we must have

γ(t)=0t[0U(t)𝖳U(t)0]+A𝕊++n.\gamma(t)=\int_{0}^{t}\begin{bmatrix}0&U(t)^{\mathsf{T}}\\ U(t)&0\end{bmatrix}+A\in\mathbb{S}^{n}_{++}.

Since A=At0t𝑑τA=\frac{A}{t}\int_{0}^{t}d\tau, we see that γ(t)0\gamma(t)\succ 0 if and only if

t[0U(t)𝖳U(t)0]+A0.t\begin{bmatrix}0&U(t)^{\mathsf{T}}\\ U(t)&0\end{bmatrix}+A\succ 0.

B.2 Semi-Riemannian Geometry of Matrix Lie Groups

As discussed in Section 4.1, matrix Lie groups provide another class of rich examples for manifolds admitting semi-Riemannian structures. In this section we illustrate how to apply the semi-Riemannian manifold optimization framework developed in Section 3 to some common matrix Lie groups, despite the degeneracy of their inherited sub-Riemannian structures, by means of projection to semi-normal bundles. We begin with a general discussion on the semi-normal bundle of matrix Lie groups, then specialize to several examples.

Let GGL(n,)G\subseteq\mathrm{GL}(n,\mathbb{R}) be a matrix Lie group. We denote by 𝔤\mathfrak{g} the Lie algebra of GG. Then the tangent space of GG at a point AGA\in G is simply A𝔤A\mathfrak{g}. For each fixed positive integer 0pn0\leq p\leq n, there is a left-invariant semi-Riemannian metric of signature (p,q)=(p,np)(p,q)=(p,n-p) on GL(n,)\mathrm{GL}(n,\mathbb{R}) defined by

U,VA=tr((A1U)𝖳Ip,q(A1V)),U,VTAG,AG.\langle U,V\rangle_{A}=\operatorname{tr}((A^{-1}U)^{\mathsf{T}}\operatorname{I}_{p,q}(A^{-1}V)),\quad U,V\in\operatorname{T}_{A}G\quad,A\in G.

Since the bilinear form ,\langle\cdot,\cdot\rangle is left-invariant, we have the following

Lemma B.4.

For each AGA\in G, we have

SNA(G,GL(n,))={AX:XSNIn(G,GL(n,)}.\operatorname{SN}_{A}(G,\mathrm{GL}(n,\mathbb{R}))=\{AX:X\in\operatorname{SN}_{\operatorname{I}_{n}}(G,\mathrm{GL}(n,\mathbb{R})\}.

Let (,)(\cdot,\cdot) be the Riemannian metric on GL(n,)\mathrm{GL}(n,\mathbb{R}) defined by

(U,V)A=tr((A1U)𝖳A1V),AGL(n,),U,VTAGL(n,).(U,V)_{A}=\operatorname{tr}((A^{-1}U)^{\mathsf{T}}A^{-1}V),\quad A\in\mathrm{GL}(n,\mathbb{R}),\quad U,V\in\operatorname{T}_{A}\mathrm{GL}(n,\mathbb{R}).

We denote by N(G,GL(n,))\operatorname{N}(G,\mathrm{GL}(n,\mathbb{R})) the normal bundle of GG in GL(n,)\mathrm{GL}(n,\mathbb{R}). We can relate the normal bundle and the semi-normal of bundle of GG in GL(n,)\mathrm{GL}(n,\mathbb{R}) by the following:

Proposition B.5.

For each AGA\in G, we have

SNA(G,GL(n,))=Ip,qNA(G,GL(n,)).\operatorname{SN}_{A}(G,\mathrm{GL}(n,\mathbb{R}))=\operatorname{I}_{p,q}\operatorname{N}_{A}(G,\mathrm{GL}(n,\mathbb{R})).

In particular, SN(G,GL(n,))\operatorname{SN}(G,\mathrm{GL}(n,\mathbb{R})) is a vector bundle on GG of rank n2dimGn^{2}-\dim G.

Proof B.6.

Un×nU\in\mathbb{R}^{n\times n} lies in SNA(G,GL(n,))\operatorname{SN}_{A}(G,\mathrm{GL}(n,\mathbb{R})) if and only if tr((A1U)𝖳Ip,qA1V)=0\operatorname{tr}((A^{-1}U)^{\mathsf{T}}\operatorname{I}_{p,q}A^{-1}V)=0 for all V𝔤V\in\mathfrak{g}. This implies that USNA(G,GL(n,))U\in\operatorname{SN}_{A}(G,\mathrm{GL}(n,\mathbb{R})) if and only if Ip,qUNA(G,GL(n,))\operatorname{I}_{p,q}U\in\operatorname{N}_{A}(G,\mathrm{GL}(n,\mathbb{R})) and this completes the proof.

Let γ(t)\gamma(t) be a geodesic passing through AGA\in G with tangent direction AUAU. Then by definition, γ(t)\gamma(t) is characterized by the following relations:

γ(t)G,γ˙(t)γ(t)𝔤,γ¨(t)SNγ(t)(G,GL(n,))\gamma(t)\in G,\quad\dot{\gamma}(t)\in\gamma(t)\mathfrak{g},\quad\ddot{\gamma}(t)\in\operatorname{SN}_{\gamma(t)}(G,\mathrm{GL}(n,\mathbb{R}))

and the initial condition γ(0)=A\gamma(0)=A, γ˙(0)=U\dot{\gamma}(0)=U. To compute geodesics explicitly for matrix Lie groups, we need the following simple but handy observations.

Lemma B.7.

If γ(t)\gamma(t) is a given geodesic, then there exists a unique curve U(t)U(t) in 𝔤\mathfrak{g} such that γ˙(t)=γ(t)U(t)\dot{\gamma}(t)=\gamma(t)U(t) and U(t)2+U(t)˙SNIn(G,GL(n,))U(t)^{2}+\dot{U(t)}\in\operatorname{SN}_{\operatorname{I}_{n}}(G,\mathrm{GL}(n,\mathbb{R})).

Proof B.8.

Since γ˙(t)γ(t)𝔤\dot{\gamma}(t)\in\gamma(t)\mathfrak{g}, we can write γ˙(t)=γ(t)U(t)\dot{\gamma}(t)=\gamma(t)U(t) for a curve U(t)U(t) in 𝔤\mathfrak{g}. By differentiating γ˙(t)=γ(t)U(t)\dot{\gamma}(t)=\gamma(t)U(t), we obtain

γ¨(t)=γ(t)(U(t)2+U˙(t))SNγ(t)(G,GL(n,)).\ddot{\gamma}(t)=\gamma(t)(U(t)^{2}+\dot{U}(t))\in\operatorname{SN}_{\gamma(t)}(G,\mathrm{GL}(n,\mathbb{R})).

This implies that U(t)2+U(t)˙SNIn(G,GL(n,))U(t)^{2}+\dot{U(t)}\in\operatorname{SN}_{\operatorname{I}_{n}}(G,\mathrm{GL}(n,\mathbb{R})).

Proposition B.9.

If γ(t)\gamma(t) is a geodesic passing through AA with tangent direction AUAU, then γ(t)=Aexp(0tU(τ)𝑑τ)\gamma(t)=A\exp(\int_{0}^{t}U(\tau)d\tau), where U(t)U(t) is the curve in 𝔤\mathfrak{g} determined in Lemma B.7.

Corollary B.10.

The geodesic curve γ(t)\gamma(t) passing through AO(p,q)A\in\operatorname{O}(p,q) with direction AIp,qUA\operatorname{I}_{p,q}U in Eq. 40 is of constant speed.

Proof B.11.

We calculate g~γ(t)(γ˙(t),γ˙(t))\widetilde{g}_{\gamma(t)}(\dot{\gamma}(t),\dot{\gamma}(t)). By Lemma B.17, we have

g~γ(t)(γ˙(t),γ˙(t))=tr(U12)+tr(U32).\widetilde{g}_{\gamma(t)}(\dot{\gamma}(t),\dot{\gamma}(t))=-\operatorname{tr}(U_{1}^{2})+\operatorname{tr}(U_{3}^{2}).

Corollary B.12.

The energy E(t)\operatorname{E}(t) of the geodesic γ(t)\gamma(t) in Eq. 40 is

E(t)=(tr(U12)+tr(U32))t.E\left(t\right)=(-\operatorname{tr}(U_{1}^{2})+\operatorname{tr}(U_{3}^{2}))t.

Proof B.13.

By definition and Corollary B.10, we have

E(t)=0tg~γ(τ)(γ˙(τ),γ˙(τ))𝑑τ=(tr(U12)+tr(U32))t.\operatorname{E}(t)=\int_{0}^{t}\widetilde{g}_{\gamma(\tau)}(\dot{\gamma}(\tau),\dot{\gamma}(\tau))d\tau=(-\operatorname{tr}(U_{1}^{2})+\operatorname{tr}(U_{3}^{2}))t.

The following characterization of the parallel transport of a tangent vector along a geodesic on GG can be easily obtained by unravelling Definition 4.13.

Lemma B.14.

Let Δ(t)\Delta(t) be a parallel transport of ΔTAG\Delta\in\operatorname{T}_{A}G along a geodesic curve γ(t)\gamma(t) passing through AGA\in G with tangent direction AUTAGAU\in\operatorname{T}_{A}G. Then

Δ(t)=γ(t)V(t),\Delta(t)=\gamma(t)V(t),

where V(t)V(t) is a curve in 𝔤\mathfrak{g} such that U(t)V(t)+V˙(t)SNIn(G,GL(n,))U(t)V(t)+\dot{V}(t)\in\operatorname{SN}_{\operatorname{I}_{n}}(G,\mathrm{GL}(n,\mathbb{R})) and V(0)=ΔV(0)=\Delta. Here U(t)U(t) is the curve in 𝔤\mathfrak{g} determined in Lemma B.7.

B.2.1 Indefinite Orthogonal Groups

The definition of indefinite orthogonal groups O(p,q)O\left(p,q\right) can be found in Example 4.5. In this subsection we derive explicit formulae for the geodesic in O(p,q)O\left(p,q\right) following Definition 4.12.

Proposition B.15.

For each AO(p,q)A\in\operatorname{O}(p,q), we have

SNA(O(p,q),GL(p+q,))={AS:SS2p+q}\operatorname{SN}_{A}(\operatorname{O}(p,q),\mathrm{GL}(p+q,\mathbb{R}))=\{AS:S\in\operatorname{S}^{2}\mathbb{R}^{p+q}\}

Proof B.16.

By Lemma B.4, it is sufficient to prove

SNIp+q(O(p,q),GL(p+q,))={S:SS2p+q}\operatorname{SN}_{\operatorname{I}_{p+q}}(\operatorname{O}(p,q),\mathrm{GL}(p+q,\mathbb{R}))=\{S:S\in\operatorname{S}^{2}\mathbb{R}^{p+q}\}

as ,\langle\cdot,\cdot\rangle is left-invariant. To this end, we notice that by Proposition B.5

SNIn(O(p,q),GL(p+q,))=Ip,qNIn(O(p,q),GL(n,)).\operatorname{SN}_{\operatorname{I}_{n}}(\operatorname{O}(p,q),\mathrm{GL}(p+q,\mathbb{R}))=\operatorname{I}_{p,q}\operatorname{N}_{\operatorname{I}_{n}}(\operatorname{O}(p,q),\mathrm{GL}(n,\mathbb{R})).

Now NIn(O(p,q),GL(n,))\operatorname{N}_{\operatorname{I}_{n}}(\operatorname{O}(p,q),\mathrm{GL}(n,\mathbb{R})) consists of all matrices of the form Ip,qS\operatorname{I}_{p,q}S where SS is symmetric, we may conclude that

SNIn(O(p,q),GL(p+q,))={S:SS2p+q}.\operatorname{SN}_{\operatorname{I}_{n}}(\operatorname{O}(p,q),\mathrm{GL}(p+q,\mathbb{R}))=\{S:S\in\operatorname{S}^{2}\mathbb{R}^{p+q}\}.

Let γ(t)\gamma(t) be a geodesic curve passing through AO(p,q)A\in\operatorname{O}(p,q) with direction AIp,qΔA\operatorname{I}_{p,q}\Delta for some skew-symmetric matrix Δ\Delta. By Proposition B.9 and Proposition B.15, the curve γ(t)\gamma(t) can be written as

(37) γ(t)=Aexp(0tU(τ)𝑑τ),\gamma(t)=A\exp\left(\int_{0}^{t}U(\tau)d\tau\right),

where U(t)U(t) is a curve in 𝔬(p,q)\mathfrak{o}(p,q) such that

(38) U(0)=Ip,qΔ,U(t)2+U(t)˙S2p+q.U(0)=\operatorname{I}_{p,q}\Delta,\quad U(t)^{2}+\dot{U(t)}\in\operatorname{S}^{2}\mathbb{R}^{p+q}.

We partition a (p+q)×(p+q)(p+q)\times(p+q) skew-symmetric matrix YY as [Y1Y2𝖳Y2Y3]\begin{bmatrix}Y_{1}&-Y_{2}^{\mathsf{T}}\\ Y_{2}&Y_{3}\end{bmatrix} where Y12pY_{1}\in\bigwedge^{2}\mathbb{R}^{p}, Y32qY_{3}\in\bigwedge^{2}\mathbb{R}^{q} and Y2q×pY_{2}\in\mathbb{R}^{q\times p}.

Lemma B.17.

Let tγ(t)t\mapsto\gamma(t) be the geodesic passing through AA with direction AIp,qΔA\operatorname{I}_{p,q}\Delta. Then the curve U(t)U(t) in 𝔬(p,q)\mathfrak{o}(p,q) satisfying (38) is

(39) U(t)=[Δ1exp(Δ1t)Δ2𝖳exp(Δ3t)exp(Δ3t)Δ2exp(Δ1t)Δ3].U(t)=\begin{bmatrix}-\Delta_{1}&\exp(-\Delta_{1}t)\Delta_{2}^{\mathsf{T}}\exp(\Delta_{3}t)\\ \exp(-\Delta_{3}t)\Delta_{2}\exp(\Delta_{1}t)&\Delta_{3}\end{bmatrix}.

Proof B.18.

We parametrize U(t)U(t) as U(t)=Ip,qΔ(t)U(t)=\operatorname{I}_{p,q}\Delta(t) where Δ(t)2p+q\Delta(t)\in\bigwedge^{2}\mathbb{R}^{p+q}. Since U(t)2+U˙(t)U(t)^{2}+\dot{U}(t) is symmetric, we have

Skew(U(t)2+U˙(t))=0.\operatorname{Skew}(U(t)^{2}+\dot{U}(t))=0.

This implies that

Δ1˙(t)=0,Δ3˙(t)=0,Δ2˙(t)=Δ2(t)Δ1(t)Δ3(t)Δ2(t).\dot{\Delta_{1}}(t)=0,\quad\dot{\Delta_{3}}(t)=0,\quad\dot{\Delta_{2}}(t)=\Delta_{2}(t)\Delta_{1}(t)-\Delta_{3}(t)\Delta_{2}(t).

from which we obtain Δ1(t)=Δ1\Delta_{1}(t)=\Delta_{1}, Δ3(t)=Δ3\Delta_{3}(t)=\Delta_{3} and

Δ2(t)=exp(Δ3t)Δ2exp(Δ1t).\Delta_{2}(t)=\exp(-\Delta_{3}t)\Delta_{2}\exp(\Delta_{1}t).

Therefore, we obtain

U(t)=[Δ1exp(Δ1t)Δ2𝖳exp(Δ3t)exp(Δ3t)Δ2exp(Δ1t)Δ3].U(t)=\begin{bmatrix}-\Delta_{1}&\exp(-\Delta_{1}t)\Delta_{2}^{\mathsf{T}}\exp(\Delta_{3}t)\\ \exp(-\Delta_{3}t)\Delta_{2}\exp(\Delta_{1}t)&\Delta_{3}\end{bmatrix}.

By Proposition B.9, we obtain the following:

Proposition B.19.

The geodesic curve γ(t)\gamma(t) passing through AO(p,q)A\in\operatorname{O}(p,q) with direction AIp,qΔA\operatorname{I}_{p,q}\Delta is unique and is

(40) γ(t)=Aexp([Δ1t0texp(Δ1τ)Δ2𝖳exp(Δ3τ)𝑑τ0texp(Δ3τ)Δ2exp(Δ1τ)𝑑τΔ3t]).\gamma(t)=A\exp\left(\begin{bmatrix}-\Delta_{1}t&\int_{0}^{t}\exp(-\Delta_{1}\tau)\Delta_{2}^{\mathsf{T}}\exp(\Delta_{3}\tau)d\tau\\ \int_{0}^{t}\exp(-\Delta_{3}\tau)\Delta_{2}\exp(\Delta_{1}\tau)d\tau&\Delta_{3}t\end{bmatrix}\right).

Corollary B.20.

The curve γ(t)=Aexp(tIp,qΔ)\gamma(t)=A\exp(t\operatorname{I}_{p,q}\Delta) is a geodesic curve passing through AA with direction Ip,qY\operatorname{I}_{p,q}Y if and only if Δ3Δ2=Δ2Δ1\Delta_{3}\Delta_{2}=\Delta_{2}\Delta_{1}.

Proof B.21.

If γ(t)=Aexp(tIp,qΔ)\gamma(t)=A\exp(t\operatorname{I}_{p,q}\Delta) is a geodesic curve, then from Proposition B.19, it is straightforward to verify that Δ3Δ2Δ2Δ1=0\Delta_{3}\Delta_{2}-\Delta_{2}\Delta_{1}=0. Conversely, if Δ3Δ2Δ2Δ1=0\Delta_{3}\Delta_{2}-\Delta_{2}\Delta_{1}=0, then Eq. 40 is reduced to

γ(t)=Aexp([Δ1tΔ2𝖳tΔ2tΔ3t])=Aexp(tIp,qΔ).\gamma(t)=A\exp\left(\begin{bmatrix}\Delta_{1}t&\Delta_{2}^{\mathsf{T}}t\\ \Delta_{2}t&\Delta_{3}t\end{bmatrix}\right)=A\exp(t\operatorname{I}_{p,q}\Delta).

In particular, if p=0p=0 (resp. q=0q=0), then Δ3Δ2=Δ2Δ1\Delta_{3}\Delta_{2}=\Delta_{2}\Delta_{1} always holds and Corollary B.20 shows that geodesics in O(q)\operatorname{O}(q) (resp. p=0p=0) are of the form Aexp(tΔ)A\exp(t\Delta) for some skew-symmetric matrix Δ\Delta. Moreover, Corollary B.20 already shows a big difference between Riemannian geometry and non-Riemannian geometry of O(p,q)\operatorname{O}(p,q) (cf. [5, Theorem 2.14]).

B.2.2 Orthogonal Groups

By Proposition B.19, the geodesic passing through a point AO(p,q)A\in\operatorname{O}(p,q) with tangent direction UTAO(p,q)U\in\operatorname{T}_{A}\operatorname{O}(p,q) exists. We will see that this is not always true. To that end, we consider O(n)\operatorname{O}(n) for example. First, according to Lemma B.4 and Proposition B.5, we have the following

Proposition B.22.

For each AO(n)A\in\operatorname{O}(n), the semi-normal space of O(n)\operatorname{O}(n) at AA is

SNA(O(n),GL(n,))={Ip,qS:SS2n}.\operatorname{SN}_{A}(\operatorname{O}(n),\mathrm{GL}(n,\mathbb{R}))=\{\operatorname{I}_{p,q}S:S\in\operatorname{S}^{2}\mathbb{R}^{n}\}.

Now if γ(t)\gamma(t) is a geodesic curve passing through AA with tangent direction AΔA\Delta where Δ=[Δ1Δ2𝖳Δ2Δ3]𝔬(n)\Delta=\begin{bmatrix}\Delta_{1}&-\Delta_{2}^{\mathsf{T}}\\ \Delta_{2}&\Delta_{3}\end{bmatrix}\in\mathfrak{o}(n), then

γ(t)=Aexp(0tU(τ)𝑑τ),\gamma(t)=A\exp\left(\int_{0}^{t}U(\tau)d\tau\right),

where U(t)=[U1(t)U2(t)𝖳U2(t)U3(t)]U(t)=\begin{bmatrix}U_{1}(t)&-U_{2}(t)^{\mathsf{T}}\\ U_{2}(t)&U_{3}(t)\end{bmatrix} is skew-symmetric satisfying

U(0)=Δ,Ip,q(U(t)2+U˙(t))S2n.U(0)=\Delta,\quad\operatorname{I}_{p,q}(U(t)^{2}+\dot{U}(t))\in\operatorname{S}^{2}\mathbb{R}^{n}.

Hence we have

Skew(Ip,q(U(t)2+U˙(t)))=0,\operatorname{Skew}(\operatorname{I}_{p,q}(U(t)^{2}+\dot{U}(t)))=0,

which implies

U1˙(t)=0,U3˙(t)=0,U2(t)U1(t)+U3(t)U2(t)=0.\dot{U_{1}}(t)=0,\quad\dot{U_{3}}(t)=0,\quad U_{2}(t)U_{1}(t)+U_{3}(t)U_{2}(t)=0.

Therefore, we may conclude that

U1(t)=Δ1,U3(t)=Δ3,U2(t)Δ1+Δ3U2(t)=0.U_{1}(t)=\Delta_{1},\quad U_{3}(t)=\Delta_{3},\quad U_{2}(t)\Delta_{1}+\Delta_{3}U_{2}(t)=0.
Proposition B.23.

On O(n)\operatorname{O}(n), a geodesic passing through AA with the tangent direction AΔA\Delta exists if and only if the skew symmetric matrix Δ=[Δ1Δ2𝖳Δ2Δ3]\Delta=\begin{bmatrix}\Delta_{1}&-\Delta_{2}^{\mathsf{T}}\\ \Delta_{2}&\Delta_{3}\end{bmatrix} satisfies the relation

Δ2Δ1+Δ3Δ2=0.\Delta_{2}\Delta_{1}+\Delta_{3}\Delta_{2}=0.

If such a γ\gamma exists, then γ\gamma if of the form

γ(t)=Aexp([Δ1t0tU2(τ)𝖳𝑑τ0tU2(τ)𝑑τΔ3t])\gamma(t)=A\exp\left(\begin{bmatrix}\Delta_{1}t&-\int_{0}^{t}U_{2}(\tau)^{\mathsf{T}}d\tau\\ \int_{0}^{t}U_{2}(\tau)d\tau&\Delta_{3}t\end{bmatrix}\right)

where U2U_{2} is a curve in the linear subspace {Xq×p:XΔ1+Δ3X=0}\{X\in\mathbb{R}^{q\times p}:X\Delta_{1}+\Delta_{3}X=0\} such that U2(0)=Δ2U_{2}(0)=\Delta_{2}. The square of the speed of γ(t)\gamma(t) is tr(Δ12)tr(Δ32)\operatorname{tr}(\Delta^{2}_{1})-\operatorname{tr}(\Delta^{2}_{3}) and the energy of γ\gamma is E(t)=t(tr(Δ12)tr(Δ32))E(t)=t(\operatorname{tr}(\Delta^{2}_{1})-\operatorname{tr}(\Delta^{2}_{3})).

In particular, if Δ2Δ1+Δ3Δ2=0\Delta_{2}\Delta_{1}+\Delta_{3}\Delta_{2}=0 and we take U2(t)=Δ2U_{2}(t)=\Delta_{2}, then by Proposition B.23 γ\gamma becomes

γ(t)=Aexp(Δt),\gamma(t)=A\exp(\Delta t),

which is exactly the geodesic curve on O(n)\operatorname{O}(n) with the usual Riemannian metric

(U,V)A=tr((A1U)𝖳A1V)\left(U,V\right)_{A}=\operatorname{tr}((A^{-1}U)^{\mathsf{T}}A^{-1}V)

where AO(n)A\in\operatorname{O}(n) and U,VTAO(n)U,V\in\operatorname{T}_{A}\operatorname{O}(n).

B.2.3 Special Linear Groups with pqp\neq q

We notice that the Lie algebra of SL(n,)\operatorname{SL}(n,\mathbb{R}) consists of all traceless n×nn\times n matrices, which implies that

NIn(SL(n,),GL(n,))={λIn:λ}.\operatorname{N}_{\operatorname{I}_{n}}(\operatorname{SL}(n,\mathbb{R}),\mathrm{GL}(n,\mathbb{R}))=\{\lambda\operatorname{I}_{n}:\lambda\in\mathbb{R}\}.

Therefore, from Lemma B.4 and Proposition B.5, we have the following

Proposition B.24.

For each ASL(n,)A\in\operatorname{SL}(n,\mathbb{R}), the semi-normal space of SL(n,)\operatorname{SL}(n,\mathbb{R}) at AA is

SNA(SL(n,,GL(n,)))={λAIp,q:λ}.\operatorname{SN}_{A}(\operatorname{SL}(n,\mathbb{R},\mathrm{GL}(n,\mathbb{R})))=\{\lambda A\operatorname{I}_{p,q}:\lambda\in\mathbb{R}\}.

In particular,

SL(n,)={SL(n,)×{0},ifpqSN(SL(n,,GL(n,))),otherwise.\operatorname{SL}(n,\mathbb{R})^{\perp}=\begin{cases}\operatorname{SL}(n,\mathbb{R})\times\{0\},&\text{if}~p\neq q\\ \operatorname{SN}(\operatorname{SL}(n,\mathbb{R},\mathrm{GL}(n,\mathbb{R}))),&\text{otherwise}.\end{cases}

Hence SL(n,)\operatorname{SL}(n,\mathbb{R}) is a non-dengerate semi-Riemannian sub-manifold of GL(n,)\mathrm{GL}(n,\mathbb{R}) if and only if pqp\neq q.

By Proposition B.24 and [35, Corollary 10], we obtain the following

Corollary B.25.

If pqp\neq q, then the geodesic passing through ASL(n,)A\in\operatorname{SL}(n,\mathbb{R}) with the tangent direction AUTASL(n,)AU\in\operatorname{T}_{A}\operatorname{SL}(n,\mathbb{R}) exists and it is unique.

If γ(t)\gamma(t) is a geodesic curve passing ASL(n,)A\in\operatorname{SL}(n,\mathbb{R}) with the tangent direction AUTASL(n,)AU\in\operatorname{T}_{A}\operatorname{SL}(n,\mathbb{R}), then

γ(t)=Aexp(0tU(τ)𝑑τ),\gamma(t)=A\exp\left(\int_{0}^{t}U(\tau)d\tau\right),

where tr(U(t))=0\operatorname{tr}(U(t))=0 satisfying

(41) U(0)=U,U(t)2+U˙(t)=λ(t)Ip,qU(0)=U,\quad U(t)^{2}+\dot{U}(t)=\lambda(t)\operatorname{I}_{p,q}

for some real valued function λ(t)\lambda(t). Moreover, from (41), we also have

U(0)2+U˙(0)=λ(0)Ip,q.U(0)^{2}+\dot{U}(0)=\lambda(0)\operatorname{I}_{p,q}.

Together with the fact that tr(U˙(0))=0\operatorname{tr}(\dot{U}(0))=0 as tr(U(t))=0\operatorname{tr}(U(t))=0, we may conclude that

(42) λ(0)(qp)=tr(U(0)2)=tr(U2).\lambda(0)(q-p)=\operatorname{tr}(U(0)^{2})=\operatorname{tr}(U^{2}).
Proposition B.26.

If p=qp=q and γ\gamma is a geodesic curve passing through ASL(n,)A\in\operatorname{SL}(n,\mathbb{R}) with the tangent direction γ˙(0)=AUTASL(n,)\dot{\gamma}(0)=AU\in\operatorname{T}_{A}\operatorname{SL}(n,\mathbb{R}), then tr(U2)=0\operatorname{tr}(U^{2})=0.

Next we consider the simplest case where n=2n=2 and p=q=1p=q=1, we may write U(t)=[a(t)b(t)c(t)a(t)]U(t)=\begin{bmatrix}a(t)&b(t)\\ c(t)&-a(t)\end{bmatrix} and U=[abca]U=\begin{bmatrix}a&b\\ c&-a\end{bmatrix} so that the ODE in (41) becomes

[a(t)2+b(t)c(t)+a˙(t)b˙(t)c˙(t)a(t)2+b(t)c(t)a˙(t)]=λ(t)[1001].\begin{bmatrix}a(t)^{2}+b(t)c(t)+\dot{a}(t)&\dot{b}(t)\\ \dot{c}(t)&a(t)^{2}+b(t)c(t)-\dot{a}(t)\end{bmatrix}=\lambda(t)\begin{bmatrix}-1&0\\ 0&1\end{bmatrix}.

This implies that b(t)=b,c(t)=cb(t)=b,c(t)=c and a(t)2=bca(t)^{2}=-bc. Hence we may conclude that

Proposition B.27.

An embedded geodesic γ\gamma on SL(2,)\operatorname{SL}(2,\mathbb{R}) passing through AA with tangent direction U=[abca]U=\begin{bmatrix}a&b\\ c&-a\end{bmatrix} exists if and only if a2+bc=0a^{2}+bc=0. Moreover, if such γ\gamma exists, then it is unique and

γ(t)=Aexp([abca]t).\gamma(t)=A\exp\left(\begin{bmatrix}a&b\\ c&-a\end{bmatrix}t\right).

The square of the speed of γ(t)\gamma(t) is 2a2+b2+c22a^{2}+b^{2}+c^{2} and the energy E(t)E(t) is t(2a2+b2+c2)t(2a^{2}+b^{2}+c^{2}).

B.2.4 Symplectic Group

We recall that the symplectic group Sp(2n,)\operatorname{Sp}(2n,\mathbb{R}) is the group of (2n)×(2n)(2n)\times(2n) matrices AA satisfying

A𝖳JnA=JnA^{\mathsf{T}}\operatorname{J}_{n}A=\operatorname{J}_{n}

where Jn=[0InIn0]\operatorname{J}_{n}=\begin{bmatrix}0&\operatorname{I}_{n}\\ -\operatorname{I}_{n}&0\end{bmatrix}. The Lie algebra of Sp(2n,)\operatorname{Sp}(2n,\mathbb{R}) is

𝔰𝔭(n)={JnS:SS22n}\mathfrak{sp}(n)=\left\{\operatorname{J}_{n}S:S\in\operatorname{S}^{2}\mathbb{R}^{2n}\right\}

and the normal space is

NI2n(Sp(2n,),GL(2n,))={JnΔ:Δ22n}.\operatorname{N}_{\operatorname{I}_{2n}}(\operatorname{Sp}(2n,\mathbb{R}),\mathrm{GL}(2n,\mathbb{R}))=\left\{\operatorname{J}_{n}\Delta:\Delta\in\bigwedge^{2}\mathbb{R}^{2n}\right\}.

Without loss of generality, we may assume that 1pn1\leq p\leq n.

Proposition B.28.

For each ASp(2n,)A\in\operatorname{Sp}(2n,\mathbb{R}), we have

SNA(Sp(2n,),GL(2n,))={AIp,qJnΔ:Δ22n}\operatorname{SN}_{A}(\operatorname{Sp}(2n,\mathbb{R}),\mathrm{GL}(2n,\mathbb{R}))=\left\{A\operatorname{I}_{p,q}\operatorname{J}_{n}\Delta:\Delta\in\bigwedge^{2}\mathbb{R}^{2n}\right\}

and

Sp(2n,)A={A[XY0Z𝖳00Z000X𝖳000Y𝖳0]:Xp×p,Yp×(np),Z(np)×p}.\operatorname{Sp}(2n,\mathbb{R})_{A}^{\perp}=\left\{A\begin{bmatrix}X&Y&0&Z^{\mathsf{T}}\\ 0&0&Z&0\\ 0&0&X^{\mathsf{T}}&0\\ 0&0&Y^{\mathsf{T}}&0\end{bmatrix}:X\in\mathbb{R}^{p\times p},Y\in\mathbb{R}^{p\times(n-p)},Z\in\mathbb{R}^{(n-p)\times p}\right\}.

In particular, Sp(2n,)\operatorname{Sp}(2n,\mathbb{R}) is a degenerate semi-Riemannian sub-manifold of GL(2n,)\mathrm{GL}(2n,\mathbb{R}).

Proof B.29.

The description of the semi-normal space follows from Lemma B.4 and Proposition B.5 and the description of Sp(2n,)A\operatorname{Sp}(2n,\mathbb{R})_{A}^{\perp} is obtained by a straightforward calculation.

Next we describe geodesics on Sp(2n,)\operatorname{Sp}(2n,\mathbb{R}).

Proposition B.30.

Let γ(t)\gamma(t) be a geodesic passing through ASp(2n,)A\in\operatorname{Sp}(2n,\mathbb{R}) with the tangent direction AJnXA\operatorname{J}_{n}X. If we write XX as X=[SP𝖳PQ]X=\begin{bmatrix}S&P^{\mathsf{T}}\\ P&Q\end{bmatrix} where S,QS,Q are symmetric n×nn\times n matrices and PP is an n×nn\times n matrix. We also partition S,P,QS,P,Q as

S=[S11S12S12𝖳S22],P=[P11P12P21P22],Q=[Q11Q12Q12𝖳Q22]S=\begin{bmatrix}S_{11}&S_{12}\\ S_{12}^{\mathsf{T}}&S_{22}\end{bmatrix},\quad P=\begin{bmatrix}P_{11}&P_{12}\\ P_{21}&P_{22}\end{bmatrix},\quad Q=\begin{bmatrix}Q_{11}&Q_{12}\\ Q_{12}^{\mathsf{T}}&Q_{22}\end{bmatrix}

where S11,P11,Q11p×pS_{11},P_{11},Q_{11}\in\mathbb{R}^{p\times p} and S22,P22,Q22(np)×(np)S_{22},P_{22},Q_{22}\in\mathbb{R}^{(n-p)\times(n-p)}. Then

γ(t)=Aexp(0t[P(τ)Q(τ)SP(τ)𝖳]dτ,\gamma(t)=A\exp(\int_{0}^{t}\begin{bmatrix}P(\tau)&Q(\tau)\\ -S&-P(\tau)^{\mathsf{T}}\end{bmatrix}d\tau,

where P(t)=[P11(t)P12(t)P21P22]P(t)=\begin{bmatrix}P_{11}(t)&P_{12}(t)\\ P_{21}&P_{22}\end{bmatrix}, Q(t)=[Q11Q12(t)Q12(t)𝖳Q22]Q(t)=\begin{bmatrix}Q_{11}&Q_{12}(t)\\ Q_{12}(t)^{\mathsf{T}}&Q_{22}\end{bmatrix} and

{Q11S11+Q12(t)S12𝖳=P11(t)2+P12(t)P21,Q11S12+Q12(t)S22=P11(t)P12(t)+P12(t)P22,(P21Q11+P22P12(t)𝖳)(Q12(t)𝖳P11(t)𝖳+Q22P12(t)𝖳)+Q˙12(t)𝖳=0.\begin{cases}Q_{11}S_{11}+Q_{12}(t)S_{12}^{\mathsf{T}}=P_{11}(t)^{2}+P_{12}(t)P_{21},\\ Q_{11}S_{12}+Q_{12}(t)S_{22}=P_{11}(t)P_{12}(t)+P_{12}(t)P_{22},\\ (P_{21}Q_{11}+P_{22}P_{12}(t)^{\mathsf{T}})-(Q_{12}(t)^{\mathsf{T}}P_{11}(t)^{\mathsf{T}}+Q_{22}P_{12}(t)^{\mathsf{T}})+\dot{Q}_{12}(t)^{\mathsf{T}}=0.\end{cases}