\newsiamremark

remarkRemark \newsiamremarkhypothesisHypothesis \newsiamthmclaimClaim \newsiamthmexampleExample \headersSemi-Riemannian Manifold OptimizationT. Gao, L.-H. Lim, and K. Ye

Semi-Riemannian Manifold Optimization

Tingran Gao Department of Statistics and Committee on Computational and Applied Mathematics (CCAM), The University of Chicago, Chicago IL () tingrangao@galton.uchicago.edu Lek-Heng Lim Department of Statistics and Committee on Computational and Applied Mathematics (CCAM), The University of Chicago, Chicago, IL () lekheng@galton.uchicago.edu Ke Ye Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China () keyk@amss.ac.cn

Abstract

We introduce in this paper a manifold optimization framework that utilizes semi-Riemannian structures on the underlying smooth manifolds. Unlike in Riemannian geometry, where each tangent space is equipped with a positive definite inner product, a semi-Riemannian manifold allows the metric tensor to be indefinite on each tangent space, i.e., possessing both positive and negative definite subspaces; differential geometric objects such as geodesics and parallel-transport can be defined on non-degenerate semi-Riemannian manifolds as well, and can be carefully leveraged to adapt Riemannian optimization algorithms to the semi-Riemannian setting. In particular, we discuss the metric independence of manifold optimization algorithms, and illustrate that the weaker but more general semi-Riemannian geometry often suffices for the purpose of optimizing smooth functions on smooth manifolds in practice.

keywords:

manifold optimization, semi-Riemannian geometry, degenerate submanifolds, Lorentzian geometry, steepest descent, conjugate gradient, Newton’s method, trust region method

{AMS}

90C30, 53C50, 53B30, 49M05, 49M15

1 Introduction

Manifold optimization [12, 2] is a class of techniques for solving optimization problems of the form

(1)

\min_{x\in\mathcal{M}}f\left(x\right)

where $\mathcal{M}$ is a (typically nonlinear and nonconvex) manifold and $f:\mathcal{M}\rightarrow\mathbb{R}$ is a smooth function over $\mathcal{M}$ . These techniques generally begin with endowing the manifold $\mathcal{M}$ with a Riemannian structures, which amounts to specifying a smooth family of inner products on the tangent spaces of $\mathcal{M}$ , with which analogies of differential quantities such as gradient and Hessian can be defined on $\mathcal{M}$ in parallel with their well-known counterparts on Euclidean spaces. This geometric perspective enables us to tackle a constrained optimization problem Eq. 1 using methodologies of unconstrained optimization, which becomes particularly beneficial when the constraints (expressed in $\mathcal{M}$ ) appear highly nonlinear and nonconvex.

The optimization problem Eq. 1 is certainly independent of the choice of Riemannian structures on $\mathcal{M}$ ; in fact, all critical points of $f$ on $\mathcal{M}$ are metric independent. From a differential geometric perspective, equipping the manifold with a Riemannian structure and studying the critical points of a generic smooth function is highly reminiscent of the classical Morse theory [27, 33], for which the main interest is to understand the topology of the underlying manifold; the topological information needs to be extracted using tools from differential geometry, but is certainly independent of the choice of Riemannian structures. It is thus natural to inquire the influence of different choices of Riemannian metrics on manifold optimization algorithms, which to our knowledge has never been explored in existing literature. This paper stems from our attempts at understanding the dependence of manifold optimization on Riemannian structure. It turns out that most technical tools for optimization on Riemannian manifolds can be extended to a larger class of metric structures on manifolds, namely, semi-Riemannian structures. Just as a Riemannian metric is a smooth assignment of inner products to tangent spaces, a semi-Riemannian metric smoothly assigns to each tangent space a scalar product, which is a symmetric bilinear form but without the constraint of positive definiteness; our major technical contribution in this paper is an optimization framework built upon the rich differential geometry in such weaker but more general metric structures, of which standard unconstrained optimization on Euclidean spaces and Riemannian manifold optimization are special cases. Though semi-Riemannian geometry has attracted generations of mathematical physicists for its effectiveness in providing space-time model in general relativity [35, 9], to the best of our knowledge, the link with manifold optimization has never been explored.

A different yet strong motivation for investigating optimization problems on semi-Riemannian manifolds arises from the Riemannian geometric interpretation of interior point methods [31, 41]. For a twice differentiable and strongly convex function $f$ defined over an open convex domain $Q$ in an Euclidean space, denote by $\nabla f$ and $\nabla^{2}f$ for the gradient and Hessian of $f$ , respectively. The strong convexity of $f$ ensures $\nabla^{2}f\left(x\right)\succeq 0$ which defines a local inner product $g_{x}\left(\cdot,\cdot\right):T_{x}Q\times T_{x}Q\rightarrow\mathbb{R}$ by

g_{x}\left(v,w\right):=v^{\top}\left[\nabla^{2}f\left(x\right)\right]w,\quad\forall v,w\in T_{x}Q.

With respect to this class of new local inner products, which can be interpreted as turning $Q$ into a Riemannian manifold $\left(Q,g\right)$ , the gradient of $f$ takes the form

\tilde{\nabla}f\left(x\right)=\left[\nabla f\left(x\right)\right]^{-1}\nabla f\left(x\right).

The negative manifold gradient $-\tilde{\nabla}f\left(x\right)=-\left[\nabla f\left(x\right)\right]^{-1}\nabla f\left(x\right)$ coincides with the descent direction $\eta_{x}$ satisfying the Newton’s equation

(2)

\left[\nabla^{2}f\left(x\right)\right]\eta_{x}=-\nabla f\left(x\right)

at $x\in M$ . In other words, the Newton method, which is second order, can be interpreted as a first order method in the Riemannian setting. Such equivalence between first and second order methods under coordinate transformation is also known in other contexts such as natural gradient descent in information geometry; see [40] and the references therein. Extending this geometric picture beyond the relatively well-understood case of strongly convex functions requires understanding optimization on semi-Riemannian manifolds as a first step; we expect the theoretical foundation laid out in this paper will shed light upon gaining deeper geometric insights on the convergence of non-convex optimization algorithms.

The rest of this paper is organized as follows. In Section 2 we provide a brief but self-contained introduction to Riemannian optimization and semi-Riemannian geometry. Section 3 details the algorithmic framework of semi-Riemannian optimization, and proposes semi-Riemannian analogies of the Riemannian steepest descent and conjugate gradient algorithms; the metric independence of some second-order algorithms are also investigated. We specialize the general geometric framework to submanifolds in Section 4, in which we characterize the phenomenon (which does not exist in Riemannian geometry) of degeneracy for induced semi-Riemannian structures, and identify several (nearly) non-degenerate examples to which our general algorithmic framework applies. We illustrate the utility of the proposed framework with several examples in Section 5 and conclude with Section 6. More examples and some omitted proofs are deferred to the Supplementary Materials.

2 Preliminaries

2.1 Notations

We denote a smooth manifold using $M$ or $\mathcal{M}$ . Lower case letters such as $a,b,c$ or $x,y,z$ will be used to denote vectors or points on a manifold, depending on the context. We write $T\!M$ and $T^{*}M$ for the tangent and cotangent bundles of $M$ , respectively. For a fibre bundle $E$ , $\Gamma\left(E\right)$ will be used to denote smooth sections of this bundle. Unless otherwise specified, we use $\left\langle\cdot,\cdot\right\rangle$ or $g\in\Gamma\left(T^{*}\!M\otimes T^{*}\!M\right)$ to denote a semi-Riemannian metric. For a smooth function $f$ , notations $Df$ and $D^{2}f$ stand for semi-Riemannian gradients and Hessians, respectively, when they exist; $\nabla f$ and $\nabla^{2}f$ will be reserved for Riemannian gradients and Hessians, respectively. More generally, $D$ will be used to denote the Levi-Civita connection on the semi-Riemannian manifold, while $\nabla$ denotes for the Levi-Civita connection on a Riemannian manifold. We denote anti-symmetric (i.e. skew-symmetric) matrices and symmetric matrices of size $n$ -by- $n$ with $\mathrm{Skew}\left(\mathbb{R}^{n\times n}\right)$ and $\mathrm{Sym}\left(\mathbb{R}^{n\times n}\right)$ , respectively. For a vector space $V$ , $\bigwedge^{k}V$ and $S^{k}V$ stands for alternated or symmetrized $k$ copies of $V$ , respectively.

2.2 Riemannian Manifold Optimization

As stated at the beginning of this paper, manifold optimization is a type of nonlinear optimization problems taking the form of Eq. 1. The methodology of Riemannian optimization is to equip the smooth manifold $M$ with a Riemannian metric structure, i.e. positive definite bilinear forms $\left\langle\cdot,\cdot\right\rangle$ on the tangent spaces of $\mathcal{M}$ that varies smoothly on the manifold [28, 10, 38]. The differentiable structure on $\mathcal{M}$ facilitates generalizing the concept of differentiable functions from Euclidean spaces to these nonlinear objects; in particular, notions such as gradient and Hessian are available on Riemannian manifolds and play the same role as their Euclidean space counterparts.

The algorithmic framework of Riemannian manifold optimization has been established and investigated in a sequence of works [13, 44, 12, 2]. These algorithms typically builds upon the concepts of gradient, the first-order differential operator $\nabla:C^{1}\left(\mathcal{M}\right)\rightarrow\Gamma\left(TM\right)$ defined by

\left\langle\nabla f\left(x\right),X\right\rangle=Xf\left(x\right)\quad\forall X\in T_{x}M,

and Hessian, the covariant derivative of the gradient operator defined by

\nabla^{2}f\left(X,Y\right)=XYf-\left(\nabla_{X}Y\right)f\quad\forall X,Y\in\Gamma\left(TM\right)

as well as a retraction $\mathrm{Retr}_{x}:T_{x}\mathcal{M}\rightarrow\mathcal{M}$ from each tangent plane $T_{x}\mathcal{M}$ to the manifold $\mathcal{M}$ such that (1) $\mathrm{Retr}_{x}\left(0\right)=x$ for all $x\in\mathcal{M}$ , and (2) the differential map of $\mathrm{Retr}_{x}$ is identify at $0\in T_{x}\mathcal{M}$ . On Riemannian manifolds it is natural to use the exponential mapping as the retraction, but any general map from tangent spaces to the Riemannian manifold suffices; in fact, the only requirement implied by conditions (1) and (2) is that the retraction map coincides with the exponential map up to the first order.

The optimality conditions for unconstrained optimization on Euclidean spaces in terms of gradients and Hessians can be naturally translated into the Riemannian manifold setting:

Proposition 2.1 ([8], Proposition 1.1).

A local optimum $x\in\mathcal{M}$ of Problem Eq. 1 satisfies the following necessary conditions:

(i)

$\nabla f\left(x\right)=0$ if $f:\mathcal{M}\rightarrow\mathbb{R}$ is first-order differentiable;
(ii)

$\nabla f\left(x\right)=0$ and $\nabla^{2}f\left(x\right)\succeq 0$ if $f:\mathcal{M}\rightarrow\mathbb{R}$ is second-order differentiable.

Following [8], we call $x\in\mathcal{M}$ satisfying condition (i) in Proposition 2.1 a (first-order) critical point or stationary point, and a point satisfying condition (i) in Proposition 2.1 a second-order critical point.

The heart of Riemannian manifold optimization is to transform the nonlinear constrained optimization problem Eq. 1 into an unconstrained problem on the manifold $\mathcal{M}$ . Following this methodology, classical unconstrained optimization algorithms such as gradient descent, conjugate gradients, Newton’s method, and trust region methods have been generalized to Riemannian manifolds; see [2, Chapter 8]. For instance, the dynamics of the iterates $x_{0},x_{1},\cdots,x_{k},\cdots$ generated by gradient descent algorithm on Riemannian manifolds essentially replaces the descent step $x_{k+1}=x_{k}-\nabla f\left(x_{k}\right)$ with its Riemannian counterpart $x_{k+1}=\mathrm{Retr}_{x_{k}}\left(-\nabla f\left(x_{k}\right)\right)$ . Other differential geometric objects such as parallel-transport, Hessian, and curvature render themselves naturally en route to adapting other unconstrained optimization algorithms to the manifold setting. We refer interested readers to [2] for more details.

2.3 Semi-Riemannian Geometry

Semi-Riemannian geometry differs from Riemannian geometry in that the bilinear form equipped on each tangent space can be indefinite. Classical examples include Lorentzian spaces and De Sitter spaces in general relativity; see e.g. [35, 9]. Although one may think of Riemannian geometry as a special case of semi-Riemannian geometry as all Riemannian metric tensors are automatically semi-Riemannian, the existence of a semi-Riemannian metric with nontrivial index (see definition below) actually imposes additional constraints on the tangent bundle of the manifold and is thus often more restrictive—the tangent bundle should admit a non-trivial splitting into the direct sum of “positive definite” and “negative definite” sub-bundles. Nevertheless, such metric structures have found vast applications in and beyond understanding the geometry of spacetime, for instance, in the study of the regularity of optimal transport maps [21, 20, 3].

Definition 2.2.

A symmetric bilinear form $\left\langle\cdot,\cdot\right\rangle:V\times V\rightarrow\mathbb{R}$ on a vector space $V$ is non-degenerate if

\left\langle v,w\right\rangle=0\,\,\textrm{for all}\,\,w\in V\quad\Leftrightarrow\quad v=0.

The index $\nu\in\mathbb{Z}_{\geq 0}$ of a symmetric bilinear form on $V$ is the dimension of the maximum negative definite subspace of $V$ ; similarly, we denote $\pi\in\mathbb{Z}_{\geq 0}$ for the dimension of the maximum positive definite subspace of $V$ . A scalar product on a vector space $V$ is a non-degenerate symmetric bilinear form on $V$ . The signature of a scalar product on $V$ with index $\nu$ is a vector of length $\mathrm{dim}\left(V\right)$ with the first $\nu$ entries equaling $-1$ and the rest of entries equaling $1$ . A subspace $W\subset V$ is said to be non-degenerate if the restriction of the scalar product to $W$ is non-degenerate.

The main difference between a scalar product and an inner product is that the former needs not possess positive definiteness. The main issue with this lack of positivity is the consequent lack of a meaningful definition for “orthogonality” — a vector subspace may well be the orthogonal complement of itself: consider for example the subspace spanned by $\left(1,1\right)$ in $\mathbb{R}^{2}$ equipped with a scalar product with signature $\left(-,+\right)$ . The same example illustrates that the property of non-degeneracy is not always inheritable by subspaces. Nonetheless, the following is true:

Lemma 2.3 (Chapter 2, Lemma 23, [35]).

A subspace $W$ of a vector space $V$ is non-degenerate if and only if $V=W\oplus W^{\perp}$ .

Definition 2.4 (Semi-Riemannian Manifolds).

A metric tensor $g$ on a smooth manifold $M$ is a symmetric non-degenerate $\left(0,2\right)$ tensor field on $M$ of constant index. A semi-Riemannian manifold is a smooth manifold $M$ equipped with a metric tensor.

Example 2.5 (Minkowski Spaces $\mathbb{R}^{p,q}$ ).

Consider the Euclidean space $\mathbb{R}^{n}$ and denote $\operatorname{I}_{p,q}$ for the $n$ -by- $n$ diagonal matrix with the first $p$ diagonal entries equaling $-1$ and the rest $q=n-p$ entries equaling $1$ , where $0\leq p\leq n$ and $n\geq 1$ . For arbitrary $u,w\in\mathbb{R}^{n}$ , define the bilinear form

\left\langle u,v\right\rangle:=u^{\top}\operatorname{I}_{p,q}w.

It is straightforward to verify that this bilinear form is nondegenerate on $\mathbb{R}^{n}$ , and that such defined $\left(\mathbb{R},\left\langle\cdot,\cdot\right\rangle\right)$ is a semi-Riemannian manifold. This space is known as the Minkowski space of signature $\left(p,q\right)$ .

Example 2.6.

Consider the vector space of matrices $\mathbb{R}^{n\times n}$ , where $n\in\mathbb{N}$ and $n=p+q$ , $p,q\in\mathbb{N}$ . Define a bilinear form on $\mathbb{R}^{n\times n}$ by

\left\langle A,B\right\rangle:=\mathrm{Tr}\left(A^{\top}\operatorname{I}_{p,q}B\right),\quad\forall A,B\in\mathbb{R}^{n\times n}.

This bilinear form is non-degenerate on $\mathbb{R}^{n\times n}$ , because for any $A,B\in\mathbb{R}^{n\times n}$ we have

\mathrm{Tr}\left(A^{\top}\operatorname{I}_{p,q}B\right)=\mathrm{vec}\left(A\right)^{\top}\left(I_{n}\otimes I_{p,q}\right)\mathrm{vec}\left(B\right)

where $I_{n}$ is the identity matrix of size $n$ -by- $n$ , $\otimes$ denotes for the Kronecker product, and $\mathrm{vec}:\mathbb{R}^{n\times n}\rightarrow\mathbb{R}^{n^{2}}$ is the vectorization operator that vertically stacks the columns of a matrix in $\mathbb{R}^{n\times n}$ . The non-degeneracy then follows from Example 2.5. This example gives rise to a semi-Riemannian structure for matrices in $\mathbb{R}^{n\times n}$ .

The non-degeneracy of the semi-Riemannian metric tensor ensures that most classical constructions on Riemannian manifolds have their analogies on a semi-Riemannian manifold. Most fundamentally, the “miracle of Riemannian geometry” — the existence and uniqueness of a canonical connection — is beheld on semi-Riemannian manifolds as well. Quoting [35, Theorem 11], on a semi-Riemannian manifold $M$ there is a unique connection $D:\Gamma\left(M,TM\right)\rightarrow\Gamma\left(M,T^{\otimes 2}M\right)$ such that

(3)

\left[V,W\right]=D_{V}W-D_{W}V

and

(4)

X\left\langle V,W\right\rangle=\left\langle D_{X}V,W\right\rangle+\left\langle V,D_{X}W\right\rangle

for all $X,V,W\in\Gamma\left(M,TM\right)$ . This connection is called the Levi-Civita connection of $M$ and is characterized by the Koszul formula

(5)		$\displaystyle 2\left\langle D_{V}W,X\right\rangle=$	$\displaystyle V\left\langle W,X\right\rangle+W\left\langle X,V\right\rangle-X\left\langle V,W\right\rangle$
(5)			$\displaystyle-\left\langle V,\left[W,X\right]\right\rangle+\left\langle W,\left[X,V\right]\right\rangle+\left\langle X,\left[V,W\right]\right\rangle\quad\forall X,V,W\in\Gamma\left(M,TM\right).$

Geodesics, parallel-transport, and curvature of $M$ can be defined via the Levi-Civita connection on $M$ in an entirely analogous manner as on Riemannian manifolds.

Differential operators can be defined on semi-Riemannian manifolds much the same way as on Riemannian manifolds. For any $f\in C^{1}\left(M\right)$ , where $M$ is a semi-Riemannian manifold, the gradient of $f$ , denoted as $Df\in\Gamma\left(M,TM\right)$ , is defined by the equality (c.f. [35, Definition 47])

(6)

\left\langle Df,X\right\rangle=Xf,\quad\forall X\in\Gamma\left(M,TM\right).

The Hessian of $f\in C^{2}\left(M\right)$ can be similarly defined, also similar to the Riemannian case ([35, Definition 48, Lemma 49]), by $D^{2}f=D\left(Df\right)\in\Gamma\left(M,T^{*}M\otimes T^{*}M\right)$ , or equivalently

(7)

D^{2}f\left(X,Y\right)=XYf-\left(D_{X}Y\right)f,\quad\forall X,Y\in\Gamma\left(M,TM\right).

Since the Levi-Civita connection on $M$ is torsion-free, $\nabla^{2}f$ is a symmetric $\left(0,2\right)$ tensor field on $M$ , i.e.,

D^{2}f\left(X,Y\right)=D^{2}f\left(Y,X\right),\quad\forall X,Y\in\Gamma\left(M,TM\right).

One way to compare the semi-Riemannian and Riemannian gradients and Hessians, when both metric structures exist on the same smooth manifold, is through their local coordinate expressions. In fact, the local coordinate expressions for the two types (Riemannian/semi-Riemannian) of differential operators can be unified as follows. Let $\left\{x^{1},\cdots,x^{n}\right\}$ be a local coordinate system around an arbitrary point $x\in\mathcal{M}$ , and denote $g_{ij}$ and $h_{ij}$ for the components of the Riemannian and semi-Riemannian metric tensors, respectively; the Christoffel symbols will be denoted as $\phantom{}{}^{g}\Gamma_{ij}^{k}$ and $\phantom{}{}^{h}\Gamma_{ij}^{k}$ , respectively. Direct computation reveals

(8)			$\displaystyle\nabla f=g^{ij}\partial_{j}f\partial_{i},\qquad\nabla^{2}f=\left(\partial_{ij}^{2}f-\phantom{}^{g}\Gamma_{ij}^{k}\partial_{k}f\right)\mathrm{d}x^{i}\otimes\mathrm{d}x^{j},$
(8)			$\displaystyle Df=h^{ij}\partial_{j}f\partial_{i},\qquad D^{2}f=\left(\partial_{ij}^{2}f-\phantom{}^{h}\Gamma_{ij}^{k}\partial_{k}f\right)\mathrm{d}x^{i}\otimes\mathrm{d}x^{j}.$

Using the music isomorphism induced from the (Riemannian or semi-Riemannian) metric, the Hessians can be cast in the form of $\left(2,0\right)$ -tensors on $\Gamma\left(T\!M\otimes T\!M\right)$ as

	$\displaystyle\left(\nabla^{2}f\right)^{\sharp}$	$\displaystyle=g^{i\ell}g^{jm}\left(\partial_{ij}^{2}f-\phantom{}^{g}\Gamma_{ij}^{k}\partial_{k}f\right)\partial_{i}\otimes\partial_{m},$
	$\displaystyle\left(D^{2}f\right)^{\sharp}$	$\displaystyle=h^{i\ell}h^{jm}\left(\partial_{ij}^{2}f-\phantom{}^{h}\Gamma_{ij}^{k}\partial_{k}f\right)\partial_{i}\otimes\partial_{m}.$

Remark 2.7.

Notably, for any $x\in\mathcal{M}$ , if we compute the Hessians $D^{2}f\left(x\right)$ and $\nabla^{2}f\left(x\right)$ in the corresponding geodesic normal coordinates centered at $x$ , Eq. 8 implies that the two Hessians take the same coordinate form $\left(\partial_{ij}^{2}f\right)_{1\leq i,j\leq n}$ since both $\phantom{}{}^{g}\Gamma_{ij}^{k}$ and $\phantom{}{}^{h}\Gamma_{ij}^{k}$ vanish at $x$ . For instance, $\mathbb{R}^{n}$ has the same geodesics under the Euclidean or Lorentzian metric (straight lines), and the standard coordinate system serves as geodesic normal coordinate system for both metrics; see Example 2.10. In particular, the notion of geodesic convexity [39, 46] is equivalent for the two different of metrics; this equivalence is not completely trivial by the well-known first and second order characterization (see e.g. [46, Theorem 5.1] and [46, Theorem 6.1]) since geodesics need not be the same under different metrics.

Proposition 2.8.

On a smooth manifold $\mathcal{M}$ admitting two different Riemannian or semi-Riemannian structures, an optimization problem is geodesic convex with respect to one metric if and only if it is also geodesic convex with respect to another.

Proof 2.9.

Denote the two metric tensors on $\mathcal{M}$ as $g$ and $h$ , respectively. Both $g$ and $h$ can be Riemannian or semi-Riemannian, respectively or simultaneously. For any $x\in\mathcal{M}$ , let $x^{1},\cdots,x^{n}$ and $y^{1},\cdots,y^{n}$ be the geodesic coordinates around $x$ with respect to $g$ and $h$ , respectively. Denote $J=\left(\partial y_{j}/\partial x_{i}\right)_{1\leq i,j\leq n}$ for the Jacobian of the coordinate transformation between the two normal coordinate systems. The coordinate expressions of a tangent vector $v\in T_{x}\mathcal{M}$ in the two normal coordinate systems are linked by (Einstein summation convention adopted)

v=v^{i}\partial/\partial x_{i}=\tilde{v}^{j}\partial/\partial y_{j}\quad\Leftrightarrow\quad v^{i}=\tilde{v}^{j}\partial x_{i}/\partial y_{j}.

Therefore

		$\displaystyle\left[\nabla^{2}f\left(x\right)\right]\left(v,v\right)\geq 0\quad\forall v\in T_{x}\mathcal{M}$
	$\displaystyle\Leftrightarrow$	$\displaystyle v^{i}v^{j}\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}\left(x\right)\geq 0\quad\forall v^{1},\cdots,v^{n}\in\mathbb{R}$
	$\displaystyle\Leftrightarrow$	$\displaystyle\tilde{v}^{\ell}\frac{\partial x_{i}}{\partial y_{\ell}}\tilde{v}^{m}\frac{\partial x_{j}}{\partial y_{m}}\frac{\partial^{2}f}{\partial y_{i}\partial y_{j}}\geq 0\quad\forall\tilde{v}^{1},\cdots,\tilde{v}^{n}\in\mathbb{R}$
	$\displaystyle\Leftrightarrow$	$\displaystyle\left[D^{2}f\left(x\right)\right]\left(v,v\right)\geq 0\quad\forall v\in T_{x}\mathcal{M}.$

which establishes the desired equivalence.

Example 2.10 (Gradient and Hessian in Minkowski Spaces).

Consider the Euclidean space $\mathbb{R}^{n}$ . Denote $I_{p,q}\in\mathbb{R}^{n\times n}$ for the $n$ -by- $n$ diagonal matrix with the first $p$ diagonal entries equaling $-1$ and the rest $q=n-p$ diagonal entries equaling $1$ . We compute and compare in this example the gradients and Hessians of differentiable functions on $\mathbb{R}^{n}$ . We take the Riemannian metric as the standard Euclidean metric, and the semi-Riemannian metric given by $\operatorname{I}_{p,q}$ . For any $f\in C^{2}\left(M\right)$ , the gradient of $f$ is determined by

	$\displaystyle\left(Df\right)^{\top}\operatorname{I}_{p,q}X$	$\displaystyle=Xf=\left(\nabla f\right)^{\top}X,\quad\forall X\in\Gamma\left(\mathbb{R}^{n},\mathbb{R}^{n}\right)$
		$\displaystyle\Leftrightarrow Df=\operatorname{I}_{p,q}\nabla f\qquad\textrm{where $\nabla f=\left(\partial_{1}f,\cdots,\partial_{n}f\right)\in\mathbb{R}^{n}$.}$

Furthermore, since in this case the semi-Riemannian metric tensor is constant on $\mathbb{R}^{n}$ , the Christoffel symbol vanishes (c.f. [35, Chap 3. Proposition 13 and Lemma 14]), and thus $D_{X}Df=\operatorname{I}_{p,q}\nabla_{X}\nabla f=\operatorname{I}_{p,q}\left(\nabla^{2}f\right)X$ for all $X\in\Gamma\left(\mathbb{R}^{n},\mathbb{R}^{n}\right)$ , where

\nabla^{2}f=\left(\partial_{i}\partial_{j}f\right)_{1\leq i,j\leq n}\in\mathbb{R}^{n\times n}.

By the definition of Hessian, for all $X,Y\in\Gamma\left(\mathbb{R}^{n},\mathbb{R}^{n}\right)$ we have

D^{2}f\left(X,Y\right)=\left\langle D_{X}Df,Y\right\rangle=Y^{\top}\operatorname{I}_{p,q}\cdot\operatorname{I}_{p,q}\left(\nabla^{2}f\right)X=Y^{\top}\left(\nabla^{2}f\right)X

from which we deduce the equality $D^{2}f=\nabla^{2}f$ . In fact, the equivalence of the two Hessians also follows directly from Remark 2.7, since the geodesics under the Riemannian and semi-Riemannian metrics coincide in this example (see e.g. [35, Chapter 3 Example 25]). In particular, the equivalence between the two types of geodesics and Hessians imply the equivalence of geodesic convexity for the two metrics.

3 Semi-Riemannian Optimization Framework

This section introduces the algorithmic framework of semi-Riemannian optimization. To begin with, we point out that the first- and second-order necessary conditions for optimality in unconstrained optimization and Riemannian optimization can be directly generalized to semi-Riemannian manifolds. We then generalize several Riemannian manifold optimization algorithms to their semi-Riemannian counterparts, and illustrate the difference with a few numerical examples. We end this section by showing global and local convergence results for semi-Riemannian optimization.

3.1 Optimality Conditions

The following Proposition 3.1 should be considered as the semi-Riemannian analogy of the optimality conditions Proposition 2.1 .

Proposition 3.1 (Semi-Riemannian First- and Second-Order Necessary Conditions for Optimality).

Let $\mathcal{M}$ be a semi-Riemannian manifold. A local optimum $x\in\mathcal{M}$ of Problem Eq. 1 satisfies the following necessary conditions:

(i)

$Df\left(x\right)=0$ if $f:\mathcal{M}\rightarrow\mathbb{R}$ is first-order differentiable;
(ii)

$Df\left(x\right)=0$ and $D^{2}f\left(x\right)\succeq 0$ if $f:\mathcal{M}\rightarrow\mathbb{R}$ is second-order differentiable.

Proof 3.2.

(i)

If $x\in\mathcal{M}$ is a local optimum of Eq. 1, then for any $X\in\Gamma\left(TM\right)$ we have $Xf\left(x\right)=0$ , which, by definition Eq. 6 and the non-degeneracy of the semi-Riemannian metric, implies that $Df\left(x\right)=0$ .

(ii)

If $x\in\mathcal{M}$ is a local optimum of Eq. 1, then there exists a local neighborhood $U\subset\mathcal{M}$ of $x$ such that $f\left(y\right)\geq f\left(x\right)$ for all $y\in U$ . Without loss of generality we can assume that $U$ is sufficiently small so as to be geodesically convex (see e.g. [10, §3.4]). Denote $\gamma:\left[-1,1\right]\rightarrow U$ for a constant-speed geodesic segment connecting $\gamma\left(0\right)=x$ to $\gamma\left(1\right)=y$ that lies entirely in $U$ . The one-variable function $t\mapsto f\circ\gamma\left(t\right)$ admits Taylor expansion

	$\displaystyle f\left(y\right)$	$\displaystyle=f\circ\gamma\left(1\right)=f\circ\gamma\left(0\right)+\left(f\circ\gamma\right)^{\prime}\left(0\right)+\frac{1}{2}\left(f\circ\gamma\right)^{\prime\prime}\left(\xi\right)$
		$\displaystyle=f\left(x\right)+\left\langle Df\left(x\right),\gamma^{\prime}\left(0\right)\right\rangle+\frac{1}{2}D_{\gamma^{\prime}\left(\xi\right)}\left\langle Df\left(\gamma\left(\xi\right)\right),\gamma^{\prime}\left(\xi\right)\right\rangle$
		$\displaystyle=f\left(x\right)+\frac{1}{2}\left[D^{2}f\left(\gamma\left(\xi\right)\right)\right]\left(\gamma^{\prime}\left(\xi\right),\gamma^{\prime}\left(\xi\right)\right)$

where the last equality used $Df\left(x\right)=0$ . Letting $y\rightarrow x$ on $\mathcal{M}$ , the smoothness of $D^{2}f$ ensures that

D^{2}f\left(x\right)\left[V,V\right]\geq 0\qquad\forall V\in T_{x}\mathcal{M}

which establishes $D^{2}f\left(x\right)\succeq 0$ .

The formal similarity between Proposition 3.1 and Proposition 2.1 is not entirely surprising. As can be seen from the proofs, both optimality conditions are based on geometric interpretations of the same Taylor expansion; the metrics affect the specific forms of the gradient and Hessian, but the optimality conditions are essentially derived from the Taylor expansions only. Completely parallel to the Riemannian setting, we can also translate the second-order sufficient conditions [26, §7.3] into the semi-Riemannian setting without much difficulty. The proof essentially follows [26, §7.3 Proposition 3], with the Taylor expansion replaced with the expansion along geodesics in Proposition 3.1 (ii); we omit the proof since it is straightforward, but document the result in Proposition 3.3 below for future reference. Recall from [26, §7.1] that $x\in\mathcal{M}$ is a strict relative minimum point of $f$ on $\mathcal{M}$ if there is a local neighborhood of $x$ on $\mathcal{M}$ such that $f\left(y\right)>f\left(x\right)$ for all $y\in U\backslash\left\{x\right\}$ .

Proposition 3.3 (Semi-Riemannian Second-Order Sufficient Conditions).

Let $f$ be a second differentiable function on a semi-Riemannian manifold $\mathcal{M}$ , and $x\in\mathcal{M}$ is a an interior point. If $Df\left(x\right)=0$ and $D^{2}f\left(x\right)\succ 0$ , then $x$ is a strict relative minimum point of $f$ .

The formal similarity between the Riemannian and semi-Riemannian optimality conditions indicates that it might be possible to transfer many technologies in manifold optimization from the Riemannian to the semi-Riemannian setting. For instance, the equivalence of the first-order necessary condition implies that, in order to search for a first-order stationary point, on a semi-Riemannian manifold we should look for points at which the semi-Riemannian gradient $Df$ vanishes, just like in the Riemannian realm we look for points at which the Riemannian gradient $\nabla f$ vanishes. However, extra care has to be taken regarding the influence different metric structures have on the induced topology of the underlying manifold. For Riemannian manifolds, it is straightforward to check that the induced topology coincides with the original topology of the underlying manifold (see e.g. [10, Chap 7 Proposition 2.6]), whereas the “topology” induced by a semi-Riemannian structure is generally quite pathological — for instance, two distinct points connected by a light-like geodesic (a geodesic along which all tangent vectors are null vectors (c.f. Definition 4.1)) has zero distance. An exemplary consequence is that, in search of a first-order stationary point, we shouldn’t be looking for points at which $\left\|Df\right\|^{2}$ vanishes since this does not imply $Df=0$ .

3.2 Determining the “Steepest Descent Direction”

As long as gradients, Hessians, retractions, and parallel-transports can be properly defined, one might think there exists no essential difficulty in generalizing any Riemannian optimization algorithms to the semi-Riemannian setup, with the Riemannian geometric quantities replaced with their semi-Riemannian counterparts, mutatis mutandis. It is tempting to apply this methodology to all standard manifold optimization algorithms, including but not limited to first-order methods such as steepest descent, conjugate gradient descent, and quasi-Newton methods, or second-order methods such as Newton’s method and trust region methods. We discuss in this subsection how to determine a proper descent direction for steepest-descent-type algorithms on a semi-Riemannian manifold. Some exemplary first- and second-order methods will be discussed in the next subsection.

As one of the prototypical first-order optimization algorithms, gradient descent is known for its simplicity yet surprisingly powerful theoretical guarantees under mild technical assumptions. A plausible “Semi-Riemannian Gradient Descent” algorithm that naïvely follows the paradigm of Riemannian gradient descent could be designed as simply replacing the Riemannian gradient $\nabla f$ with the semi-Riemannian gradient $Df$ defined in Eq. 6, as listed in Algorithm 1. Of course, a key step in Algorithm 1 is to determine the descent direction $\eta_{k}$ in each iteration. However, while negative gradient is an obvious choice in Riemannian manifold optimization, the “steepest descent direction” is a slightly more subtle notion in semi-Riemannian geometry, as will be demonstrated shortly in this section.

A first difficulty with replacing $-\nabla f\left(x\right)$ by $-Df\left(x\right)$ is that $-Df\left(x\right)$ needs not be a descent direction at all: consider, for instance, an illustrative example of optimization in the Minkowski space (Euclidean space equipped with the standard semi-Riemannian metric): the first order Taylor expansion at $x$ gives for any small $t>0$

(9)

f\left(x-tDf\left(x\right)\right)\approx f\left(x\right)-t\left\langle Df\left(x\right),Df\left(x\right)\right\rangle

but in the semi-Riemannian setting the scalar product term $\left\langle Df\left(x\right),Df\left(x\right)\right\rangle$ may well be negative, unlike the Riemannian case. In order for the value of the objective function to decrease (at least in the first order), we have to pick the descent direction to be either $Df\left(x\right)$ or $-Df\left(x\right)$ , whichever makes $\left\langle Df\left(x\right),Df\left(x\right)\right\rangle>0$ .

Though the quick fix by replacing $Df\left(x\right)$ with $\pm Df\left(x\right)$ would work generically in many problems of practical interest, a second, and more serious issue with choosing $\pm Df\left(x\right)$ as the descent direction lies inherently at the indefiniteness of the metric tensor. For standard gradient descent algorithms (e.g. on Euclidean spaces with standard metric, or more generally on Riemannian manifolds), the algorithm terminates after $\left\|\nabla f\right\|$ becomes smaller than a predefined threshold; for norms induced from positive definite metric tensors, $\left\|\nabla f\right\|\approx 0$ is equivalent to characterizing $\nabla f\approx 0$ , implying that the sequence $\left\{x_{k}\mid k=0,1,\cdots\right\}$ is truly approaching a first order stationary point. This intuition breaks down for indefinite metric tensors as $\left\|Df\right\|\approx 0$ no longer implies the proximity between $Df$ and $0$ . Even though one can fix this ill-defined termination condition by introducing an auxiliary Riemannian metric (which always exists on a Riemannian manifold), when $Df$ is a null vector (i.e. $\left\|Df\right\|=0$ , see Definition 4.1), the gradient algorithm loses the first order decrease in the objection function value (see Eq. 9); the validity of the algorithm then relies upon second-order information, with which we lose the benefits of first-order methods. As a concrete example, consider the unconstrained optimization problem on the Minkowski space $\mathbb{R}^{2}$ equipped with a metric of signature $\left(-1,1\right)$ :

\min_{x,y\in\mathbb{R}}f\left(x,y\right)=\frac{1}{2}\left(x-y\right)^{2}.

Recall from Example 2.10 that

Df\left(x,y\right)=\operatorname{I}_{1,1}\nabla f\left(x,y\right)=-\left(x-y\right)\cdot\left(1,1\right)^{\top}

which is a direction parallel to the isolines of the objective function $f$ . Thus the semi-Riemannian gradient descent will never decrease the objective function value.

Algorithm 1 Semi-Riemannian Steepest Descent

1:Manifold

M

, semi-Riemannian metric

\left\langle\cdot,\cdot\right\rangle

, objective function

f

, retraction

\mathrm{Retr}:T\!M\rightarrow M

, initial value

x_{0}\in M

, parameters for Linesearch, gradient

Df

x_{0}\leftarrow

Initiate

k\leftarrow 0

4:while not converge do

\eta\leftarrow

FindDescentDirection

\left(x_{k},M,Df\left(x_{k}\right)\right)

\triangleright

c.f. Algorithm 4

0<t_{k}\leftarrow

LineSearch

\left(f,x_{k},\eta_{k}\right)

\triangleright

t_{k}

is the Armijo step size

7: Choose

x_{k+1}

such that

\triangleright

c\in\left(0,1\right)

is a parameter

f\left(x_{k}\right)-f\left(x_{k+1}\right)>c\left[f\left(x_{k}\right)-f\left(\mathrm{Retr}_{x_{k}}\left(t_{k}\eta_{k}\right)\right)\right]

k\leftarrow k+1

9:end while

10:return Sequence of iterates

\left\{x_{k}\right\}

To rectify these issues, it is necessary to revisit the motivating, geometric interpretation of the negative gradient direction as the direction of “steepest descent,” i.e. for any Riemannian manifold $\left(M,g\right)$ and function $f$ on $M$ differentiable at $x\in M$ , we know from vector arithmetic that

(10)

-\frac{\nabla f\left(x\right)}{\sqrt{g\left(\nabla f\left(x\right),\nabla f\left(x\right)\right)}}=\operatorname*{argmin}_{V\in T_{x}M\atop g\left(V,V\right)=1}g\left(V,\nabla f\left(x\right)\right)=\operatorname*{argmin}_{V\in T_{x}M\atop g\left(V,V\right)=1}Vf\left(x\right).

In the semi-Riemannian setting, assuming $M$ is equipped with a semi-Riemannian metric $h$ , we can also set the descent direction leading to the steepest decrease of the objective function value. It is not hard to see that in general

(11)

\pm\frac{Df\left(x\right)}{\sqrt{\left|h\left(Df\left(x\right),Df\left(x\right)\right)\right|}}\neq\operatorname*{argmin}_{V\in T_{x}M\atop\left|h\left(V,V\right)\right|=1}h\left(V,\nabla f\left(x\right)\right)=\operatorname*{argmin}_{V\in T_{x}M\atop\left|h\left(V,V\right)\right|=1}Vf\left(x\right).

In fact, in both versions the search for the “steepest descent direction” is guided by making the directional derivative $Vf\left(x\right)$ as negative as possible, but constrained on different unit spheres. The precise relation between the two steepest descent directions is not readily visible, for the two unit spheres could differ drastically in geometry. In fact, for cases in which the unit ball $\left\{v\in T_{x}M\mid\left|h\left(v,v\right)\right|=1\right\}$ is noncompact, the “steepest descent direction” so defined may not even exist.

Example 3.4.

Consider the optimization problem over the Minkowski space $\mathbb{R}^{1,1}$ equipped with a metric of signature $\left(-,+\right)$

\min_{x,y\in\mathbb{R}}f\left(x,y\right)=\frac{1}{2}\left[x^{2}+\left(y+1\right)^{2}\right].

At $\left(x,y\right)=\left(0,0\right)$ , recall from Example 2.10 that $\nabla f\left(0,0\right)=\left(0,1\right)^{\top}=Df\left(0,0\right)$ . Over the unit ball $\left\{\left(u,v\right)^{\top}\in\mathbb{R}^{2}\mid u^{2}-v^{2}=\pm 1\right\}\subset T_{\left(0,0\right)}\mathbb{R}^{2}$ under this Lorentzian metric, the scalar product $\left\langle Df\left(0,0\right),\left(u,v\right)^{\top}\right\rangle=v\rightarrow-\infty$ as $\left(u,v\right)\rightarrow\left(\infty,-\infty\right)$ . Even worse, since the scalar product approaches $-\infty$ , it is not possible to find a descent direction $\eta$ with $\left\langle Df\left(0,0\right)\right\rangle\geq\gamma\min_{V\in T_{x}M,\,\left|\,\left\langle V,V\right\rangle\right|=1}Vf\left(0,0\right)$ for some pre-set threshold $\gamma>0$ .

One way to fix this non-compactness issue is to restrict the candidate tangent vectors $V$ in the minimization of $Vf\left(x\right)$ to lie in a compact subset of the tangent space $T\!M$ . For instance, one can consider the unit sphere in $T\!M$ under a Riemannian metric. Comparing the right hand sides of Eq. 10 and Eq. 11, descent directions determined in this manner will be the negative gradient direction under the Riemannian metric, thus in general has nothing to do with the semi-Riemannian metric; moreover, if a Riemannian metric has to be defined laboriously in addition to the semi-Riemannian one, in principle we can already employ well-established, fully-functioning Riemannian optimization techniques, thus bypassing the semi-Riemannian setup entirely. While this argument might well render first-order semi-Riemannian optimization futile, we emphasize here that one can define steepest descent directions with the aid of “Riemannian structures” that arise naturally from the semi-Riemannian structure, and thus there is no need to specify a separate Riemannian structure in parallel to the semi-Riemannian one, though this affiliated “Riemannian structure” is highly local.

The key observation here is that one does not need to consistently specify a Riemannian structure over the entire manifold, if the only goal is to find one steepest descent direction in that tangent space — in other words, when we search for the steepest descent direction in the tangent space $T_{x}M$ of a semi-Riemannian manifold $M$ , it suffices to specify a Riemannian structure locally around $x$ , or more extremely, only on the tangent space $T_{x}M$ , in order for the “steepest descent direction” to be well-defined over a compact subset of $T_{x}M$ . These local inner products do not have to “patch together” to give rise to a globally defined Riemannian structure. A very handy way to find local inner products is through the help of geodesic normal coordinates that reduce the local calculation to the Minkowski spaces. For any $x\in M$ , there is a normal neighborhood $U\subset M$ containing $x$ such that the exponential map $\mathrm{exp}_{x}:T_{x}M\rightarrow M$ is a diffeomorphism when restricted to $U$ , and one can pick an orthonormal basis (with respect to the semi-Riemannian metric on $M$ ), denoted as $\left\{e_{1},\cdots,e_{n}\right\}$ , such that $\left\langle e_{i},e_{j}\right\rangle_{x}=\delta_{ij}\epsilon_{j}$ , where $1\leq i,j\leq n$ , $n=\mathrm{dim}\left(M\right)$ , $\delta_{ij}$ are the Kronecker delta’s, and $\epsilon_{j}\in\left\{\pm 1\right\}$ . Without loss of generality, assume $M$ is a semi-Riemannian manifold of order $p$ , where $0\leq p\leq n$ , and that $\epsilon_{1}=\cdots=\epsilon_{p}=-1$ , $\epsilon_{p+1}=\cdots=\epsilon_{n}=1$ . The normal coordinates of any $y\in U$ are determined by the coefficients of $\exp_{x}^{-1}y\in T_{x}M$ with respect to the orthonormal basis $\left\{e_{1},\cdots,e_{n}\right\}$ . It is straightforward (see [35, Proposition 33]) to verify that

g_{ij}\left(x\right)=\delta_{ij}\epsilon_{j},\quad\Gamma_{ij}^{k}\left(x\right)=0\qquad\forall 1\leq i,j,k\leq n

where $\left\{g_{ij}\right\}$ denotes the semi-Riemannian metric tensor components and $\left\{\Gamma_{ij}^{k}\right\}$ stands for the Christoffel symbols. Under this coordinate system, it is straightforward to verify that the scalar product between tangent vectors $u,v\in T_{x}M$ can be written as

\left\langle u,v\right\rangle=\sum_{i=1}^{n}\epsilon_{i}u^{i}v^{i}

where $u=u^{i}e_{i}$ and $v=v^{j}e_{j}$ (Einstein’s summation convention implicitly invoked). The local Riemannian structure can thus be defined as

(12)

g\left(u,v\right)=\sum_{i=1}^{n}u^{i}v^{i}.

Essentially, such a local inner product is defined by imposing orthogonality between positive and negative definite subspaces of $T_{x}M$ and “reversing the sign” of the negative definite component of the scalar product. Making such a modification consistently and smoothly over the entire manifold is certainly subject to topological obstructions; nevertheless, locally (in fact, pointwise) defined Riemannian structures suffice for our purposes, and in practical applications we can simply the workflow by choosing an arbitrary orthonormal basis in the tangent space in place of the geodesic frame. The orthonormalization process, of course, is adapted for the semi-Riemannian setting; see [35, Chapter 2, Lemma 24 and Lemma 25] or Algorithm 2. The output set of vectors $\left\{e_{1},\cdots,e_{n}\right\}$ satisfies

\left\langle e_{i},e_{j}\right\rangle=\delta_{ij}\epsilon_{i}

where $\delta_{ij}$ are the Kronecker symbols, and $\epsilon_{i}=\left\langle e_{i},e_{i}\right\rangle=\pm 1$ . A generic approach which works with high probability is to pick a random linearly independent set of vectors and apply a (pivoted) Gram-Schmidt orthogonalization process with respect to the indefinite scalar product; see Algorithm 3.

Algorithm 2 Finding an Orthonormal Basis with respect to a Nondegenerate Indefinite Scalar Product

1:Vector space

V

of finite dimension

n\in\mathbb{N}

, scalar product

\left\langle\cdot,\cdot\right\rangle:V\times V\rightarrow\mathbb{R}

of type

\left(p,q\right)

with

p+q=n

2:function FindONBasis(

V

)

3: Find

v\in V

with

\left\langle v,v\right\rangle\neq 0

\triangleright

v

exists by nondegeneracy

e_{1}\leftarrow v/\sqrt{\left|\left\langle v,v\right\rangle\right|}

5: for

k=2,\cdots,n

V_{k}\leftarrow\mathrm{span}\left\{e_{1},\cdots,e_{k-1}\right\}

W_{k}\leftarrow V_{k}^{\perp}

\triangleright

V=V_{k}\oplus V_{k}^{\perp}

by [35, Lemma 3.19 and 3.23]

8: Find

w_{k}\in W_{k}

with

\left\langle w_{k},w_{k}\right\rangle\neq 0

\triangleright

w_{k}

exists by nondegeneracy of

W_{k}

e_{k}\leftarrow w_{k}/\sqrt{\left|\left\langle w_{k},w_{k}\right\rangle\right|}

10: end for

11: return

\left\{e_{1},\cdots,e_{n}\right\}

12:end function

Algorithm 3 Gram-Schmidt for an Indefinite Scalar Product

1:Vector space

V

of finite dimension

n\in\mathbb{N}

, scalar product

\left\langle\cdot,\cdot\right\rangle:V\times V\rightarrow\mathbb{R}

of type

\left(p,q\right)

with

p+q=n

, input linearly independent vectors

\left\{v_{1},\cdots,v_{n}\right\}

2:function IndefGramSchmidt(

\left\{v_{1},\cdots,v_{n}\right\}

)

e_{1}\leftarrow v_{1}/\sqrt{\left|\left\langle v_{1},v_{1}\right\rangle\right|}

\triangleright

w.l.o.g. assume

\left\langle v_{1},v_{1}\right\rangle\neq 0

4: for

k=2,\cdots,n

\displaystyle w_{k}\leftarrow v_{k}-\sum_{\ell=1}^{k-1}\left\langle v_{k},v_{\ell}\right\rangle v_{\ell}

\triangleright

w.l.o.g. assume

\left\langle w_{k},w_{k}\right\rangle\neq 0

after pivoting

e_{k}\leftarrow w_{k}/\sqrt{\left|\left\langle w_{k},w_{k}\right\rangle\right|}

7: end for

8: return

\left\{e_{1},\cdots,e_{n}\right\}

9:end function

In geodesic normal coordinates, the gradient $Df$ takes the form

Df\left(x\right)=\sum_{i=1}^{n}\epsilon_{i}\partial_{i}f\left(x\right)\partial_{i}\big{|}_{x}

and choosing the steepest descent direction reduces to the problem

\max_{v^{1},\cdots,v^{n}\in\mathbb{R}\atop\left(v^{1}\right)^{2}+\cdots+\left(v^{n}\right)^{2}=1}\sum_{i=1}^{n}\epsilon_{i}v^{i}\partial_{i}f\left(x\right)

of which the optimum is obviously attained at

\left(v^{1},\cdots,v^{n}\right)=\frac{1}{\displaystyle\sum_{i=1}^{n}\left(\partial_{i}f\left(x\right)\right)^{2}}\left(\epsilon_{1}\partial_{1}f\left(x\right),\cdots,\epsilon_{n}\partial_{n}f\left(x\right)\right).

For the simplicity of statement, we introduce the notation

\left[X\right]^{+}:=\sum_{i=1}^{n}\left\langle X,e_{i}\right\rangle e_{i}

for $X\in T_{x}M$ , where $\left\{e_{1},\cdots,e_{n}\right\}$ is an orthonormal basis for the semi-Riemannian metric tensor $\left\langle\cdot,\cdot\right\rangle$ on $T_{x}M$ . Using this notation, the descent direction we will choose can be written as

(13)

-\left[Df\left(x\right)\right]^{+}=-\sum_{i=1}^{n}\left\langle Df\left(x\right),e_{i}\right\rangle e_{i}.

Note that, by [35, Lemma 3.25], with respect to an orthonormal basis $\left\{e_{1},\cdots,e_{n}\right\}$ we have in general

Df\left(x\right)=\sum_{i=1}^{n}\epsilon_{i}\left\langle Df\left(x\right),e_{i}\right\rangle e_{i}\neq\sum_{i=1}^{n}\left\langle Df\left(x\right),e_{i}\right\rangle e_{i}=\left[Df\left(x\right)\right]^{+}

which is consistent with our previous discussion that the steepest descent direction in the semi-Riemannian setting is not $-Df\left(x\right)$ in general. Intuitively, the “steepest descent direction” is obtained by reversing signs of components of the gradient that “corresponds to” the negative definite subspace, and then rescale according to the induced Riemannian metric. This leads to the routine Algorithm 4 for finding descent directions.

Remark 3.5.

The definition $\left[X\right]^{+}$ certainly depends on the choice of the orthonormal basis with respect to the semi-Riemannian metric tensor. In other words, if we choose a different orthonormal basis with respect to the same semi-Riemannian metric on $T_{x}M$ , the resulting descent direction will also be different. In practical computations, we could pre-compute an orthonormal basis for all points on the manifold, but that will complicate the proofs for convergence since the amount of descent will be uncomparable to each other across tangent vectors. A compromise is to cover the entire semi-Riemannian manifold with a chart consisting of geodesic normal neighborhoods, and extend the definition Eq. 13 from at a single point to over the geodesic normal neighborhood around each point, with the orthonormal basis given by geodesic normal frame fields [35, pp.84-85] defined over each normal neighborhood. Under suitable compactness assumptions, this construction essentially defines a Riemannian structure on the semi-Riemannian manifold by means of partition of unity and

(14)

g\left(X,Y\right):=\left\langle X,\left[Y\right]^{+}\right\rangle=\sum_{i=1}^{n}\left\langle X,e_{i}\right\rangle\left\langle Y,e_{i}\right\rangle.

The arbitrariness of the choice of geodesic normal frame fields makes this Riemannian structure non-canonical, but the bilinear form $g\left(\cdot,\cdot\right)$ is symmetric and coercive, and can thus be used for performing steepest descent in the semi-Riemannian setting.

Algorithm 4 Finding Semi-Riemannian Descent Direction

1:function FindDescentDirection(

x,M,Df\left(x\right)

)

\left\{e_{1},\cdots,e_{n}\right\}\leftarrow

FindONBasis

\left(T_{x}M,\left\langle\cdot,\cdot\right\rangle\right)

\displaystyle\eta\leftarrow-\left[Df\left(x\right)\right]^{+}=-\sum_{i=1}^{n}\left\langle Df\left(x\right),e_{i}\right\rangle e_{i}

4: return

\eta

5:end function

Remark 3.6.

For Minkowski spaces, it is easy to check that the descent direction output from Algorithm 4 coincides with $-\nabla f\left(x\right)$ exactly. In this sense Algorithm 1 can be viewed as a generalization of the Riemannian steepest descent algorithm. In fact, the pointwise construction of positive-definite scalar products in each tangent space Eq. 12 indicates that the methodology of Riemannian manifold optimization can be carried over to settings with weaker geometric assumptions, namely, when the inner product structure on the tangent spaces need not vary smoothly from point to point. From this perspective, we can also view semi-Riemannian optimization as a type of manifold optimization with weaker geometric assumptions.

Remark 3.7.

Algorithm 1 can indeed be viewed as an instance of a more general paradigm of line-search based optimization on manifolds [42, §3]. Our choice of the descent direction in Algorithm 4 ensures that the objective function value indeed decreases, at least for sufficiently small step size, which further facilitates convergence.

Example 3.8 (Semi-Riemannian Gradient Descent for Minkowski Spaces).

Recall from Example 2.10 that the semi-Riemannian gradient of a differentiable function on Minkowski space $\mathbb{R}^{p,q}$ is $Df\left(x\right)=\operatorname{I}_{p,q}\nabla f\left(x\right)$ . If we choose the standard canonical basis for $\mathbb{R}^{p,q}$ , the descent direction $\left[Df\left(x\right)\right]^{+}$ produced by Algorithm 4 and needed for Algorithm 1 is

\left[Df\left(x\right)\right]^{+}=\operatorname{I}_{n}\cdot\operatorname{I}_{p,q}\cdot\operatorname{I}_{n}\cdot\operatorname{I}_{p,q}\nabla f\left(x\right)=\nabla f\left(x\right)

and thus the semi-Riemannian gradient descent coincides with the standard gradient descent algorithm on the Euclidean space if the standard orthonormal basis is used at every point of $\mathbb{R}^{p,q}$ . Of course, if we use a randomly generated orthonormal basis (under the semi-Riemannian metric) at each point, the semi-Riemannian gradient descent will be drastically different from standard gradient descent on Euclidean spaces; see Section 5.1 for an illustration.

When studying self-concordant barrier functions for interior point methods, a useful guiding principle is to consider the Riemannian geometry defined by the Hessian of a strictly convex self-concordant barrier function [31, 11, 41, 32]; in this setting, descent directions produced from Newton’s method can be equivalently viewed as gradients with respect to the Riemannian structure. When the barrier function is non-convex, however, the Hessians are no longer positive definite, and the Riemannain geometry is replaced with semi-Riemannian geometry. It is well known that the direction computed from Newton’s equation Eq. 2 may not always be a descent direction if $\nabla^{2}f$ is not positive definite [48, §3.3], which is consistent with our observation in this subsection that semi-Riemannian gradients need not be descent directions in general. In this particular case, our modification Eq. 13 can also be interpreted as a novel variant of the Hessian modification strategy [48, §3.4], as follows. Denote the function under consideration as $f:Q\rightarrow\mathbb{R}$ , where $Q\subset\mathbb{R}^{n}$ is a connected, closed convex subset with non-empty interior and contains no straight lines. Assume $\nabla^{2}f$ is non-degenerate on $Q$ , which necessarily implies that $\nabla^{2}f$ is of constant signature on $Q$ . At any $x\in Q$ , the negative gradient of $f$ with respect to the semi-Riemannian metric defined by the Hessian of $f$ is $-Df\left(x\right)=-\left[\nabla^{2}f\left(x\right)\right]^{-1}\nabla f\left(x\right)$ , where $\nabla f$ and $\nabla^{2}f$ stand for the gradient and Hessian of $f$ with respect to the Euclidean geometry of $Q$ . Our proposed modification first finds a matrix $U\in\mathbb{R}^{n\times n}$ satisfying

U^{\top}\left[\nabla^{2}f\left(x\right)\right]U=\operatorname{I}_{p,q}

where $\left(p,q\right)$ is the constant signature of $\nabla^{2}f$ on $Q$ , and then set

(15)

-\left[Df\left(x\right)\right]^{+}=-UU^{\top}\left[\nabla^{2}f\left(x\right)\right]Df\left(x\right)=-UU^{\top}\nabla f\left(x\right)

which is guaranteed to be a descent direction since

-\left[\nabla f\left(x\right)\right]^{\top}\left[Df\left(x\right)\right]^{+}=-\left\|U\nabla f\left(x\right)\right\|^{2}\leq 0.

From Eq. 15 it is evident that the semi-Riemannian descent direction $-\left[Df\left(x\right)\right]^{+}$ is obtained from $-Df\left(x\right)$ by replacing the inverse Hessian with $UU^{\top}$ . This is close to Hessian modification in spirit, but also drastically different from common Hessian modification techniques that adds a correction matrix to the true Hessian $\nabla^{2}f\left(x\right)$ ; see [48, §3.4] for more detailed explanation.

3.3 Semi-Riemannian Conjugate Gradient

Using the same steepest descent directions and line search strategy, we can also adapt conjugate gradient methods to the semi-Riemannian setting. See Algorithm 5 for the algorithm description. Note that in Algorithm 5 we used the Polak-Rebière formula to determine $\beta_{k}$ , but alternatives such as Hestenes-Stiefel or Fletcher-Reeves methods (see e.g. [12, §2.6] or [42]) can be easily adapted to the semi-Riemannian setting as well, since none of the major steps in Riemannian conjugate gradient algorithm relies essentially on the positive-definiteness of the metric tensor, except that the (steepest) descent direction needs to be modified according to Eq. 13. We noticed in practice that Polak-Rebière and formulae tend to be more robust and efficient than the Fletcher-Reeves formula for the choice of $\beta_{k}$ , which is consistent with general observations of nonlinear conjugate gradient methods [48, §5.2].

Algorithm 5 Semi-Riemannian Conjugate Gradient (Polak-Rebière)

1:Manifold

M

, objective function

f

, retraction

\mathrm{Retr}

, parallel transport

P

, initial value

x_{0}\in M

, parameters for Linesearch, gradient

Df

and Hessian

D^{2}f

k\leftarrow 0

x_{0}\leftarrow

Initiate

\eta_{0}\leftarrow

FindDescentDirection

\left(x_{0},M,Df\left(x_{0}\right)\right)

\triangleright

c.f. Algorithm 4

5:while not converge do

0<t_{k}\leftarrow

LineSearch

\left(f,x_{k},\eta_{k}\right)

\triangleright

t_{k}

is the Armijo step size

x_{k+1}\leftarrow\mathrm{Retr}_{x_{k}}\left(t_{k}\eta_{k}\right)

\xi_{k+1}\leftarrow

FindDescentDirection

\left(x_{k+1},M,Df\left(x_{k+1}\right)\right)

\eta_{k+1}=\xi_{k+1}+\beta_{k}P\eta_{k}

, where

\triangleright

P:T_{x_{k}}\!M\rightarrow T_{x_{k+1}}\!M

\beta_{k}:=\max\left\{0,\frac{\left\langle Df\left(x_{k+1}\right)-P\left[Df\left(x_{k}\right)\right],\left[Df\left(x_{k+1}\right)\right]^{+}\right\rangle}{\left\langle Df\left(x_{k}\right),\left[Df\left(x_{k}\right)\right]^{+}\right\rangle}\right\}

10:

k\leftarrow k+1

11:end while

12:return Sequence of iterates

\left\{x_{k}\right\}

Remark 3.9.

For Minkowski spaces (including Lorentzian spaces) with the standard orthonormal basis, both steepest descent and conjugate gradient methods coincide with their counterparts on standard Euclidean spaces, since they share identical descent directions, parallel-transports, and Hessians of the objective function.

Remark 3.10.

Algorithm 5 can also be applied to self-concordant barrier functions for interior point methods, when the objective function is not necessarily strictly convex but has non-degenerate Hessians. In this context, where the semi-Riemannian metric tensor is given by the Hessian of the objective function, Algorithm 5 can be viewed as a hybrid of Newton and conjugate gradient methods, in the sense that the “steepest descent directions” are determined by the Newton equations but the actual descent directions are combined using the methodology of conjugate gradient methods. To the best of our knowledge, such a hybrid algorithm has not been investigated in existing literature.

3.4 Metric Independence of Second Order Methods

In this subsection we consider two prototypical second-order optimization methods on semi-Riemannian manifolds, namely, Newton’s method and trust region method. Surprisingly, both methods turn out to produce descent directions that are independent of the choice of scalar products on tangent spaces. We give a geometric interpretation of this independence from the perspective of jets in Section 3.4.2.

3.4.1 Semi-Riemannian Newton’s Method

As an archetypal second-order method, Newton’s method on Riemannian manifolds has already been developed in detail in the early literature of Riemannian optimization [2, Chap 6]. The rationale behind Newton’s method is that the first order stationary points of a differentiable function $f:M\rightarrow\mathbb{R}$ are in one-to-one correspondence with the minimum of $\left\|\nabla f\right\|^{2}=\left\langle\nabla f,\nabla f\right\rangle$ when the metric is positive-definite (i.e., when $M$ is a Riemannian manifold). Thus by choosing the direction $V$ to satisfy the Newton equation $\nabla_{V}\nabla f=-\nabla f$ we ensure that $V$ is a descent direction

V\left\langle\nabla f,\nabla f\right\rangle=2\left\langle\nabla_{V}\nabla f,\nabla f\right\rangle=-2\left\langle\nabla f,\nabla f\right\rangle=-2\left\|\nabla f\right\|^{2}

and the right hand side is strictly negative as long as $\nabla f\neq 0$ . The main difficulty in generalizing this procedure to the semi-Riemannian setting is similar with the difficulty we faced in Section 3.2: when the metric is indefinite, $Df=0$ has nothing to do with $\left\|Df\right\|=0$ , and thus one can no longer find the stationary points of $f$ by minimizing $\left\|Df\right\|^{2}$ . The approach we’ll adopt to fix this issue is also similar to that in Section 3.2: instead of minimizing $\left\langle Df\left(x\right),Df\left(x\right)\right\rangle$ , we will focus on the coercive bilinear form $\left\langle Df\left(x\right),\left[Df\left(x\right)\right]^{+}\right\rangle$ .

Let $E_{1},\cdots,E_{n}$ be a local geodesic normal coordinate frame centered at $x\in M$ , i.e. for any $1\leq i,j\leq n$

\left\langle E_{i}\left(x\right),E_{j}\left(x\right)\right\rangle=\epsilon_{i}\delta_{ij},\quad\nabla_{E_{i}}E_{j}\left(x\right)=0.

Then we have

(16)

\left\langle Df\left(x\right),\left[Df\left(x\right)\right]^{+}\right\rangle=\sum_{i=1}^{n}\left|\left\langle Df\left(x\right),E_{i}\left(x\right)\right\rangle\right|^{2}

and thus for any tangent vector $V\in T_{x}M$ we have

	$\displaystyle V$	$\displaystyle\left\langle Df\left(x\right),\left[Df\left(x\right)\right]^{+}\right\rangle=2\sum_{i=1}^{n}\left\langle Df\left(x\right),E_{i}\left(x\right)\right\rangle V\left\langle Df\left(x\right),E_{i}\left(x\right)\right\rangle$
		$\displaystyle=2\sum_{i=1}^{n}\left\langle Df\left(x\right),E_{i}\left(x\right)\right\rangle\left[\left\langle D_{V}Df\left(x\right),E_{i}\left(x\right)\right\rangle+\left\langle Df\left(x\right),D_{V}E_{i}\left(x\right)\right\rangle\right]$
		$\displaystyle=2\sum_{i=1}^{n}\left\langle Df\left(x\right),E_{i}\left(x\right)\right\rangle\left\langle D_{V}Df\left(x\right),E_{i}\left(x\right)\right\rangle$

where in the last equality we used the fact that $\nabla_{E_{i}}E_{j}\left(x\right)=0$ for all $1\leq i,j\leq n$ . Therefore, as long as we pick $V$ to satisfy Newton’s equation

(17)

\left[D^{2}f\left(x\right)\right]\left(V\right)=D_{V}Df\left(x\right)=-Df\left(x\right)

we can ensure decrease in the value of Eq. 16. In other words, we can obtain a descent direction for semi-Riemannian optimization using the same Newton’s equation as for Riemannian optimization, with the only difference that Riemannian gradient and Hessian get replaced with their semi-Riemannian counterparts.

Algorithm 6 Semi-Riemannian Newton’s Method

1:Manifold

M

, objective function

f

, retraction

\mathrm{Retr}

, initial value

x_{0}\in M

, parameters for Linesearch, gradient

Df

and Hessian

D^{2}f

2:while not converge do

3: Obtain the descent direction by solving the Newton equation

\left[D^{2}f\left(x_{k}\right)\right]\left(\eta_{k}\right)=-Df\left(x_{k}\right)

0<t_{k}\leftarrow

LineSearch

\left(f,x_{k},\eta_{k}\right)

\triangleright

t_{k}

is the Armijo step size

x_{k+1}\leftarrow\mathrm{Retr}_{x_{k}}\left(t_{k}\eta_{k}\right)

k\leftarrow k+1

7:end while

8:return Sequence of iterates

\left\{x_{k}\right\}

Given that our semi-Riemannian Newton’s method builds upon the “Riemannian surrogate” Eq. 16, it is not surprising that the semi-Riemannian Newton’s method reduces to the ordinary Newton’s method on Minkowski spaces, and the geodesics and parallel-transports stays the same as their Riemannian counterparts (i.e. when the scalar product is positive definite). This is best illustrated in the following calculation.

Example 3.11 (Semi-Riemannian Newton’s Method for Minkowski Spaces).

Recalling the definitions of semi-Riemannian gradient and Hessians from Example 2.10, the descent direction $\eta_{k}$ needed in Algorithm 6 is determined by

\operatorname{I}_{p,q}\nabla^{2}f\left(x_{k}\right)\eta_{k}=-\operatorname{I}_{p,q}\nabla f\left(x_{k}\right)\quad\Leftrightarrow\quad\eta_{k}=-\left[\nabla^{2}f\left(x_{k}\right)\right]^{-1}\nabla f\left(x_{k}\right)

for all $k=0,1,2,\cdots$ . This calculation made it clear that the semi-Riemannian Newton’s method coincides with the standard Newton’s method.

The metric independence demonstrated in Example 3.11 reflects a more general phenomenon of metric independence in Newton’s method as formulated in [41, §1.6]. Though the discussion in phenomenon of metric independence in Newton’s method as formulated in [41, §1.6] is restricted to the Riemannian case (scalar product required to be positive definite), it is straightforward to see that the metric independence persists under non-degenerate change of semi-Riemannian structures. In fact, if we denote $J\left(x_{k}\right)$ for the Jacobian matrix of a non-degenerate coordinate transformation at $x_{k}$ , it is straightforward to check from the coordinate expressions of semi-Riemannian gradient and Hessian Eq. 8 that the Newton equation Eq. 17 in the new coordinate system takes the form $J\left(x_{k}\right)\left[D^{2}f\left(x_{k}\right)\right]\left(V\right)=-J\left(x_{k}\right)\nabla f\left(x_{k}\right)$ , which yields the same descent direction as Eq. 17. In the Riemannian regime, this metric independence is often attributed to the fact that second-order approximation is independent of inner products (see e.g. [41, §1.6]); we provide a general and unified differential geometric interpretation of this independence in terms of jets in Section 3.4.2.

3.4.2 Jets and the Metric Independence of Trust Region Method

It is well known that first-order and Newton’s methods suffer from various drawbacks from a numerical optimization methods, such as slow local convergence and/or prohibitive computational cost in determining the descent direction. It is thus argued (c.f. [2], [1]) that it could be more efficient to consider successive optimization of local models of the cost function on the domain of the problem. Trust region methods, which considers quadratic local models through approximate Taylor expansions of the cost function, fall into this category (see e.g. [48] and the references therein). This methodology has also been generalized to Riemannian manifolds for manifold optimization [1, 2, 19, 18]. In a nutshell, at each point $x\in M$ the Riemannian trust-region method strives to find the descent direction by solving locally the quadratic optimization problem on the tangent plane $T_{x}M$ :

(18)

\min_{\eta\in T_{x}M\atop\left\|\eta\right\|\leq\Delta_{0}}m_{x}\left(\eta\right)=f\left(x\right)+\left\langle\nabla f\left(x\right),\eta\right\rangle+\frac{1}{2}\left[\nabla^{2}f\left(x\right)\right]\left(\eta,\eta\right)

where $\left\langle\cdot,\cdot\right\rangle$ is the inner product specified by the Riemannian metric tensor, $\left\|\cdot\right\|$ is the induced norm, and $\Delta_{0}$ is the radius of the trust region which is updated through the iterations according to certain technical criteria (e.g. the geometry of the manifold, the approximation quality of the local model, etc.).

When generalizing trust region methods to semi-Riemannian optimization, again we are faced with the difficulties for the other methods discussed previously, such as the non-compactness of the “metric ball” of bounded radius $\Delta_{0}>0$ . This can be resolved by introducing a positive definite inner product accompanying the indefinite metric tensor as in Section 3.2 and Section 3.4.1, then restrict the search for the descent direction to a bounded domain defined by the norm induced from the inner product. Denoting $\left\|\cdot\right\|_{+}$ for the induced norm on $T_{x}M$ , the local quadratic optimization problem in the semi-Riemannian setting can be written as

(19)

\min_{\eta\in T_{x}M\atop\left\|\eta\right\|_{+}\leq\Delta_{0}}m^{\textrm{semi}}_{x}\left(\eta\right)=f\left(x\right)+\left\langle Df\left(x\right),\eta\right\rangle+\frac{1}{2}\left[D^{2}f\left(x\right)\right]\left(\eta,\eta\right).

We argue that this local quadratic model coincides with the Riemannian model Eq. 18 with the (frame-field-dependent) Riemannian structure Eq. 14. In fact, the verification is straightforward by picking geodesic normal coordinate systems under the Riemannian and semi-Riemannian metric (which ensures the Christoffel symbols vanish at $x$ ) and a change-of-coordinate argument as in the proof of Proposition 2.8, together with the coordinate expressions Eq. 8. This implies that a trust region method based on Eq. 19 for the semi-Riemannian manifold $M$ can be interpreted and analyzed using more or less the same techniques in existing literature of Riemannian trust region methods. The only subtlety here is the frame dependence of locality of the Riemannian structures accompanying the semi-Riemannian metric; nevertheless, this technicality can be resolved by noticing the direct dependence of the local Riemannian structure with the smooth semi-Riemannian structure.

The argument we gave in this section can be carried out to establish the “metric independence” of trust region methods on manifolds. While it is certainly desirable to pick a metric on the manifold so as to enable numerical implementations of the optimization algorithms, at the end of the day the only influence of the metric enters the trust region methods through choosing the size $\Delta_{0}$ of the trust region, which eventually does not matter after the region radius update rules are carried out (which ultimately depends on the value distribution of the cost function only). One geometric explanation for this phenomenon is through the notion of jets (see e.g. [43, 47, 36]), which characterizes the manifold analogy of “polynomial approximation” for smooth functions. Though the formal invarance of under change of coordinates breaks down for derivatives greater than or equal to the second order, it turns out that one can define equivalence classes of “Taylor polynomial expansion modulo higher order terms” by the matching of a fixed number of lower order derivatives at a fixed point. More concretely, consider an arbitrary point $q\in M$ and denote $\left(U,\left(x^{1},\cdots,x^{d}\right)\right)$ for a coordinate system around $q$ , and assume without loss of generality that $x^{j}\left(q\right)=0$ for all $j=1,\cdots,d$ . By a direct calculate, one can verify that the second order Taylor expansion

(20)

f\left(x\right)=f\left(0\right)+x^{i}\partial_{i}f\left(0\right)+\frac{1}{2}x^{j}x^{k}\partial^{2}_{jk}f\left(0\right)+O\left(\left\|x\right\|^{3}\right),\quad x\in U

is formally preserved under change of coordinates up to cubic polynomials. This indicates that, as long as we interpret the big- $O$ notation in Eq. 20 as containing not only “metrically” $O\left(\left\|x\right\|^{3}\right)$ terms (characterized by the local smooth structure or the metric tensor thereof) but also polynomials of degree $\geq 3$ in the components of $x\in\mathbb{R}^{d}$ , then the expansion Eq. 20 makes sense geometrically as an element in the polynomial ring modulo ideals generated by cubic polynomials. (In fact, for fixed $k\in\mathbb{N}$ , the union of $k$ -jets over all points on the manifold form a fibre bundle often referred to as a jet bundle.) For the purpose of trust region methods this equivalence relation suffices for specifying local models, as equivalent polynomials (as the same jet) give rise to local models of the same order (see e.g. [2, Proposition 7.1.3]). It then follows that, for distinct Riemannian or semi-Riemannian metrics on the same smooth manifold and under geodesic normal coordinates chosen respectively with respect to the metric structures, the local models Eqs. 18 and 19 correspond to the same jet and will metrically differ from each other in terms of cubic geodesic distances only, whenever the metrics involved are all Riemannian. When at least one of the metric tensors involved is semi-Riemannian, the metric comparison has to be carried out with extra caution (e.g. with respect to the metric structure induced by another Riemannian structure) since coordinate polynomials are no longer bounded by “semi-Riemannian norms” of the same order, again due to the indefiniteness of the semi-Riemannian metric tensor.

4 Semi-Riemannian Optimization on Submanifolds

Submanifolds of Euclidean spaces are most often encountered in practical applications of manifold optimization. A key difference between Riemannian and semi-Riemannian geometry is that the non-degeneracy of the metric tensor can not be inherited by sub-manifolds as easily from semi-Riemannian ambient manifolds: for a submanifold $X$ of $M$ , any Riemannian metric on $M$ induces a Rimannian metric on $X$ since $g$ is positive definite at every point $x\in X$ , but a semi-Rimannian metric on $M$ could become degenerate when restrict to $X$ ; this degeneracy is the main obstruction to finding a well-defined “orthogonal projection” which is essential for (i) relating gradients on the manifold with gradients in the ambient space, and (ii) defining geodesics on submanifolds. Semi-Riemannian manifolds with degeneracy are of interest to the theory of general relativity and mathematical physics; see [22, 23, 24, 25, 45] and the references therein. This section provides some characterization of degenerate semi-Riemannian manifolds (see Definition 4.3) in terms of their degenerate bundles (see Definition 4.4). The goal is to identify non-degenerate semi-Riemannian submanifolds of Minkowski spaces for which our algorithmic framework in Section 3 applies. As demonstrated in the computation in this section and Appendix B, unfortunately, semi-Riemannian structures inherited from the ambient Minkowski space are degenerate for most matrix Lie groups. Nonetheless, many interesting hypersurfaces (co-dimension one submanifolds) of Minkowski spaces admit non-degenerate induced semi-Riemannian structures, or degenerate ones but with degeneracy contained in a set of measure zero; the semi-Riemannian optimization framework introduced in Section 3 applies seamlessly to these examples, some of which we illustrate in Section 5.

4.1 Degeneracy of Semi-Riemannian Submanifolds

Theories of Riemannian and semi-Riemannian geometry build upon the non-degeneracy of metric tensors. However, physical models of spacetime renders itself naturally to the occurrence of singularities, as pointed out in general relativity [37, 15, 17, 16]. A lot of work in semi-Riemannian geometry are thus devoted to the development of singular semi-Riemannian geometry — the geometry of semi-Riemannian manifolds with degeneracy in their metric tensors, either with constant signature [22, 23, 24] or more generally, with possibly variable signature [25, 45]. In special cases such as null hypersurfaces of Lorentzian manifolds, specific techniques such as rigging [14, 6] have been developed, but generalizing these special constructions to other degenerate semi-Riemannian submanifolds is much less straightforward, if possible at all. For the simplicity of exposition, we’ll confine our discussion to the constant signature scenario regardless of whether singularities occur.

Definition 4.1.

A symmetric bilinear form $\left\langle\cdot,\cdot\right\rangle:V\times V\rightarrow\mathbb{R}$ on a vector space $V$ is said to have signature $\left(\kappa,\nu,\pi\right)$ if the maximum positive definite subspace is of dimension $\pi\in\mathbb{Z}_{\geq 0}$ , the maximum negative definite subspace is of dimension $\nu\in\mathbb{Z}_{\geq 0}$ , and the dimension of the degenerate subspace with respect to this bilinear form

V^{\perp}:=\left\{v\in V\mid\left\langle v,u\right\rangle=0\quad\forall u\in V\right\}

is of dimension $\kappa\in\mathbb{Z}_{\geq 0}$ . A vector $0\neq v\in V$ is said to be (1) degenerate if $v\in V$ ; (2) null if $\left\langle v,v\right\rangle=0$ but $v\notin V^{\perp}$ ; (3) timelike if $\left\langle v,v\right\rangle<0$ ; (4) spacelike if $\left\langle v,v\right\rangle>0$ .

Definition 4.2.

Let $W$ be a subspace of a vector space $V$ equipped with a bilinear form $\left\langle\cdot,\cdot\right\rangle:V\times V\rightarrow\mathbb{R}$ . Denote $\left(\kappa,\nu,\pi\right)$ for the type of the bilinear form on $W$ obtained from restricting $\left\langle\cdot,\cdot\right\rangle$ to $W$ . We say that $W$ is (1) degenerate if $\kappa\geq 1$ ; (2) nondegenerate if $\kappa=0$ ; (3) timelike if $\kappa=0$ and $\nu\geq 1$ ; (4) spacelike if $\kappa=0$ , $\nu=0$ , and $\pi\geq 1$ .

Definition 4.3 (Degenerate Semi-Riemannian Manifolds).

A degenerate semi-Riemannian manifold is a smooth manifold equipped with a possibly degenerate $\left(0,2\right)$ tensor field. This tensor field will be referred to as the degenerate metric tensor of the degenerate semi-Riemannian manifold; the signature of the degenerate metric tensor will also be referred to as the signature of the manifold when no confusion exists. Unless otherwise specified, the degenerate metric tensor is of constant signature in the rest of this paper.

When the context is clear, we will occasionally omit the adjective “degenerate” when referring to degenerate semi-Riemannian manifolds and the degenerate metric tensor on it, since non-degenerate semi-Riemannian manifolds are special cases of degenerate ones with $\kappa=0$ .

Definition 4.4 (Degenerate Bundle, [24] Definition 3.1).

The degenerate bundle of a (possibly degenerate) semi-Riemannian manifold $\left(M,\left\langle\cdot,\cdot\right\rangle\right)$ is defined as the distribution

(21)

M^{\perp}:=\bigcup_{x\in M}\left\{u\in T_{x}M\mid\left\langle u,v\right\rangle=0\quad\forall v\in T_{x}M\right\}.

We say $M$ is integrable if the distribution $M^{\perp}$ is integrable. We denote by $M^{\perp}_{x}$ the linear space $\{u\in\operatorname{T}_{x}M:\langle u,v\rangle=0,~\forall v\in\operatorname{T}_{x}M\}$ and we call the set of point $x\in M$ such that $M_{x}^{\perp}\neq\{0\}$ the degenerate locus of $M$ .

As in the setup of Riemannian manifold optimization, for practice it is of primary interest to understand the differential geometry of submanifolds of an ambient manifold for which most differential geometric quantities can be characterized explicitly. In the context of semi-Riemannian geometry, a first technical subtlety with the notion of “semi-Riemannian submanifolds” is that the induced semi-Riemannian metric tensor may well suffer from certain degeneracy even when the ambient semi-Riemannian geometry is non-degenerate. The main difficulty lies at the non-existence of a canonical “orthogonal projection” from the ambient to the submanifold tangent spaces — this complicates the definitions of normal bundles, second fundamental forms, as well as extrinsic characterizations of intrinsic geometric concepts such as affine connections, geodesics, and parallel-translates. For instance, it is well-known that covariant derivatives on a semi-Riemannian submanifold can be obtained from calculating the covariant derivatives on the ambient semi-Riemannian manifold and then projecting the result to the tangent spaces of the submanifold (see e.g. [35, Chapter 4, Lemma 3]), but this characterization breaks down if the projection operator can not be properly defined. In fact, on a degenerate semi-Riemannian manifold there does not exist in general a semi-Riemannian analogue of the Levi-Civita (metric-compatible and torsion-free) connection, even for a degenerate semi-Riemannian submanifold of a non-degenerate semi-Riemannian manifold. Such an analogue, if exists, is called a Koszul derivative of the degenerate semi-Riemannian manifold; a semi-Riemannian manifold admitting a Koszul derivative is called a singular semi-Riemannian manifold in [22, 23, 24]. In general, a singular semi-Riemannian manifold $M$ admits more than one Koszul derivatives, and any two Koszul derivatives on $M$ differ from each other by a map from $\Gamma\left(TM\right)\times\Gamma\left(TM\right)$ to the degenerate bundle $M^{\perp}$ ; see e.g. [22, Proposition 3.5]. Note that though it is tempting to define a connection on a degenerate semi-Riemannian manifold through the Koszul formula Eq. 5, the formula defines a Koszul derivative if and only if the metric tensor is Lie parallel along all sections of the degenerate bundle ([24, Theorem 3.4]). Another useful (necessary but insufficient) criterion for the existence of a Koszul derivative on a degenerate semi-Riemannian manifold is the integrability the degenerate bundle: as shown in [24, Corollary 3.6], if a semi-Riemannian manifold $M$ admits a Koszul derivative, then $M^{\perp}$ is integrable.

A large class of examples of semi-Riemannian manifolds commonly encountered in scientific computation are matrix Lie groups. They admit semi-Riemannian structures of arbitrary signature since tangent bundles of Lie groups are trivial. For instance, it is straightforward to verify that the semi-Riemannian structure on $\mathbb{R}^{n\times n}$ specified in Example 2.6 induces a non-degenerate semi-Riemannian structure on the general linear group $\mathrm{GL}\left(n,\mathbb{R}\right)$ , though non-degeneracy becomes evident for almost all interesting matrix subgroups of $\mathrm{GL}\left(n,\mathbb{R}\right)$ . We demonstrate the ubiquity of such degeneracy in the following two examples; more examples of matrix Lie groups are deferred to Section B.2.

Example 4.5 (Indefinite Orthogonal Group).

Let $\mathbb{N}\ni n=p+q$ , $0\leq p\leq n$ , and $p,q\in\mathbb{N}$ . Define the indefinite orthogonal group of signature $\left(p,q\right)$ as

(22)

O\left(p,q\right):=\left\{A\in\mathbb{R}^{n\times n}\mid A^{\top}\operatorname{I}_{p,q}A=\operatorname{I}_{p,q}\right\}

where $\operatorname{I}_{p,q}$ is defined in Example 2.6. The Lie algebra of this Lie group can be easily verified as

(23)

\mathfrak{o}(p,q)\coloneqq\left\{X\in\mathbb{R}^{n\times n}:X^{\top}\operatorname{I}_{p,q}+\operatorname{I}_{p,q}X=0\right\}.

The tangent space at an arbitrary $A\in O\left(p,q\right)$ is thus

(24)		$\displaystyle T_{A}O\left(p,q\right)$	$\displaystyle=\left\{AX\mid X\in\mathbb{R}^{n\times n},X^{\top}\operatorname{I}_{p,q}+\operatorname{I}_{p,q}X=0\right\}$
(24)			$\displaystyle=\left\{A\operatorname{I}_{p,q}Y\mid Y\in\mathbb{R}^{n\times n},Y^{\top}+Y=0\right\}.$

Equipping $\mathfrak{o}\left(p,q\right)$ with bilinear form specified in Example 2.6, the Lie group structure on $O\left(p,q\right)$ induces a left-invariant semi-Riemannian metric on $O\left(p,q\right)$ by

(25)		$\displaystyle\left\langle X,Y\right\rangle_{A}=$	$\displaystyle\mathrm{Tr}\left(\left(A^{-1}X\right)^{\top}\operatorname{I}_{p,q}A^{-1}Y\right)\quad\forall X,Y\in T_{A}O\left(p,q\right)$
(25)		$\displaystyle=$	$\displaystyle\mathrm{Tr}\left(X^{\top}\operatorname{I}_{p,q}Y\right)\qquad\textrm{since $A^{\top}\operatorname{I}_{p,q}A=I_{p,q}$.}$

For the ease of notation, we shall drop the sub-script $A\in G$ unless there is a potential risk of confusion. This semi-Riemannian metric will be referred to as the natural semi-Riemannian metric on $O\left(p,q\right)$ . The degenerate bundle of this semi-Riemannian structure can be easily determined as follows. Let $\Delta\in\mathbb{R}^{n\times n}$ be a skew-symmetric matrix such that $A\operatorname{I}_{p,q}\Delta\in T_{A}O\left(p,q\right)$ for an arbitrary $A\in O\left(p,q\right)$ . Setting

0=\mathrm{Tr}\left(X^{\top}A^{-\top}\operatorname{I}_{p,q}A^{-1}\Delta\right)=\mathrm{Tr}\left(X^{\top}\operatorname{I}_{p,q}\Delta\right)\quad\forall X\in\mathbb{R}^{n\times n},X^{\top}+X=0

we have that $\operatorname{I}_{p,q}\Delta$ must be symmetric, i.e.

(26)

\operatorname{I}_{p,q}\Delta=\Delta^{\top}\operatorname{I}_{p,q}(=-\Delta\operatorname{I}_{p,q})

Writing $\Delta$ in the partitioned form

\Delta=\begin{bmatrix}\Delta_{1}&\Delta_{2}\\ -\Delta_{2}^{\top}&\Delta_{3}\end{bmatrix}

where

\Delta_{1}\in\mathbb{R}^{p\times p},\quad\Delta_{2}\in\mathbb{R}^{p\times q},\quad\Delta_{3}\in\mathbb{R}^{q\times q}

satisfying

\Delta_{1}+\Delta_{1}^{\top}=0,\,\,\Delta_{3}+\Delta_{3}^{\top}=0.

Plugging this partitioned form into Eq. 26 gives

\Delta_{1}=0,\quad\Delta_{3}=0

from which it follows that the degenerate bundle of $O\left(p,q\right)$ takes the form

	$\displaystyle O\left(p,q\right)^{\perp}$	$\displaystyle=\bigcup_{A\in O\left(p,q\right)}\left\{A\operatorname{I}_{p,q}\begin{bmatrix}&\Delta_{2}\\ -\Delta_{2}^{\top}&\end{bmatrix}\,\Bigg{\|}\,\Delta_{2}\in\mathbb{R}^{p\times q}\right\}$
		$\displaystyle=\bigcup_{A\in O\left(p,q\right)}\left\{A\begin{bmatrix}&\Delta_{2}\\ \Delta_{2}^{\top}&\end{bmatrix}\,\Bigg{\|}\,\Delta_{2}\in\mathbb{R}^{p\times q}\right\}$

In particular, this indicates that the natural semi-Riemannian structure on $O\left(p,q\right)$ is degenerate. By checking at the identity it is clear that $\left[\mathfrak{o}\left(p,q\right),\mathfrak{o}\left(p,q\right)\right]\nsubseteq\mathfrak{o}\left(p,q\right)$ , hence the degenerate bundle $O\left(p,q\right)^{\perp}$ is not integrable. It then follows from [24, Corollary 3.6] that $O\left(p,q\right)$ equipped with the natural semi-Riemannian metric does not admit a Koszul derivative.

Example 4.6 (Orthogonal Group).

Let $\mathbb{N}\ni n=p+q$ , $0\leq p\leq n$ , and $p,q\in\mathbb{N}$ . The manifold structure on the orthogonal group $O\left(n\right)$ is well-known:

	$\displaystyle O\left(n\right)$	$\displaystyle=\left\{A\in\mathbb{R}^{n\times n}\mid AA^{\top}=A^{\top}A=I_{n}\right\}$
	$\displaystyle\mathfrak{o}\left(n\right)$	$\displaystyle=\left\{X\in\mathbb{R}^{n\times n}\mid X+X^{\top}=0\right\}$
	$\displaystyle T_{A}O\left(n\right)$	$\displaystyle=\left\{AX\mid X\in\mathbb{R}^{n\times n},X+X^{\top}=0\right\}=A\mathfrak{o}\left(n\right),\quad\forall A\in O\left(n\right).$

Equip $O\left(n\right)$ with the same left-invariant semi-Riemannian metric as in Example 4.5:

\left\langle U,V\right\rangle_{A}:=\mathrm{Tr}\left(\left(A^{-1}U\right)^{\top}\operatorname{I}_{p,q}A^{-1}V\right)\quad\forall U,V\in T_{A}O\left(n\right).

In this example, again the semi-Riemannian metric is degenerate. In fact, by a similar argument as in Example 4.5 one has

(27)

O\left(n\right)^{\perp}=\bigcup_{A\in O\left(p,q\right)}\left\{A\begin{bmatrix}&\Delta_{2}\\ -\Delta_{2}^{\top}&\end{bmatrix}\,\Bigg{|}\,\Delta_{2}\in\mathbb{R}^{p\times q}\right\}

and again, $O\left(n\right)^{\perp}$ is not integrable. In fact, one can also easily verify that the involution $\left[O\left(n\right)^{\perp},O\left(n\right)^{\perp}\right]$ is orthogonal to $O\left(n\right)^{\perp}$ with respect to the natural Riemannian (not semi-Riemannian!) metric on $O\left(n\right)$ . Again, from [24, Corollary 3.6] we know that $O\left(n\right)$ with semi-Riemannian metric Eq. 27 does not admit a Koszul derivative.

Remark 4.7.

Example 4.6 is a special case of a more general practice: one can equip $O\left(p,q\right)$ with a semi-Riemannian structure with $\operatorname{I}_{p,q}$ replaced with $\operatorname{I}_{p^{\prime},q^{\prime}}$ in Example 4.5, where $p+q=p^{\prime}+q^{\prime}$ but $p\neq p^{\prime}$ and $q\neq q^{\prime}$ . Again this is due to the triviality of the tangent bundle of the Lie group $O\left(p,q\right)$ for any integers $p$ and $q$ .

Remark 4.8.

It is natural to ask at this point whether a given manifold of interest, such as $\operatorname{O}\left(p,q\right)$ or $\operatorname{O}\left(n\right)$ , admits a semi-Riemannian structure of a particular type for which a Koszul derivative exists. We are not aware of general results of this sort. Some related work (e.g. [34, 7, 4]) have been devoted to the existence of left-invariant Lorentz metrics satisfying certain curvature sign conditions, following the seminal work of Milnor [29]. Bi-invariant semi-Riemannian metrics on Lie groups have also been widely explored since the 1910s; see [30, §1.4] for a brief survey.

Remark 4.9.

Though the notion of orthogonality breaks down for degenerate semi-Riemannian submanifolds, the tangent bundle of any semi-Riemannian manifold of type $\left(\kappa,\nu,\pi\right)$ admits a direct sum decomposition $T\!M=M^{\perp}\oplus H$ , where $H$ is a sub-bundle of $T\!M$ with rank $\nu+\pi$ . In this case, the restriction of the semi-Riemannian metric on $H$ gives rise to a non-degenerate semi-Riemannian metric of type $\left(0,\nu,\pi\right)$ . We will fully leverage this partial non-degeneracy in the semi-Riemannian optimization algorithm presented in this paper.

4.1.1 Gradient and Hessian of Submanifolds of Minkowski Spaces

When $M$ is a non-degenerate semi-Riemannian submanifold of a Minkowski space $\mathbb{R}^{p+q}$ , gradient and Hessian of a twice differentiable function $f$ on $M$ can be computed explicitly from the gradient and Hessian of $f$ on the ambient Minkowski space $\mathbb{R}^{p+q}$ , thanks to the non-degeneracy which ensures for any $x\in M$ that the tangent space $T_{x}M$ has an orthogonal complement in $\mathbb{R}^{p+q}$ , and thus $Df$ and $D^{2}f$ on $\mathbb{R}^{p+q}$ can be orthogonally projected onto $T_{x}M$ . Specifically, the same argument as in [2, §3.6.1] indicates that the semi-Riemannian gradient of $f$ on $M$ is exactly the orthogonal projection to the tangent space of $M$ of the semi-Riemannian gradient of $f$ as a function defined on the Minkowski space $\mathbb{R}^{p+q}$ ; a similar argument yields the fact that the Hessian of $f$ is the composition of the Hessian of $f$ on the Minkowski space $\mathbb{R}^{p+q}$ composed with the orthogonal projection from $\mathbb{R}^{p+q}$ to the tangent space of $M$ .

Example 4.10 (Euclidean Sphere in Minkowski Spaces).

Consider the standard Euclidean sphere

\mathbb{S}^{p+q-1}=\left\{x\in\mathbb{R}^{p,q}\mid x_{1}^{2}+\cdots+x_{p+q}^{2}=1\right\}

as a submanifold of $\mathbb{R}^{p,q}$ , the Minkowski space equipped with inner product $\operatorname{I}_{p,q}\in\mathbb{R}^{p+q}$ as defined in Example 2.5. For any $x\in\mathbb{S}^{p+q-1}$ , the tangent space $T_{x}\mathbb{S}^{p+q-1}$ can be specified as

T_{x}\mathbb{S}^{p+q-1}=\left\{v\in\mathbb{R}^{p,q}\mid v^{\top}x=\left\langle v,\operatorname{I}_{p,q}x\right\rangle=0\right\}

and thus the projection from $\mathbb{R}^{p,q}$ to $T_{x}\mathbb{S}^{p+q-1}$ is

P_{x}\left(v\right):=v-\frac{\left\langle v,\operatorname{I}_{p,q}x\right\rangle}{\left\langle\operatorname{I}_{p,q}x,\operatorname{I}_{p,q}x\right\rangle}\operatorname{I}_{p,q}x=v-\frac{v^{\top}x}{x^{\top}\operatorname{I}_{p,q}x}\operatorname{I}_{p,q}x,\quad\forall x^{\top}\operatorname{I}_{p,q}x\neq 0.

For $x\in\mathbb{S}^{p+q-1}$ with $x^{\top}\operatorname{I}_{p,q}x=0$ , the projection operator $P_{x}$ is not defined since $x$ is a null vector. Nevertheless, this occurs only for a set of measure zero on $\mathbb{S}^{p+q-1}$ , which means they almost never occur in practice. In our numerical experiments on $\mathbb{S}^{p+q-1}$ (see Section 5.2), we just randomly perturb the point $x$ so that the optimization trajectory stays away from the degenerate locus. This works perfectly as long as the optimum is not on the degenerate locus. If unavoidable, we can also temporarily resort to the Riemannian orthogonal projection for $x\in\mathbb{S}^{p+q-1}$ with $x^{\top}\operatorname{I}_{p,q}x=0$ . For a twice differentiable function $f:\mathbb{S}^{p+q-1}\rightarrow\mathbb{R}$ , if we denote $Df$ and $D^{2}f$ for the semi-Riemannian gradient and Hessian of $f$ on the ambient Minkowski space (following Example 2.10), then the semi-Riemannian gradient and Hessian of $f$ on $\mathbb{S}^{p+q-1}$ are $P_{x}\left(Df\left(x\right)\right)$ and $P_{x}\left(D^{2}f\left(x\right)\right)$ , respectively.

4.1.2 Geodesics and Parallel-Transports

Regardless of whether the semi-Riemannian submanifold under consideration is degenerate, we can define analogies of geodesics and parallel-transports on them by means of their semi-normal bundles. To this end, for semi-Riemannian manifold $M$ and its submanifold $X$ we denote by $TM$ and $TX$ the tangent bundles of $M$ and $X$ , respectively. Let $x\in X$ , we define the semi-normal space of $X$ in $M$ at $x$ to be

\operatorname{SN}_{x}(X,M)\coloneqq\{u\in T_{x}M:\widetilde{g}_{x}(u,v)=0,~\forall v\in T_{x}X\}.

We also define the semi-normal distribution of $X$ in $M$ to be

\operatorname{SN}(X,M)\coloneqq\bigsqcup_{x\in X}\operatorname{SN}_{x}(X,M).

Consider the linear map

\operatorname{SN}_{x}(X,M)\hookrightarrow\operatorname{T}_{x}M\to\operatorname{T}_{x}M/\operatorname{T}_{x}X

where the first map is the inclusion and the second map is the quotient map. The following observation is straightforward by definition.

Lemma 4.11.

Fibres of the degenerate bundle of $X$ (c.f. Definition 4.4) at $x\in X$ can be written as

X_{x}^{\perp}=\operatorname{SN}_{x}(X,M)\cap\operatorname{T}_{x}X.

In particular, if $X$ is an open submanifold of $M$ , then $X$ is a non-degenerate semi-Riemannian submanifold of $M$ .

Hence we have an injective map

\operatorname{SN}_{x}(X,M)/X_{x}^{\perp}\hookrightarrow\operatorname{T}_{x}M/\operatorname{T}_{x}X

and thus $\operatorname{SN}(X,M)/X^{\perp}$ is a sub-distribution of the normal bundle $\operatorname{N}(X,M)\coloneqq\operatorname{T}M|_{X}/\operatorname{T}X$ . If $\dim\operatorname{SN}_{x}(X,M)-\dim X_{x}^{\perp}$ is constant with respect to $x\in X$ , then $\operatorname{SN}(X,M)$ is a sub-bundle of $\operatorname{N}(X,M)$ and will be referred to as the semi-normal bundle of $X$ with respect to $M$ . We define the analogy of geodesics on (possibly degenerate) semi-Riemannian submanifolds as curves with accelerations in the semi-normal bundle — when the semi-Riemannian submanifold becomes non-degenerate these geodesics reduces to standard semi-Riemannian geodesics.

Definition 4.12.

For a given $x\in X$ and $\Delta\in\operatorname{T}_{x}X$ , if a smooth curve $\gamma:[-\epsilon,\epsilon]\to X$ satisfies $\gamma(0)=x,\dot{\gamma}(0)=\Delta$ and

\frac{D}{dt}(\dot{\gamma}(t))\in\operatorname{SN}_{\gamma(t)}(X,M)

for all $t\in[-\epsilon,\epsilon]$ , then $\gamma$ is called an embedded geodesic curve on $X$ passing through $x$ with the tangent direction $\Delta$ . Here $\frac{D}{dt}(\dot{\gamma}(t))$ is the covariant derivative of $\dot{\gamma}(t)$ along $\gamma(t)$ on the ambient semi-Riemannian manifold $(M,g)$ .

Definition 4.13.

Let $\gamma(t)$ be a curve passing through $x=\gamma(0)$ on $X$ and let $\Delta\in\operatorname{T}_{x}X$ be a given tangent vector. A parallel transportation of $\Delta$ along the curve $\gamma(t)$ is a vector field $\Delta(t)$ such that $\Delta(0)=\Delta$ and

\frac{D}{dt}\left(\Delta(t)\right)\in\operatorname{SN}_{\gamma(t)}(X,M).

We remark that on a (semi-)Riemannian manifold $(Z,g)$ , a geodesic $\tau$ passing through $z\in Z$ with the tangent direction $U\in\operatorname{T}_{z}Z$ is traditionally defined by the second order ODE with initial condition:

(28)

\begin{cases}\nabla_{\dot{\tau}(t)}\dot{\tau}(t)=0,\\ \tau(0)=x,\,\,\dot{\tau}(0)=U\end{cases}

where $\nabla$ is the covariant derivative uniquely determined by the metric $g$ . In the meanwhile, a well-known fact (cf. [35, Corollary 10]) is that if $(Z,g)$ is isometrically embedded in a (semi-)Riemannian manifold $(\overline{Z},\overline{g})$ then Eq. 28 is equivalent to the condition that $\overline{D}/dt(\dot{\gamma}(t))$ is always perpendicular to $Z$ , i.e.

\overline{g}\left(\frac{\overline{D}}{dt}\left(\dot{\gamma}(t)\right),V\right)=0

for all $V\in\operatorname{T}_{\gamma(t)}Z$ . Here $\overline{D}$ is the covariant derivative on $\overline{Z}$ along the curve $\gamma(t)$ . From this second perspective, Definition 4.12 and Definition 4.13 are natural generalizations of geodesics and parallel-transports from nondegenerate to degenerate semi-Riemannian geometry. Of course, it is in general not possible to obtain closed-form expressions for the embedded geodesic curves and parallel-transports; see Section B.2 for some examples. These definitions apply to the particular case when the semi-Riemannian structure under consideration is actually Riemannian, and thus the optimization methods are also applicable to degenerate Riemannian manifolds.

4.2 Semi-Riemannian Hypersurfaces of Minkowski Spaces

In this subsection we describe the semi-Riemannian geometry of submanifolds of codimension one in the Minkowski space $\mathbb{R}^{p,q}$ (see Example 2.5), which are prototypical examples of semi-Riemannian manifolds. Throughout this subsection $X$ denotes a submanifold of $\mathbb{R}^{p,q}$ . Unraveling the definition of semi-normal and normal bundles yields:

Proposition 4.14.

For each $x\in X$ , we have

\operatorname{SN}_{x}(X,\mathbb{R}^{p,q})=\operatorname{I}_{p,q}\operatorname{N}_{x}(X,\mathbb{R}^{p+q})

where

\operatorname{N}_{x}(X,\mathbb{R}^{p+q})=\left\{v\in T_{x}\mathbb{R}^{p+q}\mid v_{1}w_{1}+\cdots+v_{p+q}w_{p+q}=0\textrm{ for all }w\in T_{x}X\right\}

and

\operatorname{I}_{p,q}\operatorname{N}_{x}(X,\mathbb{R}^{p+q})=\left\{\operatorname{I}_{p,q}v\mid v\in N_{x}(X,\mathbb{R}^{p+q})\right\}.

In particular, $\operatorname{SN}(X,\mathbb{R}^{p,q})$ is a vector bundle on $X$ of rank $(p+q-\dim X)$ .

Corollary 4.15.

Let $X\subseteq\mathbb{R}^{p,q}$ be a hypersurface (submanifolds of co-dimension one), and $x\in X$ . Then either $X_{x}^{\perp}=\{0\}$ or $X_{x}^{\perp}=\operatorname{SN}_{x}(X,\mathbb{R}^{p,q})=\operatorname{I}_{p,q}\operatorname{N}_{x}(X,\mathbb{R}^{p+q})$ .

Proof 4.16.

By Lemma 4.11 we have $X_{x}^{\perp}=\operatorname{SN}_{x}(X,\mathbb{R}^{p,q})\cap T_{x}X$ , but by Proposition 4.14, we know $\operatorname{SN}_{x}(X,\mathbb{R}^{p,q})=\operatorname{I}_{p,q}\operatorname{N}_{x}(X,\mathbb{R}^{p+q})$ is one-dimensional.

Example 4.17 (Euclidean Spheres in Minkowski Spaces).

Consider as in Example 4.10 the hypersurface

\mathbb{S}^{p+q-1}=\left\{x\in\mathbb{R}^{p,q}\mid x_{1}^{2}+\cdots+x_{p+q}^{2}=1\right\}\subseteq\mathbb{R}^{p,q}.

Direct calculation yields

	$\displaystyle\operatorname{N}_{x}(\mathbb{S}^{p+q-1},\mathbb{R}^{p+q})$	$\displaystyle=\left\{\lambda x:\lambda\in\mathbb{R}\right\},$
	$\displaystyle\operatorname{SN}_{x}(\mathbb{S}^{p+q-1},\mathbb{R}^{p,q})$	$\displaystyle=\operatorname{I}_{p,q}\operatorname{N}_{x}(\mathbb{S}^{p+q-1},\mathbb{R}^{p+q})=\left\{\lambda\operatorname{I}_{p,q}x:\lambda\in\mathbb{R}\right\}$
	$\displaystyle T_{x}\mathbb{S}^{p+q-1}$	$\displaystyle=\left\{v\in\mathbb{R}^{p,q}\mid v_{1}x_{1}+\cdots+v_{p+q}x_{p+q}=0\right\}$

and thus

	$\displaystyle(\mathbb{S}^{p+q-1})_{x}^{\perp}$	$\displaystyle=\operatorname{SN}_{x}(\mathbb{S}^{p+q-1},\mathbb{R}^{p,q})\cap T_{x}\mathbb{S}^{p+q-1}$
		$\displaystyle=\left\{\lambda\operatorname{I}_{p,q}x\mid\lambda\in\mathbb{R},\,\,x_{1}^{2}+\cdots+x_{p}^{2}=x_{p+1}^{2}+\dots+x_{p+q}^{2}\right\}$
		$\displaystyle=$

It is conceivable that hypersurfaces, and in particular those linear ones — known as hyperplanes — play an important role in semi-Riemannian geometry as they can provide rich yet elementary examples of non-degenerate semi-Riemannian sub-manifolds. In fact, generically speaking, hyperplanes inherit non-degenerate semi-Riemannian structures from the ambient Minkowski spaces; we defer a simple proof to supplementary materials. It makes use of a handy criterion for the non-degeneracy of semi-Riemannian structures on hypersurfaces which we establish as follows. First of all, we point out that Proposition 4.14 can be equivalently interpreted in terms Gauss maps: for closed sub-manifolds $X\subset\mathbb{R}^{p,q}$ with $\operatorname{dim}X<p+q$ , denote $m:=p+q-\operatorname{dim}\left(X\right)$ and define the Gauss map

\operatorname{N}:X\to\operatorname{Gr}(m,p+q),\quad\operatorname{N}(x)=\operatorname{N}_{x}(X,\mathbb{R}^{p+q}).

and the semi-Gauss map

\operatorname{SN}:X\to\operatorname{Gr}(m,p+q),\quad\operatorname{SN}(x)=\operatorname{SN}_{x}(X,\mathbb{R}^{p,q}).

Proposition 4.14 states essentially the commutativity of the following diagram:

(29)

Denote by $\mathcal{V}$ the quadratic hypersurface in $\mathbb{P}\mathbb{R}^{p+q}$ defined by

\mathcal{V}:=\left\{\left(x_{1},\cdots,x_{p},y_{1},\cdots,y_{q}\right)\in\mathbb{P}\mathbb{R}^{p+q}\,\Bigg{|}\,\sum_{j=1}^{p}x_{j}^{2}-\sum_{j=1}^{q}y_{j}^{2}=0\right\}.

The degeneracy of semi-Riemannian structures on hypersurfaces is totally determined by the intersections of semi-normal bundles with $\mathcal{V}$ . More concretely, it follows directly from the definitions that

Proposition 4.18.

If $\dim X=p+q-1$ , then $\operatorname{N}^{-1}(\mathcal{V})$ is the degenerate locus of $X$ . In particular, $X$ is non-degenerate if and only if $\mathcal{V}\cap\operatorname{N}(X)=\emptyset$ , where $\operatorname{N}(X)$ is the image of the Gauss map of $X$ .

In the remainder of this section we provide two classes of hypersurfaces, namely, pseudo-spheres and pseudo-hyperbolic spaces, in the Minkowski space $\mathbb{R}^{p,q}$ that are different from hyperplanes. For both examples we obtain closed form expressions for embedded geodesic curves and parallel-transports (see Definition 4.12 and Definition 4.13) needed for implementing the algorithmic framework proposed in Section 3. Numerical experiments demonstrating the efficacy of the semi-Riemannian optimization framework on these hypersurfaces can be found in Section 5.

4.2.1 Pseudo-spheres

Let $\mathbb{S}^{p,q}$ be the hypersurface in $\mathbb{R}^{p,q}$ defined by the equation

-\sum_{j=1}^{p}x_{j}^{2}+\sum_{j=1}^{q}y_{j}^{2}=1.

Here we write $x\in\mathbb{R}^{p,q}$ as $x=(x_{1},\dots,x_{p},y_{1},\dots,y_{q})$ . In the literature, $\mathbb{S}^{p,q}$ is called the unit pseudo-sphere in $\mathbb{R}^{p,q}$ , and $\mathbb{S}^{1,q}$ (resp. $\mathbb{S}^{p,1}$ ) is known asx the de Sitter (resp. Anti-de Sitter) space. The tangent space $\operatorname{T}_{x}\mathbb{S}^{p,q}$ is characterized by

\operatorname{T}_{x}\mathbb{S}^{p,q}=\left\{(u,v)\in\mathbb{R}^{p,q}:-\sum_{j=1}^{p}x_{j}u_{j}+\sum_{j=1}^{q}y_{j}v_{j}=0\right\}

for each $x=(x_{1},\dots,x_{p},y_{1},\dots,y_{q})\in\mathbb{R}^{p,q}$ . Hence we also have

\operatorname{N}_{x}(\mathbb{S}^{p,q},\mathbb{R}^{p,q})=\left\{\lambda\operatorname{I}_{p,q}x:\lambda\in\mathbb{R}\right\},\quad\operatorname{SN}_{x}(\mathbb{S}^{p,q},\mathbb{R}^{p+q})=\left\{\lambda x:\lambda\in\mathbb{R}\right\}.

This together with Proposition 4.18 implies the following

Lemma 4.19.

For any positive integers $p$ and $q$ , $\mathbb{S}^{p,q}$ is a non-degenerate semi-Riemannian sub-manifold of $\mathbb{R}^{p,q}$ .

We now turn to investigating the embedded geodesics on $\mathbb{S}^{p,q}$ .

Proposition 4.20.

The embedded geodesic passing through $x\in\mathbb{S}^{p,q}$ with tangent direction $X\in T_{x}\mathbb{S}^{p,q}$ is

(30)

\gamma(t)=\begin{cases}\displaystyle x\cos(t\left\|X\right\|)+\left(X/\left\|X\right\|\right)\sin(t\left\|X\right\|),&\text{if}~\langle X,X\rangle>0,\\ \displaystyle x\cosh(t\left\|X\right\|)+\left(X/\left\|X\right\|\right)\sinh(t\left\|X\right\|),&\text{if}~\langle X,X\rangle<0,\\ \displaystyle x+tX,&\text{otherwise}\end{cases}

where $\left\|X\right\|=\sqrt{\left|\langle X,X\rangle_{x}\right|}$ .

Proof 4.21.

First, it is straightforward to verify that $\gamma(0)=x$ and $\dot{\gamma}(0)=X$ . Next we notice that

\gamma(t)^{\mathsf{T}}\operatorname{I}_{p,q}\gamma(t)=1

since $x^{\mathsf{T}}\operatorname{I}_{p,q}X=0$ . This implies that $\gamma(t)$ is indeed a curve on $\mathbb{S}^{p,q}$ . Lastly, by taking second derivative, we have

\ddot{\gamma}(t)=\begin{cases}-\langle X,X\rangle\gamma(t),~\text{if}~\langle X,X\rangle\neq 0,\\ 0,~\text{otherwise}\\ \end{cases}

and hence $\ddot{\gamma}(t)\in\operatorname{SN}_{\gamma(t)}(\mathbb{S}^{p,q},\mathbb{R}^{p,q})$ . Therefore, $\gamma(t)$ is the geodesic curve passing through $x$ with tangent direction $X$ .

We now compute the parallel translation on $\mathbb{S}^{p,q}$ . Let $x$ be a point on $\mathbb{S}^{p,q}$ and let $\Delta$ be a tangent vector on $\mathbb{S}^{p,q}$ at $x$ . We denote by $\gamma(t)$ the geodesic curve passing through $x$ with tangent direction $X$ . Let $\Delta(t)$ be the parallel transportation of $\Delta$ along $\gamma$ . By definition, we must have that $\Delta(t)\in\operatorname{T}_{\gamma(t)}\mathbb{S}^{p,q}$ and $\dot{\Delta}(t)\in\operatorname{SN}_{\gamma(t)}(\mathbb{S}^{p,q},\mathbb{R}^{p,q})$ . This implies

(31)		$\displaystyle\langle\Delta(t),\gamma(t)\rangle$	$\displaystyle=0,$
(32)		$\displaystyle\langle\dot{\Delta}(t),\gamma(t)\rangle\gamma(t)$	$\displaystyle=\dot{\Delta}(t).$

Differentiating Eq. 31, we obtain $\langle\dot{\Delta}(t),\gamma\rangle=-\langle\Delta(t),\dot{\gamma}(t)\rangle$ and hence

\dot{\Delta}(t)=-\langle\Delta(t),\dot{\gamma}(t)\rangle\gamma(t).

Since parallel translation preserves inner product, we see that $\langle\Delta(t),\dot{\gamma}(t)\rangle=\langle\Delta,X\rangle$ and

(33)

\dot{\Delta}(t)=-\langle\Delta,X\rangle\gamma(t).

Integrating Eq. 33 and using the initial condition that $\Delta(0)=\Delta$ to get

Proposition 4.22.

Let $\gamma(t)$ be the geodesic passing through $x\in\mathbb{S}^{p,q}$ with tangent direction $X\in\operatorname{T}_{x}\mathbb{S}^{p,q}$ . The parallel transport of $\Delta\in\operatorname{T}_{x}\mathbb{S}^{p,q}$ along the $\gamma(t)$ is

\Delta(t)=-\langle\Delta,X\rangle\int_{0}^{t}\gamma(\tau)d\tau+\Delta.

More precisely, we have

		$\displaystyle\Delta(t)=$
		$\displaystyle$

4.2.2 Pseudo-hyperbolic Spaces

The unit pseudo-hyperbolic space $\mathbb{H}^{p,q}$ in $\mathbb{R}^{p,q}$ is defined by the equation

-\sum_{j=1}^{p}x_{j}^{2}+\sum_{j=1}^{q}y_{j}^{2}=-1.

The tangent space of $\mathbb{H}^{p,q}$ at a point $x=(x_{1},\dots,x_{p},y_{1},\dots,y_{q})$ is

\operatorname{T}_{x}\mathbb{H}^{p,q}=\{(u,v)\in\mathbb{R}^{p,q}:-\sum_{j=1}^{p}x_{j}u_{j}+\sum_{j=1}^{q}y_{j}v_{j}=0\}

Let $\sigma_{p,q}:\mathbb{R}^{p,q}\to\mathbb{R}^{q,p}$ be the map defined by

\sigma_{p,q}(x_{1},\dots,x_{p},y_{1},\dots,y_{q})=(y_{1},\dots,y_{q},x_{1},\dots,x_{p}).

It is straightforward ([35, Lemma 24]) to verify that $\sigma_{p,q}$ is an anti-isometry between $\mathbb{H}^{p,q}$ and $\mathbb{S}^{q,p}$ , whose inverse is $\sigma_{q,p}$ . Therefore we have

\operatorname{N}_{x}(\mathbb{H}^{p,q},\mathbb{R}^{p,q})=\{\lambda\operatorname{I}_{p,q}x:\lambda\in\mathbb{R}\},\quad\operatorname{SN}_{x}(\mathbb{H}^{p,q},\mathbb{R}^{p,q})=\{\lambda x:\lambda\in\mathbb{R}\}.

Corollary 4.23.

For any positive integers $p,q$ , the pseudo-hyperbolic space $\mathbb{H}^{p,q}$ is a non-degenerate semi-Riemannian sub-manifold of $\mathbb{R}^{p,q}$ .

Moreover, geodesics and parallel transports on $\mathbb{H}^{p,q}$ can be easily obtained from those on $\mathbb{S}^{q,p}$ via the anti-isometry $\sigma_{p,q}$ .

Corollary 4.24.

Let $x$ be a point on $\mathbb{H}^{p,q}$ and let $X$ be a tangent direction of $\mathbb{H}^{p,q}$ at $x$ . The geodesic curve $\gamma(t)$ passing through $x$ with tangent direction $X$ is

\gamma(t)=\begin{cases}x\cosh(t\left\|X\right\|)+\left(X/\left\|X\right\|\right)\sinh(t\left\|X\right\|),&\text{if}~\langle X,X\rangle>0,\\ x\cos(t\left\|X\right\|))+\left(X/\left\|X\right\|\right)\sin(t\left\|X\right\|),&\text{if}~\langle X,X\rangle<0,\\ x+tX,&\text{otherwise}.\end{cases}

Corollary 4.25.

Let $\gamma(t)$ be the geodesic on $\mathbb{H}^{p,q}$ passing through $x\in\mathbb{H}^{p,q}$ with tangent direction $X\in\operatorname{T}_{x}\mathbb{H}^{p,q}$ . The parallel transport of $\Delta\in\operatorname{T}_{x}\mathbb{H}^{p,q}$ along $\gamma(t)$ is

		$\displaystyle\Delta(t)=$
		$\displaystyle$

5 Numerical Experiments

We demonstrate in this section the feasibility of the proposed semi-Riemannian optimization framework through various conceptual or numerical experiments.

5.1 Minkowski Spaces

Although we know from Example 3.8 that the semi-Riemannian descent direction coincides with the negative Riemannian gradient when the standard orthonormal basis is chosen and fixed at every point of $\mathbb{R}^{1,1}$ , the two types of gradients nevertheless differ from each other if we follow the random orthonormal basis construction Algorithm 2. To illustrate the difference between Riemannian and semi-Riemannian optimization on Minkowski spaces, we solve a simple quadratic convex optimization problem

(34)

\min_{x\in\mathbb{R}^{2}}x^{\top}Ax

on $\mathbb{R}^{1,1}$ equipped with the standard semi-Riemannian metric of signature $\left(-,+\right)$ . Here $A\in\mathbb{R}^{2\times 2}$ is a randomly generated symmetric positive definite matrix, and we apply both steepest descent Algorithm 1 and conjugate gradient Algorithm 5, using random orthonormal bases in subrountine Algorithm 4 for finding descent directions and Armijo’s rule for line search. The semi-Riemannian optimization trajectories vary from instances to instances due to the randomness in basis construction, but global convergence to the global minimum $x=\left(0,0\right)^{\top}$ is empirically observed. We illustrate in Fig. 1 the comparison among trajectories of Riemannian/semi-Riemannian steepest descent and conjugate gradient algorithms for one random instance.

Refer to caption — Figure 1: Riemannian and semi-Riemannian steepest descent Algorithm 1 and conjugate gradient Algorithm 5 optimization on the Minkowski Space $\mathbb{R}^{1,1}$ for an instance of the quadratic convex problem Eq. 34, where $A=[0.3649,-0.1065;-0.1065,1.7427]$ and the initial point is chosen as $x_{0}=\left(-0.7285,0.0230\right)^{\top}$ .

5.2 Euclidean Spheres in Minkowski Spaces

The calculations in Example 4.17 imply that the unit Euclidean sphere $\mathbb{S}^{p+q-1}$ is nondegenerate as a semi-Riemannian submanifold in $\mathbb{R}^{p,q}$ except for a degenerate locus of measure zero. Let $f:\mathbb{S}^{p+q-1}\rightarrow\mathbb{R}$ be a differentiable function on $\mathbb{S}^{p+q-1}$ , and denote $\nabla f$ for the gradient of $f$ in the ambient Euclidean space. As shown in Example 2.10, the semi-Riemannian gradient of $f$ in the Minkowski space can be written as $Df=I_{p,q}\nabla f$ , and the descent directions in the ambient space take the form $-\left[Df\right]^{+}=-\nabla f$ . Recall from Example 4.10 and Example 4.17 that, unless $x$ is a null vector (which is a set of measure zero), the fibre of the degenerate bundle $\left(\mathbb{S}^{p+q-1}\right)^{\perp}$ vanishes at $x$ and thus we can project $Df$ to a unique tangent vector in $T_{x}\mathbb{S}^{p+q-1}$ by Lemma 2.3. This indicates that the optimization trajectory falls outside of the degenerate locus with probability $1$ , as long as the optimum in not inside the degenerate locus.

To illustrate the feasibility of our proposed semi-Riemannian optimization framework, we solve the problem

(35)

\max_{x_{1}^{2}+\cdots+x_{p+q}^{2}=1}x^{\top}Ax

using semi-Riemannian steepest descent and conjugate gradient methods, where $A\in\mathbb{R}^{\left(p+q\right)\times\left(p+q\right)}$ is a randomly generated symmetric (but not necessarily positive definite) matrix. Obviously, the maximum of Eq. 35 is attaned at the eigenvector associated with the maximum eigenvalue of the matrix $A$ , and hence we can visualize and compare the convergence dynamics of Riemannian and semi-Riemannian optimization schemes by keeping track of the $L^{2}$ -discrepancy between solutions obtained at each iteration and the true maximizer. As there does not seem to exist explicit expressions for the semi-Riemannian geodesic and parallel-transport on $\mathbb{S}^{p+q-1}$ (see Table 1), we use Riemannian geodesic and parallel transport on $\mathbb{S}^{p+q-1}$ ; generically, these choices do not essentially affect the convergence of manifold optimization algorithms, which allows for arbitrary retractions [2, 8] and general parallel-transports [42, 19]. Apart from the random basis generation inherent to the local semi-Riemannain Gram-Schmidt orthonormalization Algorithm 2, for $p+q>2$ there also exist multiple semi-Riemannian structures on $\mathbb{R}^{p+q}$ which induce distinct semi-Riemannian structures on $\mathbb{S}^{p+q-1}$ ; our experimental results in Fig. 2 suggest that all semi-Riemannian structures ensure convergence, though the convergence rates may differ. A deeper investigation of the depenence of convergence rate on the choice of semi-Riemannian structures appears highly intriguing but is beyond the scope of this paper; we defer such exploration to future work.

5.3 Pseudo-spheres in Minkowski Spaces

Since the pseudo-spheres (Section 4.2.1) and pseudo-hyperbolic spaces (Section 4.2.2) differ from each other only by an anti-isometry [35, Lemma 24], we will only consider pseudo-spheres in this numerical experiment. Note that, given an arbitrary point $x\in\mathbb{S}^{p,q}$ and a tangent direction $\operatorname{T}_{x}\mathbb{S}^{p,q}$ , it is generally difficult to calculate Riemannian geodesics on pseudo-spheres explicitly (except for some particular cases where e.g. Clairaut’s relation holds, see [10, Chapter 3 Ex. 1]), but semi-Riemannian geodesics adopt closed-form expression Eq. 30 and thus can be used as retractions for semi-Riemannian optimization algorithms. We consider the problem

(36)

\min_{x\in\mathbb{S}^{p,q}}\left\|x-\xi\right\|_{2}^{2}

where $\left\|x-\xi\right\|_{2}$ is the Euclidean distance between $x\in\mathbb{S}^{p,q}$ and an arbitrarily chosen point $\xi\in\mathbb{R}^{p+q}$ that does not lie on $\mathbb{S}^{p,q}$ . An illustration of the convergence of semi-Riemannian steepest descent and conjugate gradient methods (in semi-log scale) for a random instance of Eq. 36 with $p=3$ and $q=12$ can be found in Fig. 3, where the vertical axis marks the squared Euclidean distance between $x_{k}$ and the ground truth solution $x_{\textrm{true}}$ computed using the constrained optimization routine fmincon provided in the Matlab optimization toolbox. This numerical experiment indicates that the convergence rates of both semi-Riemannian first-order methods are linear for Eq. 36.

6 Conclusion

Motivated by the metric independence of Riemannian optimization algorithms and the Riemannian geometry of self-concordant barrier functions, we developed an algorithmic framework for optimization on semi-Riemannian manifolds in this paper, which includes Riemannian manifold optimization and standard unconstrained optimization in Euclidean spaces as special cases. We proposed a modification to the semi-Riemannian gradients for obtaining descent directions, and used this methodology to devise steepest descent and conjugate gradient algorithms for semi-Riemannian manifold optimization. We also showed that second-order methods such as Newton’s method and trust region methods are invariant with respect to difference choices of semi-Riemannian (including Riemannian) metrics. We provided numerical experiments to demonstrate the feasibility of the proposed algorithmic framework on non-degenerate semi-Riemannian submanifolds of Minkowski spaces. We defer more rigorous theoretical analysis, as well as broader ranges of applications of, semi-Riemannian manifold optimization to future work.

Software

MATLAB code for the surface registration algorithm is publicly available at https://github.com/trgao10/SemiRiem.

Acknowledgments

The authors would like to thank Lin Lin (UC Berkeley) for inspirational discussions.

References

[1] P.-A. Absil, C. G. Baker, and K. A. Gallivan, Trust-Region Methods on Riemannian Manifolds, Foundations of Computational Mathematics, 7 (2007), p. 303–330.
[2] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms on Matrix Manifolds, Princeton University Press, 2009.
[3] N. Ahmad, H. K. Kim, and R. J. McCann, Optimal Transportation, Topology and Uniqueness, Bulletin of Mathematical Sciences, 1 (2011), p. 13–32.
[4] R. Albuquerque, On Lie Groups with Left Invariant Semi-Riemannian Metric, in Proceedings 1st International Meeting on Geometry and Topology, 1998, p. 1–13.
[5] E. Andruchow, G. Larotonda, L. Recht, and A. Varela, The Left Invariant Metric in the General Linear Group, Journal of Geometry and Physics, 86 (2014), p. 241–257.
[6] C. Atindogbe, M. Gutiérrez, and R. Hounnonkpe, New Properties on Normalized Null Hypersurfaces, Mediterranean Journal of Mathematics, 15 (2018), p. 166, https://doi.org/10.1007/s00009-018-1210-0, https://doi.org/10.1007/s00009-018-1210-0.
[7] F. Barnet et al., On Lie Groups That Admit Left-Invariant Lorentz Metrics of Constant Sectional Curvature, Illinois Journal of Mathematics, 33 (1989), p. 631–642.
[8] N. Boumal, P.-A. Absil, and C. Cartis, Global Rates of Convergence for Nonconvex Optimization on Manifolds, IMA Journal of Numerical Analysis, (2016).
[9] S. M. Carroll, Spacetime and Geometry: An Introduction to General Relativity, San Francisco, USA: Addison-Wesley (2004), 2004.
[10] M. P. Do Carmo, Riemannian Geometry, Springer, 1992.
[11] J. J. Duistermaat, On Hessian Riemannian Structures, Asian Journal of Mathematics, 5 (2001), pp. 79–91.
[12] A. Edelman, T. A. Arias, and S. T. Smith, The Geometry of Algorithms with Orthogonality Constraints, SIAM Journal on Matrix Analysis and Applications, 20 (1998), p. 303–353.
[13] D. Gabay, Minimizing a Differentiable Function over a Differential Manifold, Journal of Optimization Theory and Applications, 37 (1982), p. 177–219.
[14] M. Gutiérrez and B. Olea, Induced Riemannian Structures on Null Hypersurfaces, Mathematische Nachrichten, 289 (2016), pp. 1219–1236.
[15] S. W. Hawking, The Occurrence of Singularities in Cosmology, Proc. R. Soc. Lond. A, 294 (1966), p. 511–521.
[16] S. W. Hawking and G. F. R. Ellis, The Large Scale Structure of Space-Time, vol. 1, Cambridge University Press, 1973.
[17] S. W. Hawking and R. Penrose, The Singularities of Gravitational Collapse and Cosmology, Proc. R. Soc. Lond. A, 314 (1970), p. 529–548.
[18] G. Heidel and V. Schulz, A Riemannian Trust-Region Method for Low-Rank Tensor Completion, Numerical Linear Algebra with Applications, (2017), p. e2175.
[19] W. Huang, P.-A. Absil, and K. A. Gallivan, A Riemannian Symmetric Rank-One Trust-Region Method, Mathematical Programming, 150 (2015), p. 179–216.
[20] Y. Kim and R. McCann, Continuity, Curvature, and the General Covariance of Optimal Transportation, Journal of the European Mathematical Society, 12 (2010), p. 1009–1040.
[21] Y.-H. Kim, R. J. McCann, and M. Warren, Pseudo-Riemannian Geometry Calibrates Optimal Transportation, Mathematical Research Letters, 17 (2010), p. 1183–1197, https://doi.org/10.4310/MRL.2010.v17.n6.a16.
[22] D. N. Kupeli, Degenerate Manifolds, Geometriae Dedicata, 23 (1987), p. 259–290.
[23] D. N. Kupeli, Degenerate Submanifolds in Semi-Riemannian Geometry, Geometriae Dedicata, 24 (1987), p. 337–361.
[24] D. N. Kupeli, On Null Submanifolds in Spacetimes, Geometriae Dedicata, 23 (1987), p. 33–51.
[25] J. C. Larsen, Singular Semi-Riemannian Geometry, Journal of Geometry and Physics, 9 (1992), p. 3–23.
[26] D. G. Luenberger and Y. Ye, Linear and Nonlinear Programming, vol. 2, Springer, 1984.
[27] J. Milnor, Morse Theory, vol. 51, Princeton university press, 1963.
[28] J. Milnor, Morse Theory, vol. 51 of Annals of Mathematic Studies, Princeton University Press, 1973.
[29] J. Milnor, Curvatures of Left Invariant Metrics on Lie Groups, Advances in Mathematics, 21 (1976), p. 293–329, https://doi.org/10.1016/S0001-8708(76)80002-3.
[30] N. Miolane and X. Pennec, Computing Bi-Invariant Pseudo-Metrics on Lie Groups for Consistent Statistics, Entropy, 17 (2015), p. 1850–1881.
[31] Y. Nesterov and A. Nemirovskii, Interior-Point Polynomial Algorithms in Convex Programming, vol. 13, Siam, 1994.
[32] Y. E. Nesterov and M. J. Todd, On the Riemannian Geometry Defined by Self-Concordant Barriers and Interior-Point Methods, Foundations of Computational Mathematics, 2 (2002), pp. 333–361.
[33] L. Nicolaescu, An Invitation to Morse Theory, Springer Science & Business Media, 2011.
[34] K. Nomizu, Left-Invariant Lorentz Metrics on Lie Groups, Osaka Journal of Mathematics, 16 (1979), p. 143–150.
[35] B. O’Neill, Semi-Riemannian Geometry with Applications to Relativity, vol. 103 of Pure and Applied Mathematics, Academic Press, 1983.
[36] R. S. Palais, Seminar on Atiyah-Singer Index Theorem.(AM-57), vol. 57, Princeton University Press, 2016.
[37] R. Penrose, Gravitational Collapse and Space-Time Singularities, Physical Review Letters, 14 (1965), p. 57.
[38] P. Petersen, Riemannian Geometry, vol. 171 of Graduate Texts in Mathematics, Springer New York, 2006.
[39] T. Rapcsak, Geodesic Convexity in Nonlinear Optimization, Journal of Optimization Theory and Applications, 69 (1991), p. 169–183.
[40] G. Raskutti and S. Mukherjee, The Information Geometry of Mirror Descent, IEEE Transactions on Information Theory, 61 (2015), pp. 1451–1457.
[41] J. Renegar, A Mathematical View of Interior-Point Methods in Convex Optimization, vol. 3, Siam, 2001.
[42] W. Ring and B. Wirth, Optimization Methods on Riemannian Manifolds and Their Application to Shape Space, SIAM Journal on Optimization, 22 (2012), p. 596–627, https://doi.org/10.1137/11082885X.
[43] D. J. Saunders, The Geometry of Jet Bundles, vol. 142, Cambridge University Press, 1989.
[44] S. T. Smith, Optimization Techniques on Riemannian Manifolds, Fields institute communications, 3 (1994), p. 113–135.
[45] O. C. Stoica, On Singular Semi-Riemannian Manifolds, International Journal of Geometric Methods in Modern Physics, 11 (2014), p. 1450041.
[46] C. Udriste, Convex Functions and Optimization Methods on Riemannian Manifolds, vol. 297, Springer Science & Business Media, 1994.
[47] R. Vakil, A Beginner’s Guide to Jet Bundles from the Point of View of Algebraic Geometry, Notes, (1998).
[48] S. Wright and J. Nocedal, Numerical Optimization, Springer Science, 35 (1999), p. 7.

Appendix A Genericity Non-degeneracy of Semi-Riemannian Structures on Hyperplanes of Minkowski Spaces

We begin with a brief discussion for the Gauss map defined in Section 4.2. Consider $Z=\{(x,W)\in X\times\operatorname{Gr}(m,p+q):W=\operatorname{N}_{x}(X,\mathbb{R}^{p+q})\}$ . Since $W=\operatorname{N}_{x}(X,\mathbb{R}^{p+q})$ if and only if $W$ is perpendicular to $\operatorname{T}_{x}X$ with respect to the Euclidean metric on $\mathbb{R}^{p+q}$ , $Z$ is a closed subset of $X\times\operatorname{Gr}(m,p+q)$ . More precisely, we have

Proposition A.1.

Let $\pi:X\times\operatorname{Gr}(m,p+q)\to\operatorname{Gr}(m,p+q)$ be the canonical projection onto the second factor. The following facts hold:

(1)

$\pi(Z)=\operatorname{N}(X)$ ;
(2)

If $X$ is compact, then $\pi$ is a closed map, (i.e. mapping closed sets to closed sets). In particular, $\operatorname{N}(X)\subseteq\operatorname{Gr}(m,p+q)$ is a closed subset if $X$ is compact.

The non-degeneracy of a generic hyperplane can be easily obtained as a corollary of Proposition 4.18. Recall that hyperplanes $H\subset\mathbb{R}^{p+q}$ can be characterized as

H:=\left\{x=\left(x_{1},\cdots,x_{p},y_{1},\cdots,y_{q}\right)\in\mathbb{R}^{p+q}\,\Bigg{|}\,\sum_{j=1}^{p}a_{j}x_{j}+\sum_{j=1}^{q}b_{j}y_{j}=0\right\},

and we denote

\mathbf{n}\coloneqq(a_{1},\dots,a_{p},b_{1},\dots,b_{q})\in\mathbb{R}^{p+q}

for the normal vector of $H$ . Obviously, vector $\operatorname{I}_{p,q}\mathbf{n}$ lies in $H$ if and only if $\mathbf{n}$ satisfies the equation $\langle\mathbf{n},\mathbf{n}\rangle=0$ , or equivalently, the equation $\sum_{j=1}^{p}a_{j}^{2}=\sum_{j=1}^{q}b_{j}^{2}$ .

Corollary A.2.

A generic hyperplane in $\mathbb{R}^{p,q}$ is non-degenerate.

Proof A.3.

If $H$ is a hyperplane, then the image of its Gauss map is a single point $\mathbf{n}$ , which is the line determined by the normal vector of $H$ . Since $\mathcal{V}$ is a hypersurface in $\mathbb{P}\mathbb{R}^{p+q}$ , we see that the normal vector of a generic hyperplane does not lie on $\mathcal{V}$ and hence a generic hyperplane is non-degenerate.

Appendix B Additional Examples

The following Table 1 summarizes the examples computed in this paper.

Table 1: Explicit Examples Calculated

Manifolds	Non-degenerate	Geodesics	Paral. Transp.
(generic) hyperplane	yes	✓	✓
sphere	no	✗	✗
pseudo-sphere	yes	✓	✓
pseudo-hyperbolic space	yes	✓	✓
indefinite orthogonal group	no	✓	✗
orthogonal group	no	✓	✗
special linear group	yes ( $p\neq q$ )	✓	✗
symplectic group	no	✓	✗
SPD matrices	no	✓	✗

B.1 Semi-Riemannian Geometry of Symmetric Positive Definite Matrices

Let $\mathbb{S}^{n}_{++}$ be the manifold consisting of all $n\times n$ symmetric positive definite matrices. A matrix $A\in\mathbb{R}^{n\times n}$ is symmetric positive definite if and only if there exists some $M\in\mathrm{GL}(n,\mathbb{R})$ such that $A=M^{\mathsf{T}}M$ . The tangent space of $\mathbb{S}^{n}_{++}$ at $A$ is $\operatorname{S}^{2}\mathbb{R}^{n}$ . Hence if we regard $\mathbb{S}^{n}_{++}$ as a semi-Riemannian sub-manifold of $\mathrm{GL}(n,\mathbb{R})$ with respect to the semi-Riemannian metric $\langle\cdot,\cdot\rangle$ with signature $(p,q)=(p,n-p)$ , then the semi-normal space of $\mathbb{S}^{n}_{++}$ at $A$ is

\operatorname{SN}_{A}(\mathbb{S}^{n}_{++},\mathrm{GL}(n,\mathbb{R}))=\left\{\operatorname{I}_{p,q}\Delta:\Delta\in\bigwedge^{2}\mathbb{R}^{2}\right\}.

It is straightforward to compute the intersection of $\operatorname{T}_{A}\mathbb{S}^{n}_{++}$ and $\operatorname{SN}_{A}(\mathbb{S}^{n}_{++},\mathrm{GL}(n,\mathbb{R}))$ . Hence we obtain the following:

Proposition B.1.

The degenerate bundle of $\mathbb{S}^{n}_{++}$ in $\mathrm{GL}(n,\mathbb{R})$ is

(\mathbb{S}^{n}_{++})_{A}^{\perp}=\left\{\begin{bmatrix}0&X^{\mathsf{T}}\\ X&0\end{bmatrix}:X\in\mathbb{R}^{q\times p}\right\}.

In particular, $\mathbb{S}^{n}_{++}$ is a degenerate semi-Riemannian sub-manifold of $\mathrm{GL}(n,\mathbb{R})$ .

Proposition B.2.

If a geodesic passing through $A\in\mathbb{S}^{n}_{++}$ with the tangent direction $\operatorname{I}_{p,q}\Delta$ exists, then $\Delta=\begin{bmatrix}0&-\Delta_{2}^{\mathsf{T}}\\ \Delta_{2}&0\end{bmatrix}$ and $\gamma(t)$ can be written as

\gamma(t)=\int_{0}^{t}\begin{bmatrix}0&U(\tau)^{\mathsf{T}}\\ U(\tau)&0\end{bmatrix}+A,

for some suitable $q\times p$ matrix-valued function $U(t)$ such that $U(0)=\Delta_{2}$ and

t\begin{bmatrix}0&U(\tau)^{\mathsf{T}}\\ U(\tau)&0\end{bmatrix}d\tau+A\succ 0

for each $t$ .

Proof B.3.

Let $\gamma(t)$ be a geodesic curve passing through $A$ with the tangent direction $\operatorname{I}_{p,q}\Delta$ . Then $\gamma(t)$ must satisfies the following conditions:

\gamma(0)=A,\quad\dot{\gamma}(0)=\operatorname{I}_{p,q}\Delta,\quad\dot{\gamma}(t)\in\operatorname{S}^{2}\mathbb{R}^{n},\quad\operatorname{I}_{p,q}\ddot{\gamma}(t)\in\bigwedge^{2}\mathbb{R}^{2}.

Hence $\dot{\gamma}(t)=\begin{bmatrix}0&U(t)^{\mathsf{T}}\\ U(t)&0\end{bmatrix}$ for some $q\times p$ matrix-valued function $U(t)$ . Therefore, we must have

\gamma(t)=\int_{0}^{t}\begin{bmatrix}0&U(t)^{\mathsf{T}}\\ U(t)&0\end{bmatrix}+A\in\mathbb{S}^{n}_{++}.

Since $A=\frac{A}{t}\int_{0}^{t}d\tau$ , we see that $\gamma(t)\succ 0$ if and only if

t\begin{bmatrix}0&U(t)^{\mathsf{T}}\\ U(t)&0\end{bmatrix}+A\succ 0.

B.2 Semi-Riemannian Geometry of Matrix Lie Groups

As discussed in Section 4.1, matrix Lie groups provide another class of rich examples for manifolds admitting semi-Riemannian structures. In this section we illustrate how to apply the semi-Riemannian manifold optimization framework developed in Section 3 to some common matrix Lie groups, despite the degeneracy of their inherited sub-Riemannian structures, by means of projection to semi-normal bundles. We begin with a general discussion on the semi-normal bundle of matrix Lie groups, then specialize to several examples.

Let $G\subseteq\mathrm{GL}(n,\mathbb{R})$ be a matrix Lie group. We denote by $\mathfrak{g}$ the Lie algebra of $G$ . Then the tangent space of $G$ at a point $A\in G$ is simply $A\mathfrak{g}$ . For each fixed positive integer $0\leq p\leq n$ , there is a left-invariant semi-Riemannian metric of signature $(p,q)=(p,n-p)$ on $\mathrm{GL}(n,\mathbb{R})$ defined by

\langle U,V\rangle_{A}=\operatorname{tr}((A^{-1}U)^{\mathsf{T}}\operatorname{I}_{p,q}(A^{-1}V)),\quad U,V\in\operatorname{T}_{A}G\quad,A\in G.

Since the bilinear form $\langle\cdot,\cdot\rangle$ is left-invariant, we have the following

Lemma B.4.

For each $A\in G$ , we have

\operatorname{SN}_{A}(G,\mathrm{GL}(n,\mathbb{R}))=\{AX:X\in\operatorname{SN}_{\operatorname{I}_{n}}(G,\mathrm{GL}(n,\mathbb{R})\}.

Let $(\cdot,\cdot)$ be the Riemannian metric on $\mathrm{GL}(n,\mathbb{R})$ defined by

(U,V)_{A}=\operatorname{tr}((A^{-1}U)^{\mathsf{T}}A^{-1}V),\quad A\in\mathrm{GL}(n,\mathbb{R}),\quad U,V\in\operatorname{T}_{A}\mathrm{GL}(n,\mathbb{R}).

We denote by $\operatorname{N}(G,\mathrm{GL}(n,\mathbb{R}))$ the normal bundle of $G$ in $\mathrm{GL}(n,\mathbb{R})$ . We can relate the normal bundle and the semi-normal of bundle of $G$ in $\mathrm{GL}(n,\mathbb{R})$ by the following:

Proposition B.5.

For each $A\in G$ , we have

\operatorname{SN}_{A}(G,\mathrm{GL}(n,\mathbb{R}))=\operatorname{I}_{p,q}\operatorname{N}_{A}(G,\mathrm{GL}(n,\mathbb{R})).

In particular, $\operatorname{SN}(G,\mathrm{GL}(n,\mathbb{R}))$ is a vector bundle on $G$ of rank $n^{2}-\dim G$ .

Proof B.6.

$U\in\mathbb{R}^{n\times n}$ lies in $\operatorname{SN}_{A}(G,\mathrm{GL}(n,\mathbb{R}))$ if and only if $\operatorname{tr}((A^{-1}U)^{\mathsf{T}}\operatorname{I}_{p,q}A^{-1}V)=0$ for all $V\in\mathfrak{g}$ . This implies that $U\in\operatorname{SN}_{A}(G,\mathrm{GL}(n,\mathbb{R}))$ if and only if $\operatorname{I}_{p,q}U\in\operatorname{N}_{A}(G,\mathrm{GL}(n,\mathbb{R}))$ and this completes the proof.

Let $\gamma(t)$ be a geodesic passing through $A\in G$ with tangent direction $AU$ . Then by definition, $\gamma(t)$ is characterized by the following relations:

\gamma(t)\in G,\quad\dot{\gamma}(t)\in\gamma(t)\mathfrak{g},\quad\ddot{\gamma}(t)\in\operatorname{SN}_{\gamma(t)}(G,\mathrm{GL}(n,\mathbb{R}))

and the initial condition $\gamma(0)=A$ , $\dot{\gamma}(0)=U$ . To compute geodesics explicitly for matrix Lie groups, we need the following simple but handy observations.

Lemma B.7.

If $\gamma(t)$ is a given geodesic, then there exists a unique curve $U(t)$ in $\mathfrak{g}$ such that $\dot{\gamma}(t)=\gamma(t)U(t)$ and $U(t)^{2}+\dot{U(t)}\in\operatorname{SN}_{\operatorname{I}_{n}}(G,\mathrm{GL}(n,\mathbb{R}))$ .

Proof B.8.

Since $\dot{\gamma}(t)\in\gamma(t)\mathfrak{g}$ , we can write $\dot{\gamma}(t)=\gamma(t)U(t)$ for a curve $U(t)$ in $\mathfrak{g}$ . By differentiating $\dot{\gamma}(t)=\gamma(t)U(t)$ , we obtain

\ddot{\gamma}(t)=\gamma(t)(U(t)^{2}+\dot{U}(t))\in\operatorname{SN}_{\gamma(t)}(G,\mathrm{GL}(n,\mathbb{R})).

This implies that $U(t)^{2}+\dot{U(t)}\in\operatorname{SN}_{\operatorname{I}_{n}}(G,\mathrm{GL}(n,\mathbb{R}))$ .

Proposition B.9.

If $\gamma(t)$ is a geodesic passing through $A$ with tangent direction $AU$ , then $\gamma(t)=A\exp(\int_{0}^{t}U(\tau)d\tau)$ , where $U(t)$ is the curve in $\mathfrak{g}$ determined in Lemma B.7.

Corollary B.10.

The geodesic curve $\gamma(t)$ passing through $A\in\operatorname{O}(p,q)$ with direction $A\operatorname{I}_{p,q}U$ in Eq. 40 is of constant speed.

Proof B.11.

We calculate $\widetilde{g}_{\gamma(t)}(\dot{\gamma}(t),\dot{\gamma}(t))$ . By Lemma B.17, we have

\widetilde{g}_{\gamma(t)}(\dot{\gamma}(t),\dot{\gamma}(t))=-\operatorname{tr}(U_{1}^{2})+\operatorname{tr}(U_{3}^{2}).

Corollary B.12.

The energy $\operatorname{E}(t)$ of the geodesic $\gamma(t)$ in Eq. 40 is

E\left(t\right)=(-\operatorname{tr}(U_{1}^{2})+\operatorname{tr}(U_{3}^{2}))t.

Proof B.13.

By definition and Corollary B.10, we have

\operatorname{E}(t)=\int_{0}^{t}\widetilde{g}_{\gamma(\tau)}(\dot{\gamma}(\tau),\dot{\gamma}(\tau))d\tau=(-\operatorname{tr}(U_{1}^{2})+\operatorname{tr}(U_{3}^{2}))t.

The following characterization of the parallel transport of a tangent vector along a geodesic on $G$ can be easily obtained by unravelling Definition 4.13.

Lemma B.14.

Let $\Delta(t)$ be a parallel transport of $\Delta\in\operatorname{T}_{A}G$ along a geodesic curve $\gamma(t)$ passing through $A\in G$ with tangent direction $AU\in\operatorname{T}_{A}G$ . Then

\Delta(t)=\gamma(t)V(t),

where $V(t)$ is a curve in $\mathfrak{g}$ such that $U(t)V(t)+\dot{V}(t)\in\operatorname{SN}_{\operatorname{I}_{n}}(G,\mathrm{GL}(n,\mathbb{R}))$ and $V(0)=\Delta$ . Here $U(t)$ is the curve in $\mathfrak{g}$ determined in Lemma B.7.

B.2.1 Indefinite Orthogonal Groups

The definition of indefinite orthogonal groups $O\left(p,q\right)$ can be found in Example 4.5. In this subsection we derive explicit formulae for the geodesic in $O\left(p,q\right)$ following Definition 4.12.

Proposition B.15.

For each $A\in\operatorname{O}(p,q)$ , we have

\operatorname{SN}_{A}(\operatorname{O}(p,q),\mathrm{GL}(p+q,\mathbb{R}))=\{AS:S\in\operatorname{S}^{2}\mathbb{R}^{p+q}\}

Proof B.16.

By Lemma B.4, it is sufficient to prove

\operatorname{SN}_{\operatorname{I}_{p+q}}(\operatorname{O}(p,q),\mathrm{GL}(p+q,\mathbb{R}))=\{S:S\in\operatorname{S}^{2}\mathbb{R}^{p+q}\}

as $\langle\cdot,\cdot\rangle$ is left-invariant. To this end, we notice that by Proposition B.5

\operatorname{SN}_{\operatorname{I}_{n}}(\operatorname{O}(p,q),\mathrm{GL}(p+q,\mathbb{R}))=\operatorname{I}_{p,q}\operatorname{N}_{\operatorname{I}_{n}}(\operatorname{O}(p,q),\mathrm{GL}(n,\mathbb{R})).

Now $\operatorname{N}_{\operatorname{I}_{n}}(\operatorname{O}(p,q),\mathrm{GL}(n,\mathbb{R}))$ consists of all matrices of the form $\operatorname{I}_{p,q}S$ where $S$ is symmetric, we may conclude that

\operatorname{SN}_{\operatorname{I}_{n}}(\operatorname{O}(p,q),\mathrm{GL}(p+q,\mathbb{R}))=\{S:S\in\operatorname{S}^{2}\mathbb{R}^{p+q}\}.

Let $\gamma(t)$ be a geodesic curve passing through $A\in\operatorname{O}(p,q)$ with direction $A\operatorname{I}_{p,q}\Delta$ for some skew-symmetric matrix $\Delta$ . By Proposition B.9 and Proposition B.15, the curve $\gamma(t)$ can be written as

(37)

\gamma(t)=A\exp\left(\int_{0}^{t}U(\tau)d\tau\right),

where $U(t)$ is a curve in $\mathfrak{o}(p,q)$ such that

(38)

U(0)=\operatorname{I}_{p,q}\Delta,\quad U(t)^{2}+\dot{U(t)}\in\operatorname{S}^{2}\mathbb{R}^{p+q}.

We partition a $(p+q)\times(p+q)$ skew-symmetric matrix $Y$ as $\begin{bmatrix}Y_{1}&-Y_{2}^{\mathsf{T}}\\ Y_{2}&Y_{3}\end{bmatrix}$ where $Y_{1}\in\bigwedge^{2}\mathbb{R}^{p}$ , $Y_{3}\in\bigwedge^{2}\mathbb{R}^{q}$ and $Y_{2}\in\mathbb{R}^{q\times p}$ .

Lemma B.17.

Let $t\mapsto\gamma(t)$ be the geodesic passing through $A$ with direction $A\operatorname{I}_{p,q}\Delta$ . Then the curve $U(t)$ in $\mathfrak{o}(p,q)$ satisfying (38) is

(39)

U(t)=\begin{bmatrix}-\Delta_{1}&\exp(-\Delta_{1}t)\Delta_{2}^{\mathsf{T}}\exp(\Delta_{3}t)\\ \exp(-\Delta_{3}t)\Delta_{2}\exp(\Delta_{1}t)&\Delta_{3}\end{bmatrix}.

Proof B.18.

We parametrize $U(t)$ as $U(t)=\operatorname{I}_{p,q}\Delta(t)$ where $\Delta(t)\in\bigwedge^{2}\mathbb{R}^{p+q}$ . Since $U(t)^{2}+\dot{U}(t)$ is symmetric, we have

\operatorname{Skew}(U(t)^{2}+\dot{U}(t))=0.

This implies that

\dot{\Delta_{1}}(t)=0,\quad\dot{\Delta_{3}}(t)=0,\quad\dot{\Delta_{2}}(t)=\Delta_{2}(t)\Delta_{1}(t)-\Delta_{3}(t)\Delta_{2}(t).

from which we obtain $\Delta_{1}(t)=\Delta_{1}$ , $\Delta_{3}(t)=\Delta_{3}$ and

\Delta_{2}(t)=\exp(-\Delta_{3}t)\Delta_{2}\exp(\Delta_{1}t).

Therefore, we obtain

U(t)=\begin{bmatrix}-\Delta_{1}&\exp(-\Delta_{1}t)\Delta_{2}^{\mathsf{T}}\exp(\Delta_{3}t)\\ \exp(-\Delta_{3}t)\Delta_{2}\exp(\Delta_{1}t)&\Delta_{3}\end{bmatrix}.

By Proposition B.9, we obtain the following:

Proposition B.19.

The geodesic curve $\gamma(t)$ passing through $A\in\operatorname{O}(p,q)$ with direction $A\operatorname{I}_{p,q}\Delta$ is unique and is

(40)

\gamma(t)=A\exp\left(\begin{bmatrix}-\Delta_{1}t&\int_{0}^{t}\exp(-\Delta_{1}\tau)\Delta_{2}^{\mathsf{T}}\exp(\Delta_{3}\tau)d\tau\\ \int_{0}^{t}\exp(-\Delta_{3}\tau)\Delta_{2}\exp(\Delta_{1}\tau)d\tau&\Delta_{3}t\end{bmatrix}\right).

Corollary B.20.

The curve $\gamma(t)=A\exp(t\operatorname{I}_{p,q}\Delta)$ is a geodesic curve passing through $A$ with direction $\operatorname{I}_{p,q}Y$ if and only if $\Delta_{3}\Delta_{2}=\Delta_{2}\Delta_{1}$ .

Proof B.21.

If $\gamma(t)=A\exp(t\operatorname{I}_{p,q}\Delta)$ is a geodesic curve, then from Proposition B.19, it is straightforward to verify that $\Delta_{3}\Delta_{2}-\Delta_{2}\Delta_{1}=0$ . Conversely, if $\Delta_{3}\Delta_{2}-\Delta_{2}\Delta_{1}=0$ , then Eq. 40 is reduced to

\gamma(t)=A\exp\left(\begin{bmatrix}\Delta_{1}t&\Delta_{2}^{\mathsf{T}}t\\ \Delta_{2}t&\Delta_{3}t\end{bmatrix}\right)=A\exp(t\operatorname{I}_{p,q}\Delta).

In particular, if $p=0$ (resp. $q=0$ ), then $\Delta_{3}\Delta_{2}=\Delta_{2}\Delta_{1}$ always holds and Corollary B.20 shows that geodesics in $\operatorname{O}(q)$ (resp. $p=0$ ) are of the form $A\exp(t\Delta)$ for some skew-symmetric matrix $\Delta$ . Moreover, Corollary B.20 already shows a big difference between Riemannian geometry and non-Riemannian geometry of $\operatorname{O}(p,q)$ (cf. [5, Theorem 2.14]).

B.2.2 Orthogonal Groups

By Proposition B.19, the geodesic passing through a point $A\in\operatorname{O}(p,q)$ with tangent direction $U\in\operatorname{T}_{A}\operatorname{O}(p,q)$ exists. We will see that this is not always true. To that end, we consider $\operatorname{O}(n)$ for example. First, according to Lemma B.4 and Proposition B.5, we have the following

Proposition B.22.

For each $A\in\operatorname{O}(n)$ , the semi-normal space of $\operatorname{O}(n)$ at $A$ is

\operatorname{SN}_{A}(\operatorname{O}(n),\mathrm{GL}(n,\mathbb{R}))=\{\operatorname{I}_{p,q}S:S\in\operatorname{S}^{2}\mathbb{R}^{n}\}.

Now if $\gamma(t)$ is a geodesic curve passing through $A$ with tangent direction $A\Delta$ where $\Delta=\begin{bmatrix}\Delta_{1}&-\Delta_{2}^{\mathsf{T}}\\ \Delta_{2}&\Delta_{3}\end{bmatrix}\in\mathfrak{o}(n)$ , then

\gamma(t)=A\exp\left(\int_{0}^{t}U(\tau)d\tau\right),

where $U(t)=\begin{bmatrix}U_{1}(t)&-U_{2}(t)^{\mathsf{T}}\\ U_{2}(t)&U_{3}(t)\end{bmatrix}$ is skew-symmetric satisfying

U(0)=\Delta,\quad\operatorname{I}_{p,q}(U(t)^{2}+\dot{U}(t))\in\operatorname{S}^{2}\mathbb{R}^{n}.

Hence we have

\operatorname{Skew}(\operatorname{I}_{p,q}(U(t)^{2}+\dot{U}(t)))=0,

which implies

\dot{U_{1}}(t)=0,\quad\dot{U_{3}}(t)=0,\quad U_{2}(t)U_{1}(t)+U_{3}(t)U_{2}(t)=0.

Therefore, we may conclude that

U_{1}(t)=\Delta_{1},\quad U_{3}(t)=\Delta_{3},\quad U_{2}(t)\Delta_{1}+\Delta_{3}U_{2}(t)=0.

Proposition B.23.

On $\operatorname{O}(n)$ , a geodesic passing through $A$ with the tangent direction $A\Delta$ exists if and only if the skew symmetric matrix $\Delta=\begin{bmatrix}\Delta_{1}&-\Delta_{2}^{\mathsf{T}}\\ \Delta_{2}&\Delta_{3}\end{bmatrix}$ satisfies the relation

\Delta_{2}\Delta_{1}+\Delta_{3}\Delta_{2}=0.

If such a $\gamma$ exists, then $\gamma$ if of the form

\gamma(t)=A\exp\left(\begin{bmatrix}\Delta_{1}t&-\int_{0}^{t}U_{2}(\tau)^{\mathsf{T}}d\tau\\ \int_{0}^{t}U_{2}(\tau)d\tau&\Delta_{3}t\end{bmatrix}\right)

where $U_{2}$ is a curve in the linear subspace $\{X\in\mathbb{R}^{q\times p}:X\Delta_{1}+\Delta_{3}X=0\}$ such that $U_{2}(0)=\Delta_{2}$ . The square of the speed of $\gamma(t)$ is $\operatorname{tr}(\Delta^{2}_{1})-\operatorname{tr}(\Delta^{2}_{3})$ and the energy of $\gamma$ is $E(t)=t(\operatorname{tr}(\Delta^{2}_{1})-\operatorname{tr}(\Delta^{2}_{3}))$ .

In particular, if $\Delta_{2}\Delta_{1}+\Delta_{3}\Delta_{2}=0$ and we take $U_{2}(t)=\Delta_{2}$ , then by Proposition B.23 $\gamma$ becomes

\gamma(t)=A\exp(\Delta t),

which is exactly the geodesic curve on $\operatorname{O}(n)$ with the usual Riemannian metric

\left(U,V\right)_{A}=\operatorname{tr}((A^{-1}U)^{\mathsf{T}}A^{-1}V)

where $A\in\operatorname{O}(n)$ and $U,V\in\operatorname{T}_{A}\operatorname{O}(n)$ .

B.2.3 Special Linear Groups with $p\neq q$

We notice that the Lie algebra of $\operatorname{SL}(n,\mathbb{R})$ consists of all traceless $n\times n$ matrices, which implies that

\operatorname{N}_{\operatorname{I}_{n}}(\operatorname{SL}(n,\mathbb{R}),\mathrm{GL}(n,\mathbb{R}))=\{\lambda\operatorname{I}_{n}:\lambda\in\mathbb{R}\}.

Therefore, from Lemma B.4 and Proposition B.5, we have the following

Proposition B.24.

For each $A\in\operatorname{SL}(n,\mathbb{R})$ , the semi-normal space of $\operatorname{SL}(n,\mathbb{R})$ at $A$ is

\operatorname{SN}_{A}(\operatorname{SL}(n,\mathbb{R},\mathrm{GL}(n,\mathbb{R})))=\{\lambda A\operatorname{I}_{p,q}:\lambda\in\mathbb{R}\}.

In particular,

\operatorname{SL}(n,\mathbb{R})^{\perp}=\begin{cases}\operatorname{SL}(n,\mathbb{R})\times\{0\},&\text{if}~p\neq q\\ \operatorname{SN}(\operatorname{SL}(n,\mathbb{R},\mathrm{GL}(n,\mathbb{R}))),&\text{otherwise}.\end{cases}

Hence $\operatorname{SL}(n,\mathbb{R})$ is a non-dengerate semi-Riemannian sub-manifold of $\mathrm{GL}(n,\mathbb{R})$ if and only if $p\neq q$ .

By Proposition B.24 and [35, Corollary 10], we obtain the following

Corollary B.25.

If $p\neq q$ , then the geodesic passing through $A\in\operatorname{SL}(n,\mathbb{R})$ with the tangent direction $AU\in\operatorname{T}_{A}\operatorname{SL}(n,\mathbb{R})$ exists and it is unique.

If $\gamma(t)$ is a geodesic curve passing $A\in\operatorname{SL}(n,\mathbb{R})$ with the tangent direction $AU\in\operatorname{T}_{A}\operatorname{SL}(n,\mathbb{R})$ , then

\gamma(t)=A\exp\left(\int_{0}^{t}U(\tau)d\tau\right),

where $\operatorname{tr}(U(t))=0$ satisfying

(41)

U(0)=U,\quad U(t)^{2}+\dot{U}(t)=\lambda(t)\operatorname{I}_{p,q}

for some real valued function $\lambda(t)$ . Moreover, from (41), we also have

U(0)^{2}+\dot{U}(0)=\lambda(0)\operatorname{I}_{p,q}.

Together with the fact that $\operatorname{tr}(\dot{U}(0))=0$ as $\operatorname{tr}(U(t))=0$ , we may conclude that

(42)

\lambda(0)(q-p)=\operatorname{tr}(U(0)^{2})=\operatorname{tr}(U^{2}).

Proposition B.26.

If $p=q$ and $\gamma$ is a geodesic curve passing through $A\in\operatorname{SL}(n,\mathbb{R})$ with the tangent direction $\dot{\gamma}(0)=AU\in\operatorname{T}_{A}\operatorname{SL}(n,\mathbb{R})$ , then $\operatorname{tr}(U^{2})=0$ .

Next we consider the simplest case where $n=2$ and $p=q=1$ , we may write $U(t)=\begin{bmatrix}a(t)&b(t)\\ c(t)&-a(t)\end{bmatrix}$ and $U=\begin{bmatrix}a&b\\ c&-a\end{bmatrix}$ so that the ODE in (41) becomes

\begin{bmatrix}a(t)^{2}+b(t)c(t)+\dot{a}(t)&\dot{b}(t)\\ \dot{c}(t)&a(t)^{2}+b(t)c(t)-\dot{a}(t)\end{bmatrix}=\lambda(t)\begin{bmatrix}-1&0\\ 0&1\end{bmatrix}.

This implies that $b(t)=b,c(t)=c$ and $a(t)^{2}=-bc$ . Hence we may conclude that

Proposition B.27.

An embedded geodesic $\gamma$ on $\operatorname{SL}(2,\mathbb{R})$ passing through $A$ with tangent direction $U=\begin{bmatrix}a&b\\ c&-a\end{bmatrix}$ exists if and only if $a^{2}+bc=0$ . Moreover, if such $\gamma$ exists, then it is unique and

\gamma(t)=A\exp\left(\begin{bmatrix}a&b\\ c&-a\end{bmatrix}t\right).

The square of the speed of $\gamma(t)$ is $2a^{2}+b^{2}+c^{2}$ and the energy $E(t)$ is $t(2a^{2}+b^{2}+c^{2})$ .

B.2.4 Symplectic Group

We recall that the symplectic group $\operatorname{Sp}(2n,\mathbb{R})$ is the group of $(2n)\times(2n)$ matrices $A$ satisfying

A^{\mathsf{T}}\operatorname{J}_{n}A=\operatorname{J}_{n}

where $\operatorname{J}_{n}=\begin{bmatrix}0&\operatorname{I}_{n}\\ -\operatorname{I}_{n}&0\end{bmatrix}$ . The Lie algebra of $\operatorname{Sp}(2n,\mathbb{R})$ is

\mathfrak{sp}(n)=\left\{\operatorname{J}_{n}S:S\in\operatorname{S}^{2}\mathbb{R}^{2n}\right\}

and the normal space is

\operatorname{N}_{\operatorname{I}_{2n}}(\operatorname{Sp}(2n,\mathbb{R}),\mathrm{GL}(2n,\mathbb{R}))=\left\{\operatorname{J}_{n}\Delta:\Delta\in\bigwedge^{2}\mathbb{R}^{2n}\right\}.

Without loss of generality, we may assume that $1\leq p\leq n$ .

Proposition B.28.

For each $A\in\operatorname{Sp}(2n,\mathbb{R})$ , we have

\operatorname{SN}_{A}(\operatorname{Sp}(2n,\mathbb{R}),\mathrm{GL}(2n,\mathbb{R}))=\left\{A\operatorname{I}_{p,q}\operatorname{J}_{n}\Delta:\Delta\in\bigwedge^{2}\mathbb{R}^{2n}\right\}

and

\operatorname{Sp}(2n,\mathbb{R})_{A}^{\perp}=\left\{A\begin{bmatrix}X&Y&0&Z^{\mathsf{T}}\\ 0&0&Z&0\\ 0&0&X^{\mathsf{T}}&0\\ 0&0&Y^{\mathsf{T}}&0\end{bmatrix}:X\in\mathbb{R}^{p\times p},Y\in\mathbb{R}^{p\times(n-p)},Z\in\mathbb{R}^{(n-p)\times p}\right\}.

In particular, $\operatorname{Sp}(2n,\mathbb{R})$ is a degenerate semi-Riemannian sub-manifold of $\mathrm{GL}(2n,\mathbb{R})$ .

Proof B.29.

The description of the semi-normal space follows from Lemma B.4 and Proposition B.5 and the description of $\operatorname{Sp}(2n,\mathbb{R})_{A}^{\perp}$ is obtained by a straightforward calculation.

Next we describe geodesics on $\operatorname{Sp}(2n,\mathbb{R})$ .

Proposition B.30.

Let $\gamma(t)$ be a geodesic passing through $A\in\operatorname{Sp}(2n,\mathbb{R})$ with the tangent direction $A\operatorname{J}_{n}X$ . If we write $X$ as $X=\begin{bmatrix}S&P^{\mathsf{T}}\\ P&Q\end{bmatrix}$ where $S,Q$ are symmetric $n\times n$ matrices and $P$ is an $n\times n$ matrix. We also partition $S,P,Q$ as

S=\begin{bmatrix}S_{11}&S_{12}\\ S_{12}^{\mathsf{T}}&S_{22}\end{bmatrix},\quad P=\begin{bmatrix}P_{11}&P_{12}\\ P_{21}&P_{22}\end{bmatrix},\quad Q=\begin{bmatrix}Q_{11}&Q_{12}\\ Q_{12}^{\mathsf{T}}&Q_{22}\end{bmatrix}

where $S_{11},P_{11},Q_{11}\in\mathbb{R}^{p\times p}$ and $S_{22},P_{22},Q_{22}\in\mathbb{R}^{(n-p)\times(n-p)}$ . Then

\gamma(t)=A\exp(\int_{0}^{t}\begin{bmatrix}P(\tau)&Q(\tau)\\ -S&-P(\tau)^{\mathsf{T}}\end{bmatrix}d\tau,

where $P(t)=\begin{bmatrix}P_{11}(t)&P_{12}(t)\\ P_{21}&P_{22}\end{bmatrix}$ , $Q(t)=\begin{bmatrix}Q_{11}&Q_{12}(t)\\ Q_{12}(t)^{\mathsf{T}}&Q_{22}\end{bmatrix}$ and

\begin{cases}Q_{11}S_{11}+Q_{12}(t)S_{12}^{\mathsf{T}}=P_{11}(t)^{2}+P_{12}(t)P_{21},\\ Q_{11}S_{12}+Q_{12}(t)S_{22}=P_{11}(t)P_{12}(t)+P_{12}(t)P_{22},\\ (P_{21}Q_{11}+P_{22}P_{12}(t)^{\mathsf{T}})-(Q_{12}(t)^{\mathsf{T}}P_{11}(t)^{\mathsf{T}}+Q_{22}P_{12}(t)^{\mathsf{T}})+\dot{Q}_{12}(t)^{\mathsf{T}}=0.\end{cases}

Semi-Riemannian Manifold Optimization

Abstract

keywords:

1 Introduction

2 Preliminaries

2.1 Notations

2.2 Riemannian Manifold Optimization

Proposition 2.1 ([8], Proposition 1.1).

2.3 Semi-Riemannian Geometry

Definition 2.2.

Lemma 2.3 (Chapter 2, Lemma 23, [35]).

Definition 2.4 (Semi-Riemannian Manifolds).

Example 2.5 (Minkowski Spaces ℝp,q\mathbb{R}^{p,q}).

Example 2.6.

Remark 2.7.

Proposition 2.8.

Proof 2.9.

Example 2.10 (Gradient and Hessian in Minkowski Spaces).

3 Semi-Riemannian Optimization Framework

3.1 Optimality Conditions

Proposition 3.1 (Semi-Riemannian First- and Second-Order Necessary Conditions for Optimality).

Proof 3.2.

Proposition 3.3 (Semi-Riemannian Second-Order Sufficient Conditions).

3.2 Determining the “Steepest Descent Direction”

Example 3.4.

Remark 3.5.

Remark 3.6.

Remark 3.7.

Example 3.8 (Semi-Riemannian Gradient Descent for Minkowski Spaces).

3.3 Semi-Riemannian Conjugate Gradient

Remark 3.9.

Remark 3.10.

3.4 Metric Independence of Second Order Methods

3.4.1 Semi-Riemannian Newton’s Method

Example 3.11 (Semi-Riemannian Newton’s Method for Minkowski Spaces).

3.4.2 Jets and the Metric Independence of Trust Region Method

4 Semi-Riemannian Optimization on Submanifolds

4.1 Degeneracy of Semi-Riemannian Submanifolds

Definition 4.1.

Definition 4.2.

Definition 4.3 (Degenerate Semi-Riemannian Manifolds).

Definition 4.4 (Degenerate Bundle, [24] Definition 3.1).

Example 4.5 (Indefinite Orthogonal Group).

Example 4.6 (Orthogonal Group).

Remark 4.7.

Remark 4.8.

Remark 4.9.

4.1.1 Gradient and Hessian of Submanifolds of Minkowski Spaces

Example 4.10 (Euclidean Sphere in Minkowski Spaces).

4.1.2 Geodesics and Parallel-Transports

Lemma 4.11.

Definition 4.12.

Definition 4.13.

4.2 Semi-Riemannian Hypersurfaces of Minkowski Spaces

Proposition 4.14.

Corollary 4.15.

Proof 4.16.

Example 4.17 (Euclidean Spheres in Minkowski Spaces).

Proposition 4.18.

4.2.1 Pseudo-spheres

Lemma 4.19.

Proposition 4.20.

Proof 4.21.

Proposition 4.22.

4.2.2 Pseudo-hyperbolic Spaces

Corollary 4.23.

Corollary 4.24.

Corollary 4.25.

5 Numerical Experiments

5.1 Minkowski Spaces

5.2 Euclidean Spheres in Minkowski Spaces

5.3 Pseudo-spheres in Minkowski Spaces

6 Conclusion

Software

Acknowledgments

References

Appendix A Genericity Non-degeneracy of Semi-Riemannian Structures on Hyperplanes of Minkowski Spaces

Proposition A.1.

Corollary A.2.

Proof A.3.

Example 2.5 (Minkowski Spaces $\mathbb{R}^{p,q}$ ).

B.2.3 Special Linear Groups with $p\neq q$