∎

¹¹institutetext: H. Oviedo ²²institutetext: Mathematics Research Center, CIMAT A.C. Guanajuato, Mexico.
²²email: harry.oviedo@cimat.mx ³³institutetext: H. Lara ⁴⁴institutetext: Universidade Federal de Santa Catarina, Campus Blumenau, Blumenau, Brazil.
⁴⁴email: hugo.lara.urdaneta@ufsc.br

Spectral Residual Method for Nonlinear Equations on Riemannian Manifolds

Harry Oviedo Hugo Lara

(Received: date / Accepted: date)

Abstract

In this paper, the spectral algorithm for nonlinear equations (SANE) is adapted to the problem of finding a zero of a given tangent vector field on a Riemannian manifold. The generalized version of SANE uses, in a systematic way, the tangent vector field as a search direction and a continuous real–valued function that adapts this direction and ensures that it verifies a descent condition for an associated merit function. In order to speed up the convergence of the proposed method, we incorporate a Riemannian adaptive spectral parameter in combination with a non–monotone globalization technique. The global convergence of the proposed procedure is established under some standard assumptions. Numerical results indicate that our algorithm is very effective and efficient solving tangent vector field on different Riemannian manifolds and competes favorably with a Polak–Ribiére–Polyak Method recently published and other methods existing in the literature.

Keywords:

Tangent vector field Riemannian manifold Nonlinear system of equations Spectral residual method Non–monotone line search.

MSC:

65K05, 90C30, 90C56, 53C21.

1 Introduction

In this work, we consider the problem of finding a zero of a tangent vector field $F(\cdot)$ over a Riemannian manifold $\mathcal{M}$ , with an associated Riemannian metric $\langle\cdot,\cdot\rangle$ (a local inner product which induces a corresponding local metric). The problem can be mathematically formulated as the solution of the following nonlinear equation

F(X)=0_{X},

(1)

where $F:\mathcal{M}\to\mathcal{TM}$ is a continuously differentiable tangent vector field, and $\mathcal{TM}:=\cup_{X\in\mathcal{M}}T_{X}\mathcal{M}$ denotes the tangent bundle of $\mathcal{M}$ , i.e., $\mathcal{TM}$ is the union of all tangent spaces at points in the manifold. Here, $0_{X}$ denotes the zero vector of the tangent space $T_{X}{\mathcal{M}}$ . This kind of problem appears frequently in several applications, for example: statistical principal component analysis oja1989neural , where the Oja’s flow induces the associated vector field, total energy minimization in electronic structure calculations martin2020electronic ; saad2010numerical ; zhang2014gradient , linear eigenvalue problems manton2002optimization ; wen2016trace , dimension reduction techniques in pattern recognition kokiopoulou2011trace ; turaga2008statistical ; zhang2018robust , Riemannian optimization problems, where the Riemannian gradient flow leads to the associated tangent vector field absil2009optimization ; edelman1998geometry ; ring2012optimization , among others.

Problem (1) is closely related to the problem of minimizing a differentiable function over the manifold $\mathcal{M}$ ,

\min\mathcal{F}(X)\quad\textrm{s.t.}\quad X\in\mathcal{M},

(2)

where $\mathcal{F}:\mathcal{M}\to\mathbb{R}$ is a smooth function. Different iterative methods have been developed for solving (2). Some popular schemes are based on gradient method cedeno2018projected ; iannazzo2018riemannian ; manton2002optimization ; oviedoimplicit ; oviedo2019scaled ; oviedo2020two ; oviedo2019non , conjugate gradient methods absil2009optimization ; edelman1998geometry ; zhu2017riemannian , Newton’s method absil2009optimization ; sato2014riemannian , or quasi–Newton methods absil2009optimization . All these numerical methods can be used to find a zero of the following tangent vector field equation,

\nabla_{\mathcal{M}}\mathcal{F}(X)=0_{X},

(3)

where $\nabla_{\mathcal{M}}\mathcal{F}(\cdot)$ denotes the Riemannian gradient of $\mathcal{F}(\cdot)$ , which is a particular case of problem (1). The Riemannian line–search methods, designed to solve the optimization problem (2) construct a sequence of points using the following recursive formula

X_{k+1}=R_{X_{k}}[\xi_{X_{k}}],

(4)

where $R_{X}:T_{X}\mathcal{M}\to\mathcal{M}$ is a retraction (see Definition 1), and $\xi_{X_{k}}\in T_{X_{k}}\mathcal{M}$ is a descent direction, i.e, $\xi_{X_{k}}$ verifies the inequality $\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),\xi_{X_{k}}\rangle<0$ for all $k\geq 0$ . Among the Riemannian line–search methods, the Riemannian gradient approach exhibits the lowest cost per iteration. This method uses the gradient vector field $\nabla_{\mathcal{M}}\mathcal{F}(\cdot)$ to define the search direction by $\xi_{X_{k}}=-\nabla_{\mathcal{M}}\mathcal{F}(X_{k})$ , at each iteration.

In the literature, there are some iterative algorithms addressing the problem (1). In adler2002newton ; dedieu2003newton ; li2005convergence were developed several Riemannian Newton methods for the solution of tangent vector fields on general Riemannian manifolds. Among the features of Newton’s method, the requirement of using second–order information and geodesics (that involves the computation of exponential mapping) to ensure keeping into the corresponding manifold, leads to the growth of computational cost. In addition, the authors in breiding2018convergence proposed a Riemannian Gauss–Newton method to address the solution of (1) through the optimization problem (2). Recently, in yao2020riemannian was introduced a Riemannian conjugate gradient to deal with the numerical solution of (1), that does not need derivative computation (it does not use the Jacobian of $F(\cdot)$ ), and that incorporates the use of retractions (see Definition 1), which is a mapping that generalizes the definition of geodesics, and that was introduced by Absil in absil2009optimization to deal with optimization problems on matrix manifolds.

In the Euclidean case $\mathcal{M}=\mathbb{R}^{n}$ in (1), i.e., for the solution of standard nonlinear system of equations, the authors in cruz2003nonmonotone introduced a method called SANE, which uses the residual $\pm F(X_{k})$ as a search direction. Then the trial point, at each iteration, is computed by $X_{k+1}=X_{k}\pm\tau_{k}F(X_{k})$ , where $\tau_{k}$ is a spectral coefficient based on the Barzilai–Borwein step–size barzilai1988two ; raydan1993barzilai . This iterative process uses precisely the functional $F(\cdot)$ , in order to define the search direction. It’s feature of been a derivative–free procedure, is highly attractive, lowering the storage requirements and the computational cost per iteration.

Motivated by the Riemannian gradient and SANE methods, in this paper, we introduce RSANE, which is a generalization of SANE to tackle the numerical solution of nonlinear equations on Riemannian manifolds. In particular, we modified the update formula of SANE by incorporating a retraction, in order to guarantee that each point $X_{k}$ belongs to the desired manifold. By following the ideas of the Riemannian Barzilai–Borwein method developed by Iannazzo et.al in iannazzo2018riemannian , we propose an extension of the spectral parameter $\tau_{k}$ to the case of Riemannian manifolds, using mappings so–called scaled vector transport. In addition, we present the convergence analysis of the proposed method obtained under the Zhang–Hager globalization strategy zhang2004nonmonotone . Finally, some numerical experiments are reported to illustrate the efficiency and effectiveness of our proposal.

The rest of this paper is organized as follow. To do this article self–contained, we briefly review, in Section 2, some concepts and tools from Riemannian geometry that it can be founded in absil2009optimization . In Section 3, we present our proposed Riemannian spectral residual method (RSANE) for solving (1). Section 4 is devoted to the convergence analysis concerning our proposed method. In Section 5 numerical tests are carried out, in order to illustrate the good performance of our approach considering the computing of eigenspaces associated to given symmetric matrices using both simulated data and real data. Finally, conclusions and perspectives are provided in Section 7.

2 Preliminaries on Riemannian Geometry

In this section, we briefly review some notions and tools of Riemannian geometry crucial for understanding this paper, by summarizing absil2009optimization .

Let $\mathcal{M}$ be a Riemannian manifold with an associated Riemannian metric $\langle\cdot,\cdot\rangle$ , and let $T_{X}\mathcal{M}$ be its tangent vector space at a given point $X\in\mathcal{M}$ . In addition, let $\mathcal{F}:\mathcal{M}\to\mathbb{R}$ a smooth scalar function defined on the Riemannian manifold $\mathcal{M}$ , the Riemannian gradient of $\mathcal{F}(\cdot)$ at $X$ , denoted by $\nabla_{\mathcal{M}}\mathcal{F}(X)$ , is defined as the unique element of $T_{X}\mathcal{M}$ that verifies

\langle\nabla_{\mathcal{M}}\mathcal{F}(X),\xi_{X}\rangle=\mathcal{DF}(X)[\xi_{X}],\quad\forall\xi_{X}\in T_{X}\mathcal{M},

where $\mathcal{DF}(X)[\xi_{X}]$ is the function that takes any point $X\in\mathcal{M}$ to the directional derivative of $\mathcal{F}(\cdot)$ in the direction $\xi_{X}$ , evaluated at $X\in\mathcal{M}$ . In the particular case that $\mathcal{M}$ is a Riemannian submanifold of an Euclidean space $\mathcal{E}$ , we have an explicit evaluation of the gradient: let $\mathcal{\bar{F}}:\mathcal{E}\to\mathbb{R}$ a smooth function defined on $\mathcal{E}$ and let $\mathcal{F}:\mathcal{M}\subset\mathcal{E}\to\mathbb{R}$ , then the Riemannian gradient of $\mathcal{F}(\cdot)$ evaluated at $X\in\mathcal{M}$ is equal to the orthogonal projection of the standard gradient of $\mathcal{\bar{F}}(\cdot)$ onto $T_{X}\mathcal{M}$ , that is,

\nabla_{\mathcal{M}}\mathcal{F}(X)=P_{X}[\nabla\mathcal{\bar{F}}(X)].

(5)

This result provides us an important tool to compute the Riemannian gradient, which will be useful in the experiments section.

Another fundamental concept for this work is retraction. This can be seen as a smooth function that pragmatically approximates the notion of geodesics edelman1998geometry . Now we present its rigorous definition.

Definition 1 (absil2009optimization )

A retraction on a manifold $\mathcal{M}$ is a smooth mapping $R[\cdot]$ from the tangent bundle $\mathcal{TM}$ onto $\mathcal{M}$ with the following properties. Let $R_{X}[\cdot]$ denote the restriction of $R[\cdot]$ to $T_{X}\mathcal{M}$ .

1.

$R_{X}[0_{X}]=X$ , where $0_{X}$ denotes the zero element of $T_{X}\mathcal{M}$
2.

With the canonical identification, $T_{0_{X}}T_{X}\mathcal{M}\simeq T_{X}\mathcal{M}$ , $R_{X}[\cdot]$ satisfies

$\mathcal{D}R_{X}[0_{X}]=\textrm{id}_{T_{X}\mathcal{M}},$

where $\textrm{id}_{T_{X}\mathcal{M}}$ denotes the identity mapping on $T_{X}\mathcal{M}$ .

The second condition in Definition 1 is known as local rigidity condition.

The concept of vector transport, which appears in absil2009optimization , provides us a tool to perform operations between two or more vectors that belong to different tangent spaces of $\mathcal{M}$ , and can be seen as a relaxation of the purely Riemannian concept of parallel transport edelman1998geometry .

Definition 2 (absil2009optimization )

A vector transport $\mathcal{T}[\cdot]$ on a manifold $\mathcal{M}$ is a smooth mapping

\mathcal{T}:\mathcal{TM}\oplus\mathcal{TM}\rightarrow\mathcal{TM}:(\eta,\xi)\mapsto\mathcal{T}_{\eta}[\xi]\in\mathcal{TM},

satisfying the following properties for all $X\in\mathcal{M}$ where $\oplus$ denote the Whitney sum, that is,

\mathcal{TM}\oplus\mathcal{TM}=\{(\eta,\xi):\eta,\xi\in T_{X}\mathcal{M},X\in\mathcal{M}\}.

1.

There exists a retraction $R[\cdot]$ , called the retraction associated with $\mathcal{T}[\cdot]$ , such that

$\pi(\mathcal{T}_{\eta}[\xi])=R_{X}(\eta),\quad\eta,\xi\in T_{X}\mathcal{M},$

where $\pi(\mathcal{T}_{\eta}[\xi])$ denotes the foot of the tangent vector $\mathcal{T}_{\eta}[\xi]$ .
2.

$\mathcal{T}_{0_{X}}[\xi]=\xi$ for all $\xi\in T_{X}\mathcal{M}$ .
3.

$\mathcal{T}_{\eta}[a\xi+b\zeta]=a\mathcal{T}_{\eta}[\xi]+b\mathcal{T}_{\eta}[\zeta]$ , for all $a,b\in\mathbb{R}$ and $\eta,\xi,\zeta\in T_{X}\mathcal{M}$ .

Next, the concept of isometry zhu2017riemannian is established, which is a property satisfied by some vector transports.

Definition 3 (zhu2017riemannian )

A vector transport $\mathcal{T}[\cdot]$ on a manifold $\mathcal{M}$ is called isometric if it satisfies

\langle\mathcal{T}_{\eta}[\xi],\mathcal{T}_{\eta}[\xi]\rangle=\langle\xi,\xi\rangle,

(6)

for all $\eta,\xi\in T_{X}\mathcal{M}$ , where $R_{X}[\cdot]$ is the retraction associated with $\mathcal{T}[\cdot]$ .

3 Spectral Approach for Tangent Vector Field on Riemannian Manifolds

In this section, we shall establish our proposal RSANE. An intuitive way to solve (1) is to promote the reduction of the residual $||F(\cdot)||$ , which we can achieve by solving the following auxiliar manifold constrained optimization problem

\min_{X\in\mathcal{M}}\mathcal{F}(X)=\frac{1}{2}||F(X)||^{2}.

(7)

We can deal with this optimization model using some Riemannian optimization method. Nevertheless, we are interested in directly solving the Riemannian nonlinear equation (1). For this purpose, we consider the following iterative method, based on the SANE method,

X_{k+1}=X_{k}-\tau_{k}F(X_{k}).

(8)

Firstly, the vector $-F(X_{k})$ is not necessarily a descent direction for the merit function $\mathcal{F}(\cdot)$ and secondly, that $X_{k+1}$ does not necessarily belong to the manifold $\mathcal{M}$ . We can overcome the first one, by modifying this vector with the sign of $\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),F(X_{k})\rangle$ , following the same idea used in SANE method, in order to force $\pm F(X_{k})$ satisfies the descent condition. Observe that the Riemannian gradient method of $\mathcal{F}(\cdot)$ can be computed by

\nabla_{\mathcal{M}}\mathcal{F}(X)=(JF(X))^{*}[F(X)],\quad\forall X\in\mathcal{M},

(9)

that is, to compute the Riemannian gradient of $\mathcal{F}(\cdot)$ at $X$ , we need to calculate the adjoint of the Jacobian of $F(\cdot)$ evaluated at $X$ .

On the other hand, the second disadvantage can be easily remedied by incorporating a retraction, similarly to the scheme (4). Keeping in mid all theses considerations, we now propose our Riemannian spectral residual method, which computes the iterates recursively by

X_{k+1}=R_{X_{k}}[\tau_{k}Z_{k}],

(10)

where $\tau_{k}>0$ represents the step–size, $R_{X}[\cdot]$ is a retraction and the search direction is determined by

Z_{k}=-s_{\theta}(\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),F(X_{k})\rangle)F(X_{k}),

(11)

where $s_{\theta}(\cdot):\mathbb{R}-\{0\}\to\{-1,1\}$ is defined by $s(x)=\frac{x}{|x|}$ . Observe that $s(\cdot)$ is a continuous function for all $x\neq 0$ , a crucial property to our convergence analysis.

3.1 A Nonmonotone Line Search with a Riemannian Spectral Parameter

In the scenario of the solution of nonlinear systems of equations over $\mathbb{R}^{n}$ , the SANE method uses a spectral parameter $\tau_{k}$ inspired by the Barzilai–Borwein step–size, originally introduced in barzilai1988two to speed up the convergence of gradient–type methods, in the context of optimization. SANE computes this spectral parameter as follow

\tau_{k+1}^{BB}=\textrm{sgn}_{k}\frac{S_{k}^{\top}S_{k}}{S_{k}^{\top}Y_{k}},

(12)

where $S_{k}=X_{k+1}-X_{k}$ , $Y_{k}=F(X_{k+1})-F(X_{k})$ , $\textrm{sgn}_{k}=s(F(X_{k})^{\top}J(X_{k})F(X_{k}))$ and $J(\cdot)$ denotes the Jacobian matrix of the vector–valued function $F:\mathbb{R}^{n}\to\mathbb{R}^{n}$ .

In the framework of Riemannian manifolds, the vectors $F(X_{k+1})\in T_{X_{k+1}}\mathcal{M}$ and $F(X_{k})\in T_{X_{k}}\mathcal{M}$ lie in different tangent spaces, then the difference between these vectors may not be well defined (this is only well defined over linear manifolds). The same drawback occurs with the difference between the points $X_{k+1}$ and $X_{k}$ . Therefore, we cannot directly use the parameter (12) to address the numerical solution of (1). In iannazzo2018riemannian Iannazzo et. al. was extended the Barzilai–Borwein step–sizes in the context of optimization on Riemannian manifolds, through the use of a vector transport (see Definition 2). This strategy transports the calculated directions to the correct tangent space, providing a way to overcome the drawback. Guided by the descriptions contained in iannazzo2018riemannian , we propose the following generalization of the spectral parameter (12),

\tau_{k+1}^{RBB1}=s(\sigma_{k})\frac{\langle\hat{S}_{k},\hat{S}_{k}\rangle}{\langle\hat{S}_{k},\hat{Y}_{k}\rangle},

(13)

where $\sigma_{k}=\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),F(X_{k})\rangle$ ,

\hat{S}_{k}=\mathcal{T}_{\tau_{k}Z_{k}}[\tau_{k}Z_{k}]=-\tau_{k}s(\sigma_{k})\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})]

(14)

and

\hat{Y}_{k}=F(X_{k+1})-\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})]=F(X_{k+1})+\frac{1}{\tau_{k}s(\sigma_{k})}\hat{S}_{k},

(15)

where $\mathcal{T}[\cdot]$ is any vector transport satisfying the Ring–Wirth non–expansive condition,

||\mathcal{T}_{\eta_{X}}[\xi_{X}]||\leq||\xi_{X}||,\quad\forall\xi_{X},\eta_{X}\in T_{X}\mathcal{M}.

(16)

Another alternative for the spectral parameter is

\tau_{k+1}^{RBB2}=s(\sigma_{k})\frac{\langle\hat{S}_{k},\hat{Y}_{k}\rangle}{\langle\hat{Y}_{k},\hat{Y}_{k}\rangle}.

(17)

In order to take advantage of both spectral parameters $\tau_{k+1}^{RBB1}$ and $\tau_{k+1}^{RBB2}$ , we adopt the following adaptive strategy

\tau^{RBB}_{k+1}=\left\{\begin{array}[]{ll}\tau_{k+1}^{RBB1}&\textrm{for even k};\\ \tau_{k+1}^{RBB2}&\textrm{for odd k}.\end{array}\right.

(18)

Note that it is always possible to define a transporter (a function that sends vectors from a tangent space to another tangent space) that satisfies the condition (16), by scaling,

\mathcal{T}_{\eta_{X}}^{\textrm{scaled}}[\xi_{X}]=\left\{\begin{array}[]{ll}\mathcal{T}_{\eta}[\xi]&\textrm{if }||\mathcal{T}_{\eta_{X}}[\xi_{X}]||\leq||\xi_{X}||;\\ \frac{||\xi||}{||\mathcal{T}_{\eta}[\xi]||}\mathcal{T}_{\eta}[\xi]&\textrm{otherwise}.\end{array}\right.

(19)

This function, introduced by Sato and Iwai in sato2015new , is referred as scaled vector transport. Observe that (19) is not necessarily a vector transport. However for the extension of the spectral parameter to the setting of Riemannian manifolds, it is not strictly mandatory. In fact, it is enough having a non–expansive transporter available. Therefore, we will use the scaled vector transport (19), in the construction of the vectors $\hat{S}_{k}$ and $\hat{Y}_{k}$ in equations (14)–(15).

Since the spectral parameter $\tau_{k}^{RBB}$ does not necessarily reduces the value of the merit function $\mathcal{F}(\cdot)$ at each iteration, the convergence result could be invalid. We can overcome this drawback by incorporating a globalization strategy, which guarantees convergence by regulating the step–size $\tau_{k}$ only occasionally (see cruz2003nonmonotone ; raydan1997barzilai ; zhang2004nonmonotone ). In the seminal paper cruz2003nonmonotone , the authors consider the globalization technique proposed by Grippo et.al. in grippo1986nonmonotone , in the definition of SANE method.

We could define our Riemannian generalization of SANE incorporating this non–monotone technique, and so analyze the convergence following the ideas described in cruz2003nonmonotone ; iannazzo2018riemannian . Instead of that, in this work, to define RSANE we adopt a more elegant globalization strategy proposed by Zhang and Hager in zhang2004nonmonotone . Specifically, we compute $\tau_{k}=\delta^{h}\tau_{k}^{RBB}$ where $h\in\mathbb{N}$ is the smallest integer satisfying

\mathcal{F}(R_{X_{k}}[\tau_{k}Z_{k}])\leq C_{k}+\rho_{1}\tau_{k}\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),Z_{k}\rangle,

(20)

where each value $C_{k+1}$ is given by a convex combination of $\mathcal{F}(X_{k+1})$ and the previous $C_{k}$ as

C_{k+1}=\frac{\eta Q_{k}C_{k}+\mathcal{F}(X_{k+1})}{Q_{k+1}},

for $Q_{k+1}=\eta Q_{k}+1$ , starting at $Q_{0}=1$ and $C_{0}=\mathcal{F}(X_{0})$ . In the sequel our generalization RSANE will be described in detail.

0: Let

X_{0}\in\mathcal{M}

be the initial guess,

\tau>0

0<\tau_{m}\leq\tau_{M}<\infty

\eta\in[0,1)

\rho_{1},\epsilon,\epsilon_{1},\delta\in(0,1)

Q_{0}=1

C_{0}=\mathcal{F}(X_{0})

k=0

1: while

||F(X_{k})||>\epsilon

\sigma_{k}=\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),F(X_{k})\rangle

3: if

|\sigma_{k}|<\epsilon_{1}||F(X_{k})||^{2}

then

4: stop the process.

5: end if

Z_{k}=-s(\sigma_{k})F(X_{k})

7: while

\mathcal{F}(R_{X_{k}}[\tau Z_{k}])>C_{k}-\rho_{1}\epsilon_{1}\tau||F(X_{k})||^{2}

\tau\leftarrow\delta\tau

9: end while

10:

\tau_{k}=\tau

11:

X_{k+1}=R_{X_{k}}[\tau_{k}Z_{k}]

12:

Q_{k+1}=\eta Q_{k}+1

and

C_{k+1}=(\eta Q_{k}C_{k}+\mathcal{F}(X_{k+1}))/Q_{k+1}

13:

\hat{S}_{k}=-\tau_{k}s(\sigma_{k})\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})]

and

\hat{Y}_{k}=F(X_{k-1})+\frac{1}{\tau_{k}s(\sigma_{k})}\hat{S}_{k}

14:

\tau_{k+1}^{RBB}=s(\sigma_{k})\frac{\langle\hat{S}_{k},\hat{S}_{k}\rangle}{\langle\hat{S}_{k},\hat{Y}_{k}\rangle}

15:

\tau=\max(\min(\tau_{k+1}^{RBB},\tau_{M}),\tau_{m})

16:

k\leftarrow k+1

17: end while

Algorithm 1 Spectral Residual Method for tangent vector field (RSANE)

Remark 1

In Algorithm 1, we replace the nonmonotone condition (20) by

\mathcal{F}(R_{X_{k}}[\tau Z_{k}])\leq C_{k}-\rho_{1}\epsilon_{1}\tau||F(X_{k})||^{2}.

(21)

We remark that with this relaxed condition, Algorithm 1 is well defined. In fact, if at iteration $k$ the procedure does not stop at Step 4, then $Z_{k}$ is a descent direction (see Lemma 2), and for all $\rho_{1}\in(0,1)$ there exists $t>0$ such that the non–monotone Zhang–Hager condition (20) holds by continuity, for $\tau>0$ sufficiently small(a proof of this fact appears in zhang2004nonmonotone ). In addition, it follows form Step 4, (20) and Lemma 2 that

	$\displaystyle\mathcal{F}(R_{X_{k}}[\tau Z_{k}])$	$\displaystyle\leq$	$\displaystyle C_{k}+\rho_{1}\tau\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),Z_{k}\rangle$
		$\displaystyle\leq$	$\displaystyle C_{k}-\rho_{1}\epsilon_{1}\tau\|\|F(X_{k})\|\|^{2},$

which implies that the relaxed condition (21) is also verified for all $\tau>0$ that satisfy (20).

Remark 2

The bottleneck of Algorithm 1 appears in step 2, since to calculate $\sigma_{k}$ , we must compute the Riemannian gradient of $\mathcal{F}(\cdot)$ , which implies evaluating the Jacobian of $F(\cdot)$ (see equation (9)). However, given a retraction $R_{X}[\cdot]$ , $\sigma_{k}$ can be approximated using finite differences as follow

$\displaystyle\sigma_{k}$	$\displaystyle=$	$\displaystyle\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),F(X_{k})\rangle$	(22)
	$\displaystyle=$	$\displaystyle\mathcal{DF}(X_{k})[F(X_{k})]$
	$\displaystyle=$	$\displaystyle\lim_{t\to 0}\frac{\mathcal{F}(R_{X_{k}}[tF(X_{k})])-\mathcal{F}(X_{k})}{t}$
	$\displaystyle\approx$	$\displaystyle\frac{\mathcal{F}(R_{X_{k}}[hF(X_{k})])-\mathcal{F}(X_{k})}{h},$

where $0<h\ll 1$ is a small real number. The fact that this approximation does not need the explicit knowledge of the Jacobian operator is useful for large–scale problems.

The following lemma establishes that in most cases $\tau_{k}^{RBB1}$ is positive.

Lemma 1

Let $\tau_{k+1}^{RBB1}$ be computed as in (13), then $\tau_{k+1}^{RBB1}>0$ when one of the following cases holds

1.

$\langle\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})],F(X_{k+1})\rangle<0$ ;
2.

$\langle\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})],F(X_{k+1})\rangle>0$ and $||F(X_{k+1})||<||\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})]||$ .

Proof

Since $\tau_{k+1}^{RBB1}=s(\sigma_{k})\frac{\langle\hat{S}_{k},\hat{S}_{k}\rangle}{\langle\hat{S}_{k},\hat{Y}_{k}\rangle}=\frac{-1}{\tau_{k}}\frac{||\hat{S}_{k}||^{2}}{\langle\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})],\hat{Y}_{k}\rangle}$ then $\verb"sign"(\tau_{k+1}^{RBB1})=\verb"sign"(-\langle\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})],\hat{Y}_{k}\rangle)$ . Suppose that (a) holds, then

\langle\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})],\hat{Y}_{k}\rangle=\langle\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})],F(X_{k+1})\rangle-||\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})]||^{2}<0,

which implies that $\tau_{k+1}^{RBB}>0$ .

On the other hand, if (b) holds, from the Cauchy–Schwarz inequality, we find that

0<\langle\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})],F(X_{k+1})\rangle\leq||\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})]||\,||F(X_{k+1})||<||\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})]||^{2},

and so

\langle\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})],\hat{Y}_{k}\rangle=\langle\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})],F(X_{k+1})\rangle-||\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})]||^{2}<0,

and hence $\tau_{k+1}^{RBB1}>0$ , which proves the lemma.

The same theoretical result is verified for spectral parameter $\tau_{k+1}^{RBB2}$ , and therefore it is also valid for the adaptive parameter $\tau_{k+1}^{RBB}$ .

Notice that if the transporter $\mathcal{T}[\cdot]$ is isometric (see Definition 3), then the second condition of item (b) in Lemma 1 is reduced to $||F(X_{k+1})||<||F({X_{k}})||$ . In addition, observe that if we set $\eta=0$ in Algorithm 1, then we have that $\mathcal{F}(X_{k+1})<\mathcal{F}(X_{k})$ for all $k\in\mathbb{N}$ . Therefore this condition would always be verified under these choices of $\eta$ and $\mathcal{T}[\cdot]$ .

Lemma 2

Assume that Algorithm 1 does not terminate. Let $\{Z_{k}\}$ be an infinite sequence of tangent direction generated by Algorithm 1. Then $Z_{k}$ is a descent direction for the merit function $\mathcal{F}(\cdot)$ at $X_{k}$ , for all $k\geq 0$ .

Proof

From Step 6 in Algorithm 1 we have $Z_{k}=-s(\sigma_{k})F(X_{k})$ , where $\sigma_{k}=\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),F(X_{k})\rangle$ . Since Algorithm 1 does not terminate, form Step 3 we obtain $|\sigma_{k}|\geq\epsilon_{1}||F(X_{k})||^{2}>0$ . Now, observe that

$\displaystyle\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),Z_{k}\rangle$	$\displaystyle=$	$\displaystyle-s(\sigma_{k})\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),F(X_{k})\rangle$
	$\displaystyle=$	$\displaystyle-s(\sigma_{k})\sigma_{k}$
	$\displaystyle=$	$\displaystyle-\frac{\sigma_{k}^{2}}{\|\sigma_{k}\|}$
	$\displaystyle=$	$\displaystyle-\|\sigma_{k}\|$
	$\displaystyle<$	$\displaystyle 0.$

Therefore, we conclude that $Z_{k}$ is a descent direction for $\mathcal{F}(\cdot)$ at $X_{k}$ , for all $k\geq 0$ .

4 Convergence Analysis

In this section, we analyse the global convergence for our Algorithm 1 under mild assumptions. Our analysis consists on a generalization of the global convergence of line–search methods for unconstrained optimization, presented in zhang2004nonmonotone and an adaptation of Theorem 4.3.1 in absil2009optimization .

The following lemma establishes that ${C_{k}}$ is bounded below by the sequence $\{\mathcal{X_{k}}\}$ .

Lemma 3

Let $\{X_{k}\}$ be an infinite sequence generated by Algorithm 1. Then for the iterates generated by Algorithm 1.

\mathcal{F}(X_{k})\leq C_{k},\quad\forall k.

(23)

Proof

Firstly, we define $\psi_{k}:\mathbb{R}\to\mathbb{R}$ by

\psi(\alpha)=\frac{\alpha C_{k-1}+\mathcal{F}(X_{k})}{\alpha+1},

observe that the derivative of $\psi(\alpha)$ is

\dot{\psi}(\alpha)=\frac{C_{k-1}-\mathcal{F}(X_{k})}{(\alpha+1)^{2}}.

it follows from the non–monotone condition (21) that

\mathcal{F}(X_{k})\leq C_{k-1}-\rho_{1}\epsilon_{1}\tau||F(X_{k})||^{2}<C_{k-1},

(24)

which implies that $\dot{\psi}(\alpha)\geq 0$ for all $\alpha\geq 0$ . Hence, the function $\psi(\cdot)$ is nondecreasing, and $\mathcal{F}(X_{k})=\psi(0)\leq\psi(\alpha)$ for all $\alpha\geq 0$ . Then, taking $\bar{\alpha}=\eta Q_{k-1}$ we obtain

\mathcal{F}(X_{k})=\psi(0)\leq\psi(\bar{\alpha})=C_{k},

(25)

which completes the proof.

In order to prove the global convergence of our proposed algorithm, we need the following asymptotic property.

Lemma 4

Any infinite sequence $\{X_{k}\}$ generated by Algorithm 1 verifies the following property

\lim_{k\to\infty}\tau_{k}||F(X_{k})||^{2}=0.

(26)

Proof

By the construction of Algorithm 1 and using Lemma 2, we have

C_{k+1}=\frac{\eta Q_{k}C_{k}+\mathcal{F}(X_{k+1})}{Q_{k+1}}<\frac{(\eta Q_{k}+1)C_{k}}{Q_{k+1}}=C_{k}.

(27)

Hence, $\{C_{k}\}$ is monotonically decreasing and bounded below by zero, therefore it converges to some limit $C^{*}\geq 0$ . It follows from Step 7 and Step 12 in Algorithm 1 that

\sum_{k=0}^{\infty}\frac{\rho_{1}\epsilon_{1}\tau_{k}||F(X_{k})||^{2}}{Q_{k+1}}\leq\sum_{k=0}^{\infty}C_{k}-C_{k+1}=C_{0}-C_{*}<\infty.

(28)

Merging this result with the fact that $Q_{k+1}=1+\eta Q_{k}=1+\eta+\eta^{2}Q_{k-1}=\cdots=\sum_{i=0}^{k}\eta^{i}<(1-\eta)^{-1}$ , we have

\lim_{k\to\infty}\tau_{k}||F(X_{k})||^{2}=0,

(29)

which proves the lemma.

The theorem below establishes a global convergence property concerning Algorithm 1. The proof can be seen as a modification to that of Theorem 3.4 in yao2020riemannian , and to that of Theorem 4.1 in oviedoimplicit .

Theorem 4.1

Algorithm 1 either terminates at a finite iteration $j\in\mathbb{N}$ where $|\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{j}),F(X_{j})\rangle|<\epsilon||F(X_{j})||^{2}$ , or it generates an infinite sequence $\{X_{k}\}$ such that

\lim_{k\to\infty}\emph{inf }||F(X_{k})||=0.

Proof

Let us assume that Algorithm 1 does not terminate, and let $X_{*}$ be any accumulation point of the sequence $\{X_{k}\}$ . We may assume that $\lim_{k\to\infty}X_{k}=X_{*}$ , taking a subsequence if necessary. By contradiction, suppose that there exists $\epsilon_{0}>0$ such that

||F(X_{k})||^{2}>\epsilon_{0},\quad\forall k\geq 0.

(30)

In view of (30) and Lemma 4 we have

\lim_{k\to\infty}\textrm{inf }\tau_{k}=0.

(31)

Firstly, we define the curve $Y_{k}(\tau)=R_{X_{k}}[-\tau Z_{k}]$ for all $k\in\mathbb{N}$ , which is smooth due to the differentiability of the retraction $R_{X}[\cdot]$ . Since the parameter $\tau_{k}>0$ is chosen by carrying out a backtracking process, then $\tau_{k}=\delta^{m_{k}}\tau_{k}^{RBB}$ , for all $k$ greater than some $\bar{k}$ , where $m_{k}\in\mathbb{N}$ is the smallest positive integer number such that the relaxed nonmonotone condition (21) is fulfilled. Thus, the scalar $\bar{\tau}=\frac{\tau_{k}}{\delta}$ violates the condition (21), i.e., it holds

-\rho_{1}\epsilon_{1}\bar{\tau}||F(X_{k})||^{2}<\mathcal{F}(Y_{k}(\bar{\tau}))-C_{k}\leq\mathcal{F}(Y_{k}(\bar{\tau}))-\mathcal{F}(X_{k}),\quad\forall k\geq\bar{k},

(32)

where the last inequality is obtained using Lemma 3.

Let us set $\psi_{k}(\tau):=\mathcal{F}\circ Y_{k}(\tau)$ , then (32) is equivalent to

-\frac{\psi_{k}(\bar{\tau})-\psi_{k}(0)}{\bar{\tau}-0}<\rho_{1}\epsilon_{1}||F(X_{k})||^{2},\quad\forall k\geq\bar{k}.

(33)

It follows from the mean value theorem, that there exists $t\in(0,\bar{\tau})$ such that $-\dot{\psi}_{k}(t)<\rho_{1}\epsilon_{1}||F(X_{k})||^{2}$ for all $k\geq\bar{k}$ , or equivalently

-\langle\nabla_{\mathcal{M}}\mathcal{F}(Y_{k}(t)),\dot{Y}_{k}(t)\rangle<\rho_{1}\epsilon_{1}||F(X_{k})||^{2},\quad\forall k\geq\bar{k}.

(34)

In view of the continuity of functions $s(\cdot)$ , $\nabla_{\mathcal{M}}\mathcal{F}(\cdot)$ , $F(\cdot)$ , $Y_{k}(\cdot)$ , the smoothness and local rigidity condition of the retraction $R_{X}[\cdot]$ , and taking limit in (34), we arrive at

-\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{*}),Z_{*}\rangle\leq\rho_{1}\epsilon_{1}||F(X_{*})||^{2},

or equivalently,

|\sigma_{*}|\leq\rho_{1}\epsilon_{1}||F(X_{*})||^{2},

(35)

where $|\sigma_{*}|=|\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{*}),F(X_{*})\rangle|$ . Since Algorithm 1 does not terminate, form Step 3 we have

|\sigma_{k}|\geq\epsilon_{1}||F(X_{k})||^{2}.

(36)

Applying limits in (36) we find that $|\sigma_{*}|\geq\epsilon_{1}||F(X_{*})||^{2}$ . Merging this last result with (35), we arrive at

||F(X_{*})||^{2}\leq\rho_{1}||F(X_{*})||^{2}.

Since $0<\rho_{1}<1$ then we have $||F(X_{*})||=0$ , this last result contradicts (30), which completes the proof.

By Theorem 4.1, we obtain the following theoretical consequence under compactness assumptions.

Corollary 1

Let $\{X_{k}\}$ be an infinite sequence generated by Algorithm 1. Suppose that the level set $\mathcal{L}=\{X\in\mathcal{M}:\mathcal{F}(X)\leq\mathcal{F}(X_{0})\}$ is compact (which holds in particular when the Riemannian manifold $\mathcal{M}$ is compact). Then

\lim_{k\to\infty}||F(X_{k})||=0.

Proof

Let $\{X_{k}\}$ be an infinite sequence generated by Algorithm 1. It follows from Lemma 3, the strict inequality (27) and the construction of Algorithm 1 that

\mathcal{F}(X_{k+1})\leq C_{k+1}<C_{k}<\cdots<C_{1}<C_{0}=\mathcal{F}(X_{0}),

which implies that $X_{k}\in\mathcal{L}$ , for all $k\geq 0$ .

Now, by contradiction let us suppose that there is a subsequence $\{X_{k}\}_{k\in\mathcal{K}}$ and $\epsilon_{0}>0$ such that

||F(X_{k})||\geq\epsilon_{0},

(37)

for any $k\in\mathcal{K}$ . Since $X_{k}\in\mathcal{L}$ , for all $k\geq 0$ and $L$ is a compact set, we have that $\{X_{k}\}_{k\in\mathcal{K}}$ must have an accumulation point $X_{*}\in\mathcal{L}$ . Taking limit in (37) and using the continuity of $||F(\cdot)||$ , we obtain $||F(X_{*})||\geq\epsilon_{0}$ , which contradicts Theorem 4.1.

Remark 3

The main drawback of Algorithm 1 is that it can prematurely terminate with a bad breakdown ( $|\sigma_{k}|<\epsilon_{1}||F(X_{k})||^{2}$ ). One way to remedy this problem is to use $Z_{k}=-\nabla\mathcal{F}(X_{k})$ as the search direction, if a bad breakdown occurs at $k$ –th iteration; and then in the next iteration, we can retry using the steps of Algorithm 1. We may even use any tangent direction $Z_{k}\in T_{X_{k}}\mathcal{M}$ such that $\langle\nabla\mathcal{F}(X_{k}),Z_{k}\rangle<0$ , as long as a bad breakdown happens, in order to overcome this difficulty.

5 Numerical Experiments

In order to give further insight into the RSANE method we present the results of some numerical experiments. We test our algorithm on some randomly generated gradient tangent vector fields on three different Riemannian manifold, involving the unit sphere, the Stiefel manifold and oblique manifold. All experiments have been performed on a intel(R) CORE(TM) i7–4770, CPU 3.40 GHz with 1TB HD and 16GB RAM memory. The algorithm was coded in Matlab with double precision. The running times are always given in CPU seconds. For numerical comparisons, we consider the SANE method proposed in cruz2003nonmonotone , and the recently published Riemannian Derivative–Free Conjugate gradient Polak–Ribiére–Polyak method (CGPR) for the numerical solution of tangent vectors field yao2020riemannian . The Matlab codes of our RSANE, SANE and CGPR are available in: http://www.optimization-online.org/DB_HTML/2020/09/8028.html

6 Implementation details

In our implementation, in addition to monitoring the residual norm $||F(X_{k})||_{F}$ , we also check the relative changes of the two consecutive iterates and their corresponding residual values

\textrm{rel}_{k}^{X}:=\frac{||X_{k+1}-X_{k}||_{F}}{||X_{k}||_{F}}\quad\textrm{and}\quad\textrm{rel}_{k}^{F}:=\frac{|\mathcal{F}(X_{k+1})-\mathcal{F}(X_{k})|}{\mathcal{F}(X_{k})+1}.

(38)

Here $||M||_{F}$ denotes the Frobenius norm of the matrix $M$ . In the case when $M$ is a vector, this norm is reduced to the standard norm on $\mathbb{R}^{n}$ . Although the residual $||X_{k+1}-X_{k}||_{F}$ is meaningless in the Riemannian context, for our numerical experiments, we will only consider Riemannian manifolds embedded in the Euclidean space $\mathbb{R}^{n\times p}$ , and the residual $\textrm{rel}_{k}^{X}$ is well defined for these types of manifolds.

We let all the algorithms run up to $K$ iterations and stop them at iteration $k<K$ if $||F(X_{k})||_{F}<\epsilon$ , or $\textrm{rel}_{k}^{X}<\epsilon_{X}$ and $\textrm{rel}_{k}^{F}<\epsilon_{F}$ , or

\textrm{mean}([\textrm{rel}_{k-\min{k,T}+1}^{X},\ldots,\textrm{rel}_{k}^{X}])\leq 10\epsilon_{X}\,\,\textrm{and}\,\,\textrm{mean}([\textrm{rel}_{k-\min{k,T}+1}^{F},\ldots,\textrm{rel}_{k}^{F}])\leq 10\epsilon_{F}.

Here, the defaults values of $\epsilon,\epsilon_{X},\epsilon_{F}$ and $T$ are $1e$ -5, $1e$ -15, $1e$ -15 and $5$ , respectively. In addition, in Algorithm 1 we use $\eta=0.6$ , $\tau=1$ e-3 (the initial step–size $\tau$ ), $\tau_{m}=1e$ -10, $\tau_{m}=1e$ +10, $\delta=0.2$ , $\epsilon_{1}=1e-8$ and $\rho_{1}=1e$ -4 as defaults values.

6.1 Considered manifolds and their geometric tools

In this subsection, we present three Riemannian manifolds that we will use for the numerical experiments in the remainder subsections, as well as some tools necessary for the algorithms, associated with each manifold, such as vectors transports and retractions.

Firstly we considere the unit sphere given by

S^{n-1}=\{x\in\mathbb{R}^{n}:||x||_{2}=1\}.

(39)

It is well–known that the tangent space of the unit sphere at $x\in S^{n-1}$ is given by $T_{x}S^{n-1}=\{z\in\mathbb{R}^{n}:z^{\top}x=0\}$ . Let $S^{n-1}$ be endowed with the inner product inherited from the classical inner product on $\mathbb{R}^{n}$ ,

\langle\xi_{x},\eta_{x}\rangle_{x}:=\xi_{x}^{\top}\eta_{X},\quad\forall\xi_{x},\eta_{x}\in T_{x}S^{n-1},\,x\in S^{n-1}.

Then, $S^{n-1}$ with this inner product defines an $n-1$ dimensional Riemannian sub–manifold of $\mathbb{R}^{n}$ . The retraction $R_{x}[\cdot]$ on $S^{n-1}$ is chosen as in absil2009optimization ,

R_{x}[\xi_{x}]=\frac{x+\xi_{x}}{||x+\xi_{x}||_{2}},

for all $\xi_{x}\in T_{x}S^{n-1}$ and $x\in S^{n-1}$ . In addition, for this particular manifold, we consider the vector transport based on orthogonal projection

\mathcal{T}_{\eta_{x}}[\xi_{x}]=\xi_{x}-R_{x}[\eta_{x}]R_{x}[\eta_{x}]^{\top}\xi_{x}.

Notice that this vector transport verifies the Ring–Wirth non-expansive condition (16).

The second Riemannian manifold considered in this section is the Stiefel manifold, which is defined as

St(n,p)=\{X\in\mathbb{R}^{n\times p}:X^{\top}X=I_{p}\},

(40)

where $I_{p}\in\mathbb{R}^{p\times p}$ denotes the identity matrix of size $p$ –by– $p$ . By differentiating both sides of $X(t)^{\top}X(t)=I_{p}$ , we obtain the tangent space of $St(n,p)$ at $X$ , given by $T_{X}St(n,p)=\{Z\in\mathbb{R}^{n\times p}:Z^{\top}X+X^{\top}Z=0\}$ . Let $St(n,p)$ be endowed with induced Riemannian metric from $\mathbb{R}^{n\times p}$ , i.e.,

\langle\xi_{X},\eta_{X}\rangle_{X}:=tr(\xi_{X}^{\top}\eta_{X}),\quad\forall\xi_{x},\eta_{x}\in T_{X}St(n,p),\,X\in St(n,p).

(41)

The pair $(St(n,p),\langle\cdot,\cdot\rangle)$ , where $\langle\cdot,\cdot\rangle$ is the inner product given in (41), forms a Riemannian sub–manifold of the Euclidean space $\mathbb{R}^{n\times p}$ , and its dimension is equal to $np-\frac{1}{2}p(p+1)$ , absil2009optimization . For our numerical experiments concerning the Stiefel manifold we will use the retractions, introduced in absil2009optimization , given by

R_{X}[\xi_{X}]=\verb"qf"(X+\xi_{X}),

(42)

and the retraction based on the matrix polar decomposition

R_{X}[\xi_{X}]=(X+\xi_{X})\left((X+\xi_{X})^{\top}(X+\xi_{X})\right)^{-1/2},

(43)

for all $\xi_{X}\in T_{X}St(n,p)$ and $X\in St(n,p)$ . In equation (42), $\verb"qf"(W)$ denotes the orthogonal factor $Q$ obtained form the QR–factorization of $W$ , such that $W=QR,$ where $Q$ belongs $St(n,p)$ and $R\in\mathbb{R}^{p\times p}$ is the upper triangular matrix with strictly positive diagonal elements. Additionally, we consider the following vector transport

\mathcal{T}_{\eta_{X}}[\xi_{X}]=\xi_{X}-R_{X}[\eta_{X}]\verb"sym"(R_{X}[\eta_{X}]^{\top}\xi_{X}),

(44)

where $R_{X}[\cdot]$ is any of the two retractions defined in (42)–(43), and $\verb"sym"(W)=\frac{1}{2}(W^{\top}+W)$ is the function that assigns to the matrix $W$ its symmetric part. This vector transport is inspired by the orthogonal projection on the tangent space of the Stiefel manifold. Moreover, the function (44) verifies the Ring–Wirth non-expansive condition (16).

Our third example, the oblique manifold, is defied as

\mathcal{OB}(n,p)=\{X\in\mathbb{R}^{n\times p}:\verb"ddiag"(X^{\top}X)=I_{p}\},

(45)

where $\verb"ddiag"(W)$ denotes the matrix $W$ with all its off–diagonal entries assigned to zero. The tangent space associated to $\mathcal{OB}(n,p)$ at $X$ is given by $T_{X}\mathcal{OB}(n,p)=\{Z\in\mathbb{R}^{n\times p}:\verb"ddiag"(X^{\top}Z)=0\}$ . Again, if we endow $OB(n,p)$ with the inner product inherited from the standard inner product on $\mathbb{R}^{n\times p}$ , given by (41), then the $\mathcal{OB}(n,p)$ becomes an embedded Riemannian manifold on $\mathbb{R}^{n\times p}$ . For this particular manifold, we consider the retraction, which appears in absil2006joint , defined by

R_{X}[\xi_{X}]=(X+\xi_{X})\verb"ddiag"\left((X+\xi_{X})^{\top}(X+\xi_{X})\right)^{-1/2},

(46)

for all $\xi_{X}\in T_{X}\mathcal{OB}(n,p)$ and $X\in\mathcal{OB}(n,p)$ . We will use another vector transport based on the orthogonal projection operator on $T_{X}\mathcal{OB}(n,p)$ ,

\mathcal{T}_{\eta_{X}}[\xi_{X}]=\xi_{X}-R_{X}[\eta_{X}]\verb"ddiag"(R_{X}[\eta_{X}]^{\top}\xi_{X}),

(47)

for all $\xi_{X},\eta_{X}\in T_{X}\mathcal{OB}(n,p)$ and $X\in\mathcal{OB}(n,p)$ .

6.2 Eigenvalues computation on the sphere

For the first test problem, we consider the standard eigenvalue problem

Ax=\lambda x,

(48)

where $A\in\mathbb{R}^{n\times n}$ is a given symmetric matrix. The values of $\lambda$ that verify (48) are known as eigenvalues of $A$ and the corresponding vectors $x\in\mathbb{R}^{n}$ are the eigenvectors. A simple way to compute extreme eigenvalues of $A$ is by minimizing (or maximizing) the associated Rayleigh quotient

r(x)=\frac{x^{\top}Ax}{x^{\top}x}.

(49)

This function is a continuously differentiable map $r:\mathbb{R}^{n}-\{0\}\to\mathbb{R}$ , whose gradient is given by $\nabla r(x)=\frac{2}{||x||_{2}^{2}}(Ax-r(x)x)$ . It is clear that any eigenvector $x$ and its corresponding eigenvalue $\lambda$ satisfy that $r(x)=\lambda$ , and thus in that case $x$ , is a critical point of $r(\cdot)$ , i.e., $\nabla r(x)=0$ . By noting that $\nabla r(x)=0$ if, and only if $x$ is a solution of the following nonlinear system of equations

G(x)\equiv Ax-r(x)x=0,

(50)

we can address directly this eigenvalue problem by using SANE method. On the other hand, by introducing the constraint $||x||_{2}=1$ , the nonlinear system $G(x)=0$ can be cast to the following tangent vector field, $F:S^{n-1}\to\mathcal{T}S^{n-1}$ defined on a Riemannian manifold (the unit sphere)

F(x)\equiv Ax-(x^{\top}Ax)x=0_{x}.

(51)

Note that for all $x\in S^{n-1}$ , we have $x^{\top}F(x)=0$ . Therefore $F(\cdot)$ effectively maps point on the sphere to its corresponding tangent space, i.e., $F(\cdot)$ defines a tangent vector field.

For illustrative purposes, in this subsection we compare the numerical performance of the SANE method solving (50), versus its generalized Riemannian version RSANE solving (51). To do this, we consider 40 instances of symmetric and positive definite sparse matrices taken from the UF sparse Matrix collection davis2011university ¹¹1The SuiteSparse Matrix Collection tool–box is available in https://sparse.tamu.edu/, which contains several matrices that arise in real applications. For this experiment, we stop both algorithms when is founded a vector $x_{k}$ that satisfies the inequality $||G(x_{k})||_{2}<TOL$ , where $TOL=\max\{\epsilon,\epsilon||G(x_{0})||_{2}\}$ , with $\epsilon=2$ e-5. Note that for our proposed algorithm $F(x_{k})=G(x_{k})$ , for all $x_{k}$ generated by RSANE, since our proposal preserves the constraint $||x||_{2}=1$ , hence this stop criterion is well defined for our algorithm. The starting vector $x_{0}\in\mathbb{R}^{n}$ was generated by $x_{0}=v/\sqrt{n}$ , where $v=(1,1,\ldots,1)^{\top}$ .

The numerical results associated to this experiments are summarized in Table 1. In this table, we report the number of iteration (Nitr); the CPU–time in seconds (Time); the residual value $\min\left\{\frac{||G(x_{*})||_{2}}{||x_{0}||_{2}},||G(x_{*})||_{2}\right\}$ (NrmF), where $x_{*}$ denotes the estimated solution obtained by each algorithm; and the number of function evaluations (Nfe). In the case of SANE this is the number of times that SANE evaluates the $G(\cdot)$ map, while for our RSANE, Nfe denotes the number of times that it evaluates the functional $F(\cdot)$ .

We observe, from Table 1, that the RSANE algorithm is a robust option to solve linear eigenvalue problems. The performance of both SANE and RSANE varied over different matrices. We note that our RSANE is very competitive for large–scale problems. Indeed, the RSANE algorithm outperforms its Euclidean version (SANE) in general terms, since RSANE took on average a total of 3644.1 iterations and 31.167 seconds to solve all the 40 problems, while SANE took 34.079 seconds and 3864.3 iterations to solve all the instances on average. It is worth noting that although RSANE failed on problem bcsstk13, it can become successful on bcsstk16, while both methods take the maximum number of iterations on problems bcsstk27, bcsstk28 and nasa4704.

In Figure 1, we plot the convergence history of SANE and RSANE methods for the particular instances “1138_bus” and “s3rmt3m1”, respectively. From this figure, we note that RSANE can converge faster than SANE, which is illustrated in Figure 1 (b). However the opposite conclusion is obtained from Figure 1(a).

Table 1: Numerical results for eigenvalues computation.

		SANE				RSANE
Name	n	Nfe	NrmF	Nitr	Time	Nfe	NrmF	Nitr	Time
1138_bus	1138	10621	9.70e-6	2791	0.284	14778	7.21e-6	3781	0.449
af_0_k101	503625	2269	1.95e-5	692	51.875	2074	1.99e-5	644	49.273
af_1_k101	503625	1798	2.00e-5	560	39.915	1702	1.99e-5	451	33.397
af_2_k101	503625	2683	2.00e-5	796	59.179	1421	1.93e-5	451	34.259
af_3_k101	503625	2207	1.93e-5	664	49.175	1757	1.92e-5	541	42.234
af_4_k101	503625	2281	1.98e-5	694	51.156	1758	1.85e-5	545	43.613
af_5_k101	503625	2651	1.76e-5	789	58.730	1975	1.79e-5	607	47.731
af_shell3	504855	305	1.99e-5	119	6.989	328	1.79e-5	125	8.026
af_shell7	504855	452	1.14e-5	165	10.204	363	1.53e-5	133	8.851
apache1	80800	15377	1.88e-5	3992	14.466	34224	1.54e-5	8434	34.905
apache2	715176	58685	1.93e-5	13970	801.886	52821	6.89e-6	12516	785.019
bcsstk10	1086	2450	1.30e-5	737	0.063	1954	1.87e-5	603	0.054
bcsstk11	1473	883	1.76e-5	302	0.034	5421	2.00e-5	1501	0.199
bcsstk13	2003	19254	1.98e-5	3293	1.372	88608	3.08e-4	15000	6.523
bcsstk14	1806	76	7.15e-6	35	0.006	4674	1.94e-5	1341	0.283
bcsstk16	4884	105001	7.49e+0	15000	24.274	1430	1.41e-5	458	0.386
bcsstk27	1224	64534	3.43e-4	15000	2.937	63935	2.11e-4	15000	3.064
bcsstk28	4410	62724	2.32e-4	15000	13.300	63527	1.83e-4	15000	14.293
bcsstk36	23052	54355	2.00e-5	12861	76.108	26406	1.99e-5	6561	38.742
bcsstm12	1473	11960	1.94e-5	3017	0.295	8632	1.99e-5	2268	0.240
bcsstm23	3134	14536	1.98e-5	3670	0.500	14820	2.00e-5	3735	0.652
bcsstm24	3562	8255	2.00e-5	2228	0.312	6285	1.85e-5	1721	0.267
cfd1	70656	2693	2.00e-5	801	6.265	2201	1.83e-5	659	5.238
ex15	6867	197	7.92e-6	76	0.027	9578	1.97e-5	2606	1.334
fv1	9604	371	1.81e-5	136	0.061	474	1.85e-5	170	0.0949
fv2	9801	369	1.54e-5	138	0.066	334	1.51e-5	127	0.0637
fv3	9801	506	1.82e-5	183	0.077	371	1.52e-5	136	0.0664
Kuu	7102	5050	3.28e-6	1479	1.533	2674	4.82e-6	819	0.9281
mhd4800b	4800	1959	1.98e-5	605	0.146	1784	1.96e-5	561	0.1363
msc23052	23052	43296	1.99e-5	10410	62.815	37779	1.99e-5	9217	56.9276
Muu	7102	29	1.78e-5	13	0.006	28	1.70e-5	12	0.0052
nasa4704	4704	63663	8.21e-5	15000	7.418	63440	5.79e-5	15000	8.264
s1rmq4m1	5489	9000	2.00e-5	2411	2.394	11052	1.99e-5	2933	3.2589
s1rmt3m1	5489	23019	1.98e-5	5895	4.566	13688	1.93e-5	3592	3.1067
s2rmq4m1	5489	9839	2.00e-5	2564	2.384	10292	1.96e-5	2683	2.774
s2rmt3m1	5489	17422	2.00e-5	4437	3.504	21669	1.93e-5	5385	4.9953
s3rmq4m1	5489	6063	1.85e-5	1710	1.500	5422	1.20e-5	1517	1.4757
s3rmt3m1	5489	12250	1.76e-5	3295	2.467	6331	1.52e-5	1764	1.3742
s3rmt3m3	5357	13090	1.89e-5	3461	2.569	9911	1.83e-5	2669	2.1663
sts4098	4098	22153	1.89e-5	5583	2.335	17664	1.93e-5	4499	2.0278

6.3 A nonlinear eigenvalue problem on the Stiefel manifold

As a second experiment, we investigate the performance our RSANE method applied to deal with the following Stiefel manifold constrained nonlinear eigenvalue problem


$\displaystyle H(X)X-X\Lambda$	$\displaystyle=$	$\displaystyle 0,$	(52a)
$\displaystyle X^{\top}X-I_{p}$	$\displaystyle=$	$\displaystyle 0.$	(52b)

where $H(X)=L+\mu\,\verb"ddiag"(L^{{\dagger}}\rho(X))$ , where $L^{{\dagger}}\in\mathbb{R}^{n\times n}$ is the pseudo–inverse matrix of the discrete Laplacian operator $L$ , $\mu>0$ and $\rho(X)=\verb"diag"(XX^{\top})$ , here $\verb"diag"(M)\in\mathbb{R}^{n}$ denotes the vector containing the diagonal elements of the matrix $M\in\mathbb{R}^{n\times n}$ . Observe that $\Lambda\in\mathbb{R}^{p\times p}$ can be seen as a block of eigenvalues of the nonlinear matrix $H(X)$ . In fact, in the special case that the operator $H(X)$ is constant and $p=1$ , this problem is reduced to the linear eigenvalue problem (48). This kind of nonlinear eigenvalue problem appear frequently in total energy minimization in electronic structure calculations cedeno2018projected ; saad2010numerical . Pre–multiplying by $X^{\top}$ both sides of (52a) and using (52b) we obtain $\Lambda=X^{\top}H(X)X$ , so substituting this result in (52a), we obtain the following tangent vector field defined on the Stiefel manifold

F(X)\equiv H(X)X-XX^{\top}H(X)X=0_{X},

(53)

where $F:St(n,p)\to\mathcal{T}St(n,p)$ .

In Table 2 we report the results obtained by running the RSANE and CGPR methods on the problem of finding a zero of (53), with six possible choices for the pair $(n,p)$ , obtained by varying $n$ and $p$ in $\{100,500,1000\}$ and $\{10,50\}$ , respectively. For comparison purposes, we repeat our experiments over 30 different random generated starting points $X_{0}\in St(n,p)$ for each pair of $(n,p)$ and report the averaged number of iteration (Nitr), the averaged number of the evaluation of the functional $F(\cdot)$ (Nfe), the averaged total computing time in seconds (Time) and the averaged residual (NrmF) given by $\frac{1}{30}\sum_{i=1}^{30}||F(x_{*}^{i})||_{F}$ , where $x_{*}^{i}$ denotes the estimated solution obtained by the respective algorithm for the $i$ –th starting point, for all $i\in\{1,2,\ldots,30\}$ . In addition, for all the experiments we fix $\mu=1$ , $\epsilon=1$ e-4 as the tolerance for the termination rule based on residual norm $||F(X_{k})||_{F}<\epsilon$ . In this table, RSANE_polar and RSANE_qr denotes our Algorithm 1 using the retractions defined in (43) and (42), respectively. Similarly, CGPR_polar and CGPR_qr denotes the Riemannian Derivative–Free conjugate gradient Polak–Ribiére–Polyak method developed in yao2020riemannian using the retractions (43) and (42), respectively.

As shown in Table 2, RSANE is superior to the Riemannian conjugate gradient method solving nonlinear eigenvalues problems for different choices of $(n,p)$ . In particular, we note that for problems with $p=50$ , our proposal basically converges in half of the iterations that the CGPR takes to reach the desired precision.

Table 2: Numerical results for nonlinear eigenvalue problems.

Method	Nitr	Time	NrmF	Nfe	Nitr	Time	NrmF	Nfe
	$(n,p)=(100,10)$				$(n,p)=(100,50)$
RSANE-polar	70.0	0.022	7.85e-6	167.2	632.9	0.898	2.10e-4	2170.1
RSANE-qr	67.8	0.016	6.87e-6	161.9	620.6	0.505	4.10e-4	2126.9
CGPR-polar	79.0	0.029	8.30e-5	231.3	1209.4	2.747	1.46e-3	6586.4
CGPR-qr	78.2	0.024	8.05e-5	229.3	996.8	1.296	3.30e-4	5451.3
	$(n,p)=(500,10)$				$(n,p)=(500,50)$
RSANE-polar	65.0	0.200	8.10e-6	151.6	311.6	2.048	7.79e-6	955.4
RSANE-qr	68.0	0.211	7.43e-6	160.0	311.5	1.928	8.79e-6	954.6
CGPR-polar	74.3	0.284	8.14e-5	209.3	737.5	7.757	8.86e-5	3601.4
CGPR-qr	73.6	0.280	8.17e-5	210.2	767.1	7.583	9.12e-5	3731.1
	$(n,p)=(1000,10)$				$(n,p)=(1000,50)$
RSANE-polar	68.9	0.628	7.55e-6	159.8	316.3	5.272	8.12e-6	968.9
RSANE-qr	66.8	0.604	6.86e-6	153.4	321.3	5.294	8.03e-6	987.6
CGPR-polar	73.5	0.820	7.57e-5	208.5	790.3	21.224	9.36e-5	3880.0
CGPR-qr	77.4	0.860	6.87e-5	219.6	881.2	23.505	9.50e-5	4347.7

6.4 Joint diagonalization on the oblique manifold

In this subsection we analyze the numerical behaviour of our RSANE method solving the nonlinear equation based on the tangent vector field obtained from the Riemannian gradient of the scalar function $\mathcal{F}:\mathcal{OB}(n,p)\to\mathbb{R}$ , defined by

\mathcal{F}(X)=\sum_{i=1}^{N}||\verb"off"(X^{\top}C_{i}X)||_{F}^{2},

(54)

where $\verb"off"(W)=W-\verb"ddiag"(W)$ and $C_{i}\in\mathbb{R}^{n\times n}$ are given symmetric matrices. The minimization of this function on $\mathcal{OB}(n,p)$ is frequently used to perform independent component analysis, see absil2006joint . It is well–known that the problem of minimizing (54) is closely related to the problem of finding a zero to the Riemannian gradient of the function $\mathcal{F}(\cdot)$ , which is given by

\nabla_{\mathcal{OB}(n,p)}\mathcal{F}(X)=\sum_{i=1}^{N}4C_{i}X\verb"off"(X^{\top}C_{i}X),

(55)

for details see Section 3 in absil2006joint . In order to study the numerical performance of our RSANE compared with the CGPR method, we consider the following tangent vector field

F(X)\equiv\sum_{i=1}^{N}4C_{i}X\verb"off"(X^{\top}C_{i}X)=0_{X},

(56)

which is obtained form the Riemannian gradient of $\mathcal{F}(\cdot)$ .

Now, we present a computational comparison considering the RSANE and CGPR yao2020riemannian methods on the solution of the tangent vector field (56). To do this, we set $N=5$ , and study the numerical behavior of both methods for the pairs of values $(n,p)=(500,100)$ y $(n,p)=(1000,100)$ . For each pair $(n,p)$ , we repeat 30 independent runs of RSANE and CGPR, generating the symmetric matrices $C_{i}\in\mathbb{R}^{n\times n}$ as follows

C_{i}=D+B_{i}+B_{i}^{\top},

where $D\in\mathbb{R}^{n\times n}$ was a diagonal matrix, whose diagonal elements are given by $d_{ii}=\sqrt{n+i}$ , for all $i\in\{1,2,\ldots,n\}$ ; and the $B_{i}^{\prime}s\in\mathbb{R}^{n\times n}$ were randomly generated matrices, whose entries follow a standard normal distribution. This particular structure of the $C_{i}$ ’s matrices was taken from zhu2017riemannian . The starting point $X_{0}\in\mathcal{OB}(n,p)$ was randomly generated using the following Matlab commands

X_{0}=\verb"zeros"(n,p);\quad M=\verb"randn"(n,p)\quad\textrm{and}\quad X_{0}(:,i)=M(:,i)/\verb"norm"(M(:,i)),

for all $i=1,...,p$ . For this comparison, we use $\epsilon=1$ e- $5$ as the tolerance for the stopping rule based on the residual norm $||F(X_{k})||_{F}<\epsilon$ .

The mean of number of iteration, number of evaluation of $F(\cdot)$ , total computational time in seconds, and the residual “NrmF” defined as in the previous subsection, are reported in Table 3. In addition, in Figure 2 we plot the numerical behavior associated to the average residual curve $||F(X_{k})||_{F}$ throughout the iterations, for all the methods and for each pairs $(n,p)$ .

From Table 3 and Figure 2, we can see that both RSANE and CGPR can find an approximation of the solution of problem (56) with the pre–established precision for the residual norm, in the two cases of $(n,p)$ considered. In addition, we clearly observe that RSANE performs better than CGPR in terms of the mean value of iterations and CPU–time.

Table 3: Numerical results for joint diagonalization Riemannian gradient vector field.

Method	Nitr	Time	NrmF	Nfe
$(n,p)=(500,100)$
RSANE	157.8	3.950	6.85e-6	390.0
CGPR	238.8	7.589	7.84e-6	741.8
$(n,p)=(1000,100)$
RSANE	68.1	4.894	6.50e-6	142.5
CGPR	153.4	14.342	7.59e-6	416.1

7 Concluding Remarks

In this paper, a Riemannian residual approach for finding a zero of a tangent vector field is proposed. The new approach can be seen as an extended version of the SANE method developed in cruz2003nonmonotone , for the solution of large–scale nonlinear systems of equations. Since the proposed method systematically uses the tangent vector field for building the search direction, RSANE is very easy to implement and has low storage requirements, which is suitable for solving large–scale problems. In addition, our proposal uses a modification of the Riemannian Barzilai–Borwein step–sizes introduced in iannazzo2018riemannian , combined with the Zhang–Hager globalization strategy zhang2004nonmonotone , in order to guarantee the convergence of the associated residual sequence.

The preliminary numerical results show that RSANE performs efficiently, dealing with several tangent vector fields considering both real and simulated data, and different Riemannian manifolds. In particular, RSANE is competitive against its Euclidean version (SANE), solving large–scale eigenvalue problems. Additionally, RSANE outperforms the derivative-free conjugate gradient algorithm recently published in yao2020riemannian , on two considered matrix manifolds.

References

[1] P-A Absil and Kyle A Gallivan. Joint diagonalization on the oblique manifold for independent component analysis. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, volume 5, pages V–V. IEEE, 2006.
[2] P-A Absil, Robert Mahony, and Rodolphe Sepulchre. Optimization algorithms on matrix manifolds. Princeton University Press, 2009.
[3] Roy L Adler, Jean-Pierre Dedieu, Joseph Y Margulies, Marco Martens, and Mike Shub. Newton’s method on riemannian manifolds and a geometric model for the human spine. IMA Journal of Numerical Analysis, 22(3):359–390, 2002.
[4] Jonathan Barzilai and Jonathan M Borwein. Two-point step size gradient methods. IMA journal of numerical analysis, 8(1):141–148, 1988.
[5] Paul Breiding and Nick Vannieuwenhoven. Convergence analysis of riemannian gauss–newton methods and its connection with the geometric condition number. Applied Mathematics Letters, 78:42–50, 2018.
[6] Oscar Susano Dalmau Cedeno and Harry Fernando Oviedo Leon. Projected nonmonotone search methods for optimization with orthogonality constraints. Computational and Applied Mathematics, 37(3):3118–3144, 2018.
[7] Timothy A Davis and Yifan Hu. The university of florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS), 38(1):1, 2011.
[8] Jean-Pierre Dedieu, Pierre Priouret, and Gregorio Malajovich. Newton’s method on riemannian manifolds: covariant alpha theory. IMA Journal of Numerical Analysis, 23(3):395–419, 2003.
[9] Alan Edelman, Tomás A Arias, and Steven T Smith. The geometry of algorithms with orthogonality constraints. SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998.
[10] Luigi Grippo, Francesco Lampariello, and Stephano Lucidi. A nonmonotone line search technique for newton’s method. SIAM Journal on Numerical Analysis, 23(4):707–716, 1986.
[11] Bruno Iannazzo and Margherita Porcelli. The riemannian barzilai–borwein method with nonmonotone line search and the matrix geometric mean computation. IMA Journal of Numerical Analysis, 38(1):495–517, 2018.
[12] Effrosini Kokiopoulou, Jie Chen, and Yousef Saad. Trace optimization and eigenproblems in dimension reduction methods. Numerical Linear Algebra with Applications, 18(3):565–602, 2011.
[13] William La-Cruz and Marcos Raydan. Nonmonotone spectral methods for large-scale nonlinear systems. Optimization Methods and Software, 18(5):583–599, 2003.
[14] Chong Li and Jinhua Wang. Convergence of the newton method and uniqueness of zeros of vector fields on riemannian manifolds. Science in China Series A: Mathematics, 48(11):1465–1478, 2005.
[15] Jonathan H Manton. Optimization algorithms exploiting unitary constraints. IEEE Transactions on Signal Processing, 50(3):635–650, 2002.
[16] Richard M Martin. Electronic structure: basic theory and practical methods. Cambridge university press, 2020.
[17] Erkki Oja. Neural networks, principal components, and subspaces. International journal of neural systems, 1(01):61–68, 1989.
[18] Harry Oviedo. Implicit steepest descent algorithm for optimization with orthogonality constraints.
[19] Harry Oviedo and Oscar Dalmau. A scaled gradient projection method for minimization over the stiefel manifold. In Mexican International Conference on Artificial Intelligence, pages 239–250. Springer, 2019.
[20] Harry Oviedo, Oscar Dalmau, and Hugo Lara. Two adaptive scaled gradient projection methods for stiefel manifold constrained optimization. Numerical Algorithms, pages 1–21, 2020.
[21] Harry Oviedo, Hugo Lara, and Oscar Dalmau. A non-monotone linear search algorithm with mixed direction on stiefel manifold. Optimization Methods and Software, 34(2):437–457, 2019.
[22] Marcos Raydan. On the barzilai and borwein choice of steplength for the gradient method. IMA Journal of Numerical Analysis, 13(3):321–326, 1993.
[23] Marcos Raydan. The barzilai and borwein gradient method for the large scale unconstrained minimization problem. SIAM Journal on Optimization, 7(1):26–33, 1997.
[24] Wolfgang Ring and Benedikt Wirth. Optimization methods on riemannian manifolds and their application to shape space. SIAM Journal on Optimization, 22(2):596–627, 2012.
[25] Yousef Saad, James R Chelikowsky, and Suzanne M Shontz. Numerical methods for electronic structure calculations of materials. SIAM review, 52(1):3–54, 2010.
[26] Hiroyuki Sato. Riemannian newton’s method for joint diagonalization on the stiefel manifold with application to ica. arXiv preprint arXiv:1403.8064, 2014.
[27] Hiroyuki Sato and Toshihiro Iwai. A new, globally convergent riemannian conjugate gradient method. Optimization, 64(4):1011–1031, 2015.
[28] Pavan Turaga, Ashok Veeraraghavan, and Rama Chellappa. Statistical analysis on stiefel and grassmann manifolds with applications in computer vision. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8. IEEE, 2008.
[29] Zaiwen Wen, Chao Yang, Xin Liu, and Yin Zhang. Trace-penalty minimization for large-scale eigenspace computation. Journal of Scientific Computing, 66(3):1175–1203, 2016.
[30] Teng-Teng Yao, Zhi Zhao, Zheng-Jian Bai, and Xiao-Qing Jin. A riemannian derivative-free polak–ribiére–polyak method for tangent vector field. Numerical Algorithms, pages 1–31, 2020.
[31] Hongchao Zhang and William W Hager. A nonmonotone line search technique and its application to unconstrained optimization. SIAM journal on Optimization, 14(4):1043–1056, 2004.
[32] Teng Zhang and Yi Yang. Robust pca by manifold optimization. The Journal of Machine Learning Research, 19(1):3101–3139, 2018.
[33] Xin Zhang, Jinwei Zhu, Zaiwen Wen, and Aihui Zhou. Gradient type optimization methods for electronic structure calculations. SIAM Journal on Scientific Computing, 36(3):C265–C289, 2014.
[34] Xiaojing Zhu. A riemannian conjugate gradient method for optimization on the stiefel manifold. Computational optimization and Applications, 67(1):73–110, 2017.