This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: H. Oviedo 22institutetext: Mathematics Research Center, CIMAT A.C. Guanajuato, Mexico.
22email: harry.oviedo@cimat.mx
33institutetext: H. Lara 44institutetext: Universidade Federal de Santa Catarina, Campus Blumenau, Blumenau, Brazil.
44email: hugo.lara.urdaneta@ufsc.br

Spectral Residual Method for Nonlinear Equations on Riemannian Manifolds

Harry Oviedo    Hugo Lara
(Received: date / Accepted: date)
Abstract

In this paper, the spectral algorithm for nonlinear equations (SANE) is adapted to the problem of finding a zero of a given tangent vector field on a Riemannian manifold. The generalized version of SANE uses, in a systematic way, the tangent vector field as a search direction and a continuous real–valued function that adapts this direction and ensures that it verifies a descent condition for an associated merit function. In order to speed up the convergence of the proposed method, we incorporate a Riemannian adaptive spectral parameter in combination with a non–monotone globalization technique. The global convergence of the proposed procedure is established under some standard assumptions. Numerical results indicate that our algorithm is very effective and efficient solving tangent vector field on different Riemannian manifolds and competes favorably with a Polak–Ribiére–Polyak Method recently published and other methods existing in the literature.

Keywords:
Tangent vector field Riemannian manifold Nonlinear system of equations Spectral residual method Non–monotone line search.
MSC:
65K05, 90C30, 90C56, 53C21.

1 Introduction

In this work, we consider the problem of finding a zero of a tangent vector field F()F(\cdot) over a Riemannian manifold \mathcal{M}, with an associated Riemannian metric ,\langle\cdot,\cdot\rangle (a local inner product which induces a corresponding local metric). The problem can be mathematically formulated as the solution of the following nonlinear equation

F(X)=0X,F(X)=0_{X}, (1)

where F:𝒯F:\mathcal{M}\to\mathcal{TM} is a continuously differentiable tangent vector field, and 𝒯:=XTX\mathcal{TM}:=\cup_{X\in\mathcal{M}}T_{X}\mathcal{M} denotes the tangent bundle of \mathcal{M}, i.e., 𝒯\mathcal{TM} is the union of all tangent spaces at points in the manifold. Here, 0X0_{X} denotes the zero vector of the tangent space TXT_{X}{\mathcal{M}}. This kind of problem appears frequently in several applications, for example: statistical principal component analysis oja1989neural , where the Oja’s flow induces the associated vector field, total energy minimization in electronic structure calculations martin2020electronic ; saad2010numerical ; zhang2014gradient , linear eigenvalue problems manton2002optimization ; wen2016trace , dimension reduction techniques in pattern recognition kokiopoulou2011trace ; turaga2008statistical ; zhang2018robust , Riemannian optimization problems, where the Riemannian gradient flow leads to the associated tangent vector field absil2009optimization ; edelman1998geometry ; ring2012optimization , among others.

Problem (1) is closely related to the problem of minimizing a differentiable function over the manifold \mathcal{M},

min(X)s.t.X,\min\mathcal{F}(X)\quad\textrm{s.t.}\quad X\in\mathcal{M}, (2)

where :\mathcal{F}:\mathcal{M}\to\mathbb{R} is a smooth function. Different iterative methods have been developed for solving (2). Some popular schemes are based on gradient method cedeno2018projected ; iannazzo2018riemannian ; manton2002optimization ; oviedoimplicit ; oviedo2019scaled ; oviedo2020two ; oviedo2019non , conjugate gradient methods absil2009optimization ; edelman1998geometry ; zhu2017riemannian , Newton’s method absil2009optimization ; sato2014riemannian , or quasi–Newton methods absil2009optimization . All these numerical methods can be used to find a zero of the following tangent vector field equation,

(X)=0X,\nabla_{\mathcal{M}}\mathcal{F}(X)=0_{X}, (3)

where ()\nabla_{\mathcal{M}}\mathcal{F}(\cdot) denotes the Riemannian gradient of ()\mathcal{F}(\cdot), which is a particular case of problem (1). The Riemannian line–search methods, designed to solve the optimization problem (2) construct a sequence of points using the following recursive formula

Xk+1=RXk[ξXk],X_{k+1}=R_{X_{k}}[\xi_{X_{k}}], (4)

where RX:TXR_{X}:T_{X}\mathcal{M}\to\mathcal{M} is a retraction (see Definition 1), and ξXkTXk\xi_{X_{k}}\in T_{X_{k}}\mathcal{M} is a descent direction, i.e, ξXk\xi_{X_{k}} verifies the inequality (Xk),ξXk<0\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),\xi_{X_{k}}\rangle<0 for all k0k\geq 0. Among the Riemannian line–search methods, the Riemannian gradient approach exhibits the lowest cost per iteration. This method uses the gradient vector field ()\nabla_{\mathcal{M}}\mathcal{F}(\cdot) to define the search direction by ξXk=(Xk)\xi_{X_{k}}=-\nabla_{\mathcal{M}}\mathcal{F}(X_{k}), at each iteration.

In the literature, there are some iterative algorithms addressing the problem (1). In adler2002newton ; dedieu2003newton ; li2005convergence were developed several Riemannian Newton methods for the solution of tangent vector fields on general Riemannian manifolds. Among the features of Newton’s method, the requirement of using second–order information and geodesics (that involves the computation of exponential mapping) to ensure keeping into the corresponding manifold, leads to the growth of computational cost. In addition, the authors in breiding2018convergence proposed a Riemannian Gauss–Newton method to address the solution of (1) through the optimization problem (2). Recently, in yao2020riemannian was introduced a Riemannian conjugate gradient to deal with the numerical solution of (1), that does not need derivative computation (it does not use the Jacobian of F()F(\cdot)), and that incorporates the use of retractions (see Definition 1), which is a mapping that generalizes the definition of geodesics, and that was introduced by Absil in absil2009optimization to deal with optimization problems on matrix manifolds.

In the Euclidean case =n\mathcal{M}=\mathbb{R}^{n} in (1), i.e., for the solution of standard nonlinear system of equations, the authors in cruz2003nonmonotone introduced a method called SANE, which uses the residual ±F(Xk)\pm F(X_{k}) as a search direction. Then the trial point, at each iteration, is computed by Xk+1=Xk±τkF(Xk)X_{k+1}=X_{k}\pm\tau_{k}F(X_{k}), where τk\tau_{k} is a spectral coefficient based on the Barzilai–Borwein step–size barzilai1988two ; raydan1993barzilai . This iterative process uses precisely the functional F()F(\cdot), in order to define the search direction. It’s feature of been a derivative–free procedure, is highly attractive, lowering the storage requirements and the computational cost per iteration.

Motivated by the Riemannian gradient and SANE methods, in this paper, we introduce RSANE, which is a generalization of SANE to tackle the numerical solution of nonlinear equations on Riemannian manifolds. In particular, we modified the update formula of SANE by incorporating a retraction, in order to guarantee that each point XkX_{k} belongs to the desired manifold. By following the ideas of the Riemannian Barzilai–Borwein method developed by Iannazzo et.al in iannazzo2018riemannian , we propose an extension of the spectral parameter τk\tau_{k} to the case of Riemannian manifolds, using mappings so–called scaled vector transport. In addition, we present the convergence analysis of the proposed method obtained under the Zhang–Hager globalization strategy zhang2004nonmonotone . Finally, some numerical experiments are reported to illustrate the efficiency and effectiveness of our proposal.

The rest of this paper is organized as follow. To do this article self–contained, we briefly review, in Section 2, some concepts and tools from Riemannian geometry that it can be founded in absil2009optimization . In Section 3, we present our proposed Riemannian spectral residual method (RSANE) for solving (1). Section 4 is devoted to the convergence analysis concerning our proposed method. In Section 5 numerical tests are carried out, in order to illustrate the good performance of our approach considering the computing of eigenspaces associated to given symmetric matrices using both simulated data and real data. Finally, conclusions and perspectives are provided in Section 7.

2 Preliminaries on Riemannian Geometry

In this section, we briefly review some notions and tools of Riemannian geometry crucial for understanding this paper, by summarizing absil2009optimization .

Let \mathcal{M} be a Riemannian manifold with an associated Riemannian metric ,\langle\cdot,\cdot\rangle, and let TXT_{X}\mathcal{M} be its tangent vector space at a given point XX\in\mathcal{M}. In addition, let :\mathcal{F}:\mathcal{M}\to\mathbb{R} a smooth scalar function defined on the Riemannian manifold \mathcal{M}, the Riemannian gradient of ()\mathcal{F}(\cdot) at XX, denoted by (X)\nabla_{\mathcal{M}}\mathcal{F}(X), is defined as the unique element of TXT_{X}\mathcal{M} that verifies

(X),ξX=𝒟(X)[ξX],ξXTX,\langle\nabla_{\mathcal{M}}\mathcal{F}(X),\xi_{X}\rangle=\mathcal{DF}(X)[\xi_{X}],\quad\forall\xi_{X}\in T_{X}\mathcal{M},

where 𝒟(X)[ξX]\mathcal{DF}(X)[\xi_{X}] is the function that takes any point XX\in\mathcal{M} to the directional derivative of ()\mathcal{F}(\cdot) in the direction ξX\xi_{X}, evaluated at XX\in\mathcal{M}. In the particular case that \mathcal{M} is a Riemannian submanifold of an Euclidean space \mathcal{E}, we have an explicit evaluation of the gradient: let ¯:\mathcal{\bar{F}}:\mathcal{E}\to\mathbb{R} a smooth function defined on \mathcal{E} and let :\mathcal{F}:\mathcal{M}\subset\mathcal{E}\to\mathbb{R}, then the Riemannian gradient of ()\mathcal{F}(\cdot) evaluated at XX\in\mathcal{M} is equal to the orthogonal projection of the standard gradient of ¯()\mathcal{\bar{F}}(\cdot) onto TXT_{X}\mathcal{M}, that is,

(X)=PX[¯(X)].\nabla_{\mathcal{M}}\mathcal{F}(X)=P_{X}[\nabla\mathcal{\bar{F}}(X)]. (5)

This result provides us an important tool to compute the Riemannian gradient, which will be useful in the experiments section.

Another fundamental concept for this work is retraction. This can be seen as a smooth function that pragmatically approximates the notion of geodesics edelman1998geometry . Now we present its rigorous definition.

Definition 1 (absil2009optimization )

A retraction on a manifold \mathcal{M} is a smooth mapping R[]R[\cdot] from the tangent bundle 𝒯\mathcal{TM} onto \mathcal{M} with the following properties. Let RX[]R_{X}[\cdot] denote the restriction of R[]R[\cdot] to TXT_{X}\mathcal{M}.

  1. 1.

    RX[0X]=XR_{X}[0_{X}]=X, where 0X0_{X} denotes the zero element of TXT_{X}\mathcal{M}

  2. 2.

    With the canonical identification, T0XTXTXT_{0_{X}}T_{X}\mathcal{M}\simeq T_{X}\mathcal{M}, RX[]R_{X}[\cdot] satisfies

    𝒟RX[0X]=idTX,\mathcal{D}R_{X}[0_{X}]=\textrm{id}_{T_{X}\mathcal{M}},

    where idTX\textrm{id}_{T_{X}\mathcal{M}} denotes the identity mapping on TXT_{X}\mathcal{M}.

The second condition in Definition 1 is known as local rigidity condition.

The concept of vector transport, which appears in absil2009optimization , provides us a tool to perform operations between two or more vectors that belong to different tangent spaces of \mathcal{M}, and can be seen as a relaxation of the purely Riemannian concept of parallel transport edelman1998geometry .

Definition 2 (absil2009optimization )

A vector transport 𝒯[]\mathcal{T}[\cdot] on a manifold \mathcal{M} is a smooth mapping

𝒯:𝒯𝒯𝒯:(η,ξ)𝒯η[ξ]𝒯,\mathcal{T}:\mathcal{TM}\oplus\mathcal{TM}\rightarrow\mathcal{TM}:(\eta,\xi)\mapsto\mathcal{T}_{\eta}[\xi]\in\mathcal{TM},

satisfying the following properties for all XX\in\mathcal{M} where \oplus denote the Whitney sum, that is,

𝒯𝒯={(η,ξ):η,ξTX,X}.\mathcal{TM}\oplus\mathcal{TM}=\{(\eta,\xi):\eta,\xi\in T_{X}\mathcal{M},X\in\mathcal{M}\}.
  1. 1.

    There exists a retraction R[]R[\cdot], called the retraction associated with 𝒯[]\mathcal{T}[\cdot], such that

    π(𝒯η[ξ])=RX(η),η,ξTX,\pi(\mathcal{T}_{\eta}[\xi])=R_{X}(\eta),\quad\eta,\xi\in T_{X}\mathcal{M},

    where π(𝒯η[ξ])\pi(\mathcal{T}_{\eta}[\xi]) denotes the foot of the tangent vector 𝒯η[ξ]\mathcal{T}_{\eta}[\xi].

  2. 2.

    𝒯0X[ξ]=ξ\mathcal{T}_{0_{X}}[\xi]=\xi for all ξTX\xi\in T_{X}\mathcal{M}.

  3. 3.

    𝒯η[aξ+bζ]=a𝒯η[ξ]+b𝒯η[ζ]\mathcal{T}_{\eta}[a\xi+b\zeta]=a\mathcal{T}_{\eta}[\xi]+b\mathcal{T}_{\eta}[\zeta], for all a,ba,b\in\mathbb{R} and η,ξ,ζTX\eta,\xi,\zeta\in T_{X}\mathcal{M}.

Next, the concept of isometry zhu2017riemannian is established, which is a property satisfied by some vector transports.

Definition 3 (zhu2017riemannian )

A vector transport 𝒯[]\mathcal{T}[\cdot] on a manifold \mathcal{M} is called isometric if it satisfies

𝒯η[ξ],𝒯η[ξ]=ξ,ξ,\langle\mathcal{T}_{\eta}[\xi],\mathcal{T}_{\eta}[\xi]\rangle=\langle\xi,\xi\rangle, (6)

for all η,ξTX\eta,\xi\in T_{X}\mathcal{M}, where RX[]R_{X}[\cdot] is the retraction associated with 𝒯[]\mathcal{T}[\cdot].

3 Spectral Approach for Tangent Vector Field on Riemannian Manifolds

In this section, we shall establish our proposal RSANE. An intuitive way to solve (1) is to promote the reduction of the residual F()||F(\cdot)||, which we can achieve by solving the following auxiliar manifold constrained optimization problem

minX(X)=12F(X)2.\min_{X\in\mathcal{M}}\mathcal{F}(X)=\frac{1}{2}||F(X)||^{2}. (7)

We can deal with this optimization model using some Riemannian optimization method. Nevertheless, we are interested in directly solving the Riemannian nonlinear equation (1). For this purpose, we consider the following iterative method, based on the SANE method,

Xk+1=XkτkF(Xk).X_{k+1}=X_{k}-\tau_{k}F(X_{k}). (8)

Firstly, the vector F(Xk)-F(X_{k}) is not necessarily a descent direction for the merit function ()\mathcal{F}(\cdot) and secondly, that Xk+1X_{k+1} does not necessarily belong to the manifold \mathcal{M}. We can overcome the first one, by modifying this vector with the sign of (Xk),F(Xk)\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),F(X_{k})\rangle, following the same idea used in SANE method, in order to force ±F(Xk)\pm F(X_{k}) satisfies the descent condition. Observe that the Riemannian gradient method of ()\mathcal{F}(\cdot) can be computed by

(X)=(JF(X))[F(X)],X,\nabla_{\mathcal{M}}\mathcal{F}(X)=(JF(X))^{*}[F(X)],\quad\forall X\in\mathcal{M}, (9)

that is, to compute the Riemannian gradient of ()\mathcal{F}(\cdot) at XX, we need to calculate the adjoint of the Jacobian of F()F(\cdot) evaluated at XX.

On the other hand, the second disadvantage can be easily remedied by incorporating a retraction, similarly to the scheme (4). Keeping in mid all theses considerations, we now propose our Riemannian spectral residual method, which computes the iterates recursively by

Xk+1=RXk[τkZk],X_{k+1}=R_{X_{k}}[\tau_{k}Z_{k}], (10)

where τk>0\tau_{k}>0 represents the step–size, RX[]R_{X}[\cdot] is a retraction and the search direction is determined by

Zk=sθ((Xk),F(Xk))F(Xk),Z_{k}=-s_{\theta}(\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),F(X_{k})\rangle)F(X_{k}), (11)

where sθ():{0}{1,1}s_{\theta}(\cdot):\mathbb{R}-\{0\}\to\{-1,1\} is defined by s(x)=x|x|s(x)=\frac{x}{|x|}. Observe that s()s(\cdot) is a continuous function for all x0x\neq 0, a crucial property to our convergence analysis.

3.1 A Nonmonotone Line Search with a Riemannian Spectral Parameter

In the scenario of the solution of nonlinear systems of equations over n\mathbb{R}^{n}, the SANE method uses a spectral parameter τk\tau_{k} inspired by the Barzilai–Borwein step–size, originally introduced in barzilai1988two to speed up the convergence of gradient–type methods, in the context of optimization. SANE computes this spectral parameter as follow

τk+1BB=sgnkSkSkSkYk,\tau_{k+1}^{BB}=\textrm{sgn}_{k}\frac{S_{k}^{\top}S_{k}}{S_{k}^{\top}Y_{k}}, (12)

where Sk=Xk+1XkS_{k}=X_{k+1}-X_{k}, Yk=F(Xk+1)F(Xk)Y_{k}=F(X_{k+1})-F(X_{k}), sgnk=s(F(Xk)J(Xk)F(Xk))\textrm{sgn}_{k}=s(F(X_{k})^{\top}J(X_{k})F(X_{k})) and J()J(\cdot) denotes the Jacobian matrix of the vector–valued function F:nnF:\mathbb{R}^{n}\to\mathbb{R}^{n}.

In the framework of Riemannian manifolds, the vectors F(Xk+1)TXk+1F(X_{k+1})\in T_{X_{k+1}}\mathcal{M} and F(Xk)TXkF(X_{k})\in T_{X_{k}}\mathcal{M} lie in different tangent spaces, then the difference between these vectors may not be well defined (this is only well defined over linear manifolds). The same drawback occurs with the difference between the points Xk+1X_{k+1} and XkX_{k}. Therefore, we cannot directly use the parameter (12) to address the numerical solution of (1). In iannazzo2018riemannian Iannazzo et. al. was extended the Barzilai–Borwein step–sizes in the context of optimization on Riemannian manifolds, through the use of a vector transport (see Definition 2). This strategy transports the calculated directions to the correct tangent space, providing a way to overcome the drawback. Guided by the descriptions contained in iannazzo2018riemannian , we propose the following generalization of the spectral parameter (12),

τk+1RBB1=s(σk)S^k,S^kS^k,Y^k,\tau_{k+1}^{RBB1}=s(\sigma_{k})\frac{\langle\hat{S}_{k},\hat{S}_{k}\rangle}{\langle\hat{S}_{k},\hat{Y}_{k}\rangle}, (13)

where σk=(Xk),F(Xk)\sigma_{k}=\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),F(X_{k})\rangle,

S^k=𝒯τkZk[τkZk]=τks(σk)𝒯τkZk[F(Xk)]\hat{S}_{k}=\mathcal{T}_{\tau_{k}Z_{k}}[\tau_{k}Z_{k}]=-\tau_{k}s(\sigma_{k})\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})] (14)

and

Y^k=F(Xk+1)𝒯τkZk[F(Xk)]=F(Xk+1)+1τks(σk)S^k,\hat{Y}_{k}=F(X_{k+1})-\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})]=F(X_{k+1})+\frac{1}{\tau_{k}s(\sigma_{k})}\hat{S}_{k}, (15)

where 𝒯[]\mathcal{T}[\cdot] is any vector transport satisfying the Ring–Wirth non–expansive condition,

𝒯ηX[ξX]ξX,ξX,ηXTX.||\mathcal{T}_{\eta_{X}}[\xi_{X}]||\leq||\xi_{X}||,\quad\forall\xi_{X},\eta_{X}\in T_{X}\mathcal{M}. (16)

Another alternative for the spectral parameter is

τk+1RBB2=s(σk)S^k,Y^kY^k,Y^k.\tau_{k+1}^{RBB2}=s(\sigma_{k})\frac{\langle\hat{S}_{k},\hat{Y}_{k}\rangle}{\langle\hat{Y}_{k},\hat{Y}_{k}\rangle}. (17)

In order to take advantage of both spectral parameters τk+1RBB1\tau_{k+1}^{RBB1} and τk+1RBB2\tau_{k+1}^{RBB2}, we adopt the following adaptive strategy

τk+1RBB={τk+1RBB1for even k;τk+1RBB2for odd k.\tau^{RBB}_{k+1}=\left\{\begin{array}[]{ll}\tau_{k+1}^{RBB1}&\textrm{for even k};\\ \tau_{k+1}^{RBB2}&\textrm{for odd k}.\end{array}\right. (18)

Note that it is always possible to define a transporter (a function that sends vectors from a tangent space to another tangent space) that satisfies the condition (16), by scaling,

𝒯ηXscaled[ξX]={𝒯η[ξ]if 𝒯ηX[ξX]ξX;ξ𝒯η[ξ]𝒯η[ξ]otherwise.\mathcal{T}_{\eta_{X}}^{\textrm{scaled}}[\xi_{X}]=\left\{\begin{array}[]{ll}\mathcal{T}_{\eta}[\xi]&\textrm{if }||\mathcal{T}_{\eta_{X}}[\xi_{X}]||\leq||\xi_{X}||;\\ \frac{||\xi||}{||\mathcal{T}_{\eta}[\xi]||}\mathcal{T}_{\eta}[\xi]&\textrm{otherwise}.\end{array}\right. (19)

This function, introduced by Sato and Iwai in sato2015new , is referred as scaled vector transport. Observe that (19) is not necessarily a vector transport. However for the extension of the spectral parameter to the setting of Riemannian manifolds, it is not strictly mandatory. In fact, it is enough having a non–expansive transporter available. Therefore, we will use the scaled vector transport (19), in the construction of the vectors S^k\hat{S}_{k} and Y^k\hat{Y}_{k} in equations (14)–(15).

Since the spectral parameter τkRBB\tau_{k}^{RBB} does not necessarily reduces the value of the merit function ()\mathcal{F}(\cdot) at each iteration, the convergence result could be invalid. We can overcome this drawback by incorporating a globalization strategy, which guarantees convergence by regulating the step–size τk\tau_{k} only occasionally (see cruz2003nonmonotone ; raydan1997barzilai ; zhang2004nonmonotone ). In the seminal paper cruz2003nonmonotone , the authors consider the globalization technique proposed by Grippo et.al. in grippo1986nonmonotone , in the definition of SANE method.

We could define our Riemannian generalization of SANE incorporating this non–monotone technique, and so analyze the convergence following the ideas described in cruz2003nonmonotone ; iannazzo2018riemannian . Instead of that, in this work, to define RSANE we adopt a more elegant globalization strategy proposed by Zhang and Hager in zhang2004nonmonotone . Specifically, we compute τk=δhτkRBB\tau_{k}=\delta^{h}\tau_{k}^{RBB} where hh\in\mathbb{N} is the smallest integer satisfying

(RXk[τkZk])Ck+ρ1τk(Xk),Zk,\mathcal{F}(R_{X_{k}}[\tau_{k}Z_{k}])\leq C_{k}+\rho_{1}\tau_{k}\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),Z_{k}\rangle, (20)

where each value Ck+1C_{k+1} is given by a convex combination of (Xk+1)\mathcal{F}(X_{k+1}) and the previous CkC_{k} as

Ck+1=ηQkCk+(Xk+1)Qk+1,C_{k+1}=\frac{\eta Q_{k}C_{k}+\mathcal{F}(X_{k+1})}{Q_{k+1}},

for Qk+1=ηQk+1Q_{k+1}=\eta Q_{k}+1, starting at Q0=1Q_{0}=1 and C0=(X0)C_{0}=\mathcal{F}(X_{0}). In the sequel our generalization RSANE will be described in detail.

0:  Let X0X_{0}\in\mathcal{M} be the initial guess, τ>0\tau>0, 0<τmτM<0<\tau_{m}\leq\tau_{M}<\infty, η[0,1)\eta\in[0,1), ρ1,ϵ,ϵ1,δ(0,1)\rho_{1},\epsilon,\epsilon_{1},\delta\in(0,1), Q0=1Q_{0}=1, C0=(X0)C_{0}=\mathcal{F}(X_{0}), k=0k=0.
1:  while  F(Xk)>ϵ||F(X_{k})||>\epsilon  do
2:     σk=(Xk),F(Xk)\sigma_{k}=\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),F(X_{k})\rangle,
3:     if  |σk|<ϵ1F(Xk)2|\sigma_{k}|<\epsilon_{1}||F(X_{k})||^{2}  then
4:        stop the process.
5:     end if
6:     Zk=s(σk)F(Xk)Z_{k}=-s(\sigma_{k})F(X_{k}),
7:     while  (RXk[τZk])>Ckρ1ϵ1τF(Xk)2\mathcal{F}(R_{X_{k}}[\tau Z_{k}])>C_{k}-\rho_{1}\epsilon_{1}\tau||F(X_{k})||^{2} do
8:        τδτ\tau\leftarrow\delta\tau,
9:     end while
10:     τk=τ\tau_{k}=\tau,
11:     Xk+1=RXk[τkZk]X_{k+1}=R_{X_{k}}[\tau_{k}Z_{k}],
12:     Qk+1=ηQk+1Q_{k+1}=\eta Q_{k}+1 and Ck+1=(ηQkCk+(Xk+1))/Qk+1C_{k+1}=(\eta Q_{k}C_{k}+\mathcal{F}(X_{k+1}))/Q_{k+1},
13:     S^k=τks(σk)𝒯τkZk[F(Xk)]\hat{S}_{k}=-\tau_{k}s(\sigma_{k})\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})] and Y^k=F(Xk1)+1τks(σk)S^k\hat{Y}_{k}=F(X_{k-1})+\frac{1}{\tau_{k}s(\sigma_{k})}\hat{S}_{k},
14:     τk+1RBB=s(σk)S^k,S^kS^k,Y^k\tau_{k+1}^{RBB}=s(\sigma_{k})\frac{\langle\hat{S}_{k},\hat{S}_{k}\rangle}{\langle\hat{S}_{k},\hat{Y}_{k}\rangle},
15:     τ=max(min(τk+1RBB,τM),τm)\tau=\max(\min(\tau_{k+1}^{RBB},\tau_{M}),\tau_{m}).
16:     kk+1k\leftarrow k+1.
17:  end while
Algorithm 1 Spectral Residual Method for tangent vector field (RSANE)
Remark 1

In Algorithm 1, we replace the nonmonotone condition (20) by

(RXk[τZk])Ckρ1ϵ1τF(Xk)2.\mathcal{F}(R_{X_{k}}[\tau Z_{k}])\leq C_{k}-\rho_{1}\epsilon_{1}\tau||F(X_{k})||^{2}. (21)

We remark that with this relaxed condition, Algorithm 1 is well defined. In fact, if at iteration kk the procedure does not stop at Step 4, then ZkZ_{k} is a descent direction (see Lemma 2), and for all ρ1(0,1)\rho_{1}\in(0,1) there exists t>0t>0 such that the non–monotone Zhang–Hager condition (20) holds by continuity, for τ>0\tau>0 sufficiently small(a proof of this fact appears in zhang2004nonmonotone ). In addition, it follows form Step 4, (20) and Lemma 2 that

(RXk[τZk])\displaystyle\mathcal{F}(R_{X_{k}}[\tau Z_{k}]) \displaystyle\leq Ck+ρ1τ(Xk),Zk\displaystyle C_{k}+\rho_{1}\tau\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),Z_{k}\rangle
\displaystyle\leq Ckρ1ϵ1τF(Xk)2,\displaystyle C_{k}-\rho_{1}\epsilon_{1}\tau||F(X_{k})||^{2},

which implies that the relaxed condition (21) is also verified for all τ>0\tau>0 that satisfy (20).

Remark 2

The bottleneck of Algorithm 1 appears in step 2, since to calculate σk\sigma_{k}, we must compute the Riemannian gradient of ()\mathcal{F}(\cdot), which implies evaluating the Jacobian of F()F(\cdot) (see equation (9)). However, given a retraction RX[]R_{X}[\cdot], σk\sigma_{k} can be approximated using finite differences as follow

σk\displaystyle\sigma_{k} =\displaystyle= (Xk),F(Xk)\displaystyle\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),F(X_{k})\rangle (22)
=\displaystyle= 𝒟(Xk)[F(Xk)]\displaystyle\mathcal{DF}(X_{k})[F(X_{k})]
=\displaystyle= limt0(RXk[tF(Xk)])(Xk)t\displaystyle\lim_{t\to 0}\frac{\mathcal{F}(R_{X_{k}}[tF(X_{k})])-\mathcal{F}(X_{k})}{t}
\displaystyle\approx (RXk[hF(Xk)])(Xk)h,\displaystyle\frac{\mathcal{F}(R_{X_{k}}[hF(X_{k})])-\mathcal{F}(X_{k})}{h},

where 0<h10<h\ll 1 is a small real number. The fact that this approximation does not need the explicit knowledge of the Jacobian operator is useful for large–scale problems.

The following lemma establishes that in most cases τkRBB1\tau_{k}^{RBB1} is positive.

Lemma 1

Let τk+1RBB1\tau_{k+1}^{RBB1} be computed as in (13), then τk+1RBB1>0\tau_{k+1}^{RBB1}>0 when one of the following cases holds

  1. 1.

    𝒯τkZk[F(Xk)],F(Xk+1)<0\langle\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})],F(X_{k+1})\rangle<0;

  2. 2.

    𝒯τkZk[F(Xk)],F(Xk+1)>0\langle\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})],F(X_{k+1})\rangle>0 and F(Xk+1)<𝒯τkZk[F(Xk)]||F(X_{k+1})||<||\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})]||.

Proof

Since τk+1RBB1=s(σk)S^k,S^kS^k,Y^k=1τkS^k2𝒯τkZk[F(Xk)],Y^k\tau_{k+1}^{RBB1}=s(\sigma_{k})\frac{\langle\hat{S}_{k},\hat{S}_{k}\rangle}{\langle\hat{S}_{k},\hat{Y}_{k}\rangle}=\frac{-1}{\tau_{k}}\frac{||\hat{S}_{k}||^{2}}{\langle\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})],\hat{Y}_{k}\rangle} then 𝚜𝚒𝚐𝚗(τk+1RBB1)=𝚜𝚒𝚐𝚗(𝒯τkZk[F(Xk)],Y^k)\verb"sign"(\tau_{k+1}^{RBB1})=\verb"sign"(-\langle\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})],\hat{Y}_{k}\rangle). Suppose that (a) holds, then

𝒯τkZk[F(Xk)],Y^k=𝒯τkZk[F(Xk)],F(Xk+1)𝒯τkZk[F(Xk)]2<0,\langle\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})],\hat{Y}_{k}\rangle=\langle\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})],F(X_{k+1})\rangle-||\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})]||^{2}<0,

which implies that τk+1RBB>0\tau_{k+1}^{RBB}>0.

On the other hand, if (b) holds, from the Cauchy–Schwarz inequality, we find that

0<𝒯τkZk[F(Xk)],F(Xk+1)𝒯τkZk[F(Xk)]F(Xk+1)<𝒯τkZk[F(Xk)]2,0<\langle\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})],F(X_{k+1})\rangle\leq||\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})]||\,||F(X_{k+1})||<||\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})]||^{2},

and so

𝒯τkZk[F(Xk)],Y^k=𝒯τkZk[F(Xk)],F(Xk+1)𝒯τkZk[F(Xk)]2<0,\langle\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})],\hat{Y}_{k}\rangle=\langle\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})],F(X_{k+1})\rangle-||\mathcal{T}_{\tau_{k}Z_{k}}[F(X_{k})]||^{2}<0,

and hence τk+1RBB1>0\tau_{k+1}^{RBB1}>0, which proves the lemma.

The same theoretical result is verified for spectral parameter τk+1RBB2\tau_{k+1}^{RBB2}, and therefore it is also valid for the adaptive parameter τk+1RBB\tau_{k+1}^{RBB}.

Notice that if the transporter 𝒯[]\mathcal{T}[\cdot] is isometric (see Definition 3), then the second condition of item (b) in Lemma 1 is reduced to F(Xk+1)<F(Xk)||F(X_{k+1})||<||F({X_{k}})||. In addition, observe that if we set η=0\eta=0 in Algorithm 1, then we have that (Xk+1)<(Xk)\mathcal{F}(X_{k+1})<\mathcal{F}(X_{k}) for all kk\in\mathbb{N}. Therefore this condition would always be verified under these choices of η\eta and 𝒯[]\mathcal{T}[\cdot].

Lemma 2

Assume that Algorithm 1 does not terminate. Let {Zk}\{Z_{k}\} be an infinite sequence of tangent direction generated by Algorithm 1. Then ZkZ_{k} is a descent direction for the merit function ()\mathcal{F}(\cdot) at XkX_{k}, for all k0k\geq 0.

Proof

From Step 6 in Algorithm 1 we have Zk=s(σk)F(Xk)Z_{k}=-s(\sigma_{k})F(X_{k}), where σk=(Xk),F(Xk)\sigma_{k}=\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),F(X_{k})\rangle. Since Algorithm 1 does not terminate, form Step 3 we obtain |σk|ϵ1F(Xk)2>0|\sigma_{k}|\geq\epsilon_{1}||F(X_{k})||^{2}>0. Now, observe that

(Xk),Zk\displaystyle\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),Z_{k}\rangle =\displaystyle= s(σk)(Xk),F(Xk)\displaystyle-s(\sigma_{k})\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{k}),F(X_{k})\rangle
=\displaystyle= s(σk)σk\displaystyle-s(\sigma_{k})\sigma_{k}
=\displaystyle= σk2|σk|\displaystyle-\frac{\sigma_{k}^{2}}{|\sigma_{k}|}
=\displaystyle= |σk|\displaystyle-|\sigma_{k}|
<\displaystyle< 0.\displaystyle 0.

Therefore, we conclude that ZkZ_{k} is a descent direction for ()\mathcal{F}(\cdot) at XkX_{k}, for all k0k\geq 0.

4 Convergence Analysis

In this section, we analyse the global convergence for our Algorithm 1 under mild assumptions. Our analysis consists on a generalization of the global convergence of line–search methods for unconstrained optimization, presented in zhang2004nonmonotone and an adaptation of Theorem 4.3.1 in absil2009optimization .

The following lemma establishes that Ck{C_{k}} is bounded below by the sequence {𝒳𝓀}\{\mathcal{X_{k}}\}.

Lemma 3

Let {Xk}\{X_{k}\} be an infinite sequence generated by Algorithm 1. Then for the iterates generated by Algorithm 1.

(Xk)Ck,k.\mathcal{F}(X_{k})\leq C_{k},\quad\forall k. (23)
Proof

Firstly, we define ψk:\psi_{k}:\mathbb{R}\to\mathbb{R} by

ψ(α)=αCk1+(Xk)α+1,\psi(\alpha)=\frac{\alpha C_{k-1}+\mathcal{F}(X_{k})}{\alpha+1},

observe that the derivative of ψ(α)\psi(\alpha) is

ψ˙(α)=Ck1(Xk)(α+1)2.\dot{\psi}(\alpha)=\frac{C_{k-1}-\mathcal{F}(X_{k})}{(\alpha+1)^{2}}.

it follows from the non–monotone condition (21) that

(Xk)Ck1ρ1ϵ1τF(Xk)2<Ck1,\mathcal{F}(X_{k})\leq C_{k-1}-\rho_{1}\epsilon_{1}\tau||F(X_{k})||^{2}<C_{k-1}, (24)

which implies that ψ˙(α)0\dot{\psi}(\alpha)\geq 0 for all α0\alpha\geq 0. Hence, the function ψ()\psi(\cdot) is nondecreasing, and (Xk)=ψ(0)ψ(α)\mathcal{F}(X_{k})=\psi(0)\leq\psi(\alpha) for all α0\alpha\geq 0. Then, taking α¯=ηQk1\bar{\alpha}=\eta Q_{k-1} we obtain

(Xk)=ψ(0)ψ(α¯)=Ck,\mathcal{F}(X_{k})=\psi(0)\leq\psi(\bar{\alpha})=C_{k}, (25)

which completes the proof.

In order to prove the global convergence of our proposed algorithm, we need the following asymptotic property.

Lemma 4

Any infinite sequence {Xk}\{X_{k}\} generated by Algorithm 1 verifies the following property

limkτkF(Xk)2=0.\lim_{k\to\infty}\tau_{k}||F(X_{k})||^{2}=0. (26)
Proof

By the construction of Algorithm 1 and using Lemma 2, we have

Ck+1=ηQkCk+(Xk+1)Qk+1<(ηQk+1)CkQk+1=Ck.C_{k+1}=\frac{\eta Q_{k}C_{k}+\mathcal{F}(X_{k+1})}{Q_{k+1}}<\frac{(\eta Q_{k}+1)C_{k}}{Q_{k+1}}=C_{k}. (27)

Hence, {Ck}\{C_{k}\} is monotonically decreasing and bounded below by zero, therefore it converges to some limit C0C^{*}\geq 0. It follows from Step 7 and Step 12 in Algorithm 1 that

k=0ρ1ϵ1τkF(Xk)2Qk+1k=0CkCk+1=C0C<.\sum_{k=0}^{\infty}\frac{\rho_{1}\epsilon_{1}\tau_{k}||F(X_{k})||^{2}}{Q_{k+1}}\leq\sum_{k=0}^{\infty}C_{k}-C_{k+1}=C_{0}-C_{*}<\infty. (28)

Merging this result with the fact that Qk+1=1+ηQk=1+η+η2Qk1==i=0kηi<(1η)1Q_{k+1}=1+\eta Q_{k}=1+\eta+\eta^{2}Q_{k-1}=\cdots=\sum_{i=0}^{k}\eta^{i}<(1-\eta)^{-1}, we have

limkτkF(Xk)2=0,\lim_{k\to\infty}\tau_{k}||F(X_{k})||^{2}=0, (29)

which proves the lemma.

The theorem below establishes a global convergence property concerning Algorithm 1. The proof can be seen as a modification to that of Theorem 3.4 in yao2020riemannian , and to that of Theorem 4.1 in oviedoimplicit .

Theorem 4.1

Algorithm 1 either terminates at a finite iteration jj\in\mathbb{N} where |(Xj),F(Xj)|<ϵF(Xj)2|\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{j}),F(X_{j})\rangle|<\epsilon||F(X_{j})||^{2}, or it generates an infinite sequence {Xk}\{X_{k}\} such that

limkinf F(Xk)=0.\lim_{k\to\infty}\emph{inf }||F(X_{k})||=0.
Proof

Let us assume that Algorithm 1 does not terminate, and let XX_{*} be any accumulation point of the sequence {Xk}\{X_{k}\}. We may assume that limkXk=X\lim_{k\to\infty}X_{k}=X_{*}, taking a subsequence if necessary. By contradiction, suppose that there exists ϵ0>0\epsilon_{0}>0 such that

F(Xk)2>ϵ0,k0.||F(X_{k})||^{2}>\epsilon_{0},\quad\forall k\geq 0. (30)

In view of (30) and Lemma 4 we have

limkinf τk=0.\lim_{k\to\infty}\textrm{inf }\tau_{k}=0. (31)

Firstly, we define the curve Yk(τ)=RXk[τZk]Y_{k}(\tau)=R_{X_{k}}[-\tau Z_{k}] for all kk\in\mathbb{N}, which is smooth due to the differentiability of the retraction RX[]R_{X}[\cdot]. Since the parameter τk>0\tau_{k}>0 is chosen by carrying out a backtracking process, then τk=δmkτkRBB\tau_{k}=\delta^{m_{k}}\tau_{k}^{RBB}, for all kk greater than some k¯\bar{k}, where mkm_{k}\in\mathbb{N} is the smallest positive integer number such that the relaxed nonmonotone condition (21) is fulfilled. Thus, the scalar τ¯=τkδ\bar{\tau}=\frac{\tau_{k}}{\delta} violates the condition (21), i.e., it holds

ρ1ϵ1τ¯F(Xk)2<(Yk(τ¯))Ck(Yk(τ¯))(Xk),kk¯,-\rho_{1}\epsilon_{1}\bar{\tau}||F(X_{k})||^{2}<\mathcal{F}(Y_{k}(\bar{\tau}))-C_{k}\leq\mathcal{F}(Y_{k}(\bar{\tau}))-\mathcal{F}(X_{k}),\quad\forall k\geq\bar{k}, (32)

where the last inequality is obtained using Lemma 3.

Let us set ψk(τ):=Yk(τ)\psi_{k}(\tau):=\mathcal{F}\circ Y_{k}(\tau), then (32) is equivalent to

ψk(τ¯)ψk(0)τ¯0<ρ1ϵ1F(Xk)2,kk¯.-\frac{\psi_{k}(\bar{\tau})-\psi_{k}(0)}{\bar{\tau}-0}<\rho_{1}\epsilon_{1}||F(X_{k})||^{2},\quad\forall k\geq\bar{k}. (33)

It follows from the mean value theorem, that there exists t(0,τ¯)t\in(0,\bar{\tau}) such that ψ˙k(t)<ρ1ϵ1F(Xk)2-\dot{\psi}_{k}(t)<\rho_{1}\epsilon_{1}||F(X_{k})||^{2} for all kk¯k\geq\bar{k}, or equivalently

(Yk(t)),Y˙k(t)<ρ1ϵ1F(Xk)2,kk¯.-\langle\nabla_{\mathcal{M}}\mathcal{F}(Y_{k}(t)),\dot{Y}_{k}(t)\rangle<\rho_{1}\epsilon_{1}||F(X_{k})||^{2},\quad\forall k\geq\bar{k}. (34)

In view of the continuity of functions s()s(\cdot), ()\nabla_{\mathcal{M}}\mathcal{F}(\cdot), F()F(\cdot), Yk()Y_{k}(\cdot), the smoothness and local rigidity condition of the retraction RX[]R_{X}[\cdot], and taking limit in (34), we arrive at

(X),Zρ1ϵ1F(X)2,-\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{*}),Z_{*}\rangle\leq\rho_{1}\epsilon_{1}||F(X_{*})||^{2},

or equivalently,

|σ|ρ1ϵ1F(X)2,|\sigma_{*}|\leq\rho_{1}\epsilon_{1}||F(X_{*})||^{2}, (35)

where |σ|=|(X),F(X)||\sigma_{*}|=|\langle\nabla_{\mathcal{M}}\mathcal{F}(X_{*}),F(X_{*})\rangle|. Since Algorithm 1 does not terminate, form Step 3 we have

|σk|ϵ1F(Xk)2.|\sigma_{k}|\geq\epsilon_{1}||F(X_{k})||^{2}. (36)

Applying limits in (36) we find that |σ|ϵ1F(X)2|\sigma_{*}|\geq\epsilon_{1}||F(X_{*})||^{2}. Merging this last result with (35), we arrive at

F(X)2ρ1F(X)2.||F(X_{*})||^{2}\leq\rho_{1}||F(X_{*})||^{2}.

Since 0<ρ1<10<\rho_{1}<1 then we have F(X)=0||F(X_{*})||=0, this last result contradicts (30), which completes the proof.

By Theorem 4.1, we obtain the following theoretical consequence under compactness assumptions.

Corollary 1

Let {Xk}\{X_{k}\} be an infinite sequence generated by Algorithm 1. Suppose that the level set ={X:(X)(X0)}\mathcal{L}=\{X\in\mathcal{M}:\mathcal{F}(X)\leq\mathcal{F}(X_{0})\} is compact (which holds in particular when the Riemannian manifold \mathcal{M} is compact). Then

limkF(Xk)=0.\lim_{k\to\infty}||F(X_{k})||=0.
Proof

Let {Xk}\{X_{k}\} be an infinite sequence generated by Algorithm 1. It follows from Lemma 3, the strict inequality (27) and the construction of Algorithm 1 that

(Xk+1)Ck+1<Ck<<C1<C0=(X0),\mathcal{F}(X_{k+1})\leq C_{k+1}<C_{k}<\cdots<C_{1}<C_{0}=\mathcal{F}(X_{0}),

which implies that XkX_{k}\in\mathcal{L}, for all k0k\geq 0.

Now, by contradiction let us suppose that there is a subsequence {Xk}k𝒦\{X_{k}\}_{k\in\mathcal{K}} and ϵ0>0\epsilon_{0}>0 such that

F(Xk)ϵ0,||F(X_{k})||\geq\epsilon_{0}, (37)

for any k𝒦k\in\mathcal{K}. Since XkX_{k}\in\mathcal{L}, for all k0k\geq 0 and LL is a compact set, we have that {Xk}k𝒦\{X_{k}\}_{k\in\mathcal{K}} must have an accumulation point XX_{*}\in\mathcal{L}. Taking limit in (37) and using the continuity of F()||F(\cdot)||, we obtain F(X)ϵ0||F(X_{*})||\geq\epsilon_{0}, which contradicts Theorem 4.1.

Remark 3

The main drawback of Algorithm 1 is that it can prematurely terminate with a bad breakdown (|σk|<ϵ1F(Xk)2|\sigma_{k}|<\epsilon_{1}||F(X_{k})||^{2}). One way to remedy this problem is to use Zk=(Xk)Z_{k}=-\nabla\mathcal{F}(X_{k}) as the search direction, if a bad breakdown occurs at kk–th iteration; and then in the next iteration, we can retry using the steps of Algorithm 1. We may even use any tangent direction ZkTXkZ_{k}\in T_{X_{k}}\mathcal{M} such that (Xk),Zk<0\langle\nabla\mathcal{F}(X_{k}),Z_{k}\rangle<0, as long as a bad breakdown happens, in order to overcome this difficulty.

5 Numerical Experiments

In order to give further insight into the RSANE method we present the results of some numerical experiments. We test our algorithm on some randomly generated gradient tangent vector fields on three different Riemannian manifold, involving the unit sphere, the Stiefel manifold and oblique manifold. All experiments have been performed on a intel(R) CORE(TM) i7–4770, CPU 3.40 GHz with 1TB HD and 16GB RAM memory. The algorithm was coded in Matlab with double precision. The running times are always given in CPU seconds. For numerical comparisons, we consider the SANE method proposed in cruz2003nonmonotone , and the recently published Riemannian Derivative–Free Conjugate gradient Polak–Ribiére–Polyak method (CGPR) for the numerical solution of tangent vectors field yao2020riemannian . The Matlab codes of our RSANE, SANE and CGPR are available in: http://www.optimization-online.org/DB_HTML/2020/09/8028.html

6 Implementation details

In our implementation, in addition to monitoring the residual norm F(Xk)F||F(X_{k})||_{F}, we also check the relative changes of the two consecutive iterates and their corresponding residual values

relkX:=Xk+1XkFXkFandrelkF:=|(Xk+1)(Xk)|(Xk)+1.\textrm{rel}_{k}^{X}:=\frac{||X_{k+1}-X_{k}||_{F}}{||X_{k}||_{F}}\quad\textrm{and}\quad\textrm{rel}_{k}^{F}:=\frac{|\mathcal{F}(X_{k+1})-\mathcal{F}(X_{k})|}{\mathcal{F}(X_{k})+1}. (38)

Here MF||M||_{F} denotes the Frobenius norm of the matrix MM. In the case when MM is a vector, this norm is reduced to the standard norm on n\mathbb{R}^{n}. Although the residual Xk+1XkF||X_{k+1}-X_{k}||_{F} is meaningless in the Riemannian context, for our numerical experiments, we will only consider Riemannian manifolds embedded in the Euclidean space n×p\mathbb{R}^{n\times p}, and the residual relkX\textrm{rel}_{k}^{X} is well defined for these types of manifolds.

We let all the algorithms run up to KK iterations and stop them at iteration k<Kk<K if F(Xk)F<ϵ||F(X_{k})||_{F}<\epsilon, or relkX<ϵX\textrm{rel}_{k}^{X}<\epsilon_{X} and relkF<ϵF\textrm{rel}_{k}^{F}<\epsilon_{F}, or

mean([relkmink,T+1X,,relkX])10ϵXandmean([relkmink,T+1F,,relkF])10ϵF.\textrm{mean}([\textrm{rel}_{k-\min{k,T}+1}^{X},\ldots,\textrm{rel}_{k}^{X}])\leq 10\epsilon_{X}\,\,\textrm{and}\,\,\textrm{mean}([\textrm{rel}_{k-\min{k,T}+1}^{F},\ldots,\textrm{rel}_{k}^{F}])\leq 10\epsilon_{F}.

Here, the defaults values of ϵ,ϵX,ϵF\epsilon,\epsilon_{X},\epsilon_{F} and TT are 1e1e-5, 1e1e-15, 1e1e-15 and 55, respectively. In addition, in Algorithm 1 we use η=0.6\eta=0.6, τ=1\tau=1e-3 (the initial step–size τ\tau), τm=1e\tau_{m}=1e-10, τm=1e\tau_{m}=1e+10, δ=0.2\delta=0.2, ϵ1=1e8\epsilon_{1}=1e-8 and ρ1=1e\rho_{1}=1e-4 as defaults values.

6.1 Considered manifolds and their geometric tools

In this subsection, we present three Riemannian manifolds that we will use for the numerical experiments in the remainder subsections, as well as some tools necessary for the algorithms, associated with each manifold, such as vectors transports and retractions.

Firstly we considere the unit sphere given by

Sn1={xn:x2=1}.S^{n-1}=\{x\in\mathbb{R}^{n}:||x||_{2}=1\}. (39)

It is well–known that the tangent space of the unit sphere at xSn1x\in S^{n-1} is given by TxSn1={zn:zx=0}T_{x}S^{n-1}=\{z\in\mathbb{R}^{n}:z^{\top}x=0\}. Let Sn1S^{n-1} be endowed with the inner product inherited from the classical inner product on n\mathbb{R}^{n},

ξx,ηxx:=ξxηX,ξx,ηxTxSn1,xSn1.\langle\xi_{x},\eta_{x}\rangle_{x}:=\xi_{x}^{\top}\eta_{X},\quad\forall\xi_{x},\eta_{x}\in T_{x}S^{n-1},\,x\in S^{n-1}.

Then, Sn1S^{n-1} with this inner product defines an n1n-1 dimensional Riemannian sub–manifold of n\mathbb{R}^{n}. The retraction Rx[]R_{x}[\cdot] on Sn1S^{n-1} is chosen as in absil2009optimization ,

Rx[ξx]=x+ξxx+ξx2,R_{x}[\xi_{x}]=\frac{x+\xi_{x}}{||x+\xi_{x}||_{2}},

for all ξxTxSn1\xi_{x}\in T_{x}S^{n-1} and xSn1x\in S^{n-1}. In addition, for this particular manifold, we consider the vector transport based on orthogonal projection

𝒯ηx[ξx]=ξxRx[ηx]Rx[ηx]ξx.\mathcal{T}_{\eta_{x}}[\xi_{x}]=\xi_{x}-R_{x}[\eta_{x}]R_{x}[\eta_{x}]^{\top}\xi_{x}.

Notice that this vector transport verifies the Ring–Wirth non-expansive condition (16).

The second Riemannian manifold considered in this section is the Stiefel manifold, which is defined as

St(n,p)={Xn×p:XX=Ip},St(n,p)=\{X\in\mathbb{R}^{n\times p}:X^{\top}X=I_{p}\}, (40)

where Ipp×pI_{p}\in\mathbb{R}^{p\times p} denotes the identity matrix of size pp–by–pp. By differentiating both sides of X(t)X(t)=IpX(t)^{\top}X(t)=I_{p}, we obtain the tangent space of St(n,p)St(n,p) at XX, given by TXSt(n,p)={Zn×p:ZX+XZ=0}T_{X}St(n,p)=\{Z\in\mathbb{R}^{n\times p}:Z^{\top}X+X^{\top}Z=0\}. Let St(n,p)St(n,p) be endowed with induced Riemannian metric from n×p\mathbb{R}^{n\times p}, i.e.,

ξX,ηXX:=tr(ξXηX),ξx,ηxTXSt(n,p),XSt(n,p).\langle\xi_{X},\eta_{X}\rangle_{X}:=tr(\xi_{X}^{\top}\eta_{X}),\quad\forall\xi_{x},\eta_{x}\in T_{X}St(n,p),\,X\in St(n,p). (41)

The pair (St(n,p),,)(St(n,p),\langle\cdot,\cdot\rangle), where ,\langle\cdot,\cdot\rangle is the inner product given in (41), forms a Riemannian sub–manifold of the Euclidean space n×p\mathbb{R}^{n\times p}, and its dimension is equal to np12p(p+1)np-\frac{1}{2}p(p+1), absil2009optimization . For our numerical experiments concerning the Stiefel manifold we will use the retractions, introduced in absil2009optimization , given by

RX[ξX]=𝚚𝚏(X+ξX),R_{X}[\xi_{X}]=\verb"qf"(X+\xi_{X}), (42)

and the retraction based on the matrix polar decomposition

RX[ξX]=(X+ξX)((X+ξX)(X+ξX))1/2,R_{X}[\xi_{X}]=(X+\xi_{X})\left((X+\xi_{X})^{\top}(X+\xi_{X})\right)^{-1/2}, (43)

for all ξXTXSt(n,p)\xi_{X}\in T_{X}St(n,p) and XSt(n,p)X\in St(n,p). In equation (42), 𝚚𝚏(W)\verb"qf"(W) denotes the orthogonal factor QQ obtained form the QR–factorization of WW, such that W=QR,W=QR, where QQ belongs St(n,p)St(n,p) and Rp×pR\in\mathbb{R}^{p\times p} is the upper triangular matrix with strictly positive diagonal elements. Additionally, we consider the following vector transport

𝒯ηX[ξX]=ξXRX[ηX]𝚜𝚢𝚖(RX[ηX]ξX),\mathcal{T}_{\eta_{X}}[\xi_{X}]=\xi_{X}-R_{X}[\eta_{X}]\verb"sym"(R_{X}[\eta_{X}]^{\top}\xi_{X}), (44)

where RX[]R_{X}[\cdot] is any of the two retractions defined in (42)–(43), and 𝚜𝚢𝚖(W)=12(W+W)\verb"sym"(W)=\frac{1}{2}(W^{\top}+W) is the function that assigns to the matrix WW its symmetric part. This vector transport is inspired by the orthogonal projection on the tangent space of the Stiefel manifold. Moreover, the function (44) verifies the Ring–Wirth non-expansive condition (16).

Our third example, the oblique manifold, is defied as

𝒪(n,p)={Xn×p:𝚍𝚍𝚒𝚊𝚐(XX)=Ip},\mathcal{OB}(n,p)=\{X\in\mathbb{R}^{n\times p}:\verb"ddiag"(X^{\top}X)=I_{p}\}, (45)

where 𝚍𝚍𝚒𝚊𝚐(W)\verb"ddiag"(W) denotes the matrix WW with all its off–diagonal entries assigned to zero. The tangent space associated to 𝒪(n,p)\mathcal{OB}(n,p) at XX is given by TX𝒪(n,p)={Zn×p:𝚍𝚍𝚒𝚊𝚐(XZ)=0}T_{X}\mathcal{OB}(n,p)=\{Z\in\mathbb{R}^{n\times p}:\verb"ddiag"(X^{\top}Z)=0\}. Again, if we endow OB(n,p)OB(n,p) with the inner product inherited from the standard inner product on n×p\mathbb{R}^{n\times p}, given by (41), then the 𝒪(n,p)\mathcal{OB}(n,p) becomes an embedded Riemannian manifold on n×p\mathbb{R}^{n\times p}. For this particular manifold, we consider the retraction, which appears in absil2006joint , defined by

RX[ξX]=(X+ξX)𝚍𝚍𝚒𝚊𝚐((X+ξX)(X+ξX))1/2,R_{X}[\xi_{X}]=(X+\xi_{X})\verb"ddiag"\left((X+\xi_{X})^{\top}(X+\xi_{X})\right)^{-1/2}, (46)

for all ξXTX𝒪(n,p)\xi_{X}\in T_{X}\mathcal{OB}(n,p) and X𝒪(n,p)X\in\mathcal{OB}(n,p). We will use another vector transport based on the orthogonal projection operator on TX𝒪(n,p)T_{X}\mathcal{OB}(n,p),

𝒯ηX[ξX]=ξXRX[ηX]𝚍𝚍𝚒𝚊𝚐(RX[ηX]ξX),\mathcal{T}_{\eta_{X}}[\xi_{X}]=\xi_{X}-R_{X}[\eta_{X}]\verb"ddiag"(R_{X}[\eta_{X}]^{\top}\xi_{X}), (47)

for all ξX,ηXTX𝒪(n,p)\xi_{X},\eta_{X}\in T_{X}\mathcal{OB}(n,p) and X𝒪(n,p)X\in\mathcal{OB}(n,p).

6.2 Eigenvalues computation on the sphere

For the first test problem, we consider the standard eigenvalue problem

Ax=λx,Ax=\lambda x, (48)

where An×nA\in\mathbb{R}^{n\times n} is a given symmetric matrix. The values of λ\lambda that verify (48) are known as eigenvalues of AA and the corresponding vectors xnx\in\mathbb{R}^{n} are the eigenvectors. A simple way to compute extreme eigenvalues of AA is by minimizing (or maximizing) the associated Rayleigh quotient

r(x)=xAxxx.r(x)=\frac{x^{\top}Ax}{x^{\top}x}. (49)

This function is a continuously differentiable map r:n{0}r:\mathbb{R}^{n}-\{0\}\to\mathbb{R}, whose gradient is given by r(x)=2x22(Axr(x)x)\nabla r(x)=\frac{2}{||x||_{2}^{2}}(Ax-r(x)x). It is clear that any eigenvector xx and its corresponding eigenvalue λ\lambda satisfy that r(x)=λr(x)=\lambda, and thus in that case xx, is a critical point of r()r(\cdot), i.e., r(x)=0\nabla r(x)=0. By noting that r(x)=0\nabla r(x)=0 if, and only if xx is a solution of the following nonlinear system of equations

G(x)Axr(x)x=0,G(x)\equiv Ax-r(x)x=0, (50)

we can address directly this eigenvalue problem by using SANE method. On the other hand, by introducing the constraint x2=1||x||_{2}=1, the nonlinear system G(x)=0G(x)=0 can be cast to the following tangent vector field, F:Sn1𝒯Sn1F:S^{n-1}\to\mathcal{T}S^{n-1} defined on a Riemannian manifold (the unit sphere)

F(x)Ax(xAx)x=0x.F(x)\equiv Ax-(x^{\top}Ax)x=0_{x}. (51)

Note that for all xSn1x\in S^{n-1}, we have xF(x)=0x^{\top}F(x)=0. Therefore F()F(\cdot) effectively maps point on the sphere to its corresponding tangent space, i.e., F()F(\cdot) defines a tangent vector field.

For illustrative purposes, in this subsection we compare the numerical performance of the SANE method solving (50), versus its generalized Riemannian version RSANE solving (51). To do this, we consider 40 instances of symmetric and positive definite sparse matrices taken from the UF sparse Matrix collection davis2011university 111The SuiteSparse Matrix Collection tool–box is available in https://sparse.tamu.edu/, which contains several matrices that arise in real applications. For this experiment, we stop both algorithms when is founded a vector xkx_{k} that satisfies the inequality G(xk)2<TOL||G(x_{k})||_{2}<TOL, where TOL=max{ϵ,ϵG(x0)2}TOL=\max\{\epsilon,\epsilon||G(x_{0})||_{2}\}, with ϵ=2\epsilon=2e-5. Note that for our proposed algorithm F(xk)=G(xk)F(x_{k})=G(x_{k}), for all xkx_{k} generated by RSANE, since our proposal preserves the constraint x2=1||x||_{2}=1, hence this stop criterion is well defined for our algorithm. The starting vector x0nx_{0}\in\mathbb{R}^{n} was generated by x0=v/nx_{0}=v/\sqrt{n}, where v=(1,1,,1)v=(1,1,\ldots,1)^{\top}.

The numerical results associated to this experiments are summarized in Table 1. In this table, we report the number of iteration (Nitr); the CPU–time in seconds (Time); the residual value min{G(x)2x02,G(x)2}\min\left\{\frac{||G(x_{*})||_{2}}{||x_{0}||_{2}},||G(x_{*})||_{2}\right\} (NrmF), where xx_{*} denotes the estimated solution obtained by each algorithm; and the number of function evaluations (Nfe). In the case of SANE this is the number of times that SANE evaluates the G()G(\cdot) map, while for our RSANE, Nfe denotes the number of times that it evaluates the functional F()F(\cdot).

We observe, from Table 1, that the RSANE algorithm is a robust option to solve linear eigenvalue problems. The performance of both SANE and RSANE varied over different matrices. We note that our RSANE is very competitive for large–scale problems. Indeed, the RSANE algorithm outperforms its Euclidean version (SANE) in general terms, since RSANE took on average a total of 3644.1 iterations and 31.167 seconds to solve all the 40 problems, while SANE took 34.079 seconds and 3864.3 iterations to solve all the instances on average. It is worth noting that although RSANE failed on problem bcsstk13, it can become successful on bcsstk16, while both methods take the maximum number of iterations on problems bcsstk27, bcsstk28 and nasa4704.

In Figure 1, we plot the convergence history of SANE and RSANE methods for the particular instances “1138_bus” and “s3rmt3m1”, respectively. From this figure, we note that RSANE can converge faster than SANE, which is illustrated in Figure 1 (b). However the opposite conclusion is obtained from Figure 1(a).

Table 1: Numerical results for eigenvalues computation.
SANE RSANE
Name n Nfe NrmF Nitr Time Nfe NrmF Nitr Time
1138_bus 1138 10621 9.70e-6 2791 0.284 14778 7.21e-6 3781 0.449
af_0_k101 503625 2269 1.95e-5 692 51.875 2074 1.99e-5 644 49.273
af_1_k101 503625 1798 2.00e-5 560 39.915 1702 1.99e-5 451 33.397
af_2_k101 503625 2683 2.00e-5 796 59.179 1421 1.93e-5 451 34.259
af_3_k101 503625 2207 1.93e-5 664 49.175 1757 1.92e-5 541 42.234
af_4_k101 503625 2281 1.98e-5 694 51.156 1758 1.85e-5 545 43.613
af_5_k101 503625 2651 1.76e-5 789 58.730 1975 1.79e-5 607 47.731
af_shell3 504855 305 1.99e-5 119 6.989 328 1.79e-5 125 8.026
af_shell7 504855 452 1.14e-5 165 10.204 363 1.53e-5 133 8.851
apache1 80800 15377 1.88e-5 3992 14.466 34224 1.54e-5 8434 34.905
apache2 715176 58685 1.93e-5 13970 801.886 52821 6.89e-6 12516 785.019
bcsstk10 1086 2450 1.30e-5 737 0.063 1954 1.87e-5 603 0.054
bcsstk11 1473 883 1.76e-5 302 0.034 5421 2.00e-5 1501 0.199
bcsstk13 2003 19254 1.98e-5 3293 1.372 88608 3.08e-4 15000 6.523
bcsstk14 1806 76 7.15e-6 35 0.006 4674 1.94e-5 1341 0.283
bcsstk16 4884 105001 7.49e+0 15000 24.274 1430 1.41e-5 458 0.386
bcsstk27 1224 64534 3.43e-4 15000 2.937 63935 2.11e-4 15000 3.064
bcsstk28 4410 62724 2.32e-4 15000 13.300 63527 1.83e-4 15000 14.293
bcsstk36 23052 54355 2.00e-5 12861 76.108 26406 1.99e-5 6561 38.742
bcsstm12 1473 11960 1.94e-5 3017 0.295 8632 1.99e-5 2268 0.240
bcsstm23 3134 14536 1.98e-5 3670 0.500 14820 2.00e-5 3735 0.652
bcsstm24 3562 8255 2.00e-5 2228 0.312 6285 1.85e-5 1721 0.267
cfd1 70656 2693 2.00e-5 801 6.265 2201 1.83e-5 659 5.238
ex15 6867 197 7.92e-6 76 0.027 9578 1.97e-5 2606 1.334
fv1 9604 371 1.81e-5 136 0.061 474 1.85e-5 170 0.0949
fv2 9801 369 1.54e-5 138 0.066 334 1.51e-5 127 0.0637
fv3 9801 506 1.82e-5 183 0.077 371 1.52e-5 136 0.0664
Kuu 7102 5050 3.28e-6 1479 1.533 2674 4.82e-6 819 0.9281
mhd4800b 4800 1959 1.98e-5 605 0.146 1784 1.96e-5 561 0.1363
msc23052 23052 43296 1.99e-5 10410 62.815 37779 1.99e-5 9217 56.9276
Muu 7102 29 1.78e-5 13 0.006 28 1.70e-5 12 0.0052
nasa4704 4704 63663 8.21e-5 15000 7.418 63440 5.79e-5 15000 8.264
s1rmq4m1 5489 9000 2.00e-5 2411 2.394 11052 1.99e-5 2933 3.2589
s1rmt3m1 5489 23019 1.98e-5 5895 4.566 13688 1.93e-5 3592 3.1067
s2rmq4m1 5489 9839 2.00e-5 2564 2.384 10292 1.96e-5 2683 2.774
s2rmt3m1 5489 17422 2.00e-5 4437 3.504 21669 1.93e-5 5385 4.9953
s3rmq4m1 5489 6063 1.85e-5 1710 1.500 5422 1.20e-5 1517 1.4757
s3rmt3m1 5489 12250 1.76e-5 3295 2.467 6331 1.52e-5 1764 1.3742
s3rmt3m3 5357 13090 1.89e-5 3461 2.569 9911 1.83e-5 2669 2.1663
sts4098 4098 22153 1.89e-5 5583 2.335 17664 1.93e-5 4499 2.0278
Refer to caption
(a) 1138_bus
Refer to caption
(b) s3rmt3m1
Figure 1: Convergence history of SANE and RSANE for the instances “1138_bus” and “s3rmt3m1”. The y–axis is in logarithmic scale.

6.3 A nonlinear eigenvalue problem on the Stiefel manifold

As a second experiment, we investigate the performance our RSANE method applied to deal with the following Stiefel manifold constrained nonlinear eigenvalue problem

H(X)XXΛ\displaystyle H(X)X-X\Lambda =\displaystyle= 0,\displaystyle 0, (52a)
XXIp\displaystyle X^{\top}X-I_{p} =\displaystyle= 0.\displaystyle 0. (52b)

where H(X)=L+μ𝚍𝚍𝚒𝚊𝚐(Lρ(X))H(X)=L+\mu\,\verb"ddiag"(L^{{\dagger}}\rho(X)), where Ln×nL^{{\dagger}}\in\mathbb{R}^{n\times n} is the pseudo–inverse matrix of the discrete Laplacian operator LL, μ>0\mu>0 and ρ(X)=𝚍𝚒𝚊𝚐(XX)\rho(X)=\verb"diag"(XX^{\top}), here 𝚍𝚒𝚊𝚐(M)n\verb"diag"(M)\in\mathbb{R}^{n} denotes the vector containing the diagonal elements of the matrix Mn×nM\in\mathbb{R}^{n\times n}. Observe that Λp×p\Lambda\in\mathbb{R}^{p\times p} can be seen as a block of eigenvalues of the nonlinear matrix H(X)H(X). In fact, in the special case that the operator H(X)H(X) is constant and p=1p=1, this problem is reduced to the linear eigenvalue problem (48). This kind of nonlinear eigenvalue problem appear frequently in total energy minimization in electronic structure calculations cedeno2018projected ; saad2010numerical . Pre–multiplying by XX^{\top} both sides of (52a) and using (52b) we obtain Λ=XH(X)X\Lambda=X^{\top}H(X)X, so substituting this result in (52a), we obtain the following tangent vector field defined on the Stiefel manifold

F(X)H(X)XXXH(X)X=0X,F(X)\equiv H(X)X-XX^{\top}H(X)X=0_{X}, (53)

where F:St(n,p)𝒯St(n,p)F:St(n,p)\to\mathcal{T}St(n,p).

In Table 2 we report the results obtained by running the RSANE and CGPR methods on the problem of finding a zero of (53), with six possible choices for the pair (n,p)(n,p), obtained by varying nn and pp in {100,500,1000}\{100,500,1000\} and {10,50}\{10,50\}, respectively. For comparison purposes, we repeat our experiments over 30 different random generated starting points X0St(n,p)X_{0}\in St(n,p) for each pair of (n,p)(n,p) and report the averaged number of iteration (Nitr), the averaged number of the evaluation of the functional F()F(\cdot) (Nfe), the averaged total computing time in seconds (Time) and the averaged residual (NrmF) given by 130i=130F(xi)F\frac{1}{30}\sum_{i=1}^{30}||F(x_{*}^{i})||_{F}, where xix_{*}^{i} denotes the estimated solution obtained by the respective algorithm for the ii–th starting point, for all i{1,2,,30}i\in\{1,2,\ldots,30\}. In addition, for all the experiments we fix μ=1\mu=1, ϵ=1\epsilon=1e-4 as the tolerance for the termination rule based on residual norm F(Xk)F<ϵ||F(X_{k})||_{F}<\epsilon. In this table, RSANE_polar and RSANE_qr denotes our Algorithm 1 using the retractions defined in (43) and (42), respectively. Similarly, CGPR_polar and CGPR_qr denotes the Riemannian Derivative–Free conjugate gradient Polak–Ribiére–Polyak method developed in yao2020riemannian using the retractions (43) and (42), respectively.

As shown in Table 2, RSANE is superior to the Riemannian conjugate gradient method solving nonlinear eigenvalues problems for different choices of (n,p)(n,p). In particular, we note that for problems with p=50p=50, our proposal basically converges in half of the iterations that the CGPR takes to reach the desired precision.

Table 2: Numerical results for nonlinear eigenvalue problems.
Method Nitr Time NrmF Nfe Nitr Time NrmF Nfe
(n,p)=(100,10)(n,p)=(100,10) (n,p)=(100,50)(n,p)=(100,50)
RSANE-polar 70.0 0.022 7.85e-6 167.2 632.9 0.898 2.10e-4 2170.1
RSANE-qr 67.8 0.016 6.87e-6 161.9 620.6 0.505 4.10e-4 2126.9
CGPR-polar 79.0 0.029 8.30e-5 231.3 1209.4 2.747 1.46e-3 6586.4
CGPR-qr 78.2 0.024 8.05e-5 229.3 996.8 1.296 3.30e-4 5451.3
(n,p)=(500,10)(n,p)=(500,10) (n,p)=(500,50)(n,p)=(500,50)
RSANE-polar 65.0 0.200 8.10e-6 151.6 311.6 2.048 7.79e-6 955.4
RSANE-qr 68.0 0.211 7.43e-6 160.0 311.5 1.928 8.79e-6 954.6
CGPR-polar 74.3 0.284 8.14e-5 209.3 737.5 7.757 8.86e-5 3601.4
CGPR-qr 73.6 0.280 8.17e-5 210.2 767.1 7.583 9.12e-5 3731.1
(n,p)=(1000,10)(n,p)=(1000,10) (n,p)=(1000,50)(n,p)=(1000,50)
RSANE-polar 68.9 0.628 7.55e-6 159.8 316.3 5.272 8.12e-6 968.9
RSANE-qr 66.8 0.604 6.86e-6 153.4 321.3 5.294 8.03e-6 987.6
CGPR-polar 73.5 0.820 7.57e-5 208.5 790.3 21.224 9.36e-5 3880.0
CGPR-qr 77.4 0.860 6.87e-5 219.6 881.2 23.505 9.50e-5 4347.7

6.4 Joint diagonalization on the oblique manifold

In this subsection we analyze the numerical behaviour of our RSANE method solving the nonlinear equation based on the tangent vector field obtained from the Riemannian gradient of the scalar function :𝒪(n,p)\mathcal{F}:\mathcal{OB}(n,p)\to\mathbb{R}, defined by

(X)=i=1N𝚘𝚏𝚏(XCiX)F2,\mathcal{F}(X)=\sum_{i=1}^{N}||\verb"off"(X^{\top}C_{i}X)||_{F}^{2}, (54)

where 𝚘𝚏𝚏(W)=W𝚍𝚍𝚒𝚊𝚐(W)\verb"off"(W)=W-\verb"ddiag"(W) and Cin×nC_{i}\in\mathbb{R}^{n\times n} are given symmetric matrices. The minimization of this function on 𝒪(n,p)\mathcal{OB}(n,p) is frequently used to perform independent component analysis, see absil2006joint . It is well–known that the problem of minimizing (54) is closely related to the problem of finding a zero to the Riemannian gradient of the function ()\mathcal{F}(\cdot), which is given by

𝒪(n,p)(X)=i=1N4CiX𝚘𝚏𝚏(XCiX),\nabla_{\mathcal{OB}(n,p)}\mathcal{F}(X)=\sum_{i=1}^{N}4C_{i}X\verb"off"(X^{\top}C_{i}X), (55)

for details see Section 3 in absil2006joint . In order to study the numerical performance of our RSANE compared with the CGPR method, we consider the following tangent vector field

F(X)i=1N4CiX𝚘𝚏𝚏(XCiX)=0X,F(X)\equiv\sum_{i=1}^{N}4C_{i}X\verb"off"(X^{\top}C_{i}X)=0_{X}, (56)

which is obtained form the Riemannian gradient of ()\mathcal{F}(\cdot).

Now, we present a computational comparison considering the RSANE and CGPR yao2020riemannian methods on the solution of the tangent vector field (56). To do this, we set N=5N=5, and study the numerical behavior of both methods for the pairs of values (n,p)=(500,100)(n,p)=(500,100) y (n,p)=(1000,100)(n,p)=(1000,100). For each pair (n,p)(n,p), we repeat 30 independent runs of RSANE and CGPR, generating the symmetric matrices Cin×nC_{i}\in\mathbb{R}^{n\times n} as follows

Ci=D+Bi+Bi,C_{i}=D+B_{i}+B_{i}^{\top},

where Dn×nD\in\mathbb{R}^{n\times n} was a diagonal matrix, whose diagonal elements are given by dii=n+id_{ii}=\sqrt{n+i}, for all i{1,2,,n}i\in\{1,2,\ldots,n\}; and the Bisn×nB_{i}^{\prime}s\in\mathbb{R}^{n\times n} were randomly generated matrices, whose entries follow a standard normal distribution. This particular structure of the CiC_{i}’s matrices was taken from zhu2017riemannian . The starting point X0𝒪(n,p)X_{0}\in\mathcal{OB}(n,p) was randomly generated using the following Matlab commands

X0=𝚣𝚎𝚛𝚘𝚜(n,p);M=𝚛𝚊𝚗𝚍𝚗(n,p)andX0(:,i)=M(:,i)/𝚗𝚘𝚛𝚖(M(:,i)),X_{0}=\verb"zeros"(n,p);\quad M=\verb"randn"(n,p)\quad\textrm{and}\quad X_{0}(:,i)=M(:,i)/\verb"norm"(M(:,i)),

for all i=1,,pi=1,...,p. For this comparison, we use ϵ=1\epsilon=1e-55 as the tolerance for the stopping rule based on the residual norm F(Xk)F<ϵ||F(X_{k})||_{F}<\epsilon.

The mean of number of iteration, number of evaluation of F()F(\cdot), total computational time in seconds, and the residual “NrmF” defined as in the previous subsection, are reported in Table 3. In addition, in Figure 2 we plot the numerical behavior associated to the average residual curve F(Xk)F||F(X_{k})||_{F} throughout the iterations, for all the methods and for each pairs (n,p)(n,p).

From Table 3 and Figure 2, we can see that both RSANE and CGPR can find an approximation of the solution of problem (56) with the pre–established precision for the residual norm, in the two cases of (n,p)(n,p) considered. In addition, we clearly observe that RSANE performs better than CGPR in terms of the mean value of iterations and CPU–time.

Table 3: Numerical results for joint diagonalization Riemannian gradient vector field.
Method Nitr Time NrmF Nfe
(n,p)=(500,100)(n,p)=(500,100)
RSANE 157.8 3.950 6.85e-6 390.0
CGPR 238.8 7.589 7.84e-6 741.8
(n,p)=(1000,100)(n,p)=(1000,100)
RSANE 68.1 4.894 6.50e-6 142.5
CGPR 153.4 14.342 7.59e-6 416.1
Refer to caption
(a) Iterations vs F(Xk)F||F(X_{k})||_{F}
Refer to caption
(b) Iterations vs F(Xk)F||F(X_{k})||_{F}
Figure 2: Averaged convergence history of RSANE and CGPR, from the same initial point, for the joint diagonalization Riemannian gradient vector field. On the left, (n,p,N)=(500,100,5)(n,p,N)=(500,100,5), and on the right (n,p,N)=(1000,100,5)(n,p,N)=(1000,100,5). The y–axis is in logarithmic scale.

7 Concluding Remarks

In this paper, a Riemannian residual approach for finding a zero of a tangent vector field is proposed. The new approach can be seen as an extended version of the SANE method developed in cruz2003nonmonotone , for the solution of large–scale nonlinear systems of equations. Since the proposed method systematically uses the tangent vector field for building the search direction, RSANE is very easy to implement and has low storage requirements, which is suitable for solving large–scale problems. In addition, our proposal uses a modification of the Riemannian Barzilai–Borwein step–sizes introduced in iannazzo2018riemannian , combined with the Zhang–Hager globalization strategy zhang2004nonmonotone , in order to guarantee the convergence of the associated residual sequence.

The preliminary numerical results show that RSANE performs efficiently, dealing with several tangent vector fields considering both real and simulated data, and different Riemannian manifolds. In particular, RSANE is competitive against its Euclidean version (SANE), solving large–scale eigenvalue problems. Additionally, RSANE outperforms the derivative-free conjugate gradient algorithm recently published in yao2020riemannian , on two considered matrix manifolds.

References

  • [1] P-A Absil and Kyle A Gallivan. Joint diagonalization on the oblique manifold for independent component analysis. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, volume 5, pages V–V. IEEE, 2006.
  • [2] P-A Absil, Robert Mahony, and Rodolphe Sepulchre. Optimization algorithms on matrix manifolds. Princeton University Press, 2009.
  • [3] Roy L Adler, Jean-Pierre Dedieu, Joseph Y Margulies, Marco Martens, and Mike Shub. Newton’s method on riemannian manifolds and a geometric model for the human spine. IMA Journal of Numerical Analysis, 22(3):359–390, 2002.
  • [4] Jonathan Barzilai and Jonathan M Borwein. Two-point step size gradient methods. IMA journal of numerical analysis, 8(1):141–148, 1988.
  • [5] Paul Breiding and Nick Vannieuwenhoven. Convergence analysis of riemannian gauss–newton methods and its connection with the geometric condition number. Applied Mathematics Letters, 78:42–50, 2018.
  • [6] Oscar Susano Dalmau Cedeno and Harry Fernando Oviedo Leon. Projected nonmonotone search methods for optimization with orthogonality constraints. Computational and Applied Mathematics, 37(3):3118–3144, 2018.
  • [7] Timothy A Davis and Yifan Hu. The university of florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS), 38(1):1, 2011.
  • [8] Jean-Pierre Dedieu, Pierre Priouret, and Gregorio Malajovich. Newton’s method on riemannian manifolds: covariant alpha theory. IMA Journal of Numerical Analysis, 23(3):395–419, 2003.
  • [9] Alan Edelman, Tomás A Arias, and Steven T Smith. The geometry of algorithms with orthogonality constraints. SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998.
  • [10] Luigi Grippo, Francesco Lampariello, and Stephano Lucidi. A nonmonotone line search technique for newton’s method. SIAM Journal on Numerical Analysis, 23(4):707–716, 1986.
  • [11] Bruno Iannazzo and Margherita Porcelli. The riemannian barzilai–borwein method with nonmonotone line search and the matrix geometric mean computation. IMA Journal of Numerical Analysis, 38(1):495–517, 2018.
  • [12] Effrosini Kokiopoulou, Jie Chen, and Yousef Saad. Trace optimization and eigenproblems in dimension reduction methods. Numerical Linear Algebra with Applications, 18(3):565–602, 2011.
  • [13] William La-Cruz and Marcos Raydan. Nonmonotone spectral methods for large-scale nonlinear systems. Optimization Methods and Software, 18(5):583–599, 2003.
  • [14] Chong Li and Jinhua Wang. Convergence of the newton method and uniqueness of zeros of vector fields on riemannian manifolds. Science in China Series A: Mathematics, 48(11):1465–1478, 2005.
  • [15] Jonathan H Manton. Optimization algorithms exploiting unitary constraints. IEEE Transactions on Signal Processing, 50(3):635–650, 2002.
  • [16] Richard M Martin. Electronic structure: basic theory and practical methods. Cambridge university press, 2020.
  • [17] Erkki Oja. Neural networks, principal components, and subspaces. International journal of neural systems, 1(01):61–68, 1989.
  • [18] Harry Oviedo. Implicit steepest descent algorithm for optimization with orthogonality constraints.
  • [19] Harry Oviedo and Oscar Dalmau. A scaled gradient projection method for minimization over the stiefel manifold. In Mexican International Conference on Artificial Intelligence, pages 239–250. Springer, 2019.
  • [20] Harry Oviedo, Oscar Dalmau, and Hugo Lara. Two adaptive scaled gradient projection methods for stiefel manifold constrained optimization. Numerical Algorithms, pages 1–21, 2020.
  • [21] Harry Oviedo, Hugo Lara, and Oscar Dalmau. A non-monotone linear search algorithm with mixed direction on stiefel manifold. Optimization Methods and Software, 34(2):437–457, 2019.
  • [22] Marcos Raydan. On the barzilai and borwein choice of steplength for the gradient method. IMA Journal of Numerical Analysis, 13(3):321–326, 1993.
  • [23] Marcos Raydan. The barzilai and borwein gradient method for the large scale unconstrained minimization problem. SIAM Journal on Optimization, 7(1):26–33, 1997.
  • [24] Wolfgang Ring and Benedikt Wirth. Optimization methods on riemannian manifolds and their application to shape space. SIAM Journal on Optimization, 22(2):596–627, 2012.
  • [25] Yousef Saad, James R Chelikowsky, and Suzanne M Shontz. Numerical methods for electronic structure calculations of materials. SIAM review, 52(1):3–54, 2010.
  • [26] Hiroyuki Sato. Riemannian newton’s method for joint diagonalization on the stiefel manifold with application to ica. arXiv preprint arXiv:1403.8064, 2014.
  • [27] Hiroyuki Sato and Toshihiro Iwai. A new, globally convergent riemannian conjugate gradient method. Optimization, 64(4):1011–1031, 2015.
  • [28] Pavan Turaga, Ashok Veeraraghavan, and Rama Chellappa. Statistical analysis on stiefel and grassmann manifolds with applications in computer vision. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8. IEEE, 2008.
  • [29] Zaiwen Wen, Chao Yang, Xin Liu, and Yin Zhang. Trace-penalty minimization for large-scale eigenspace computation. Journal of Scientific Computing, 66(3):1175–1203, 2016.
  • [30] Teng-Teng Yao, Zhi Zhao, Zheng-Jian Bai, and Xiao-Qing Jin. A riemannian derivative-free polak–ribiére–polyak method for tangent vector field. Numerical Algorithms, pages 1–31, 2020.
  • [31] Hongchao Zhang and William W Hager. A nonmonotone line search technique and its application to unconstrained optimization. SIAM journal on Optimization, 14(4):1043–1056, 2004.
  • [32] Teng Zhang and Yi Yang. Robust pca by manifold optimization. The Journal of Machine Learning Research, 19(1):3101–3139, 2018.
  • [33] Xin Zhang, Jinwei Zhu, Zaiwen Wen, and Aihui Zhou. Gradient type optimization methods for electronic structure calculations. SIAM Journal on Scientific Computing, 36(3):C265–C289, 2014.
  • [34] Xiaojing Zhu. A riemannian conjugate gradient method for optimization on the stiefel manifold. Computational optimization and Applications, 67(1):73–110, 2017.