This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Theoretical and Computable Optimal Subspace Expansions for Matrix Eigenvalue Problemsthanks: Supported by the National Natural Science Foundation of China (Nos. 11771249 and 12171273).

Zhongxiao Jia Department of Mathematical Sciences, Tsinghua University, Beijing 100084, People’s Republic of China, jiazx@tsinghua.edu.cn.
Abstract

Consider the optimal subspace expansion problem for the matrix eigenvalue problem Ax=λxAx=\lambda x: Which vector ww in the current subspace 𝒱\mathcal{V}, after multiplied by AA, provides an optimal subspace expansion for approximating a desired eigenvector xx in the sense that xx has the smallest angle with the expanded subspace 𝒱w=𝒱+span{Aw}\mathcal{V}_{w}=\mathcal{V}+{\rm span}\{Aw\}, i.e., wopt=argmaxw𝒱cos(𝒱w,x)w_{opt}=\arg\max_{w\in\mathcal{V}}\cos\angle(\mathcal{V}_{w},x)? This problem is important as many iterative methods construct nested subspaces that successively expand 𝒱\mathcal{V} to 𝒱w\mathcal{V}_{w}. An expression of woptw_{opt} by Ye (Linear Algebra Appl., 428 (2008), pp. 911–918) for AA general, but it could not be exploited to construct a computable (nearly) optimally expanded subspace. He turns to deriving a maximization characterization of cos(𝒱w,x)\cos\angle(\mathcal{V}_{w},x) for a given w𝒱w\in\mathcal{V} when AA is Hermitian. We generalize Ye’s maximization characterization to the general case and find its maximizer. Our main contributions consist of explicit expressions of woptw_{opt}, (IPV)Awopt(I-P_{V})Aw_{opt} and the optimally expanded subspace 𝒱wopt\mathcal{V}_{w_{opt}} for AA general, where PVP_{V} is the orthogonal projector onto 𝒱\mathcal{V}. These results are fully exploited to obtain computable optimally expanded subspaces within the framework of the standard, harmonic, refined, and refined harmonic Rayleigh–Ritz methods. We show how to efficiently implement the proposed subspace expansion approaches. Numerical experiments demonstrate the effectiveness of our computable optimal expansions.

keywords:
Eigenvalue problem, non-Krylov subspace, optimal expansion vector, optimal subspace expansion, Ritz vector, refined Ritz vector, harmonic Ritz vector, refined harmonic Ritz vector, computable optimally expanded subspace
AMS:
65F15, 15A18, 65F10.

1 Introduction

Consider the large scale matrix eigenproblem Ax=λxAx=\lambda x with A𝒞n×nA\in\mathcal{C}^{n\times n} and x=1\|x\|=1, where \|\cdot\| is the 2-norm of a vector or matrix. We are interested in an exterior eigenvalue λ\lambda and the associated eigenvector xx. A number of numerical methods have been available for solving this kind of problem [1, 14, 15, 21, 19]. Of them, Lanczos or Arnoldi type methods [1, 15, 19] are typical, and they are projection methods on the sequence of nested kk-dimensional Krylov subspaces

𝒱k=𝒦k(A,v1)=span{v1,Av1,,Ak1v1}\mathcal{V}_{k}=\mathcal{K}_{k}(A,v_{1})={\rm span}\{v_{1},Av_{1},\ldots,A^{k-1}v_{1}\}

and compute approximations to λ\lambda and xx. For AA Hermitian, Arnoldi type methods reduce to Lanczos type methods [1, 14]. Suppose that Vk=(v1,,vk)V_{k}=(v_{1},\ldots,v_{k}) is column orthonormal and is generated by the Lanczos process in the Hermitian case or the Arnoldi process in the non-Hermitian case, and take the expansion vector wk=vk𝒱kw_{k}=v_{k}\in\mathcal{V}_{k}. Then the next basis vector vk+1v_{k+1} is obtained by orthonormalizing AwkAw_{k} against VkV_{k}, the projection matrix VkHAVkV_{k}^{H}AV_{k} of AA onto 𝒱k\mathcal{V}_{k} is Hermitian tridiagonal for AA Hermitian or upper Hessenberg for AA non-Hermitian, and the columns of (Vk,vk+1)(V_{k},v_{k+1}) form an orthonormal basis of the (k+1)(k+1)-dimensional Krylov subspace 𝒱k+1\mathcal{V}_{k+1}, where the superscript HH denotes the conjugate transpose of a vector or matrix. If the expansion vector vkv_{k} in AvkAv_{k} is replaced by any wk𝒱kw_{k}\in\mathcal{V}_{k} that is not deficient in vkv_{k}, we deduce from Proposition 1 of Ye [25] that the same 𝒱k+1\mathcal{V}_{k+1} and Vk+1V_{k+1} are generated.

However, if we start with a non-Krylov subspace 𝒱k\mathcal{V}_{k}, then the expanded subspace 𝒱k+1\mathcal{V}_{k+1} critically depends on the choice of expansion vector wk𝒱kw_{k}\in\mathcal{V}_{k}. It is well known that the success of a general projection method requires that an underlying subspace contain sufficiently accurate approximations to xx but a sufficiently accurate subspace cannot guarantee the convergence of the Ritz and harmonic Ritz vectors obtained by the standard and harmonic Rayleigh–Ritz methods with respect to the subspace [4, 10]. To fix this deficiency, the refined and refined harmonic Rayleigh–Ritz methods have been proposed, which are shown to ensure the convergence of refined and refined harmonic Ritz vectors computed by the refined and refined harmonic Rayleigh–Ritz methods when the subspace is good enough [5, 7, 10, 19, 23].

The accuracy of a projection subspace 𝒱\mathcal{V} for eigenvector approximation can be measured by its angle with the desired unit length eigenvector xx [14, 15, 19]:

(1) cos(𝒱,x)=maxz𝒱cos(z,x)=maxz𝒱xHzz,\cos\angle(\mathcal{V},x)=\max_{z\in\mathcal{V}}\cos\angle(z,x)=\max_{z\in\mathcal{V}}\frac{\mid x^{H}z\mid}{\|z\|},

where (z,x)\angle(z,x) denotes the acute angle between xx and the nonzero vector zz. For a general kk-dimensional non-Krylov subspace 𝒱\mathcal{V} with the subscript kk dropped, in this paper we will consider the following optimal subspace expansion problem: Suppose cos(𝒱,x)0\cos\angle(\mathcal{V},x)\not=0, for any nonzero w𝒱w\in\mathcal{V}, multiply it by AA, and define the (k+1)(k+1)-dimensional expanded subspace

(2) 𝒱w=𝒱+span{Aw}.\mathcal{V}_{w}=\mathcal{V}+{\rm span}\{Aw\}.

Then which vector wopt𝒱w_{opt}\in\mathcal{V} is an optimal expansion vector that, up to scaling, solves

(3) maxw𝒱cos(𝒱w,x)?\max_{w\in\mathcal{V}}\cos\angle(\mathcal{V}_{w},x)?

This problem was first considered by Ye [25] and later paid attention by Wu and Zhang [24]. The choice of w𝒱w\in\mathcal{V} is essential to subspace expansion, and different ww may affect the quality of 𝒱w\mathcal{V}_{w} substantially. At expansion iteration k>1k>1, the Lanczos or Arnoldi type expansion takes w=vkw=v_{k}, the last column of VkV_{k}. The Ritz expansion method [24, 25], a variant of the Arnoldi method, uses the approximating Ritz vector as ww to expand the subspace. It is mathematically equivalent to the Residual Arnoldi (RA) method proposed in [11, 12].

The Shift-Invert Residual Arnoldi (SIRA) method is an alternative of the RA method for computing the eigenvalue λ\lambda closest to a given target τ\tau. More variants have been developed in [8, 9], and they fall into the categories of the harmonic Rayleigh–Ritz, refined Rayleigh–Ritz, and refined harmonic Rayleigh–Ritz methods [1, 14, 15, 19, 21]. The SIRA type methods are mathematically equivalent to the counterparts of the Jacobi–Davidson (JD) type methods; see Theorem 2.1 of [8]. Just as the shift-invert Arnoldi (SIA) type methods applied to B=(AτI)1B=(A-\tau I)^{-1}, the SIRA type and JD type methods [18] both use BB to construct nested subspaces but, unlike the SIA type methods, they project the original AA rather than BB onto the subspaces when finding approximations of (λ,x)(\lambda,x) [8]. At iteration kk, these methods and the SIA type methods are mathematically common in solving a linear system

(4) (AτI)u=w,(A-\tau I)u=w,

i.e., computing u=Bwu=Bw, where w=vkw=v_{k} in the SIA type methods and is mathematically equal to the current approximate eigenvector in SIRA and JD type methods; see [8, Theorem 2.1 and its proof] for details.

Ye [25] finds an explicit expression of woptw_{opt} for a general AA, as stated below.

Ye’s result [25, Theorem 1]: Let the columns of VV form an orthonormal basis of 𝒱\mathcal{V}, define the residual

(5) R=AVV(VHAV),R=AV-V(V^{H}AV),

and assume that the rank rank(R)2\mathop{\operator@font rank}\nolimits(R)\geq 2. Then

(6) wopt=VRx+Vcw_{opt}=VR^{\dagger}x+Vc

for any c𝒩(R)c\in\mathcal{N}(R), the null space of RR, where the superscript \dagger denotes the Moore-Penrose generalized inverse of a matrix.

Unfortunately, Ye shows that (6) cannot be exploited to obtain a computable (nearly) optimal replacement of woptw_{opt} as it involves the a priori xx: (i) VcVc makes no contribution to the expansion of 𝒱\mathcal{V} as AVc=0AVc=0; (ii) since RHV=0R^{H}V=0, we must have Rz=0R^{\dagger}z=0 when replacing xx by any z𝒱z\in\mathcal{V} and taking c=0c=0, leading to w=VRz=0w=VR^{\dagger}z=0. To this end, he gives up (6) and instead turns to deriving a maximization characterization of cos(𝒱w,x)\cos\angle(\mathcal{V}_{w},x) for a given w𝒱w\in\mathcal{V} under the assumption that AA is Hermitian; see Theorem 2 of [25] and the first result of Theorem 2.1 to be presented in this paper. But he does not find a solution to the maximization characterization problem for a given w𝒱w\in\mathcal{V}. Notice that one cannot obtain woptw_{opt} from the maximization characterization. Without any other information, e.g., woptw_{opt}, Ye has made an approximate analysis on the maximization characterization of cos(𝒱w,x)\cos\angle(\mathcal{V}_{w},x) and argued that the Ritz vector from 𝒱\mathcal{V} might be a good approximate maximizer of (3) and thus might be a nearly optimal expansion vector. Unfortunately, as we shall see, Ye’s analysis and arguments have evident defects; see a detailed explanation after Remark 3 in this paper. We stress that Ye’s proof of the maximization characterization of cos(𝒱w,x)\cos\angle(\mathcal{V}_{w},x) holds only for AA Hermitian.

Wu and Zhang [24] do not consider the optimal expansion problem (3) itself. Instead, they focus on showing that the refined Ritz vector from 𝒱\mathcal{V} can be a better expansion vector than the Ritz vector from 𝒱\mathcal{V} for AA general. Based on the analysis, they have developed a refined variant of the RA method for a general AA, which has been numerically confirmed to be more efficient than the RA method.

In this paper, we shall revisit problem (3) for AA general. Our theoretical contributions consists of two parts. The first part includes the generalization of Ye’s major result in [25], i.e., Theorem 2, to the non-Hermitian case. We first prove that cos(𝒱w,x)\cos\angle(\mathcal{V}_{w},x) for a given w𝒱w\in\mathcal{V} can be formulated as a maximization characterization problem, which extends Theorem 2 of [25] to the general case. Then we find its maximizer. This result is new even for AA Hermitian and can be exploited to explain the defect of Ye’s approximate analysis. These results are secondary, and our major theoretical contributions are in the second part. We establish informative expressions on woptw_{opt}, (IPV)Awopt(I-P_{V})Aw_{opt} and the optimally expanded 𝒱wopt\mathcal{V}_{w_{opt}}, where PV=VVHP_{V}=VV^{H} is the orthogonal projector onto 𝒱\mathcal{V}. The results show that (i) generally woptPVxw_{opt}\not=P_{V}x where PVxP_{V}x is the orthogonal projection of xx onto 𝒱\mathcal{V} and PVx/PVxP_{V}x/\|P_{V}x\| is the best approximation to xx from 𝒱\mathcal{V}, (ii) (IPV)Awopt=RRx(I-P_{V})Aw_{opt}=RR^{\dagger}x, which is the orthogonal projection of xx onto the subspace span{R}{\rm span}\{R\} and whose normalization RRx/RRxRR^{\dagger}x/\|RR^{\dagger}x\| is the best approximation to xx from span{R}{\rm span}\{R\}, (iii) 𝒱wopt=𝒱span{RRx}\mathcal{V}_{w_{opt}}=\mathcal{V}\oplus{\rm span}\{RR^{\dagger}x\} with \oplus the orthogonal direct sum, and (iv) the orthogonal projection of xx onto 𝒱wopt\mathcal{V}_{w_{opt}} is (PV+RR)x(P_{V}+RR^{\dagger})x.

For the purpose of practical computations, we make a clear and rigorous analysis on the theoretical results and obtain a number of computable optimally expanded subspaces 𝒱w~opt\mathcal{V}_{\widetilde{w}_{opt}}, which depend on chosen projection methods. As has already seen from (6) and the comments followed, it is hard to interpret woptw_{opt}, let alone a computable optimal replacement of woptw_{opt}. Fortunately, we are able to take a completely new approach to consider a computable optimal subspace expansion. Our key observation is: when expanding 𝒱\mathcal{V} to a computable optimal subspace, it follows from the fundamental identity

(7) 𝒱wopt=𝒱+span{Awopt}=𝒱span{(IPV)Awopt}\mathcal{V}_{w_{opt}}=\mathcal{V}+{\rm span}\{Aw_{opt}\}=\mathcal{V}\oplus{\rm span}\{(I-P_{V})Aw_{opt}\}

that seeking a computable optimal replacement of woptw_{opt} is unnecessary. This identity provides us a new perspective and motivates us to find out a computable optimal replacement of (IPV)Awopt(I-P_{V})Aw_{opt} as a whole rather than woptw_{opt} itself by a certain computable optimal one. As it will be clear, such an optimal replacement can be defined precisely and is deterministic within the framework of each of the standard, harmonic, refined, and refined harmonic Rayleigh–Ritz methods, respectively. As it turns out, computable optimal replacements are the Ritz vector, refined Ritz vector, harmonic Ritz vector and refined harmonic Ritz vector of AA from the subspace span{R}{\rm span}\{R\} rather than 𝒱\mathcal{V} for these four kinds of projection methods. Taking the standard Rayleigh–Ritz method as an example, we will describe how to expand 𝒱\mathcal{V} to the computable optimal subspace 𝒱w~opt\mathcal{V}_{\widetilde{w}_{opt}}. We will also show that, based on our results, there is some other novel optimal expansion that is independent of the desired xx in practical computations and is promising and more robust.

The paper is organized as follows. In Section 2, we consider the solution of the optimal subspace expansion problem (3) for a general AA, present our theoretical results, and make an analysis on them. In Section 3, we show how to obtain computable optimal replacements of (IPV)Awopt(I-P_{V})Aw_{opt} and construct optimally expanded subspaces 𝒱w~opt\mathcal{V}_{\widetilde{w}_{opt}}. We also present some other robust optimal expansion approach. In Section 4, we report numerical experiments to demonstrate the effectiveness of theoretical and computable optimal subspace expansion approaches, and compare them with the Lanczos or Arnoldi type expansion with w=vkw=v_{k} and the the RA method, i.e., the Ritz expansion [24, 25] with ww being the Ritz vector from 𝒱\mathcal{V}. In Section 5, we conclude the paper with some problems and issues that deserve future considerations.

Throughout the paper, denote by II the identity matrix with the order clear from the context, by 𝒞k\mathcal{C}^{k} the complex space of dimension kk, and by 𝒞n×k\mathcal{C}^{n\times k} or n×k\mathcal{R}^{n\times k} the set of n×kn\times k complex or real matrices.

2 The optimal woptw_{opt}, (IPV)Awopt(I-P_{V})Aw_{opt} and 𝒱wopt\mathcal{V}_{w_{opt}}

For a general AA, we first establish two results on cos(𝒱w,x)\cos\angle(\mathcal{V}_{w},x) for a given ww. The first characterizes it as a maximization problem and generalizes Ye’s Theorem 2 in [25] and the second gives a maximizer of this problem. We remind the reader that these results are secondary. After them, we will present our major results.

Theorem 2.1.

For w𝒱w\in\mathcal{V} with xHw0x^{H}w\neq 0 and x=1\|x\|=1, assume that x𝒱x\not\in\mathcal{V} and Aw𝒱Aw\not\in\mathcal{V}, define rw=(AϕI)wr_{w}=(A-\phi I)w with ϕ=xHAwxHw\phi=\frac{x^{H}Aw}{x^{H}w}, and let the columns of QwQ_{w} be an orthonormal basis of the orthogonal complement of span{rw}{\rm span}\{r_{w}\} with respect to 𝒞n\mathcal{C}^{n}. Then

(8) cos(𝒱w,x)=maxb𝒱cos(b,x)sin(b,rw)=cos(bw,x)sin(bw,rw),\cos\angle(\mathcal{V}_{w},x)=\max\limits_{b\in\mathcal{V}}\frac{\cos\angle(b,x)}{\sin\angle\left(b,r_{w}\right)}=\frac{\cos\angle(b_{w},x)}{\sin\angle(b_{w},r_{w})},

where

(9) bw=V(QwQwHV)x=V(VHQwQwHV)1VHx.b_{w}=V(Q_{w}Q_{w}^{H}V)^{\dagger}x=V(V^{H}Q_{w}Q_{w}^{H}V)^{-1}V^{H}x.
{proof}

From the definition of rwr_{w}, we have

𝒱+span{Aw}=𝒱+span{rw}.\mathcal{V}+{\rm span}\{Aw\}=\mathcal{V}+{\rm span}\{r_{w}\}.

By assumptions, we have rw0r_{w}\not=0. Notice that any nonzero a𝒱wa\in\mathcal{V}_{w} but a𝒱a\not\in\mathcal{V} can be uniquely written as

a=b+βrw,a=b+\beta r_{w},

where b𝒱b\in\mathcal{V} and β\beta is a nonzero scalar. Since Aw=rw+ϕwAw=r_{w}+\phi w and xHrw=0x^{H}r_{w}=0, we obtain

(10) cos(𝒱w,x)\displaystyle\cos\angle(\mathcal{V}_{w},x) =\displaystyle= maxa𝒱w|xHa|a=maxb𝒱,b+βrw0|xH(b+βrw)|b+βrw\displaystyle\max\limits_{a\in\mathcal{V}_{w}}\frac{|x^{H}a|}{\|a\|}=\max\limits_{b\in\mathcal{V},b+\beta r_{w}\neq 0}\frac{|x^{H}(b+\beta r_{w})|}{\|b+\beta r_{w}\|}
=\displaystyle= maxb𝒱,b+βrw0|xHb|b+βrw\displaystyle\max\limits_{b\in\mathcal{V},b+\beta r_{w}\neq 0}\frac{|x^{H}b|}{\|b+\beta r_{w}\|}
=\displaystyle= maxb𝒱maxβ0|xHb|b+βrw=maxb𝒱|xHb|minβ0b+βrw\displaystyle\max\limits_{b\in\mathcal{V}}\max\limits_{\beta\not=0}\frac{\left|x^{H}b\right|}{\|b+\beta r_{w}\|}=\max\limits_{b\in\mathcal{V}}\frac{\left|x^{H}b\right|}{\min\limits_{\beta\not=0}\|b+\beta r_{w}\|}
=\displaystyle= maxb𝒱|xHb|brwHbrw2rw=maxb𝒱|xHb|bsin(b,rw)\displaystyle\max\limits_{b\in\mathcal{V}}\frac{\left|x^{H}b\right|}{\left\|b-\frac{r_{w}^{H}b}{\|r_{w}\|^{2}}r_{w}\right\|}=\max\limits_{b\in\mathcal{V}}\frac{\left|x^{H}b\right|}{\|b\|\sin\angle(b,r_{w})}
=\displaystyle= maxb𝒱cos(b,x)sin(b,rw),\displaystyle\max\limits_{b\in\mathcal{V}}\frac{\cos\angle(b,x)}{\sin\angle(b,r_{w})},

which establishes the maximization characterization in (8).

We now seek a maximizer of the maximization problem in (8). Since xHrw=0x^{H}r_{w}=0, by the definition of QwQ_{w}, there exists a vector zw𝒞n1z_{w}\in\mathcal{C}^{n-1} such that the unit length eigenvector x=Qwzwx=Q_{w}z_{w} with zw=1\|z_{w}\|=1. As a result, we obtain QwHx=zwQ_{w}^{H}x=z_{w}, QwQwHx=Qwzw=xQ_{w}Q_{w}^{H}x=Q_{w}z_{w}=x, and

cos(b,x)=|xHb|b=|(Qwzw)Hb|b=|zwH(QwHb)|b.\cos\angle(b,x)=\frac{|x^{H}b|}{\|b\|}=\frac{|(Q_{w}z_{w})^{H}b|}{\|b\|}=\frac{|z_{w}^{H}(Q_{w}^{H}b)|}{\|b\|}.

Since QwHbb=sin(b,rw)\frac{\|Q_{w}^{H}b\|}{\|b\|}=\sin\angle(b,r_{w}), it follows from the above that

(11) cos(b,x)\displaystyle\cos\angle(b,x) =\displaystyle= QwHbb|zwH(QwHb)|QwHb=sin(b,rw)cos(QwHb,zw).\displaystyle\frac{\|Q_{w}^{H}b\|}{\|b\|}\frac{|z_{w}^{H}(Q_{w}^{H}b)|}{\|Q_{w}^{H}b\|}=\sin\angle(b,r_{w})\cos\angle(Q_{w}^{H}b,z_{w}).

Therefore, from (10), (11), the orthonormality of QwQ_{w} and x=Qwzwx=Q_{w}z_{w}, writing a nonzero b=Vyb=Vy, we obtain

cos(𝒱w,x)\displaystyle\cos\angle(\mathcal{V}_{w},x) =\displaystyle= maxb𝒱,b0cos(QwHb,zw)\displaystyle\max\limits_{b\in\mathcal{V},b\not=0}\cos\angle(Q_{w}^{H}b,z_{w})
=\displaystyle= maxb𝒱,b0cos(QwQwHb,Qwzw)\displaystyle\max\limits_{b\in\mathcal{V},b\not=0}\cos\angle(Q_{w}Q_{w}^{H}b,Q_{w}z_{w})
=\displaystyle= maxy0cos(QwQwHVy,x)\displaystyle\max\limits_{y\not=0}\cos\angle(Q_{w}Q_{w}^{H}Vy,x)
=\displaystyle= cos((QwQwHV)(QwQwHV)x,x)\displaystyle\cos\angle((Q_{w}Q_{w}^{H}V)(Q_{w}Q_{w}^{H}V)^{\dagger}x,x)

since (QwQwHV)(QwQwHV)x(Q_{w}Q_{w}^{H}V)(Q_{w}Q_{w}^{H}V)^{\dagger}x is the orthogonal projection of xx onto the subspace span{QwQwHV}{\rm span}\{Q_{w}Q_{w}^{H}V\}. As a result, yw=(QwQwHV)xy_{w}=(Q_{w}Q_{w}^{H}V)^{\dagger}x solves maxy𝒞m,y0cos(QwQwHVy,x)\max\limits_{y\in\mathcal{C}^{m},y\neq 0}\cos\angle(Q_{w}Q_{w}^{H}Vy,x), and bw=Vywb_{w}=Vy_{w} is the first relation defined in (9), which proves the second relation in the right-hand side of (8).

We next prove that QwQwHVQ_{w}Q_{w}^{H}V has full column rank. This amounts to showing that the cross-product matrix VHQwQwHVV^{H}Q_{w}Q_{w}^{H}V is positive definite by noting that (QwQwH)2=QwQwH(Q_{w}Q_{w}^{H})^{2}=Q_{w}Q_{w}^{H}. To this end, it suffices to prove that the solution of the homogenous linear system QwQwHVz=0Q_{w}Q_{w}^{H}Vz=0 is zero. Since QwQwH=IrwrwHrw2Q_{w}Q_{w}^{H}=I-\frac{r_{w}r_{w}^{H}}{\|r_{w}\|^{2}}, the system becomes

(IrwrwHrw2)Vz=0,\left(I-\frac{r_{w}r_{w}^{H}}{\|r_{w}\|^{2}}\right)Vz=0,

which yields

(12) VHrwrwHVrw2z=z.\frac{V^{H}r_{w}r_{w}^{H}V}{\|r_{w}\|^{2}}z=z.

Since

VHrwrwHVrw2=VHrw2rw2=cos2(𝒱,rw),\left\|\frac{V^{H}r_{w}r_{w}^{H}V}{\|r_{w}\|^{2}}\right\|=\frac{\|V^{H}r_{w}\|^{2}}{\|r_{w}\|^{2}}=\cos^{2}\angle(\mathcal{V},r_{w}),

taking norms in the two sides of (12) gives

zcos2(𝒱,rw)z,\|z\|\cos^{2}\angle(\mathcal{V},r_{w})\geq\|z\|,

which holds only if z=0z=0 or cos(𝒱,rw)=1\cos\angle(\mathcal{V},r_{w})=1. The latter means that rw=Awϕw𝒱r_{w}=Aw-\phi w\in\mathcal{V}, i.e, Aw𝒱Aw\in\mathcal{V}, a contradiction to our assumption. Hence we must have z=0z=0, and QwQwHVQ_{w}Q_{w}^{H}V has full column rank. Exploiting QwQwHx=xQ_{w}Q_{w}^{H}x=x, we obtain

yw=(QwQwHV)x=(VHQwQwHV)1VHx,y_{w}=(Q_{w}Q_{w}^{H}V)^{\dagger}x=(V^{H}Q_{w}Q_{w}^{H}V)^{-1}V^{H}x,

which proves the second relation in (9).

Remark 1.

This theorem holds for a general AA. For AA Hermitian, we have ϕ=λ\phi=\lambda, but for AA non-Hermitian or, more rigorously, non-normal, we have ϕλ\phi\not=\lambda. In the Hermitian case, Theorem 2 in [25] is the first relation in the right-hand side of (8). The second relation in the right-hand side of (8) is new even for AA Hermitian, and gives an explicit expression of the maximizer bwb_{w} of the maximization characterization problem in (8).

Remark 2.

From (3), relation (8) shows that woptw_{opt} solves

(13) maxw𝒱cos(𝒱w,x)=maxw𝒱cos(bw,x)sin(bw,rw).\max_{w\in\mathcal{V}}\cos\angle(\mathcal{V}_{w},x)=\max_{w\in\mathcal{V}}\frac{\cos\angle(b_{w},x)}{\sin\angle(b_{w},r_{w})}.

However, because of the complicated nonlinear relationship between bwb_{w} and rwr_{w}, it appears hard to solve the above problem to derive an explicit expression of woptw_{opt}.

Remark 3.

By approximately maximizing the first relation in the right-hand side of (8) and taking bb in it as the Ritz vector z1z_{1} of AA from 𝒱\mathcal{V} that is used to approximate the desired xx, Ye [25] makes an approximate analysis on cos(z1,x)sin(z1,rw)\frac{\cos\angle(z_{1},x)}{\sin\angle(z_{1},r_{w})}, and argues that, without any other information, z1z_{1} may in practice be a good approximate solution of maxw𝒱cos(z1,x)sin(z1,rw)\max_{w\in\mathcal{V}}\frac{\cos\angle(z_{1},x)}{\sin\angle(z_{1},r_{w})}. He thus suggests z1z_{1} as an approximation to the solution woptw_{opt} of (3). However, Ye’s proof of the first relation in the right-hand side of (8) and his approximate analysis is unapplicable to a non-Hermitian AA.

As a matter of fact, Ye’s analysis in the Hermitian case does not seem theoretically sound and has some arbitrariness, causing that his claim may be problematic, as will be shown below.

Since

cos(𝒱wopt,x)cos(𝒱w,x)\cos\angle(\mathcal{V}_{w_{opt}},x)\geq\cos\angle(\mathcal{V}_{w},x)

for any w𝒱w\in\mathcal{V}, Ye attempts to seek a good approximation to the maximizer woptw_{opt} of maxw𝒱cos(𝒱w,x)\max_{w\in\mathcal{V}}\cos\angle(\mathcal{V}_{w},x). To this end, in the first relation of the right-hand side in (8), Ye takes b𝒱b\in\mathcal{V} to be the Ritz vector z1z_{1} of AA from 𝒱\mathcal{V} that is used to approximate the desired xx. Then (8) shows that

cos(𝒱w,x)=cos(bw,x)sin(bw,rw)cos(z1,x)sin(z1,rw),\cos\angle(\mathcal{V}_{w},x)=\frac{\cos\angle(b_{w},x)}{\sin\angle(b_{w},r_{w})}\geq\frac{\cos\angle(z_{1},x)}{\sin\angle(z_{1},r_{w})},

where bwb_{w} is defined by (9). Notice that bwb_{w} is a function of w𝒱w\in\mathcal{V}. However, since z1z_{1} is a constant vector and independent of w𝒱w\in\mathcal{V}, there is no reason that z1z_{1} is close to bwb_{w} unless bwz1b_{w}\approx z_{1} for some specific ww and 𝒱\mathcal{V}, as we argue below.

Notice that QwQw=IrwrwHrw2Q_{w}Q_{w}=I-\frac{r_{w}r_{w}^{H}}{\|r_{w}\|^{2}} and rwHx=0r_{w}^{H}x=0. Applying the Sherman–Morrison formula to (VHQwQHV)1(V^{H}Q_{w}Q^{H}V)^{-1} and making use of rwHPVx=rwH(IPV)xr_{w}^{H}P_{V}x=-r_{w}^{H}(I-P_{V})x, by some manipulation we can justify that

(14) bw=PV(x+αww+βwAw),b_{w}=P_{V}(x+\alpha_{w}w+\beta_{w}Aw),

where

αw=ϕrwH(IPV)x(IPV)rw2,βw=rwH(IPV)x(IPV)rw2\alpha_{w}=\phi\frac{r_{w}^{H}(I-P_{V})x}{\|(I-P_{V})r_{w}\|^{2}},\ \beta_{w}=-\frac{r_{w}^{H}(I-P_{V})x}{\|(I-P_{V})r_{w}\|^{2}}

with ϕ=xHAwxHw\phi=\frac{x^{H}Aw}{x^{H}w}. Observe that αww\|\alpha_{w}w\| and βwAw\|\beta_{w}Aw\| do not depend on the size of w\|w\| and are generally not small. Suppose that ww is normalized. In this case, |αw||\alpha_{w}| and |βw||\beta_{w}| are generally not small. Recall that PVx/PVxP_{V}x/\|P_{V}x\| is the best approximation to xx from 𝒱\mathcal{V} and z1z_{1} is the computable approximation to xx obtained by the standard Rayleigh–Ritz method. Therefore, it is seen from (14) that the constant vector z1z_{1} is generally not a good approximation to bwb_{w} in direction for all w𝒱w\in\mathcal{V} unless wxw\approx x, which can occur only when PVx/PVxxP_{V}x/\|P_{V}x\|\approx x, that is, cos(𝒱,x)\cos\angle(\mathcal{V},x) is already sufficiently small. These mean that the functions

cos(bw,x)sin(bw,rw)andcos(z1,x)sin(z1,rw)\frac{\cos\angle(b_{w},x)}{\sin\angle(b_{w},r_{w})}\ \ {\rm and}\ \ \frac{\cos\angle(z_{1},x)}{\sin\angle(z_{1},r_{w})}

generally have no similarity and their difference is not close to zero unless z1xz_{1}\approx x. As a result, the maximizer of maxw𝒱cos(z1,x)sin(z1,rw)\max_{w\in\mathcal{V}}\frac{\cos\angle(z_{1},x)}{\sin\angle(z_{1},r_{w})} generally bears no relation to the maximizer woptw_{opt} of

maxw𝒱cos(𝒱w,x)=maxw𝒱cos(bw,x)sin(bw,rw).\max_{w\in\mathcal{V}}\cos\angle(\mathcal{V}_{w},x)=\max_{w\in\mathcal{V}}\frac{\cos\angle(b_{w},x)}{\sin\angle(b_{w},r_{w})}.

The above analysis indicates that b=z1b=z_{1} in the first relation of (8) is generally not a good approximation to bwb_{w} and the resulting maxw𝒱cos(z1,x)sin(z1,rw)\max_{w\in\mathcal{V}}\frac{\cos\angle(z_{1},x)}{\sin\angle(z_{1},r_{w})} is generally not a good replacement of maxw𝒱cos(𝒱w,x)\max_{w\in\mathcal{V}}\cos\angle(\mathcal{V}_{w},x). In the Hermitian case, by using a few approximate and heuristic arguments, Ye [25] argues that w=z1w=z_{1} may be a good approximate maximizer of maxw𝒱cos(z1,x)sin(z1,rw)\max_{w\in\mathcal{V}}\frac{\cos\angle(z_{1},x)}{\sin\angle(z_{1},r_{w})} and then uses z1z_{1} as a replacement of woptw_{opt} that solves (3).

In what follows we give up any possible further analysis on Theorem 2.1 and instead consider (3) from new perspectives. We will establish a number of important and insightful results. We derive a new expression of woptw_{opt}, which is essentially the same as but formally different from (6) with c=0c=0. The new form of woptw_{opt} will play a critical role in establishing explicit expressions of the a priori (IPV)wopt(I-P_{V})w_{opt}, the optimally expanded subspace 𝒱wopt\mathcal{V}_{w_{opt}} and some other important quantities.

Note that PV=VVHP_{V}=VV^{H} is the orthogonal projector onto 𝒱\mathcal{V}. Assume that x𝒱x\not\in\mathcal{V}, w𝒱w\in\mathcal{V} and Aw𝒱Aw\not\in\mathcal{V}. Lemma 1 of [25] states that

(15) cos(𝒱w,x)=cos2(𝒱,x)+cos2((IPV)Aw,x),\cos\angle(\mathcal{V}_{w},x)=\sqrt{\cos^{2}\angle(\mathcal{V},x)+\cos^{2}\angle((I-P_{V})Aw,x)},

where (IPV)Aw=r=Ry(I-P_{V})Aw=r=Ry with RR defined by (32) and w=Vyw=Vy. We should remind the reader that Lemma 1 of [25] uses the form r=Ryr=Ry, but we use the different form (IPV)Aw(I-P_{V})Aw here by writing RR in the form of (IPV)AV(I-P_{V})AV. (7) motivates us this change in form and enables us to establish more informative theoretical results on the optimal subspace expansion problem (3). Relation (15) indicates that

(16) wopt=argmaxw𝒱cos((IPV)Aw,x).w_{opt}=\arg\max_{w\in\mathcal{V}}\cos\angle((I-P_{V})Aw,x).

Let the matrix (V,V)(V,V_{\perp}) be unitary. Then IPV=VVHI-P_{V}=V_{\perp}V_{\perp}^{H} and (IPV)AV=VVHAV(I-P_{V})AV=V_{\perp}V_{\perp}^{H}AV. Define the vector

(17) x=(IPV)x,x_{\perp}=(I-P_{V})x,

which is the orthogonal projection of xx onto the orthogonal complement of 𝒱\mathcal{V} with respect to 𝒞n\mathcal{C}^{n} and is nonzero for x𝒱x\not\in\mathcal{V}. Then the matrix pair

(18) {VHAHxxHAV,VHAH(IPV)AV}={VHAHxxHAV,VHAHVVHAV}\{V^{H}A^{H}x_{\perp}x_{\perp}^{H}AV,\ V^{H}A^{H}(I-P_{V})AV\}=\{V^{H}A^{H}x_{\perp}x_{\perp}^{H}AV,\ V^{H}A^{H}V_{\perp}V_{\perp}^{H}AV\}

restricted to the orthogonal complement 𝒩(VHAV)\mathcal{N}^{\perp}(V_{\perp}^{H}AV) of 𝒩(VHAV)\mathcal{N}(V_{\perp}^{H}AV) is Hermitian definite, that is, the range restricted VHAH(IPV)AV=VHAHVVHAVV^{H}A^{H}(I-P_{V})AV=V^{H}A^{H}V_{\perp}V_{\perp}^{H}AV is Hermitian positive definite. Note that

𝒩(VHAV)=(VHAHV),\mathcal{N}^{\perp}(V_{\perp}^{H}AV)=\mathcal{R}(V^{H}A^{H}V_{\perp}),

the column space of VHAHVV^{H}A^{H}V_{\perp}. We write the range restricted matrix pair (18) as

(19) {VHAHxxHAV,VHAHVVHAV}|(VHAHV).\{V^{H}A^{H}x_{\perp}x_{\perp}^{H}AV,\ V^{H}A^{H}V_{\perp}V_{\perp}^{H}AV\}\big{|}_{\mathcal{R}(V^{H}A^{H}V_{\perp})}.
Theorem 2.2.

Assume that x𝒱x\not\in\mathcal{V}, w𝒱w\in\mathcal{V} and Aw𝒱Aw\not\in\mathcal{V}, and let RR be defined by (5). Then the optimal expansion vector, up to scaling, is

(20) wopt=V(VHAV)VHx=((IPV)APV)x=VRx,w_{opt}=V(V_{\perp}^{H}AV)^{\dagger}V_{\perp}^{H}x=((I-P_{V})AP_{V})^{\dagger}x=VR^{\dagger}x,
(21) (IPV)Awopt\displaystyle(I-P_{V})Aw_{opt} =\displaystyle= RRx,\displaystyle RR^{\dagger}x,
(22) 𝒱wopt\displaystyle\mathcal{V}_{w_{opt}} =\displaystyle= 𝒱span{RRx},\displaystyle\mathcal{V}\oplus{\rm span}\{RR^{\dagger}x\},

and the columns of (V,RRx/RRx)(V,RR^{\dagger}x/\|RR^{\dagger}x\|) form an orthonormal basis of 𝒱wopt\mathcal{V}_{w_{opt}}. Furthermore,

(23) cos(𝒱wopt,x)\displaystyle\cos\angle(\mathcal{V}_{w_{opt}},x) =\displaystyle= cos(PVx+RRx,x),\displaystyle\cos\angle(P_{V}x+RR^{\dagger}x,x),
(24) cos(𝒱wopt,x)\displaystyle\cos\angle(\mathcal{V}_{w_{opt}},x) =\displaystyle= cos(𝒱R,x),\displaystyle\cos\angle(\mathcal{V}_{R},x),

where the subspace

(25) 𝒱R=𝒱span{R}.\mathcal{V}_{R}=\mathcal{V}\oplus{\rm span}\{R\}.
{proof}

Write w=Vyw=Vy. Then by assumption, 𝒱\mathcal{V} is not an invariant subspace of AA, so that (IPV)AV=VVHAV0(I-P_{V})AV=V_{\perp}V_{\perp}^{H}AV\not=0. From (17) and (IPV)2=IPV(I-P_{V})^{2}=I-P_{V}, we obtain

(26) cos2((IPV)Aw,x)\displaystyle\cos^{2}\angle((I-P_{V})Aw,x) =\displaystyle= cos2((IPV)AVy,x)\displaystyle\cos^{2}\angle((I-P_{V})AVy,x)
=\displaystyle= |xH(IPV)AVy|2(IPV)AVy2\displaystyle\frac{|x^{H}(I-P_{V})AVy|^{2}}{\|(I-P_{V})AVy\|^{2}}
=\displaystyle= yHVHAHxxHAVyyHVHAH(IPV)AVy,\displaystyle\frac{y^{H}V^{H}A^{H}x_{\perp}x_{\perp}^{H}AVy}{y^{H}V^{H}A^{H}(I-P_{V})AVy},

which is the Rayleigh quotient of the range restricted Hermitian definite pair (19) with respect to the nonzero vector yy. Keep in mind that woptw_{opt} satisfies (16). Therefore, by the min-max characterization of the eigenvalues of a Hermitian definite pair (cf. e.g., [20, pp. 281-2]), the solution to the maximization problem

(27) maxw𝒱cos((IPV)Aw,x)=maxycos((IPV)AVy,x)\max_{w\in\mathcal{V}}\cos\angle((I-P_{V})Aw,x)=\max_{y}\cos\angle((I-P_{V})AVy,x)

is an eigenvector yopty_{opt} of the range restricted definite pair (19) associated with its largest eigenvalue μopt\mu_{opt}, and the optimal expansion vector wopt=Vyoptw_{opt}=Vy_{opt} up to scaling.

Notice that VHAHxxHAV|(VHAHV)V^{H}A^{H}x_{\perp}x_{\perp}^{H}AV\big{|}_{\mathcal{R}(V^{H}A^{H}V_{\perp})} is a rank-one Hermitian positive semi-definite matrix, and denote

(28) VHAHVVHAV|(VHAHV)=B|(VHAHV).V^{H}A^{H}V_{\perp}V_{\perp}^{H}AV\big{|}_{\mathcal{R}(V^{H}A^{H}V_{\perp})}=B\big{|}_{\mathcal{R}(V^{H}A^{H}V_{\perp})}.

Therefore, the range restricted Hermitian definite pair (19) has exactly one positive eigenvalue, and the other eigenvalues are zeros. By (28), the eigenvalues μ\mu of the matrix pair (19) are identical to those of the rank-one Hermitian matrix

(29) (B|(VHAHV))12VHAHxxHAV(B|(VHAHV))12,(B\big{|}_{\mathcal{R}(V^{H}A^{H}V_{\perp})})^{-\frac{1}{2}}V^{H}A^{H}x_{\perp}x_{\perp}^{H}AV(B\big{|}_{\mathcal{R}(V^{H}A^{H}V_{\perp})})^{-\frac{1}{2}},

and the eigenvectors yy of the pair (19) are related to the eigenvectors y^\hat{y} of the above matrix by

(30) y=(B|(VHAHV))12y^.y=(B\big{|}_{\mathcal{R}(V^{H}A^{H}V_{\perp})})^{-\frac{1}{2}}\hat{y}.

It is known that the unique positive eigenvalue μopt\mu_{opt} of the matrix in (29) is

(31) μopt=(B|(VHAHV))12VHAHx2\mu_{opt}=\|(B\big{|}_{\mathcal{R}(V^{H}A^{H}V_{\perp})})^{-\frac{1}{2}}V^{H}A^{H}x_{\perp}\|^{2}

and its associated (unnormalized) eigenvector is

y^opt=(B|(VHAHV))12VHAHx.\hat{y}_{opt}=(B\big{|}_{\mathcal{R}(V^{H}A^{H}V_{\perp})})^{-\frac{1}{2}}V^{H}A^{H}x_{\perp}.

Substituting this into (30) and wopt=Vyoptw_{opt}=Vy_{opt} gives the first relation in the right-hand side of (20). Exploiting PV=VVHP_{V}=VV^{H} and IPV=VVHI-P_{V}=V_{\perp}V_{\perp}^{H}, by the definition of Moore–Penrose generalized inverse, we obtain the second relation in the right-hand side of (20).

By (5), the residual

(32) R=(IPV)AV,R=(I-P_{V})AV,

which shows that (IPV)APV=RVH(I-P_{V})AP_{V}=RV^{H}. As a result, we have

(33) ((IPV)APV)=VR,((I-P_{V})AP_{V})^{\dagger}=VR^{\dagger},

which, together with the second relation in (20), proves

(34) wopt=VRx,w_{opt}=VR^{\dagger}x,

i.e., the third relation in (20). Therefore, from (32) and (34), we obtain

(IPV)Awopt=(IPV)AVRx=RRx,(I-P_{V})Aw_{opt}=(I-P_{V})AVR^{\dagger}x=RR^{\dagger}x,

which establishes (21).

Since

𝒱wopt=𝒱+span{Awopt}=𝒱span{(IPV)Awopt},\mathcal{V}_{w_{opt}}=\mathcal{V}+{\rm span}\{Aw_{opt}\}=\mathcal{V}\oplus{\rm span}\{(I-P_{V})Aw_{opt}\},

it follows from (21) that (22) holds.

Relations (21) and (22) show that the unit length RRx/RRxRR^{\dagger}x/\|RR^{\dagger}x\| and the columns of VV form an orthonormal basis of the optimally expanded subspace 𝒱wopt=𝒱span{RRx},\mathcal{V}_{w_{opt}}=\mathcal{V}\oplus{\rm span}\{RR^{\dagger}x\}, and the orthogonal projector onto 𝒱wopt\mathcal{V}_{w_{opt}} is

(35) PV+RRxxHRR/RRx2.P_{V}+RR^{\dagger}xx^{H}RR^{\dagger}/\|RR^{\dagger}x\|^{2}.

Notice that RRRR^{\dagger} itself is the orthogonal projector onto span{R}{\rm span}\{R\}. By right-multiplying the orthogonal projector in (35) with xx and exploiting (RR)2=RR(RR^{\dagger})^{2}=RR^{\dagger}, it is easily justified that the orthogonal projection of xx onto 𝒱wopt\mathcal{V}_{w_{opt}} is PVx+RRxP_{V}x+RR^{\dagger}x, which proves

cos(𝒱wopt,x)=cos(PVx+RRx,x),\cos\angle(\mathcal{V}_{w_{opt}},x)=\cos\angle(P_{V}x+RR^{\dagger}x,x),

i.e., (23) holds.

Note that VHR=0V^{H}R=0 and the orthogonal projector onto 𝒱R\mathcal{V}_{R} is PV+RRP_{V}+RR^{\dagger}. Therefore, PVx+RRx=(PV+RR)xP_{V}x+RR^{\dagger}x=(P_{V}+RR^{\dagger})x is the orthogonal projection of xx onto 𝒱R\mathcal{V}_{R} defined by (25), and from (23) we obtain (24).

Remark 4.

Since 𝒱R𝒱wopt\mathcal{V}_{R}\supset\mathcal{V}_{w_{opt}}, (23) and (24) show that the best approximation to xx from 𝒱wopt\mathcal{V}_{w_{opt}} is identical to that from the higher dimensional 𝒱R\mathcal{V}_{R} containing 𝒱wopt\mathcal{V}_{w_{opt}}. Remarkably, 𝒱R\mathcal{V}_{R} is independent of xx but provides the same best approximation to xx as 𝒱wopt\mathcal{V}_{w_{opt}} does, and all the other optimally expanded subspaces for eigenvectors zz of AA other than xx are also contained in 𝒱R\mathcal{V}_{R} since RRzspan{R}RR^{\dagger}z\in{\rm span}\{R\}. Therefore, unlike the a priori 𝒱wopt\mathcal{V}_{w_{opt}} that favors xx only, 𝒱R\mathcal{V}_{R} favor all the eigenvectors of AA equally, and it is possible to use a suitable projection method to simultaneously compute approximations to xx and other eigenvectors of AA with respect to 𝒱R\mathcal{V}_{R} rather than 𝒱wopt\mathcal{V}_{w_{opt}}. We will come back to this point in the end of next section.

Remark 5.

It is seen from (34) that the components of RxR^{\dagger}x are the coefficients of woptw_{opt} in the basis VV, while those of VHxV^{H}x are the coefficients of PVx=VVHxP_{V}x=VV^{H}x in the basis VV. Notice from (32) that RHV=0R^{H}V=0. Then RV=0R^{\dagger}V=0 as 𝒩(R)=𝒩(RH)\mathcal{N}(R^{\dagger})=\mathcal{N}(R^{H}). Relations RV=0R^{\dagger}V=0 and VHV=IV^{H}V=I show that RR^{\dagger} and VHV^{H} are different, so that RxR^{\dagger}x and VHxV^{H}x are not colinear generally. As a result, taking w=PVxw=P_{V}x and expanding 𝒱\mathcal{V} by (IPV)APVx(I-P_{V})AP_{V}x are not theoretically optimal in general. In other words, although PVx/PVxP_{V}x/\|P_{V}x\| is the best approximation to xx from 𝒱\mathcal{V}, relation (34) shows that generally woptPVxw_{opt}\not=P_{V}x up to scaling.

Finally, let us look at the particular case rank(R)=1\mathop{\operator@font rank}\nolimits(R)=1. In this case, we can write RR as R=v^uHR=\hat{v}u^{H} with u=1\|u\|=1 and R=v^\|R\|=\|\hat{v}\|. Therefore,

(36) VHAV=VHR=(VHv^)uHV_{\perp}^{H}AV=V_{\perp}^{H}R=(V_{\perp}^{H}\hat{v})u^{H}

is a rank-one matrix. Let w=Vyw=Vy. Exploiting (36), IPV=VVHI-P_{V}=V_{\perp}V_{\perp}^{H}, IPV=(IPV)2I-P_{V}=(I-P_{V})^{2}, and (IPV)v^=VHv^\|(I-P_{V})\hat{v}\|=\|V_{\perp}^{H}\hat{v}\|, by some elementary manipulation, we can prove that

cos((IPV)AVy,x)\displaystyle\cos\angle((I-P_{V})AVy,x) =\displaystyle= cos((IPV)v^,x)\displaystyle\cos\angle((I-P_{V})\hat{v},x)

for any yy satisfying uHy0u^{H}y\neq 0. Therefore, wopt=Vyw_{opt}=Vy for any yy satisfying uHy0u^{H}y\neq 0, and there are infinitely many woptw_{opt}’s. However, (IPV)Awopt=(IPV)v^(uHy)(I-P_{V})Aw_{opt}=(I-P_{V})\hat{v}(u^{H}y) is unique up to scaling for all yy satisfying uHy0u^{H}y\not=0, and 𝒱wopt\mathcal{V}_{w_{opt}} is unique since

(IPV)Awopt/(IPV)Awopt=(IPV)v^/(IPV)v^(I-P_{V})Aw_{opt}/\|(I-P_{V})Aw_{opt}\|=(I-P_{V})\hat{v}/\|(I-P_{V})\hat{v}\|

is unique and it, together with the columns of VV, forms an orthonormal basis of 𝒱wopt\mathcal{V}_{w_{opt}}.

3 Computable optimal replacements of (IPV)Awopt(I-P_{V})Aw_{opt} and computable optimally expanded subspaces 𝒱w~opt\mathcal{V}_{\widetilde{w}_{opt}}

The results in Theorem 2.2 are a priori and are not directly applicable to a practical expansion of 𝒱\mathcal{V} as they involve the desired eigenvector xx. Therefore, for a non-Krylov subspace 𝒱\mathcal{V} with rank(R)2\mathop{\operator@font rank}\nolimits(R)\geq 2, one cannot use woptw_{opt} in (6), i.e., the third result of (20), or a good approximation to it from 𝒱\mathcal{V} to expand 𝒱\mathcal{V} in computations, as we have elaborated in the introduction. However, just as (7) indicates, as far as the subspace expansion is concerned, it is (IPV)Awopt(I-P_{V})Aw_{opt} rather than woptw_{opt} itself that is used to expand 𝒱\mathcal{V}. Therefore, instead of woptw_{opt} itself, the key is to consider (IPV)Awopt(I-P_{V})Aw_{opt} as a whole when expanding 𝒱\mathcal{V}. This new perspective forms our basis of this section.

Theoretically, for a given subspace 𝒱\mathcal{V}, the unit length vector PVx/PVxP_{V}x/\|P_{V}x\| is the theoretical best approximation to xx from 𝒱\mathcal{V}. From the computational viewpoint, because of PVx/PVx𝒱P_{V}x/\|P_{V}x\|\in\mathcal{V}, projection methods are only viable choices that seek computable best approximations to xx from 𝒱\mathcal{V} using their own extraction approaches. It has been commonly known from, e.g., [1, 14, 15, 19, 21], that the standard Rayleigh–Ritz, harmonic Rayleigh–Ritz, refined Rayleigh–Ritz, and refined harmonic Rayleigh–Ritz methods compute the Ritz, harmonic Ritz, refined Ritz, and refined harmonic Ritz vectors of AA from a given subspace 𝒱\mathcal{V} and use them to approximate the desired xx. For the given 𝒱\mathcal{V}, these approximate eigenvectors are computationally the best approximations to xx that these four kinds of projection methods can obtain, and one cannot do better once a projection method is chosen. In other words, these approximations are computable optimal replacements of the a priori best approximation PVx/PVxP_{V}x/\|P_{V}x\| to xx from 𝒱\mathcal{V} within the framework of the four kinds of projection methods, respectively. This naturally motivates us to introduce the following definition.

Definition 1.

For a chosen projection method, the approximation to xx extracted by it from 𝒱\mathcal{V} is called the computable optimal replacement of PVx/PVxP_{V}x/\|P_{V}x\|.

For a chosen projection method, in terms of this definition, for a chosen projection method, we can fully exploit the theoretical results in Section 2 to expand 𝒱\mathcal{V} in its own computable optimal way. Notice that RRRR^{\dagger} is the orthogonal projector onto the subspace span{R}{\rm span}\{R\} and RRx/RRxRR^{\dagger}x/\|RR^{\dagger}x\| is the theoretical best approximation to xx from span{R}{\rm span}\{R\}, and write vopt=RRx/RRxv_{opt}=RR^{\dagger}x/\|RR^{\dagger}x\|. Then the columns of (V,vopt)(V,v_{opt}) form an orthonormal basis of 𝒱wopt\mathcal{V}_{w_{opt}} in (22). Keep these results in mind. Then regarding the optimal expansion voptv_{opt}, a chosen projection method obtains its computable best approximation to xx from span{R}{\rm span}\{R\}, and we take it as the computable optimal replacement of voptv_{opt} to xx from span{R}{\rm span}\{R\} or, equivalently, (IPV)Awopt=RRx(I-P_{V})Aw_{opt}=RR^{\dagger}x in direction. We then use such a replacement, to construct the corresponding computable optimally expanded subspace.

Specifically, by Definition 1, for the standard Rayleigh–Ritz method, the harmonic Rayleigh–Ritz method, the refined Rayleigh–Ritz method and the refined harmonic Rayleigh–Ritz methods, the Ritz vector, the harmonic Ritz vector, the refined Ritz vector, and the refined harmonic Ritz vector of AA from span{R}{\rm span}\{R\} obtained by them are the corresponding computable optimal replacements of the theoretical best approximation voptv_{opt} to xx from span{R}{\rm span}\{R\}.

In what follows we show how to efficiently and reliably achieve this goal in computations. Before proceeding, let us investigate (21) and get more insight into the optimal subspace expansion. We present the following result.

Theorem 3.1.

Assume that (V,V)(V,V_{\perp}) is unitary. Then it holds that

(37) RRxVVHx=(IPV)x.\|RR^{\dagger}x\|\leq\|V_{\perp}V_{\perp}^{H}x\|=\|(I-P_{V})x\|.

If RRx=VVHxRR^{\dagger}x=V_{\perp}V_{\perp}^{H}x, then x𝒱woptx\in\mathcal{V}_{w_{opt}} and the subspace expansion terminates.

{proof}

Since VHR=0V^{H}R=0, VHV=0V^{H}V_{\perp}=0 and (V,V)(V,V_{\perp}) is unitary, we have span{R}span{V}{\rm span}\{R\}\subseteq{\rm span}\{V_{\perp}\}, which proves (37) by noticing that RRRR^{\dagger} and VVHV_{\perp}V_{\perp}^{H} are the orthogonal projectors onto span{R}{\rm span}\{R\} and span{V}{\rm span}\{V_{\perp}\}, respectively.

For R𝒞n×kR\in\mathcal{C}^{n\times k}, since span{R}span{V}{\rm span}\{R\}\subseteq{\rm span}\{V_{\perp}\}, rank(R)=k1k\mathop{\operator@font rank}\nolimits(R)=k_{1}\leq k and rank(V)=nk\mathop{\operator@font rank}\nolimits(V_{\perp})=n-k, we must have rank(R)<nk\mathop{\operator@font rank}\nolimits(R)<n-k when k<nkk<n-k. On the other hand, if kk is sufficiently large such that k>nkk>n-k, then RR must be rank deficient as span{R}span{V}{\rm span}\{R\}\subseteq{\rm span}\{V_{\perp}\} unconditionally. Therefore, the theoretical optimal expansion RRxRR^{\dagger}x is part of (IPV)x(I-P_{V})x. Notice that x=PVx+(IPV)xx=P_{V}x+(I-P_{V})x and PVx𝒱P_{V}x\in\mathcal{V}, Therefore, if RRx=(IPV)x=VVHxRR^{\dagger}x=(I-P_{V})x=V_{\perp}V_{\perp}^{H}x, then x𝒱woptx\in\mathcal{V}_{w_{opt}} already and the subspace expansion terminates.

Observe that, for all the afore-mentioned four kinds of projection methods, the computable optimal replacements of voptv_{opt} are orthogonal to 𝒱\mathcal{V} because they lie in span{R}{\rm span}\{R\} and VHR=0V^{H}R=0. Write the unit length approximate eigenvector obtained by any chosen projection method applied to span{R}{\rm span}\{R\} as x~R\widetilde{x}_{R}, and take v~opt:=x~R\widetilde{v}_{opt}:=\widetilde{x}_{R}. Then the columns of (V,v~opt)(V,\widetilde{v}_{opt}) forms an orthonormal basis of the computable optimally expanded subspace, denoted by 𝒱w~opt\mathcal{V}_{\widetilde{w}_{opt}} hereafter, where w~opt\widetilde{w}_{opt} is a corresponding replacement of the optimal expansion vector woptw_{opt} and is not required in computations.

Mathematically, we can derive an explicit expression of w~opt\widetilde{w}_{opt} in terms of v~opt\widetilde{v}_{opt}, as the following result shows.

Theorem 3.2.

It holds that

(38) w~opt=V(VHAV)VHv~opt=VRv~opt.\widetilde{w}_{opt}=V(V_{\perp}^{H}AV)^{\dagger}V_{\perp}^{H}\widetilde{v}_{opt}=VR^{\dagger}\widetilde{v}_{opt}.
{proof}

Since w~opt𝒱\widetilde{w}_{opt}\in\mathcal{V}, we have PVw~opt=w~optP_{V}\widetilde{w}_{opt}=\widetilde{w}_{opt}, which leads to the equation

(39) (IPV)Aw~opt=(IPV)APVw~opt=v~opt.(I-P_{V})A\widetilde{w}_{opt}=(I-P_{V})AP_{V}\widetilde{w}_{opt}=\widetilde{v}_{opt}.

It then follows from ((IPV)APV)=V(VHAV)VH((I-P_{V})AP_{V})^{\dagger}=V(V_{\perp}^{H}AV)^{\dagger}V_{\perp}^{H} and (33) that (38) holds.

Compared with the expressions (20) and (34) of woptw_{opt}, we have replaced VHxV_{\perp}^{H}x and RxR^{\dagger}x by VHv~optV_{\perp}^{H}\widetilde{v}_{opt} and Rv~optR^{\dagger}\widetilde{v}_{opt} in the expressions of w~opt\widetilde{w}_{opt}, respectively. We should remind the reader that the solution w~opt\widetilde{w}_{opt} to (39) may not be unique and w~opt\widetilde{w}_{opt} in (38) is the minimum 2-norm one.

Next we focus on some computational details on the computable optimal subspace expansion approaches that correspond to the four kinds of projection methods.

Suppose that (V,v~opt)(V,\widetilde{v}_{opt}) is available. One goes to the next iteration, and performs any one of the four projection methods. They all need to form the matrix

(40) (V,v~opt)HA(V,v~opt)(V,\widetilde{v}_{opt})^{H}A(V,\widetilde{v}_{opt})

explicitly. This can be efficiently done in an updated way. For brevity, hereafter suppose that all the quantities, such as AA, VV and v~opt\widetilde{v}_{opt}, are all real when counting the computational cost. For Vn×kV\in\mathcal{R}^{n\times k}, the matrices VHAVV^{H}AV and AVAV are already available at iteration kk. To form the matrix in (40), one needs to compute one matrix-vector Av~optA\widetilde{v}_{opt}, and 2k+12k+1 vector inner products VH(Av~opt)V^{H}(A\widetilde{v}_{opt}), v~optH(AV)\widetilde{v}_{opt}^{H}(AV) and v~optH(Av~opt)\widetilde{v}_{opt}^{H}(A\widetilde{v}_{opt}). The total cost is one matrix-vector product and 4nk+2n4nk4nk+2n\approx 4nk flops.

The situation changes for other expansion vectors ww, which include those used in the Lanczos or Arnoldi type expansion approach, the Ritz expansion approach, i.e., the mathematically equivalent RA method, and the refined Ritz expansion approach. When expanding 𝒱\mathcal{V} to 𝒱w\mathcal{V}_{w}, we need to compute AwAw first, which costs one matrix-vector product, and then to orthogonalize AwAw against VV to obtain the unit length vector vwv_{w}, which costs approximately 4nk4nk flops when the (modified) Gram–Schmidt orthogonalization procedure is used. In finite precision arithmetic, some sort of reorthogonalization strategy may be required so as to ensure the numerical orthogonality between vwv_{w} and VV. This will increase the orthogonalization cost up to maximum 8nk8nk flops when the complete reorthogonalization is used. After vwv_{w} is obtained, one forms the projection matrix similar to that in (40), where v~opt\widetilde{v}_{opt} is replaced by vwv_{w}. In contrast, once v~opt\widetilde{v}_{opt} is available, the afore-described computable optimal subspace expansion approach does not perform such a Gram–Schmidt orthogonalization process and thus saves one matrix-vector product and 4nk8nk4nk\sim 8nk flops.

We now consider how to obtain a computable optimal replacement v~opt\widetilde{v}_{opt} in detail and count its computational cost. First, we need to form the residual RR, which can be recursively updated efficiently, as shown below. Notice that AVAV and VHAVV^{H}AV are already available, and write R=Rk=(R~k1,rk)R=R_{k}=(\widetilde{R}_{k-1},r_{k}) and V=Vk=(Vk1,vk)V=V_{k}=(V_{k-1},v_{k}) at iteration kk. Then

R~k1\displaystyle\widetilde{R}_{k-1} =\displaystyle= AVk1Vk1(Vk1HAVk1)vk(vkHAVk1)\displaystyle AV_{k-1}-V_{k-1}(V_{k-1}^{H}AV_{k-1})-v_{k}(v_{k}^{H}AV_{k-1})
=\displaystyle= Rk1vk(vkHAVk1),\displaystyle R_{k-1}-v_{k}(v_{k}^{H}AV_{k-1}),
rk\displaystyle r_{k} =\displaystyle= AvkVk1(Vk1HAvk)(vkHAvk)vk,\displaystyle Av_{k}-V_{k-1}(V_{k-1}^{H}Av_{k})-(v_{k}^{H}Av_{k})v_{k},

where Rk1R_{k-1} is the residual at iteration k1k-1. Since Rk1R_{k-1} and the matrices in parentheses are already available when forming VkHAVkV_{k}^{H}AV_{k}, the above updates of R~k1\widetilde{R}_{k-1} and rkr_{k} approximately cost 2nk2nk flops and 2nk2nk, respectively, so that the total cost is approximately 4nk4nk flops.

With RR available, we need to construct an orthonormal basis matrix QQ of span{R}{\rm span}\{R\}, so that span{R}=span{Q}{\rm span}\{R\}={\rm span}\{Q\}. One can compute QQ in a number of ways. For instance, the QR factorization with column pivoting, which costs 4nkk12k12(n+k)+4k13/34nkk_{1}-2k_{1}^{2}(n+k)+4k_{1}^{3}/3 flops with rank(R)=k1\mathop{\operator@font rank}\nolimits(R)=k_{1}, approximately ranging from 8nk8nk to 2nk22nk^{2} flops for the smallest k1=2k_{1}=2 and biggest k1=kk_{1}=k [2, pp. 302], and the rank-revealing QR factorization [3], which costs nearly the same as the QR factorization with column pivoting for knk\ll n. The most reliable but relatively expensive approach is the Chan R-SVD algorithm, which approximately costs 6nk26nk^{2} flops [2, pp. 493]. Once QQ is computed, we obtain

(41) RRx=QQHx.RR^{\dagger}x=QQ^{H}x.

We point out that rank(R)=k1<k\mathop{\operator@font rank}\nolimits(R)=k_{1}<k is possible as RR may be (numerically) rank deficient. Suppose that R\|R\| is not small, that is, the columns of VV do not span a good approximate invariant subspace of AA. Then RR must be numerically rank deficient as one of the kk Ritz pairs with respect to 𝒱\mathcal{V} converges to some eigenpair, e.g., (λ,x)(\lambda,x), as shown below: Let x~=Vy\widetilde{x}=Vy with y=1\|y\|=1 be the Ritz vector approximating xx and μ\mu be the corresponding Ritz value, that is, (μ,y)(\mu,y) is an eigenpair of the projection matrix VHAVV^{H}AV. Then its associated residual norm is Ry=Ax~μx~\|Ry\|=\|A\widetilde{x}-\mu\widetilde{x}\|. Whenever Rytol\|Ry\|\leq tol with toltol sufficiently small, RR must be numerically rank deficient since its smallest singular value σmin(R)Rytol\sigma_{\min}(R)\leq\|Ry\|\leq tol. Moreover, if some kk1k-k_{1} Ritz pairs converge, i.e., Rytol\|Ry\|\leq tol for kk1k-k_{1} corresponding yy’s, then the numerical rank of RR equals k1k_{1} since these yy’s are linearly independent.

Indeed, we have observed in numerical experiments that the numerical rank k1k_{1} of RR satisfies k1k1k_{1}\leq k-1 and becomes much smaller than kk as kk increases, in the sense that RR has exactly kk1k-k_{1} tiny singular value(s) no more than the level of Rϵmach\|R\|\epsilon_{\rm mach} with ϵmach\epsilon_{\rm mach} being the machine precision.

With QQ at hand, for each of the four projection methods, we can compute the corresponding eigenvector approximation to xx from span{Q}{\rm span}\{Q\}, which is the corresponding computable optimal replacement of QQHx/QQHxQQ^{H}x/\|QQ^{H}x\|. Taking the standard Rayleigh–Ritz method as an example, we need to form QHAQQ^{H}AQ, whose eigenvalues are the Ritz values of AA with respect to span{Q}{\rm span}\{Q\}. Suppose that QQ is real, and notice that Qn×k1Q\in\mathcal{R}^{n\times k_{1}}. Then we need k1k_{1} matrix-vector products AQAQ and 2nk122nk_{1}^{2} flops to compute QHAQQ^{H}AQ. We compute the eigendecomposition of QHAQQ^{H}AQ using the QR algorithm, which costs approximately 25k1325k_{1}^{3} flops for AA general and 9k139k_{1}^{3} flops for AA real symmetric. The desired Ritz vector v~opt=Qy\widetilde{v}_{opt}=Qy with yy being the unit length eigenvector of QHAQQ^{H}AQ associated with its eigenvalue approximating the desired eigenvalue λ\lambda.

In summary, suppose that the QR factorization with column pivoting is used to compute QQ. When computing v~opt\widetilde{v}_{opt} and using it to construct an orthonormal basis matrix (V,v~opt)(V,\widetilde{v}_{opt}) of 𝒱w~opt\mathcal{V}_{\widetilde{w}_{opt}}, at iteration kk, we need k1k_{1} matrix-vectors plus approximate 2nk2+2nk2=4nk22nk^{2}+2nk^{2}=4nk^{2} flops for k1kk_{1}\approx k and 8nk1+2nk122nk128nk_{1}+2nk_{1}^{2}\approx 2nk_{1}^{2} flops for k1kk_{1}\ll k. Remarkably, however, it is worthwhile to point out that the k1k_{1} matrix-vector products is the matrix-matrix product AQAQ, which, together with QH(AQ)Q^{H}(AQ), should be computed by using much more efficient level-3 BLAS operations.

For those subspace expansion approaches such as the Lanczos or Arnoldi type expansion approach, and the Ritz expansion approach and the refined Ritz expansion approach that are mathematically equivalent to the RA method and the refined RA method, when using the corresponding ww to form AwAw and orthonormalize it against VV to expand 𝒱\mathcal{V} to 𝒱w\mathcal{V}_{w}, we need one matrix-vector product and 4nk8nk4nk\sim 8nk flops. Therefore, the main cost of the kk-step subspace expansion approach is kk matrix-vector products and 2nk24nk22nk^{2}\sim 4nk^{2} flops. However, successively updating AVAV and VH(AV)V^{H}(AV) consists of level-2 BLAS and level-1 BLAS operations. In contrast, since the matrix-matrix products AQAQ and QHAQQ^{H}AQ are computed using level-3 BLAS operations, the computable optimal subspace expansion approach is much more efficient than the kk-step usual subspace expansion one even if k1kk_{1}\approx k.

Finally, let us return to (23), (24), and Remark 4 in Section 2. Relations (23) and (24) have proved that the best approximations to xx from 𝒱wopt\mathcal{V}_{w_{opt}} and 𝒱R\mathcal{V}_{R} are the same and equal to (PV+QQH)x/(PV+QQH)x(P_{V}+QQ^{H})x/\|(P_{V}+QQ^{H})x\| due to (41). Previously, we have proposed the expansion approach that obtains computable optimal replacements of vopt=QQHx/QQHxv_{opt}=QQ^{H}x/\|QQ^{H}x\| from span{R}{\rm span}\{R\} and then expands 𝒱\mathcal{V} to 𝒱w~opt\mathcal{V}_{\widetilde{w}_{opt}}, from which we compute new approximate eigenpairs in the next iteration. In the proposed approach, we first compute QQ, and then form AQAQ and QHAQQ^{H}AQ. Nevertheless, although the orthogonal projections of xx onto 𝒱wopt\mathcal{V}_{w_{opt}} and 𝒱R\mathcal{V}_{R} are identical, the computable optimal approximations to xx from 𝒱wopt\mathcal{V}_{w_{opt}} and 𝒱R\mathcal{V}_{R} are not the same. Furthermore, the approximate eigenpairs obtained by the four kinds of projection methods applied to 𝒱R\mathcal{V}_{R} are generally more accurate than the corresponding counterparts applied to 𝒱wopt\mathcal{V}_{w_{opt}} since 𝒱wopt𝒱R\mathcal{V}_{w_{opt}}\subset\mathcal{V}_{R}. The same conclusions also hold when 𝒱wopt\mathcal{V}_{w_{opt}} is replaced by 𝒱w~opt\mathcal{V}_{\widetilde{w}_{opt}} since 𝒱w~opt𝒱R\mathcal{V}_{\widetilde{w}_{opt}}\subset\mathcal{V}_{R}.

Regarding the cost, recall that the proposed computable optimal expansion approaches have computed QQ, which, together with VV, form an orthonormal basis of 𝒱R\mathcal{V}_{R}. Therefore, we have already expanded 𝒱\mathcal{V} to 𝒱R\mathcal{V}_{R}. Naturally, this motivates us to compute possibly better eigenpair approximations of (λ,x)(\lambda,x) with respect to the higher dimensional 𝒱R\mathcal{V}_{R}. This results in a new subspace expansion approach. In the next iteration, we need to form the projection matrix (V,Q)HA(V,Q)(V,Q)^{H}A(V,Q). Since AVAV and AQAQ are already available in the previously proposed computable optimal expansion approaches, the extra cost is the computation of VH(AQ)V^{H}(AQ) and QH(AV)Q^{H}(AV), which can be done using efficient level-3 BLAS operations.

More importantly, we can get more insight into the computable optimal subspace expansion approaches. They construct approximations 𝒱w~opt\mathcal{V}_{\widetilde{w}_{opt}}’s to the optimal 𝒱wopt\mathcal{V}_{w_{opt}}, but, in subsequent expansion iterations, they expand these approximations themselves rather than the optimal 𝒱wopt\mathcal{V}_{w_{opt}}. In other words, only in the first expansion iteration, both theoretical and computable optimal subspace expansion approaches start with the same subspace and expand it. After that, at each subsequent expansion iteration, they expand different subspaces; the theoretical optimal expansion approach always expand the optimal subspace at that iteration, but the computable optimal expansion approaches expand the subspaces that are already not optimally expanded ones, so that the subsequent expanded subspaces may constantly deviate from the theoretical optimally ones further and further. We have indeed observed these considerable phenomena in experiments.

In contrast, expanding 𝒱\mathcal{V} to 𝒱R\mathcal{V}_{R} does not involve xx and any other eigenvectors of AA, and 𝒱R\mathcal{V}_{R} always contains the theoretically optimal 𝒱wopt\mathcal{V}_{w_{opt}}. Therefore, 𝒱R\mathcal{V}_{R} is optimal both in theory and computations. Moreover, as we have pointed out in Remark 4 of Section 2, unlike the a priori 𝒱wopt\mathcal{V}_{w_{opt}} and its computable optimal approximations, the subspace 𝒱R\mathcal{V}_{R} can be used to compute any other eigenpair(s) of AA because 𝒱R\mathcal{V}_{R} itself contains the theoretical optimally expanded subspaces 𝒱wopt\mathcal{V}_{w_{opt}}’s for all the eigenvectors of AA at each expansion iteration. As a result, such an expansion approach, though more expensive than our previously proposed optimal expansion approaches, is more robust, and may simultaneously compute more than one eigenpairs of AA and obtain better approximations to them. We do not pursue this expansion approach in the paper and leave it as future work.

4 Numerical experiments

Recall that, for the standard Rayleigh–Ritz and the refined Rayleigh–Ritz methods, the Ritz vector and the refined Ritz vector from span{Rk}{\rm span}\{R_{k}\} are the computable optimal replacements of vopt=RkRkx/RkRkxv_{opt}=R_{k}R_{k}^{\dagger}x/\|R_{k}R_{k}^{\dagger}x\| at expansion iteration kk, respectively, where Rk=AVkVk(VkHAVk)R_{k}=AV_{k}-V_{k}(V_{k}^{H}AV_{k}). All the experiments have been performed on an Intel(R) Core(TM) i7-9700 CPU 3.00GHz with 16 GB RAM using the Matlab R2018b with the machine precision ϵmach=2.22×1016\epsilon_{\rm mach}=2.22\times 10^{-16} under the Miscrosoft Windows 10 64-bit system. We have used the Matlab function orth to compute an orthonormal basis QkQ_{k} of span{Rk}{\rm span}\{R_{k}\}. It is known from, e.g., [6, 10, 19, 21], that the refined Ritz vector is generally more accurate and can be much more accurate than the Ritz vector. We have used the two computable optimal subspace expansion approaches resulting from the Ritz vector and the refined Ritz vector from span{Rk}{\rm span}\{R_{k}\}. We shall report numerical experiments to demonstrate the effectiveness of the theoretical optimal expansion approach and the above two computable optimal expansion approaches. In the meantime, we will compare them with the standard expansion approach, i.e., the Lanczos or Arnoldi type expansion approach with w=vkw=v_{k}, the last column of VkV_{k}, and the Ritz expansion approach [25, 24], i.e., the RA method, with ww being the Ritz vector from 𝒱k\mathcal{V}_{k} at expansion iteration kk, respectively.

We reiterate that the theoretical optimal subspace expansion approach cannot be used in computation since it involves the a priori xx. Here for a purely comparison purpose, we use it as the reference when evaluating the effectiveness of the other subspace expansion approaches and showing how well the computable optimal subspace expansion approaches work. To this end, we exploit the Matlab function eig to compute the desired eigenpair (λ,x)(\lambda,x) of a general AA whenever it is not given in advance, which is supposed to be “exact”. With the unit length xx available, we are able to compute the error sin(𝒱k,x)\sin\angle(\mathcal{V}_{k},x), the distance between xx and a given 𝒱k\mathcal{V}_{k}.

For a given AA, we generate dd vectors in a normal distribution, orthonormalize them to obtain the orthonormal vectors v1,v2,,vdv_{1},v_{2},\ldots,v_{d}, and construct the initial dd-dimensional subspace

𝒱d=span{v1,v2,,vd}.\mathcal{V}_{d}={\rm span}\{v_{1},v_{2},\ldots,v_{d}\}.

We then successively expand it to an mm-dimensional 𝒱m\mathcal{V}_{m} using a given expansion approach. For d>1d>1, 𝒱d\mathcal{V}_{d} is non-Krylov, so are the expanded subspaces 𝒱k\mathcal{V}_{k}, k=d+1,,mk=d+1,\ldots,m. All the residual norms of approximate eigenpairs (μk,x~k)(\mu_{k},\tilde{x}_{k}) mean the relative residual norms

Ax~kμkx~kA1,k=d+1,,m\frac{\|A\tilde{x}_{k}-\mu_{k}\tilde{x}_{k}\|}{\|A\|_{1}},\ k=d+1,\ldots,m

where A1\|A\|_{1} denotes the 1-norm of AA and (μk,x~k)(\mu_{k},\tilde{x}_{k}) is the approximate eigenpair at iteration kk with μk\mu_{k} being the Ritz value and the unit length x~k\tilde{x}_{k} being the Ritz vector or the refined Ritz vector when the standard Rayleigh–Ritz method or the refined Rayleigh–Ritz method is applied to 𝒱k,k=d+1,,m\mathcal{V}_{k},\ k=d+1,\ldots,m.

We test two symmetric matrices and one unsymmetric matrix. The first symmetric A=diag(1,12,,1n)A=\mathop{\operator@font diag}\nolimits(1,\frac{1}{2},\ldots,\frac{1}{n}) with n=10000n=10000, whose small eigenvalues are clustered. We compute the smallest λ=1n\lambda=\frac{1}{n} and the corresponding eigenvector x=enx=e_{n}, the last column of II with order nn. The second symmetric matrix is the Strakǒs matrix [13, pp. XV], which is diagonal with the eigenvalues labeled in the descending order:

(42) λi=λ1+(i1n1)(λnλ1)ρni,\lambda_{i}=\lambda_{1}+\left(\frac{i-1}{n-1}\right)(\lambda_{n}-\lambda_{1})\rho^{n-i},

i=1,2,,ni=1,2,\ldots,n and is extensively used to test the behavior of the Lanczos algorithm. The parameter ρ\rho controls the eigenvalue distribution. The large eigenvalues of AA are clustered for ρ\rho closer to one and better separated for ρ\rho smaller. We compute the largest λ1\lambda_{1} and the corresponding eigenvector x=e1x=e_{1}, the first column of II. In the experiment, we take n=10000n=10000, λ1=8\lambda_{1}=8, λn=2\lambda_{n}=-2, and ρ=0.99\rho=0.99.

The test unsymmetric matrix is cry2500 of n=2500n=2500 from the non-Hermitian Eigenvalue Problem Collection in the Matrix Market111https://math.nist.gov/MatrixMarket. We are interested in the eigenvalue with the largest real part, which is clustered with some others, and the corresponding eigenvector xx. For these three eigenvalue problems, we have tested d=5,10,15,20,25d=5,10,15,20,25 and expand 𝒱d\mathcal{V}_{d} to 𝒱k\mathcal{V}_{k} for m=100,150,200,300m=100,150,200,300 when comparing the different subspace expansion approaches. We only report the results on d=20d=20 and m=200m=200 since we have observed very similar phenomena for the other dd’s and mm’s.

For these three test problems, Figures 12 depict the decaying curves of the errors

(43) sin(𝒱k,x)=(IVkVkH)x=xVk(VkHx),k=d+1,,m,\sin\angle(\mathcal{V}_{k},x)=\|(I-V_{k}V_{k}^{H})x\|=\|x-V_{k}(V_{k}^{H}x)\|,\ k=d+1,\ldots,m,

obtained by the five subspace expansion approaches and the residual norms (42) of the approximate eigenpairs obtained by the standard Rayleigh–Ritz method and the refined Rayleigh–Ritz method with respect to corresponding subspaces, respectively.

Figure 1a, Figure 1c, and Figure 2a draw sin(𝒱k,x)\sin\angle(\mathcal{V}_{k},x)’s for k=d+1,,mk=d+1,\ldots,m. At expansion iteration k=d,d+1,,m1k=d,d+1,\ldots,m-1, “stand” denotes the standard subspace expansion approach using w=vkw=v_{k}, “RitzR” and “r-RitzR” denote the subspace expansion approaches using the Ritz vector and the refined Ritz vector from the corresponding span{Rk}{\rm span}\{R_{k}\}, respectively, “RitzV” is the subspace expansion approach [24, 25] using the Ritz vector from the corresponding 𝒱k\mathcal{V}_{k}, and “optimal” denotes the optimal subspace expansion approach using RkRkx/RkRkxR_{k}R_{k}^{\dagger}x/\|R_{k}R_{k}^{\dagger}x\|.

Figure 1b, Figure 1d, and Figure 2b depict relative residual norms (42) of approximate eigenpairs. At expansion iteration k=d,d+1,,m1k=d,d+1,\ldots,m-1, “stand”, “RitzR”, “RitzV”, and “optimal-R” perform the standard Rayleigh–Ritz method on the resulting expanded subspaces 𝒱k+1\mathcal{V}_{k+1} using the subspace expansion approaches “stand”, “RitzR”, “RitzV”, and “optimal”, but “r-RitzR” and “optimal-RR” perform the refined Rayleigh–Ritz method on the corresponding 𝒱k+1\mathcal{V}_{k+1} using the subspace expansion approaches “r-RitzR”, and “optimal”, respectively. Particularly, we remind the reader that “optimal-R” and “optimal-RR” work on the same optimally expanded subspace 𝒱k+1\mathcal{V}_{k+1} at each expansion iteration k=d,d+1,,m1k=d,d+1,\ldots,m-1.

Refer to caption

(a)

Refer to caption

(b)

Refer to caption

(c)

Refer to caption

(d)

Fig. 1: The symmetric problems.

Refer to caption

(a)

Refer to caption

(b)

Fig. 2: The unsymmetric problem.

We now make several comments on the figures. Regarding sin(𝒱k,x)\sin\angle(\mathcal{V}_{k},x) for different kinds of subspaces 𝒱k,k=d+1,,m\mathcal{V}_{k},\ k=d+1,\ldots,m, we clearly observe that the optimal subspace expansion approach “optimal” is always the best, the computable optimal subspace expansion approach “r-RitzR” is the second best, the computable optimal subspace approach “RitzR” is the third most effective, as is seen from Figure 1c and Figure 2a. They are all more effective than the Ritz subspace expansion approach “RitzV” and the standard subspace subspace approach “stand”. The latter two ones themselves win each other for the symmetric matrices, as indicated by Figure 1a and Figure 1c, but “RitzV” is not as good as “stand” for the unsymmetric matrix cry2500. More precisely, the computable optimal subspace expansion approach “RitzR” is better than “stand” and “RitzV” for the unsymmetric matrix and the symmetric Strakǒs matrix, but behaves very like “RitzV” for the symmetric matrix A=diag(1,1/2,,1/10000)A=\mathop{\operator@font diag}\nolimits(1,1/2,\ldots,1/10000). On the other hand, as the figures indicate, the three test problems are quite difficult as the desired λ\lambda is clustered with some other eigenvalues for each problem, causing sin(𝒱k,x)\sin\angle(\mathcal{V}_{k},x) to decay very slowly for the five subspace expansion approaches including the optimal one. To summarize, the effectiveness of subspace expansion approaches is, in turn, “optimal”, “r-RitzR” and “RitzR”, and they are better than “RitzV” and “stand”; as for “RitzV” and “stand”, there is essentially no winner.

In contrast, the residual norms of the approximate eigenpairs exhibit more complicated features. Note that “stand”, “RitzR”, “RitzV”, and “optimal-R” all use the standard Rayleigh–Ritz method on respective subspaces 𝒱k+1,k=d,,m1\mathcal{V}_{k+1},\ k=d,\ldots,m-1, to compute the Ritz approximations of (λ,x)(\lambda,x). As we have commented in the above and observed from the figures, sin(𝒱k,x)\sin\angle(\mathcal{V}_{k},x) decays slowly for all the subspace expansion approaches, and the residual norms of approximate eigenpairs obtained by “stand”, “RitzR”, “RitzV” and “optimal-R” decrease in a similar pattern.

However, the situation changes dramatically for “r-RitzR” and “optimal-RR”. First, “r-RitzR” computes considerably more accurate approximate eigenpairs than the afore-mentioned four methods at each iteration. Second, “r-RitzR” behaves very like “optimal-RR”, the ideally optimal one, meaning that the computable optimal “r-RitzR” almost computes best eigenpair approximations of (λ,x)(\lambda,x) at each iteration. Third, “optimal-RR” outperforms “optimal-R” substantially, and converges much faster than the latter, though they work on the same subspace at each iteration. This indicates that the refined Rayleigh–Ritz method can outperform the standard Rayleigh–Ritz method on the same subspace considerably.

In summary, our experiments on the three test problems have demonstrated that “r-RitzR” is the best among the computable subspace expansion approaches, and the refined Rayleigh–Ritz method on the resulting generated subspaces behaves very like “optimal-RR”, i.e., the theoretical optimal subspace expansion approach.

5 Conclusions

We have considered the optimal subspace expansion problem for the general eigenvalue problem, and generalized the maximization characterization of cos(𝒱w,x)\cos\angle(\mathcal{V}_{w},x), one of Ye’s two main results, to the general case. Furthermore, we have found the solution bwb_{w} to the maximization characterization problem of cos(𝒱w,x)\cos\angle(\mathcal{V}_{w},x). Based on the expression of bwb_{w}, we have analyzed the Ritz expansion approach, i.e., the RA method and argued that the Ritz vector may not be a good approximation to the optimal expansion vector woptw_{opt}.

Most importantly, we have established a few results on woptw_{opt}, (IPV)Awopt(I-P_{V})Aw_{opt}, the theoretical optimal subspace 𝒱wopt\mathcal{V}_{w_{opt}}, and cos(𝒱wopt,x)\cos\angle(\mathcal{V}_{w_{opt}},x) for AA general and given their explicit expressions. We have analyzed the results in depth and obtained computable optimal replacements of (IPV)Awopt(I-P_{V})Aw_{opt} and 𝒱wopt\mathcal{V}_{w_{opt}} within the framework of the standard, refined, harmonic and refined harmonic Rayleigh–Ritz methods. Taking the standard Rayleigh–Ritz method as an example, we have given implementation details on how to obtain the computable optimally expanded subspace 𝒱w~opt\mathcal{V}_{\widetilde{w}_{opt}} and made a cost comparison with the subspace expansion approach with some other expansion approaches such as the Lanczos or Arnoldi type expansion approach with w=vkw=v_{k}, the last column of VkV_{k}, the Ritz expansion approach, i.e., the RA method, with ww being the Ritz vector of AA from 𝒱k\mathcal{V}_{k}.

Our numerical experiments have demonstrated the advantages of the computable optimal expansion approaches over the afore-mentioned expansion approaches. Particularly, we have observed that the computable optimal subspace expansion approach using the refined Ritz vector of AA from span{Rk}{\rm span}\{R_{k}\} behaves very like the theoretical optimal expansion approach since the eigenpair approximations obtained by the former are almost as accurate as those computed by the latter. This indicates that such computable optimal expansion approach is as good as the theoretical optimal one and is thus very promising when measuring the accuracy of approximate eigenpairs in terms of residual norms.

We have also proposed a potentially more robust subspace expansion approach that expanding 𝒱\mathcal{V} to 𝒱span{R}\mathcal{V}\oplus{\rm span}\{R\} and computing approximate eigenpairs with respect to 𝒱span{R}\mathcal{V}\oplus{\rm span}\{R\} in the next iteration. For the sake of length and focus, we do not further probe this approach in the current paper and leave it as future work.

It is well known that the harmonic Rayleigh–Ritz method is more suitable to compute interior eigenvalues and their associated eigenvectors than the standard Rayleigh–Ritz method [1, 19, 21]. Furthermore, since the standard and harmonic Rayleigh–Ritz methods may have convergence problems when computing eigenvectors [7, 10], we may gain much when using the refined Rayleigh–Ritz method [5, 10] and the refined harmonic Rayleigh–Ritz method [7, 9] correspondingly. As a matter of fact, for a sufficiently accurate subspace 𝒱\mathcal{V}, the Ritz vector and the harmonic Ritz vector may be poor approximations to xx and can deviate from xx arbitrarily (cf. [19, pp. 284-5] and [7, Example 2]), but the refined Ritz vector and the refined harmonic Ritz vector are always excellent approximations to xx [19, pp. 291] and [7, Example 2], provided that the selected approximate eigenvalues converge to the desired eigenvalue λ\lambda correctly. The advantages of the refined Rayleigh–Ritz method over the standard Rayleigh–Ritz method have also been justified by the numerical experiments in the last section. Therefore, it is preferable to expand the subspace using the refined Ritz vector and the refined harmonic Ritz vector from span{Rk}{\rm span}\{R_{k}\} when exterior and interior eigenpairs of AA are required, respectively.

Based on the results in Section 3 and our numerical experiments, starting with a kk-dimensional non-Krylov subspace with the residual RkR_{k} defined by (5) and rank(Rk)2\mathop{\operator@font rank}\nolimits(R_{k})\geq 2, it is promising to develop four kinds of more practical restarted projection algorithms using computable optimal subspace expansion approaches.

As it is known, there are remarkable distinctions between the Arnoldi type expansion with w=vkw=v_{k} and the RA method [11, 12], i.e., the Ritz expansion approach [24, 25], or the refined RA method, i.e., the refined Ritz expansion approach [24] as well as between SIRA or JD type methods and inexact SIA type methods: A large perturbation in AwAw of RA type methods or in BwBw of SIRA and JD type methods is allowed [8, 9, 22], while AvkAv_{k} in Arnoldi type methods or BvkBv_{k} in SIA type methods must be computed with high accuracy until the approximate eigenpairs start to converge, and afterwards its solution accuracy can be relaxed in a manner inversely proportional to the residual norms of approximate eigenpairs [16, 17]. Keep these in mind. There will be a lot of important problems and issues that deserve considerations. For example, how large errors are allowed in QQHx/QQHxQQ^{H}x/\|QQ^{H}x\| when effectively expanding the subspace, where the columns of QQ form an orthonormal basis of span{Rk}{\rm span}\{R_{k}\}? This can be transformed into the problem of how large errors are allowed in the residual RkR_{k}. In practical computations, this problem becomes the one of how large errors are allowed in those computable optimal replacements of QQHx/QQHxQQ^{H}x/\|QQ^{H}x\| within the framework of the four kinds of projection methods, whenever perturbation errors exist in the computable optimal replacements of QQHx/QQHxQQ^{H}x/\|QQ^{H}x\|. Is it possible to introduce the computable optimal expansion approaches into JD and SIRA type methods? All these need systematic exploration and analysis, and will constitute our future work.

Acknowledgment

I thank two referees and Professor Marlis Hochbruck very much for their numerous valuable suggestions and comments, which inspired me to ponder the topic and improve the presentation substantially.

References

  • [1] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe and H. Van der Vorst, Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide, SIAM, Philadelphia, PA, 2000.
  • [2] G. H. Golub and C. F. Van Loan, Matrix Computations, The Fourth Edition, The John Hopkins University, Baltimore, 2013.
  • [3] M. Gu and S. C. Eisenstat, efficient algorithms for computing a strong rank-revealing QR factorization, SIAM J. Sci. Comput., 17 (1996), pp. 848–869.
  • [4] Z. Jia, The convergence of generalized Lanczos methods for large unsymmetric eigenproblems, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 843–862.
  • [5] Z. Jia, Refined iterative algorithms based on Arnoldi’s process for unsymmmetric eigenproblems, Linear Algebra Appl., 259 (1997), pp. 1–23.
  • [6] Z. Jia, Some theoretical comparisons of refined Ritz vectors and Ritz vectors, Science in China Ser. A Math., 47 (2004), Supp. 222–233.
  • [7] Z. Jia, The convergence of harmonic Ritz values, harmonic Ritz vectors and refined harmonic Ritz vectors, Math. Comput., 74 (2005), pp. 1441–1456.
  • [8] Z. Jia and C. Li, Inner iterations in the shift-invert residual Arnoldi method and the Jacobi–Davidson method, Sci. China Math., 57 (2014), pp. 1733–1752.
  • [9] Z. Jia and C. Li, Harmonic and refined harmonic shift-invert residual Arnoldi and Jacobi–Davidson methods for interior eigenvalue problems, J. Comput. Appl. Math., 282 (2015), pp. 83–97.
  • [10] Z. Jia and G. W. Stewart, An analysis of the Rayleigh–Ritz method for approximating eigenspaces, Math. Comput., 70 (2001), pp. 637–648.
  • [11] C. Lee, Residual Arnoldi method: theory, package and experiments, Ph. D thesis, TR-4515, Department of Computer Science, University of Maryland at College Park, 2007.
  • [12] C. Lee and G. W. Stewart, Analysis of the residual Arnoldi method, TR-4890, Department of Computer Science, University of Maryland at College Park, 2007.
  • [13] G. Meurant, The Lanczos and Conjugate Gradient Algorithms: From Theory to Finite Precision Computations, SIAM, Philadelphia, PA, 2006.
  • [14] B. N. Parlett, The Symmetric Eigenvalue Problem, SIAM, Philadelphia, PA, 1998.
  • [15] Y. Saad, Numerical Methods for Large Eigenvalue Problems, Revised Version, SIAM, Philadephia, PA, 2011.
  • [16] V. Simoncini, Variable accuracy of matrix-vector products in projection methods for eigencomputation, SIAM J. Numer. Anal., 43 (2005), pp. 1155–1174.
  • [17] V. Simoncini and D. Szyld, Theory of inexact Krylov subspace methods and applications to scientific computing, SIAM J. Sci. Comput., 25(2003), pp. 454–477.
  • [18] G. Sleijpen and H. van der Vorst, A Jacobi–Davidson iteration method for linear eigenvalue problems, SIAM J. Matrix Anal. Appl., 17 (1996), pp. 401–425. Reprinted in SIAM Review, (2000), pp. 267–293.
  • [19] G. W. Stewart, Matrix Algorithms II: Eigensystems, SIAM, Philadelphia, PA, 2001.
  • [20] G. W. Stewart and J.-G. Sun, Matrix Perturbation Theory, Academic Press Inc., Boston, 1990.
  • [21] H. Van der Vorst, Computational Methods for Large Eigenvalue Problems, Elsevier, North Hollands, 2002.
  • [22] H. Voss, A new justification of the Jacobi–Davidson method for large eigenproblems, Linear Algebra Appl., 424 (2007), pp. 448–455.
  • [23] G. Wu, The convergence of harmonic Ritz vectors and harmonic Ritz values–revisited, SIAM. J. Matrix. Anal. Appl., 38 (2017), pp. 118–133.
  • [24] G. Wu and L. Zhang, On expansion of search subspaces for large non-Hermitian eigenproblems, Linear Algebra Appl., 454 (2014), pp. 107–129.
  • [25] Q. Ye, Optimal expansion of subspaces for eigenvector approximations, Linear Algebra Appl., 428 (2008), pp. 911–918.