Theoretical and Computable Optimal Subspace Expansions for Matrix Eigenvalue Problems††thanks: Supported by the National Natural Science Foundation of China (Nos. 11771249 and 12171273).
Abstract
Consider the optimal subspace expansion problem for the matrix eigenvalue problem : Which vector in the current subspace , after multiplied by , provides an optimal subspace expansion for approximating a desired eigenvector in the sense that has the smallest angle with the expanded subspace , i.e., ? This problem is important as many iterative methods construct nested subspaces that successively expand to . An expression of by Ye (Linear Algebra Appl., 428 (2008), pp. 911–918) for general, but it could not be exploited to construct a computable (nearly) optimally expanded subspace. He turns to deriving a maximization characterization of for a given when is Hermitian. We generalize Ye’s maximization characterization to the general case and find its maximizer. Our main contributions consist of explicit expressions of , and the optimally expanded subspace for general, where is the orthogonal projector onto . These results are fully exploited to obtain computable optimally expanded subspaces within the framework of the standard, harmonic, refined, and refined harmonic Rayleigh–Ritz methods. We show how to efficiently implement the proposed subspace expansion approaches. Numerical experiments demonstrate the effectiveness of our computable optimal expansions.
keywords:
Eigenvalue problem, non-Krylov subspace, optimal expansion vector, optimal subspace expansion, Ritz vector, refined Ritz vector, harmonic Ritz vector, refined harmonic Ritz vector, computable optimally expanded subspaceAMS:
65F15, 15A18, 65F10.1 Introduction
Consider the large scale matrix eigenproblem with and , where is the 2-norm of a vector or matrix. We are interested in an exterior eigenvalue and the associated eigenvector . A number of numerical methods have been available for solving this kind of problem [1, 14, 15, 21, 19]. Of them, Lanczos or Arnoldi type methods [1, 15, 19] are typical, and they are projection methods on the sequence of nested -dimensional Krylov subspaces
and compute approximations to and . For Hermitian, Arnoldi type methods reduce to Lanczos type methods [1, 14]. Suppose that is column orthonormal and is generated by the Lanczos process in the Hermitian case or the Arnoldi process in the non-Hermitian case, and take the expansion vector . Then the next basis vector is obtained by orthonormalizing against , the projection matrix of onto is Hermitian tridiagonal for Hermitian or upper Hessenberg for non-Hermitian, and the columns of form an orthonormal basis of the -dimensional Krylov subspace , where the superscript denotes the conjugate transpose of a vector or matrix. If the expansion vector in is replaced by any that is not deficient in , we deduce from Proposition 1 of Ye [25] that the same and are generated.
However, if we start with a non-Krylov subspace , then the expanded subspace critically depends on the choice of expansion vector . It is well known that the success of a general projection method requires that an underlying subspace contain sufficiently accurate approximations to but a sufficiently accurate subspace cannot guarantee the convergence of the Ritz and harmonic Ritz vectors obtained by the standard and harmonic Rayleigh–Ritz methods with respect to the subspace [4, 10]. To fix this deficiency, the refined and refined harmonic Rayleigh–Ritz methods have been proposed, which are shown to ensure the convergence of refined and refined harmonic Ritz vectors computed by the refined and refined harmonic Rayleigh–Ritz methods when the subspace is good enough [5, 7, 10, 19, 23].
The accuracy of a projection subspace for eigenvector approximation can be measured by its angle with the desired unit length eigenvector [14, 15, 19]:
(1) |
where denotes the acute angle between and the nonzero vector . For a general -dimensional non-Krylov subspace with the subscript dropped, in this paper we will consider the following optimal subspace expansion problem: Suppose , for any nonzero , multiply it by , and define the -dimensional expanded subspace
(2) |
Then which vector is an optimal expansion vector that, up to scaling, solves
(3) |
This problem was first considered by Ye [25] and later paid attention by Wu and Zhang [24]. The choice of is essential to subspace expansion, and different may affect the quality of substantially. At expansion iteration , the Lanczos or Arnoldi type expansion takes , the last column of . The Ritz expansion method [24, 25], a variant of the Arnoldi method, uses the approximating Ritz vector as to expand the subspace. It is mathematically equivalent to the Residual Arnoldi (RA) method proposed in [11, 12].
The Shift-Invert Residual Arnoldi (SIRA) method is an alternative of the RA method for computing the eigenvalue closest to a given target . More variants have been developed in [8, 9], and they fall into the categories of the harmonic Rayleigh–Ritz, refined Rayleigh–Ritz, and refined harmonic Rayleigh–Ritz methods [1, 14, 15, 19, 21]. The SIRA type methods are mathematically equivalent to the counterparts of the Jacobi–Davidson (JD) type methods; see Theorem 2.1 of [8]. Just as the shift-invert Arnoldi (SIA) type methods applied to , the SIRA type and JD type methods [18] both use to construct nested subspaces but, unlike the SIA type methods, they project the original rather than onto the subspaces when finding approximations of [8]. At iteration , these methods and the SIA type methods are mathematically common in solving a linear system
(4) |
i.e., computing , where in the SIA type methods and is mathematically equal to the current approximate eigenvector in SIRA and JD type methods; see [8, Theorem 2.1 and its proof] for details.
Ye [25] finds an explicit expression of for a general , as stated below.
Ye’s result [25, Theorem 1]: Let the columns of form an orthonormal basis of , define the residual
(5) |
and assume that the rank . Then
(6) |
for any , the null space of , where the superscript denotes the Moore-Penrose generalized inverse of a matrix.
Unfortunately, Ye shows that (6) cannot be exploited to obtain a computable (nearly) optimal replacement of as it involves the a priori : (i) makes no contribution to the expansion of as ; (ii) since , we must have when replacing by any and taking , leading to . To this end, he gives up (6) and instead turns to deriving a maximization characterization of for a given under the assumption that is Hermitian; see Theorem 2 of [25] and the first result of Theorem 2.1 to be presented in this paper. But he does not find a solution to the maximization characterization problem for a given . Notice that one cannot obtain from the maximization characterization. Without any other information, e.g., , Ye has made an approximate analysis on the maximization characterization of and argued that the Ritz vector from might be a good approximate maximizer of (3) and thus might be a nearly optimal expansion vector. Unfortunately, as we shall see, Ye’s analysis and arguments have evident defects; see a detailed explanation after Remark 3 in this paper. We stress that Ye’s proof of the maximization characterization of holds only for Hermitian.
Wu and Zhang [24] do not consider the optimal expansion problem (3) itself. Instead, they focus on showing that the refined Ritz vector from can be a better expansion vector than the Ritz vector from for general. Based on the analysis, they have developed a refined variant of the RA method for a general , which has been numerically confirmed to be more efficient than the RA method.
In this paper, we shall revisit problem (3) for general. Our theoretical contributions consists of two parts. The first part includes the generalization of Ye’s major result in [25], i.e., Theorem 2, to the non-Hermitian case. We first prove that for a given can be formulated as a maximization characterization problem, which extends Theorem 2 of [25] to the general case. Then we find its maximizer. This result is new even for Hermitian and can be exploited to explain the defect of Ye’s approximate analysis. These results are secondary, and our major theoretical contributions are in the second part. We establish informative expressions on , and the optimally expanded , where is the orthogonal projector onto . The results show that (i) generally where is the orthogonal projection of onto and is the best approximation to from , (ii) , which is the orthogonal projection of onto the subspace and whose normalization is the best approximation to from , (iii) with the orthogonal direct sum, and (iv) the orthogonal projection of onto is .
For the purpose of practical computations, we make a clear and rigorous analysis on the theoretical results and obtain a number of computable optimally expanded subspaces , which depend on chosen projection methods. As has already seen from (6) and the comments followed, it is hard to interpret , let alone a computable optimal replacement of . Fortunately, we are able to take a completely new approach to consider a computable optimal subspace expansion. Our key observation is: when expanding to a computable optimal subspace, it follows from the fundamental identity
(7) |
that seeking a computable optimal replacement of is unnecessary. This identity provides us a new perspective and motivates us to find out a computable optimal replacement of as a whole rather than itself by a certain computable optimal one. As it will be clear, such an optimal replacement can be defined precisely and is deterministic within the framework of each of the standard, harmonic, refined, and refined harmonic Rayleigh–Ritz methods, respectively. As it turns out, computable optimal replacements are the Ritz vector, refined Ritz vector, harmonic Ritz vector and refined harmonic Ritz vector of from the subspace rather than for these four kinds of projection methods. Taking the standard Rayleigh–Ritz method as an example, we will describe how to expand to the computable optimal subspace . We will also show that, based on our results, there is some other novel optimal expansion that is independent of the desired in practical computations and is promising and more robust.
The paper is organized as follows. In Section 2, we consider the solution of the optimal subspace expansion problem (3) for a general , present our theoretical results, and make an analysis on them. In Section 3, we show how to obtain computable optimal replacements of and construct optimally expanded subspaces . We also present some other robust optimal expansion approach. In Section 4, we report numerical experiments to demonstrate the effectiveness of theoretical and computable optimal subspace expansion approaches, and compare them with the Lanczos or Arnoldi type expansion with and the the RA method, i.e., the Ritz expansion [24, 25] with being the Ritz vector from . In Section 5, we conclude the paper with some problems and issues that deserve future considerations.
Throughout the paper, denote by the identity matrix with the order clear from the context, by the complex space of dimension , and by or the set of complex or real matrices.
2 The optimal , and
For a general , we first establish two results on for a given . The first characterizes it as a maximization problem and generalizes Ye’s Theorem 2 in [25] and the second gives a maximizer of this problem. We remind the reader that these results are secondary. After them, we will present our major results.
Theorem 2.1.
For with and , assume that and , define with , and let the columns of be an orthonormal basis of the orthogonal complement of with respect to . Then
(8) |
where
(9) |
From the definition of , we have
By assumptions, we have . Notice that any nonzero but can be uniquely written as
where and is a nonzero scalar. Since and , we obtain
(10) | |||||
which establishes the maximization characterization in (8).
We now seek a maximizer of the maximization problem in (8). Since , by the definition of , there exists a vector such that the unit length eigenvector with . As a result, we obtain , , and
Since , it follows from the above that
(11) |
Therefore, from (10), (11), the orthonormality of and , writing a nonzero , we obtain
since is the orthogonal projection of onto the subspace . As a result, solves , and is the first relation defined in (9), which proves the second relation in the right-hand side of (8).
We next prove that has full column rank. This amounts to showing that the cross-product matrix is positive definite by noting that . To this end, it suffices to prove that the solution of the homogenous linear system is zero. Since , the system becomes
which yields
(12) |
Since
taking norms in the two sides of (12) gives
which holds only if or . The latter means that , i.e, , a contradiction to our assumption. Hence we must have , and has full column rank. Exploiting , we obtain
which proves the second relation in (9).
Remark 1.
This theorem holds for a general . For Hermitian, we have , but for non-Hermitian or, more rigorously, non-normal, we have . In the Hermitian case, Theorem 2 in [25] is the first relation in the right-hand side of (8). The second relation in the right-hand side of (8) is new even for Hermitian, and gives an explicit expression of the maximizer of the maximization characterization problem in (8).
Remark 2.
Remark 3.
By approximately maximizing the first relation in the right-hand side of (8) and taking in it as the Ritz vector of from that is used to approximate the desired , Ye [25] makes an approximate analysis on , and argues that, without any other information, may in practice be a good approximate solution of . He thus suggests as an approximation to the solution of (3). However, Ye’s proof of the first relation in the right-hand side of (8) and his approximate analysis is unapplicable to a non-Hermitian .
As a matter of fact, Ye’s analysis in the Hermitian case does not seem theoretically sound and has some arbitrariness, causing that his claim may be problematic, as will be shown below.
Since
for any , Ye attempts to seek a good approximation to the maximizer of . To this end, in the first relation of the right-hand side in (8), Ye takes to be the Ritz vector of from that is used to approximate the desired . Then (8) shows that
where is defined by (9). Notice that is a function of . However, since is a constant vector and independent of , there is no reason that is close to unless for some specific and , as we argue below.
Notice that and . Applying the Sherman–Morrison formula to and making use of , by some manipulation we can justify that
(14) |
where
with . Observe that and do not depend on the size of and are generally not small. Suppose that is normalized. In this case, and are generally not small. Recall that is the best approximation to from and is the computable approximation to obtained by the standard Rayleigh–Ritz method. Therefore, it is seen from (14) that the constant vector is generally not a good approximation to in direction for all unless , which can occur only when , that is, is already sufficiently small. These mean that the functions
generally have no similarity and their difference is not close to zero unless . As a result, the maximizer of generally bears no relation to the maximizer of
The above analysis indicates that in the first relation of (8) is generally not a good approximation to and the resulting is generally not a good replacement of . In the Hermitian case, by using a few approximate and heuristic arguments, Ye [25] argues that may be a good approximate maximizer of and then uses as a replacement of that solves (3).
In what follows we give up any possible further analysis on Theorem 2.1 and instead consider (3) from new perspectives. We will establish a number of important and insightful results. We derive a new expression of , which is essentially the same as but formally different from (6) with . The new form of will play a critical role in establishing explicit expressions of the a priori , the optimally expanded subspace and some other important quantities.
Note that is the orthogonal projector onto . Assume that , and . Lemma 1 of [25] states that
(15) |
where with defined by (32) and . We should remind the reader that Lemma 1 of [25] uses the form , but we use the different form here by writing in the form of . (7) motivates us this change in form and enables us to establish more informative theoretical results on the optimal subspace expansion problem (3). Relation (15) indicates that
(16) |
Let the matrix be unitary. Then and . Define the vector
(17) |
which is the orthogonal projection of onto the orthogonal complement of with respect to and is nonzero for . Then the matrix pair
(18) |
restricted to the orthogonal complement of is Hermitian definite, that is, the range restricted is Hermitian positive definite. Note that
the column space of . We write the range restricted matrix pair (18) as
(19) |
Theorem 2.2.
Assume that , and , and let be defined by (5). Then the optimal expansion vector, up to scaling, is
(20) |
(21) | |||||
(22) |
and the columns of form an orthonormal basis of . Furthermore,
(23) | |||||
(24) |
where the subspace
(25) |
Write . Then by assumption, is not an invariant subspace of , so that . From (17) and , we obtain
(26) | |||||
which is the Rayleigh quotient of the range restricted Hermitian definite pair (19) with respect to the nonzero vector . Keep in mind that satisfies (16). Therefore, by the min-max characterization of the eigenvalues of a Hermitian definite pair (cf. e.g., [20, pp. 281-2]), the solution to the maximization problem
(27) |
is an eigenvector of the range restricted definite pair (19) associated with its largest eigenvalue , and the optimal expansion vector up to scaling.
Notice that is a rank-one Hermitian positive semi-definite matrix, and denote
(28) |
Therefore, the range restricted Hermitian definite pair (19) has exactly one positive eigenvalue, and the other eigenvalues are zeros. By (28), the eigenvalues of the matrix pair (19) are identical to those of the rank-one Hermitian matrix
(29) |
and the eigenvectors of the pair (19) are related to the eigenvectors of the above matrix by
(30) |
It is known that the unique positive eigenvalue of the matrix in (29) is
(31) |
and its associated (unnormalized) eigenvector is
Substituting this into (30) and gives the first relation in the right-hand side of (20). Exploiting and , by the definition of Moore–Penrose generalized inverse, we obtain the second relation in the right-hand side of (20).
By (5), the residual
(32) |
which shows that . As a result, we have
(33) |
which, together with the second relation in (20), proves
(34) |
i.e., the third relation in (20). Therefore, from (32) and (34), we obtain
which establishes (21).
Relations (21) and (22) show that the unit length and the columns of form an orthonormal basis of the optimally expanded subspace and the orthogonal projector onto is
(35) |
Notice that itself is the orthogonal projector onto . By right-multiplying the orthogonal projector in (35) with and exploiting , it is easily justified that the orthogonal projection of onto is , which proves
i.e., (23) holds.
Note that and the orthogonal projector onto is . Therefore, is the orthogonal projection of onto defined by (25), and from (23) we obtain (24).
Remark 4.
Since , (23) and (24) show that the best approximation to from is identical to that from the higher dimensional containing . Remarkably, is independent of but provides the same best approximation to as does, and all the other optimally expanded subspaces for eigenvectors of other than are also contained in since . Therefore, unlike the a priori that favors only, favor all the eigenvectors of equally, and it is possible to use a suitable projection method to simultaneously compute approximations to and other eigenvectors of with respect to rather than . We will come back to this point in the end of next section.
Remark 5.
It is seen from (34) that the components of are the coefficients of in the basis , while those of are the coefficients of in the basis . Notice from (32) that . Then as . Relations and show that and are different, so that and are not colinear generally. As a result, taking and expanding by are not theoretically optimal in general. In other words, although is the best approximation to from , relation (34) shows that generally up to scaling.
Finally, let us look at the particular case . In this case, we can write as with and . Therefore,
(36) |
is a rank-one matrix. Let . Exploiting (36), , , and , by some elementary manipulation, we can prove that
for any satisfying . Therefore, for any satisfying , and there are infinitely many ’s. However, is unique up to scaling for all satisfying , and is unique since
is unique and it, together with the columns of , forms an orthonormal basis of .
3 Computable optimal replacements of and computable optimally expanded subspaces
The results in Theorem 2.2 are a priori and are not directly applicable to a practical expansion of as they involve the desired eigenvector . Therefore, for a non-Krylov subspace with , one cannot use in (6), i.e., the third result of (20), or a good approximation to it from to expand in computations, as we have elaborated in the introduction. However, just as (7) indicates, as far as the subspace expansion is concerned, it is rather than itself that is used to expand . Therefore, instead of itself, the key is to consider as a whole when expanding . This new perspective forms our basis of this section.
Theoretically, for a given subspace , the unit length vector is the theoretical best approximation to from . From the computational viewpoint, because of , projection methods are only viable choices that seek computable best approximations to from using their own extraction approaches. It has been commonly known from, e.g., [1, 14, 15, 19, 21], that the standard Rayleigh–Ritz, harmonic Rayleigh–Ritz, refined Rayleigh–Ritz, and refined harmonic Rayleigh–Ritz methods compute the Ritz, harmonic Ritz, refined Ritz, and refined harmonic Ritz vectors of from a given subspace and use them to approximate the desired . For the given , these approximate eigenvectors are computationally the best approximations to that these four kinds of projection methods can obtain, and one cannot do better once a projection method is chosen. In other words, these approximations are computable optimal replacements of the a priori best approximation to from within the framework of the four kinds of projection methods, respectively. This naturally motivates us to introduce the following definition.
Definition 1.
For a chosen projection method, the approximation to extracted by it from is called the computable optimal replacement of .
For a chosen projection method, in terms of this definition, for a chosen projection method, we can fully exploit the theoretical results in Section 2 to expand in its own computable optimal way. Notice that is the orthogonal projector onto the subspace and is the theoretical best approximation to from , and write . Then the columns of form an orthonormal basis of in (22). Keep these results in mind. Then regarding the optimal expansion , a chosen projection method obtains its computable best approximation to from , and we take it as the computable optimal replacement of to from or, equivalently, in direction. We then use such a replacement, to construct the corresponding computable optimally expanded subspace.
Specifically, by Definition 1, for the standard Rayleigh–Ritz method, the harmonic Rayleigh–Ritz method, the refined Rayleigh–Ritz method and the refined harmonic Rayleigh–Ritz methods, the Ritz vector, the harmonic Ritz vector, the refined Ritz vector, and the refined harmonic Ritz vector of from obtained by them are the corresponding computable optimal replacements of the theoretical best approximation to from .
In what follows we show how to efficiently and reliably achieve this goal in computations. Before proceeding, let us investigate (21) and get more insight into the optimal subspace expansion. We present the following result.
Theorem 3.1.
Assume that is unitary. Then it holds that
(37) |
If , then and the subspace expansion terminates.
Since , and is unitary, we have , which proves (37) by noticing that and are the orthogonal projectors onto and , respectively.
For , since , and , we must have when . On the other hand, if is sufficiently large such that , then must be rank deficient as unconditionally. Therefore, the theoretical optimal expansion is part of . Notice that and , Therefore, if , then already and the subspace expansion terminates.
Observe that, for all the afore-mentioned four kinds of projection methods, the computable optimal replacements of are orthogonal to because they lie in and . Write the unit length approximate eigenvector obtained by any chosen projection method applied to as , and take . Then the columns of forms an orthonormal basis of the computable optimally expanded subspace, denoted by hereafter, where is a corresponding replacement of the optimal expansion vector and is not required in computations.
Mathematically, we can derive an explicit expression of in terms of , as the following result shows.
Theorem 3.2.
It holds that
(38) |
Since , we have , which leads to the equation
(39) |
Compared with the expressions (20) and (34) of , we have replaced and by and in the expressions of , respectively. We should remind the reader that the solution to (39) may not be unique and in (38) is the minimum 2-norm one.
Next we focus on some computational details on the computable optimal subspace expansion approaches that correspond to the four kinds of projection methods.
Suppose that is available. One goes to the next iteration, and performs any one of the four projection methods. They all need to form the matrix
(40) |
explicitly. This can be efficiently done in an updated way. For brevity, hereafter suppose that all the quantities, such as , and , are all real when counting the computational cost. For , the matrices and are already available at iteration . To form the matrix in (40), one needs to compute one matrix-vector , and vector inner products , and . The total cost is one matrix-vector product and flops.
The situation changes for other expansion vectors , which include those used in the Lanczos or Arnoldi type expansion approach, the Ritz expansion approach, i.e., the mathematically equivalent RA method, and the refined Ritz expansion approach. When expanding to , we need to compute first, which costs one matrix-vector product, and then to orthogonalize against to obtain the unit length vector , which costs approximately flops when the (modified) Gram–Schmidt orthogonalization procedure is used. In finite precision arithmetic, some sort of reorthogonalization strategy may be required so as to ensure the numerical orthogonality between and . This will increase the orthogonalization cost up to maximum flops when the complete reorthogonalization is used. After is obtained, one forms the projection matrix similar to that in (40), where is replaced by . In contrast, once is available, the afore-described computable optimal subspace expansion approach does not perform such a Gram–Schmidt orthogonalization process and thus saves one matrix-vector product and flops.
We now consider how to obtain a computable optimal replacement in detail and count its computational cost. First, we need to form the residual , which can be recursively updated efficiently, as shown below. Notice that and are already available, and write and at iteration . Then
where is the residual at iteration . Since and the matrices in parentheses are already available when forming , the above updates of and approximately cost flops and , respectively, so that the total cost is approximately flops.
With available, we need to construct an orthonormal basis matrix of , so that . One can compute in a number of ways. For instance, the QR factorization with column pivoting, which costs flops with , approximately ranging from to flops for the smallest and biggest [2, pp. 302], and the rank-revealing QR factorization [3], which costs nearly the same as the QR factorization with column pivoting for . The most reliable but relatively expensive approach is the Chan R-SVD algorithm, which approximately costs flops [2, pp. 493]. Once is computed, we obtain
(41) |
We point out that is possible as may be (numerically) rank deficient. Suppose that is not small, that is, the columns of do not span a good approximate invariant subspace of . Then must be numerically rank deficient as one of the Ritz pairs with respect to converges to some eigenpair, e.g., , as shown below: Let with be the Ritz vector approximating and be the corresponding Ritz value, that is, is an eigenpair of the projection matrix . Then its associated residual norm is . Whenever with sufficiently small, must be numerically rank deficient since its smallest singular value . Moreover, if some Ritz pairs converge, i.e., for corresponding ’s, then the numerical rank of equals since these ’s are linearly independent.
Indeed, we have observed in numerical experiments that the numerical rank of satisfies and becomes much smaller than as increases, in the sense that has exactly tiny singular value(s) no more than the level of with being the machine precision.
With at hand, for each of the four projection methods, we can compute the corresponding eigenvector approximation to from , which is the corresponding computable optimal replacement of . Taking the standard Rayleigh–Ritz method as an example, we need to form , whose eigenvalues are the Ritz values of with respect to . Suppose that is real, and notice that . Then we need matrix-vector products and flops to compute . We compute the eigendecomposition of using the QR algorithm, which costs approximately flops for general and flops for real symmetric. The desired Ritz vector with being the unit length eigenvector of associated with its eigenvalue approximating the desired eigenvalue .
In summary, suppose that the QR factorization with column pivoting is used to compute . When computing and using it to construct an orthonormal basis matrix of , at iteration , we need matrix-vectors plus approximate flops for and flops for . Remarkably, however, it is worthwhile to point out that the matrix-vector products is the matrix-matrix product , which, together with , should be computed by using much more efficient level-3 BLAS operations.
For those subspace expansion approaches such as the Lanczos or Arnoldi type expansion approach, and the Ritz expansion approach and the refined Ritz expansion approach that are mathematically equivalent to the RA method and the refined RA method, when using the corresponding to form and orthonormalize it against to expand to , we need one matrix-vector product and flops. Therefore, the main cost of the -step subspace expansion approach is matrix-vector products and flops. However, successively updating and consists of level-2 BLAS and level-1 BLAS operations. In contrast, since the matrix-matrix products and are computed using level-3 BLAS operations, the computable optimal subspace expansion approach is much more efficient than the -step usual subspace expansion one even if .
Finally, let us return to (23), (24), and Remark 4 in Section 2. Relations (23) and (24) have proved that the best approximations to from and are the same and equal to due to (41). Previously, we have proposed the expansion approach that obtains computable optimal replacements of from and then expands to , from which we compute new approximate eigenpairs in the next iteration. In the proposed approach, we first compute , and then form and . Nevertheless, although the orthogonal projections of onto and are identical, the computable optimal approximations to from and are not the same. Furthermore, the approximate eigenpairs obtained by the four kinds of projection methods applied to are generally more accurate than the corresponding counterparts applied to since . The same conclusions also hold when is replaced by since .
Regarding the cost, recall that the proposed computable optimal expansion approaches have computed , which, together with , form an orthonormal basis of . Therefore, we have already expanded to . Naturally, this motivates us to compute possibly better eigenpair approximations of with respect to the higher dimensional . This results in a new subspace expansion approach. In the next iteration, we need to form the projection matrix . Since and are already available in the previously proposed computable optimal expansion approaches, the extra cost is the computation of and , which can be done using efficient level-3 BLAS operations.
More importantly, we can get more insight into the computable optimal subspace expansion approaches. They construct approximations ’s to the optimal , but, in subsequent expansion iterations, they expand these approximations themselves rather than the optimal . In other words, only in the first expansion iteration, both theoretical and computable optimal subspace expansion approaches start with the same subspace and expand it. After that, at each subsequent expansion iteration, they expand different subspaces; the theoretical optimal expansion approach always expand the optimal subspace at that iteration, but the computable optimal expansion approaches expand the subspaces that are already not optimally expanded ones, so that the subsequent expanded subspaces may constantly deviate from the theoretical optimally ones further and further. We have indeed observed these considerable phenomena in experiments.
In contrast, expanding to does not involve and any other eigenvectors of , and always contains the theoretically optimal . Therefore, is optimal both in theory and computations. Moreover, as we have pointed out in Remark 4 of Section 2, unlike the a priori and its computable optimal approximations, the subspace can be used to compute any other eigenpair(s) of because itself contains the theoretical optimally expanded subspaces ’s for all the eigenvectors of at each expansion iteration. As a result, such an expansion approach, though more expensive than our previously proposed optimal expansion approaches, is more robust, and may simultaneously compute more than one eigenpairs of and obtain better approximations to them. We do not pursue this expansion approach in the paper and leave it as future work.
4 Numerical experiments
Recall that, for the standard Rayleigh–Ritz and the refined Rayleigh–Ritz methods, the Ritz vector and the refined Ritz vector from are the computable optimal replacements of at expansion iteration , respectively, where . All the experiments have been performed on an Intel(R) Core(TM) i7-9700 CPU 3.00GHz with 16 GB RAM using the Matlab R2018b with the machine precision under the Miscrosoft Windows 10 64-bit system. We have used the Matlab function orth to compute an orthonormal basis of . It is known from, e.g., [6, 10, 19, 21], that the refined Ritz vector is generally more accurate and can be much more accurate than the Ritz vector. We have used the two computable optimal subspace expansion approaches resulting from the Ritz vector and the refined Ritz vector from . We shall report numerical experiments to demonstrate the effectiveness of the theoretical optimal expansion approach and the above two computable optimal expansion approaches. In the meantime, we will compare them with the standard expansion approach, i.e., the Lanczos or Arnoldi type expansion approach with , the last column of , and the Ritz expansion approach [25, 24], i.e., the RA method, with being the Ritz vector from at expansion iteration , respectively.
We reiterate that the theoretical optimal subspace expansion approach cannot be used in computation since it involves the a priori . Here for a purely comparison purpose, we use it as the reference when evaluating the effectiveness of the other subspace expansion approaches and showing how well the computable optimal subspace expansion approaches work. To this end, we exploit the Matlab function eig to compute the desired eigenpair of a general whenever it is not given in advance, which is supposed to be “exact”. With the unit length available, we are able to compute the error , the distance between and a given .
For a given , we generate vectors in a normal distribution, orthonormalize them to obtain the orthonormal vectors , and construct the initial -dimensional subspace
We then successively expand it to an -dimensional using a given expansion approach. For , is non-Krylov, so are the expanded subspaces , . All the residual norms of approximate eigenpairs mean the relative residual norms
where denotes the 1-norm of and is the approximate eigenpair at iteration with being the Ritz value and the unit length being the Ritz vector or the refined Ritz vector when the standard Rayleigh–Ritz method or the refined Rayleigh–Ritz method is applied to .
We test two symmetric matrices and one unsymmetric matrix. The first symmetric with , whose small eigenvalues are clustered. We compute the smallest and the corresponding eigenvector , the last column of with order . The second symmetric matrix is the Strakǒs matrix [13, pp. XV], which is diagonal with the eigenvalues labeled in the descending order:
(42) |
and is extensively used to test the behavior of the Lanczos algorithm. The parameter controls the eigenvalue distribution. The large eigenvalues of are clustered for closer to one and better separated for smaller. We compute the largest and the corresponding eigenvector , the first column of . In the experiment, we take , , , and .
The test unsymmetric matrix is cry2500 of from the non-Hermitian Eigenvalue Problem Collection in the Matrix Market111https://math.nist.gov/MatrixMarket. We are interested in the eigenvalue with the largest real part, which is clustered with some others, and the corresponding eigenvector . For these three eigenvalue problems, we have tested and expand to for when comparing the different subspace expansion approaches. We only report the results on and since we have observed very similar phenomena for the other ’s and ’s.
For these three test problems, Figures 1–2 depict the decaying curves of the errors
(43) |
obtained by the five subspace expansion approaches and the residual norms (42) of the approximate eigenpairs obtained by the standard Rayleigh–Ritz method and the refined Rayleigh–Ritz method with respect to corresponding subspaces, respectively.
Figure 1a, Figure 1c, and Figure 2a draw ’s for . At expansion iteration , “stand” denotes the standard subspace expansion approach using , “RitzR” and “r-RitzR” denote the subspace expansion approaches using the Ritz vector and the refined Ritz vector from the corresponding , respectively, “RitzV” is the subspace expansion approach [24, 25] using the Ritz vector from the corresponding , and “optimal” denotes the optimal subspace expansion approach using .
Figure 1b, Figure 1d, and Figure 2b depict relative residual norms (42) of approximate eigenpairs. At expansion iteration , “stand”, “RitzR”, “RitzV”, and “optimal-R” perform the standard Rayleigh–Ritz method on the resulting expanded subspaces using the subspace expansion approaches “stand”, “RitzR”, “RitzV”, and “optimal”, but “r-RitzR” and “optimal-RR” perform the refined Rayleigh–Ritz method on the corresponding using the subspace expansion approaches “r-RitzR”, and “optimal”, respectively. Particularly, we remind the reader that “optimal-R” and “optimal-RR” work on the same optimally expanded subspace at each expansion iteration .
(a)
(b)
(c)
(d)
(a)
(b)
We now make several comments on the figures. Regarding for different kinds of subspaces , we clearly observe that the optimal subspace expansion approach “optimal” is always the best, the computable optimal subspace expansion approach “r-RitzR” is the second best, the computable optimal subspace approach “RitzR” is the third most effective, as is seen from Figure 1c and Figure 2a. They are all more effective than the Ritz subspace expansion approach “RitzV” and the standard subspace subspace approach “stand”. The latter two ones themselves win each other for the symmetric matrices, as indicated by Figure 1a and Figure 1c, but “RitzV” is not as good as “stand” for the unsymmetric matrix cry2500. More precisely, the computable optimal subspace expansion approach “RitzR” is better than “stand” and “RitzV” for the unsymmetric matrix and the symmetric Strakǒs matrix, but behaves very like “RitzV” for the symmetric matrix . On the other hand, as the figures indicate, the three test problems are quite difficult as the desired is clustered with some other eigenvalues for each problem, causing to decay very slowly for the five subspace expansion approaches including the optimal one. To summarize, the effectiveness of subspace expansion approaches is, in turn, “optimal”, “r-RitzR” and “RitzR”, and they are better than “RitzV” and “stand”; as for “RitzV” and “stand”, there is essentially no winner.
In contrast, the residual norms of the approximate eigenpairs exhibit more complicated features. Note that “stand”, “RitzR”, “RitzV”, and “optimal-R” all use the standard Rayleigh–Ritz method on respective subspaces , to compute the Ritz approximations of . As we have commented in the above and observed from the figures, decays slowly for all the subspace expansion approaches, and the residual norms of approximate eigenpairs obtained by “stand”, “RitzR”, “RitzV” and “optimal-R” decrease in a similar pattern.
However, the situation changes dramatically for “r-RitzR” and “optimal-RR”. First, “r-RitzR” computes considerably more accurate approximate eigenpairs than the afore-mentioned four methods at each iteration. Second, “r-RitzR” behaves very like “optimal-RR”, the ideally optimal one, meaning that the computable optimal “r-RitzR” almost computes best eigenpair approximations of at each iteration. Third, “optimal-RR” outperforms “optimal-R” substantially, and converges much faster than the latter, though they work on the same subspace at each iteration. This indicates that the refined Rayleigh–Ritz method can outperform the standard Rayleigh–Ritz method on the same subspace considerably.
In summary, our experiments on the three test problems have demonstrated that “r-RitzR” is the best among the computable subspace expansion approaches, and the refined Rayleigh–Ritz method on the resulting generated subspaces behaves very like “optimal-RR”, i.e., the theoretical optimal subspace expansion approach.
5 Conclusions
We have considered the optimal subspace expansion problem for the general eigenvalue problem, and generalized the maximization characterization of , one of Ye’s two main results, to the general case. Furthermore, we have found the solution to the maximization characterization problem of . Based on the expression of , we have analyzed the Ritz expansion approach, i.e., the RA method and argued that the Ritz vector may not be a good approximation to the optimal expansion vector .
Most importantly, we have established a few results on , , the theoretical optimal subspace , and for general and given their explicit expressions. We have analyzed the results in depth and obtained computable optimal replacements of and within the framework of the standard, refined, harmonic and refined harmonic Rayleigh–Ritz methods. Taking the standard Rayleigh–Ritz method as an example, we have given implementation details on how to obtain the computable optimally expanded subspace and made a cost comparison with the subspace expansion approach with some other expansion approaches such as the Lanczos or Arnoldi type expansion approach with , the last column of , the Ritz expansion approach, i.e., the RA method, with being the Ritz vector of from .
Our numerical experiments have demonstrated the advantages of the computable optimal expansion approaches over the afore-mentioned expansion approaches. Particularly, we have observed that the computable optimal subspace expansion approach using the refined Ritz vector of from behaves very like the theoretical optimal expansion approach since the eigenpair approximations obtained by the former are almost as accurate as those computed by the latter. This indicates that such computable optimal expansion approach is as good as the theoretical optimal one and is thus very promising when measuring the accuracy of approximate eigenpairs in terms of residual norms.
We have also proposed a potentially more robust subspace expansion approach that expanding to and computing approximate eigenpairs with respect to in the next iteration. For the sake of length and focus, we do not further probe this approach in the current paper and leave it as future work.
It is well known that the harmonic Rayleigh–Ritz method is more suitable to compute interior eigenvalues and their associated eigenvectors than the standard Rayleigh–Ritz method [1, 19, 21]. Furthermore, since the standard and harmonic Rayleigh–Ritz methods may have convergence problems when computing eigenvectors [7, 10], we may gain much when using the refined Rayleigh–Ritz method [5, 10] and the refined harmonic Rayleigh–Ritz method [7, 9] correspondingly. As a matter of fact, for a sufficiently accurate subspace , the Ritz vector and the harmonic Ritz vector may be poor approximations to and can deviate from arbitrarily (cf. [19, pp. 284-5] and [7, Example 2]), but the refined Ritz vector and the refined harmonic Ritz vector are always excellent approximations to [19, pp. 291] and [7, Example 2], provided that the selected approximate eigenvalues converge to the desired eigenvalue correctly. The advantages of the refined Rayleigh–Ritz method over the standard Rayleigh–Ritz method have also been justified by the numerical experiments in the last section. Therefore, it is preferable to expand the subspace using the refined Ritz vector and the refined harmonic Ritz vector from when exterior and interior eigenpairs of are required, respectively.
Based on the results in Section 3 and our numerical experiments, starting with a -dimensional non-Krylov subspace with the residual defined by (5) and , it is promising to develop four kinds of more practical restarted projection algorithms using computable optimal subspace expansion approaches.
As it is known, there are remarkable distinctions between the Arnoldi type expansion with and the RA method [11, 12], i.e., the Ritz expansion approach [24, 25], or the refined RA method, i.e., the refined Ritz expansion approach [24] as well as between SIRA or JD type methods and inexact SIA type methods: A large perturbation in of RA type methods or in of SIRA and JD type methods is allowed [8, 9, 22], while in Arnoldi type methods or in SIA type methods must be computed with high accuracy until the approximate eigenpairs start to converge, and afterwards its solution accuracy can be relaxed in a manner inversely proportional to the residual norms of approximate eigenpairs [16, 17]. Keep these in mind. There will be a lot of important problems and issues that deserve considerations. For example, how large errors are allowed in when effectively expanding the subspace, where the columns of form an orthonormal basis of ? This can be transformed into the problem of how large errors are allowed in the residual . In practical computations, this problem becomes the one of how large errors are allowed in those computable optimal replacements of within the framework of the four kinds of projection methods, whenever perturbation errors exist in the computable optimal replacements of . Is it possible to introduce the computable optimal expansion approaches into JD and SIRA type methods? All these need systematic exploration and analysis, and will constitute our future work.
Acknowledgment
I thank two referees and Professor Marlis Hochbruck very much for their numerous valuable suggestions and comments, which inspired me to ponder the topic and improve the presentation substantially.
References
- [1] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe and H. Van der Vorst, Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide, SIAM, Philadelphia, PA, 2000.
- [2] G. H. Golub and C. F. Van Loan, Matrix Computations, The Fourth Edition, The John Hopkins University, Baltimore, 2013.
- [3] M. Gu and S. C. Eisenstat, efficient algorithms for computing a strong rank-revealing QR factorization, SIAM J. Sci. Comput., 17 (1996), pp. 848–869.
- [4] Z. Jia, The convergence of generalized Lanczos methods for large unsymmetric eigenproblems, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 843–862.
- [5] Z. Jia, Refined iterative algorithms based on Arnoldi’s process for unsymmmetric eigenproblems, Linear Algebra Appl., 259 (1997), pp. 1–23.
- [6] Z. Jia, Some theoretical comparisons of refined Ritz vectors and Ritz vectors, Science in China Ser. A Math., 47 (2004), Supp. 222–233.
- [7] Z. Jia, The convergence of harmonic Ritz values, harmonic Ritz vectors and refined harmonic Ritz vectors, Math. Comput., 74 (2005), pp. 1441–1456.
- [8] Z. Jia and C. Li, Inner iterations in the shift-invert residual Arnoldi method and the Jacobi–Davidson method, Sci. China Math., 57 (2014), pp. 1733–1752.
- [9] Z. Jia and C. Li, Harmonic and refined harmonic shift-invert residual Arnoldi and Jacobi–Davidson methods for interior eigenvalue problems, J. Comput. Appl. Math., 282 (2015), pp. 83–97.
- [10] Z. Jia and G. W. Stewart, An analysis of the Rayleigh–Ritz method for approximating eigenspaces, Math. Comput., 70 (2001), pp. 637–648.
- [11] C. Lee, Residual Arnoldi method: theory, package and experiments, Ph. D thesis, TR-4515, Department of Computer Science, University of Maryland at College Park, 2007.
- [12] C. Lee and G. W. Stewart, Analysis of the residual Arnoldi method, TR-4890, Department of Computer Science, University of Maryland at College Park, 2007.
- [13] G. Meurant, The Lanczos and Conjugate Gradient Algorithms: From Theory to Finite Precision Computations, SIAM, Philadelphia, PA, 2006.
- [14] B. N. Parlett, The Symmetric Eigenvalue Problem, SIAM, Philadelphia, PA, 1998.
- [15] Y. Saad, Numerical Methods for Large Eigenvalue Problems, Revised Version, SIAM, Philadephia, PA, 2011.
- [16] V. Simoncini, Variable accuracy of matrix-vector products in projection methods for eigencomputation, SIAM J. Numer. Anal., 43 (2005), pp. 1155–1174.
- [17] V. Simoncini and D. Szyld, Theory of inexact Krylov subspace methods and applications to scientific computing, SIAM J. Sci. Comput., 25(2003), pp. 454–477.
- [18] G. Sleijpen and H. van der Vorst, A Jacobi–Davidson iteration method for linear eigenvalue problems, SIAM J. Matrix Anal. Appl., 17 (1996), pp. 401–425. Reprinted in SIAM Review, (2000), pp. 267–293.
- [19] G. W. Stewart, Matrix Algorithms II: Eigensystems, SIAM, Philadelphia, PA, 2001.
- [20] G. W. Stewart and J.-G. Sun, Matrix Perturbation Theory, Academic Press Inc., Boston, 1990.
- [21] H. Van der Vorst, Computational Methods for Large Eigenvalue Problems, Elsevier, North Hollands, 2002.
- [22] H. Voss, A new justification of the Jacobi–Davidson method for large eigenproblems, Linear Algebra Appl., 424 (2007), pp. 448–455.
- [23] G. Wu, The convergence of harmonic Ritz vectors and harmonic Ritz values–revisited, SIAM. J. Matrix. Anal. Appl., 38 (2017), pp. 118–133.
- [24] G. Wu and L. Zhang, On expansion of search subspaces for large non-Hermitian eigenproblems, Linear Algebra Appl., 454 (2014), pp. 107–129.
- [25] Q. Ye, Optimal expansion of subspaces for eigenvector approximations, Linear Algebra Appl., 428 (2008), pp. 911–918.