Testing Positive Semidefiniteness Using Linear Measurements ††thanks: Authors DN and WS were partially supported by NSF DMS 2011140 and NSF DMS 2108479. DW was supported by NSF CCF 1815840 and Office of Naval Research Grant N00014-18-1-2562.
Abstract
We study the problem of testing whether a symmetric input matrix is symmetric positive semidefinite (PSD), or is -far from the PSD cone, meaning that , where is the Schatten- norm of . In applications one often needs to quickly tell if an input matrix is PSD, and a small distance from the PSD cone may be tolerable. We consider two well-studied query models for measuring efficiency, namely, the matrix-vector and vector-matrix-vector query models. We first consider one-sided testers, which are testers that correctly classify any PSD input, but may fail on a non-PSD input with a tiny failure probability. Up to logarithmic factors, in the matrix-vector query model we show a tight bound, while in the vector-matrix-vector query model we show a tight bound, for every . We also show a strong separation between one-sided and two-sided testers in the vector-matrix-vector model, where a two-sided tester can fail on both PSD and non-PSD inputs with a tiny failure probability. In particular, for the important case of the Frobenius norm, we show that any one-sided tester requires queries. However we introduce a bilinear sketch for two-sided testing from which we construct a Frobenius norm tester achieving the optimal queries. We also give a number of additional separations between adaptive and non-adaptive testers. Our techniques have implications beyond testing, providing new methods to approximate the spectrum of a matrix with Frobenius norm error using dimensionality reduction in a way that preserves the signs of eigenvalues.
1 Introduction
A real-valued matrix is said to be Positive Semi-Definite (PSD) if it defines a non-negative quadratic form, namely, if for all . If is symmetric, the setting on which we focus, this is equivalent to the eigenvalues of being non-negative. Multiple works [KS03, Han+17, BCJ20] have studied the problem of testing whether a real matrix is PSD, or is far from being PSD, and this testing problem has numerous applications, including to faster algorithms for linear systems and linear algebra problems, detecting the existence of community structure, ascertaining local convexity, and differential equations; we refer the reader to [BCJ20] and the references therein.
We study two fundamental query models. In the matrix-vector model, one is given implicit access to a matrix and may query by choosing a vector and receiving the vector In the vector-matrix-vector model one chooses a pair of vectors and queries the bilinear form associated to . In other words the value of the query is . In both models, multiple, adaptively-chosen queries can be made, and the goal is to minimize the number of queries to solve a certain task. These models are standard computational models in the numerical linear algebra community, see, e.g., [Han+17] where PSD testing was studied in the matrix-vector query model. These models were recently formalized in the theoretical computer science community in [Sun+19, RWZ20], though similar models have been studied in numerous fields, such as the number of measurements in compressed sensing, or the sketching dimension of a streaming algorithm. The matrix-vector query and vector-matrix-vector query models are particularly relevant when the input matrix is not given explicitly.
A natural situation occurs when is presented implicitly as a the Hessian of a function at a point , where could be the loss function of a neural network for example. One might want to quickly distinguish between a proposed optimum of truly being a minimum, or being a saddle point with a direction of steep downward curvature. Our query model is quite natural in this context. A Hessian-vector product is efficient to compute using automatic differentiation techniques. A vector-matrix-vector product corresponds to a single second derivative computation, . This can be approximated using function queries by the finite difference approximation where is small.
While there are numerically stable methods for computing the spectrum of a symmetric matrix, and thus determining if it is PSD, these methods can be prohibitively slow for very large matrices, and require a large number of matrix-vector or vector-matrix-vector products. Our goal is to obtain significantly more efficient algorithms in these models, and we approach this problem from a property testing perspective. In particular, we focus on the following version of the PSD-testing problem. In what follows, is the Schatten- norm of , where the are the singular values of .
Definition 1.
For , an -tester is an algorithm that makes either matrix-vector or vector-matrix-vector queries to a real symmetric matrix , and outputs True with at least probability if is PSD, and outputs False with probability if is -far in spectral distance from the PSD cone, or equivalently, if the minimum eigenvalue . If the tester is guaranteed to output True on all PSD inputs (even if the input is generated by an adversary with access to the random coins of the tester), then the tester has one-sided error. Otherwise it has two-sided error. When is clear from the context we will often drop the and simply refer to an -tester.
Our work fits more broadly into the growing body of work on property testing for linear algebra problems, see, for example [Bal+19, BCJ20, BMR21]. However, a key difference is that we focus on matrix-vector and vector-matrix-vector query models, which might be more appropriate than the model in the above works which charges a cost of for reading a single entry. Indeed, such models need to make the assumption that the entries of the input are bounded by a constant or slow-growing function of , as otherwise strong impossibility results hold. This can severely limit the applicability of such algorithms to real-life matrices that do not have bounded entries; indeed, even a graph Laplacian matrix with a single degree that is large would not fit into the above models. In contrast, we use the matrix-vector and vector-matrix-vector models, which are ideally suited for modern machines such as graphics processing units and when the input matrix cannot fit into RAM, and are standard models in scientific computing, see, e.g., [BFG96].
While we focus on vector-matrix-vector queries, our results shed light on several other natural settings. Many of our results are in fact tight for general linear measurements which vectorize the input matrix and apply adaptively chosen linear forms to it. For long enough streams the best known single or multi-pass algorithms for any problem in the turnstile streaming model form a sketch using general linear measurements, and with some additional restrictions, it can be shown that the optimal multi-pass streaming algorithm just adaptively chooses general linear measurements [Ai+16]. Therefore, it is quite plausible that many of our vector-matrix-vector algorithms give tight single pass streaming bounds, given that vector-matrix-vector queries are a special case of general linear measurements, and that many our lower bounds are tight even for general linear measurements.
Moreover our vector-matrix-vector algorithms lead to efficient communication protocols for deciding whether a distributed sum of matrices is PSD, provided that exact vector-matrix-vector products may be communicated. While we expect our methods to be stable under small perturbations (i.e. when the vector-matrix-vector products are slightly inexact), we leave the full communication complexity analysis to future work.
We note that our PSD-testing problem is closely related to that of approximating the largest eigenvalue of a PSD matrix. Indeed by appropriately negating and shifting the input matrix, it is essentially equivalent to estimate the largest eigenvalue of a PSD matrix up to additive error However this problem is much less natural as real-world matrices often have many small eigenvalues, but only a few large eigenvalues.
1.1 Our Contributions
Vector-matrix-vector queries | ||
---|---|---|
Adaptive, one-sided | Corollary 3, Theorem 15 | |
Non-adaptive, one-sided | Corollary 42, Theorem 13 | |
Adaptive, two-sided | Proposition 29, Corollary 33 | |
Non-adaptive, two-sided | Theorem 28, Theorem 31 | |
Adaptive, two-sided | Corollary 30, Corollary 33 | |
Matrix-vector queries | ||
Adaptive one-sided | , | Theorem 17, Theorem 20 |
Adaptive one-sided | Theorem 17, Theorem 20 | |
Non-adaptive one-sided | Proposition 43, Corollary 46 |
We study PSD-testing in the matrix-vector and vector-matrix-vector models. In particular, given a real symmetric matrix and , we are interested in deciding between (i) is PSD and (ii) has an eigenvalue less than where is the Schatten -norm of
Tight Bounds for One-sided Testers.
We make particular note of the distinction between one-sided and two-sided testers. In some settings one is interested in a tester that produces one-sided error. When such a tester outputs False, it must be able to produce a proof that is not PSD. The simplest such proof is a witness vector such that , and indeed we observe that in the matrix-vector model, any one-sided tester can produce such a when it outputs This may be a desirable feature if one wishes to apply these techniques to saddle point detection for example: given a point that is not a local minimum, it would be useful to produce a descent direction so that optimization may continue. In the vector-matrix-vector model the situation is somewhat more complicated in general, but all of our one-sided testers produce a witness vector whenever they output False.
We provide optimal bounds for one-sided testers for both matrix-vector and vector-matrix-vector models. The bounds below are stated for constant probability algorithms. Here .
-
1.
In the matrix-vector query model, we show that up to a factor of , queries are necessary and sufficient for an -tester for any . In the case, we note that the factor may be removed.
-
2.
In the vector-matrix-vector query model, we show that queries are necessary and sufficient for an -tester for any . Note that when we obtain a very efficient -query algorithm. In particular, our tester for has query complexity independent of the matrix dimensions, and we show a sharp phase transition for , showing in some sense that is the largest value of possible for one-sided queries.
The matrix-vector query complexity is very different than the vector-matrix-vector query complexity, as the query complexity is for any , which captures the fact that each matrix-vector query response reveals more information than that of a vector-matrix-vector query, though a priori it was not clear that such responses in the matrix-vector model could not be compressed using vector-matrix-vector queries.
An Optimal Bilinear Sketch for Two-Sided Testing.
Our main technical contribution for two-sided testers is a bilinear sketch for PSD-testing with respect to the Frobenius norm, i.e. . We consider a Gaussian sketch , where has small dimension By looking at the smallest eigenvalue of the sketch, we are able to distinguish between being PSD and being -far from PSD. Notably this tester may reject even when , which results in a two-sided error guarantee. This sketch allows us to obtain tight two-sided bounds in the vector-matrix-vector model for , both for adaptive and non-adaptive queries.
Separation Between One-Sided and Two-Sided Testers.
Surprisingly, we show a separation between one-sided and two-sided testers in the vector-matrix-vector model. For the important case of the Frobenius norm, i.e., , we utilize our bilinear sketch to construct a query two-sided tester, whereas by our results above, any adaptive one-sided tester requires at least queries.
We also show that for any , any possibly adaptive two-sided tester requires queries for constant , and thus in some sense, is the largest value of possible for two-sided queries.
On the Importance of Adaptivity.
We also study the role of adaptivity in both matrix-vector and vector-matrix-vector models. In both the one-sided and two-sided vector-matrix-vector models we show a quadratic separation between adaptive and non-adaptive testers, which is the largest gap possible for any vector-matrix-vector problem [Sun+19].
In the matrix-vector model, each query reveals more information about than in the vector-matrix-vector model, allowing for even better choices for future queries. Thus we have an even larger gap between adaptive and non-adaptive testers in this setting.
Spectrum Estimation.
While the two-sided tester discussed above yields optimal bounds for PSD testing, it does not immediately give a way to estimate the negative eigenvalue when it exists. Via a different approach, we show how to give such an approximation with additive error. In fact, we show how to approximate all of the top eigenvalues of using non-adaptive vector-matrix-vector queries, which may be of independent interest.
We note that this gives an space streaming algorithm for estimating the top eigenvalues of to within additive Frobenius error. Prior work yields a similar guarantee for the singular values [AN13], but cannot recover the signs of eigenvalues.
1.2 Our Techniques
Matrix-Vector Queries.
For the case of adaptive matrix-vector queries, we show that Krylov iteration starting with a single random vector yields an optimal -tester for all . Interestingly, our analysis is able to beat the usual Krylov matrix-vector query bound for approximating the top eigenvalue, as we modify the usual polynomial analyzed for eigenvalue estimation to implicitly implement a deflation step of all eigenvalues above a certain threshold. We do not need to explicitly know the values of the large eigenvalues in order to deflate them; rather, it suffices that there exists a low degree polynomial in the Krylov space that implements this deflation.
We note that this idea of implicitly deflating the top eigenvalues first appeared in [AL86]. More recently this observation was applied in a setting very similar to ours by [SW09] who deflated eigenvalues that are large relative to the trace.
Further, we show that this technique is tight for all by showing that any smaller number of matrix-vector products would violate a recent lower bound of [Bra+20] for approximating the smallest eigenvalue of a Wishart matrix. This lower bound applies even to two-sided testers.
Vector-Matrix-Vector Queries.
We start by describing our result for . We give one of the first examples of an algorithm in the vector-matrix-vector query model that leverages adaptivity in an interesting way. Most known algorithms in this model work non-adaptively, either by applying a bilinear sketch to the matrix, or by making many independent queries in the case of Hutchinson’s trace estimator [Hut89]. Indeed, the algorithm of [AN13] works by computing for a Gaussian matrix with columns, and arguing that all eigenvalues that are at least can be estimated from the sketch. The issue with this approach is that it uses queries and this bound is tight for non-adaptive testers! One could improve this by running our earlier matrix-vector algorithm on top of this sketch, without ever explicitly forming the matrix ; however, this would only give an query algorithm.
To achieve our optimal complexity, our algorithm instead performs a novel twist to Oja’s algorithm [Oja82], the latter being a stochastic gradient descent (SGD) algorithm applied to optimizing the quadratic form over the sphere. In typical applications, the randomness of SGD arises via randomly sampling from a set of training data. In our setting, we instead artificially introduce randomness at each step, by computing the projection of the gradient onto a randomly chosen direction. This idea is implemented via the iteration
(1) |
for a well-chosen step size If ever becomes negative before reaching the maximum number of iterations, then the algorithm outputs False, otherwise it outputs True. For , we show that this scheme results in an optimal tester (up to logarithmic factors). Our proof uses a second moment analysis to analyze a random walk, that is similar in style to [Jai+16], though our analysis is quite different. Whereas [Jai+16] considers an arbitrary i.i.d. stream of unbiased estimators to (with bounded variance), our estimators are simply which do not seem to have been considered before. We leverage this special structure to obtain a better variance bound on the iterates throughout the first iterations, where each iteration can be implemented with a single vector-matrix-vector query. Our algorithm and analysis gives a new method for the fundamental problem of approximating eigenvalues.
Our result for general follows by relating the Schatten- norm to the Schatten- norm and invoking the algorithm above with a different setting of . We show our method is optimal by proving an lower bound for non-adaptive one-sided testers, and then using a theorem in [RWZ20] which shows that adaptive one-sided testers can give at most a quadratic improvement. We note that one could instead use a recent streaming lower bound of [INW22] to prove this lower bound, though such a lower bound would depend on the bit complexity.
Two-Sided Testers.
The key technical ingredient behind all of our two-sided testers is a bilinear sketch for PSD-testing. Specifically, we show that a sketch of the form with is sufficient for obtaining a two-sided tester for . In contrast to the case, we do not simply output False when as such an algorithm would automatically be one-sided. Instead we require a criterion to detect when is suspiciously small. For this we require two results.
The first is a concentration inequality for when is PSD. We show that with very good probability. This result is equivalent to bounding the smallest singular value of , which is a Gaussian matrix whose rows have different variances. Although many similar bounds for constant variances exist in the literature [Lit+05, Ver18], we were not able to find a bound for general covariances. In particular, most existing bounds do not seem to give the concentration around that we require.
When has a negative eigenvalue of , we show that . By combining these two results, we are able to take , yielding a tight bound for non-adaptive testers in the vector-matrix-vector model. In fact this bound is even tight for general linear sketches, as we show by applying the results in [LW16].
We also utilize this bilinear sketch to give tight bounds for adaptive vector-matrix-vector queries, and indeed for general linear measurements. By first (implicitly) applying the sketch, and then shifting by an appropriate multiple of the identity we are able to reduce to the -testing problem, which as described above may solved using queries.
Spectrum Estimation.
A natural approach for approximating the eigenvalues of an matrix is to first compute a sketch or a sketch for Gaussian matrices and with a small number of columns. Both of these sketches appear in [AN13]. As noted above, is a useful non-adaptive sketch for spectrum approximation, but the error in approximating each eigenvalue is proportional to the Schatten- norm of . One could instead try to make the error depend on the Frobenius norm of by instead computing for independent Gaussian matrices and , but now is no longer symmetric and it is not clear how to extract the signs of the eigenvalues of from . Indeed, [AN13] are only able to show that the singular values of are approximately the same as those of , up to additive error. We thus need a new way to preserve sign information of eigenvalues.
To do this, we show how to use results for providing the best PSD low rank approximation to an input matrix , where need not be PSD and need not even be symmetric. In particular, in [CW17a] it was argued that if is a Gaussian matrix with columns, then if one sets up the optimization problem , then the cost will be at most , where is the best rank- PSD approximation to . By further sketching on the left and right with so-called affine embeddings and , which have rows and columns respectively, one can reduce this problem to , and now , and are all matrices so can be computed with a number of vector-matrix-vector products. At this point the optimal can be found with no additional queries and its cost can be evaluated. By subtracting this cost from , we approximate , and for all , which in turn allows us to produce (signed) estimates for the eigenvalues of .
When is PSD, we note that Theorem 1.2 in [AN13] is able to reproduce our spectral approximation guarantee using sketching dimension , compared to our sketch of dimension . However as mentioned above, our guarantee is stronger in that it allows for the signs of the eigenvalues to be recovered, i.e. our guarantee holds even when is not PSD. Additionally, we are able to achieve using just a single round of adaptivity.
Lower Bounds for One-sided Testers.
To prove lower bounds for one-sided non-adaptive testers, we first show that a one-sided tester must be able to produce a witness whenever it outputs False. In the matrix-vector model, the witness is a vector with , and in the vector-matrix-vector model, the witness is a PSD matrix with . In both cases we show that even for simplest non-PSD spectrum , that it takes many queries to produce a witness when is small. In the matrix-vector model, our approach is simply to show that the eigenvector is typically far from span of all queried vectors, when the number of queries is small. This will imply that is non-negative on the queried subspace, which precludes the tester from producing a witness. In the vector-matrix-vector model our approach is similar, however now the queries take the form of inner products against rank one matrices We therefore need to work within the space of symmetric matrices, and this requires a more delicate argument.
1.3 Additional Related Work
Numerous other works have considered matrix-vector queries and vector-matrix queries, see, e.g., [Mey+21, Bra+20, Sun+19, SER18, MM15, WWZ14]. We outline a few core areas here.
Oja’s Algorithm.
PSD Testing.
As mentioned above, PSD-testing has been investigated in the bounded entry model, where one assumes that the entries of are bounded by [BCJ20], and one is allowed to query the entries of . This is a restriction of the vector-matrix-vector model that we consider where only coordinate vectors may be queried. However since we consider a more general query model, we are able to give a better adaptive tester – for us vector-matrix-vector queries suffice, beating the lower bound given in [BCJ20] for entry queries.
Another work on PSD-testing is that of [Han+17], who construct a PSD-tester in the matrix-vector model. They first show how to approximate a general trace function for sufficiently smooth , by using a Chebyshev polynomial construction to approximate in the sup-norm over an interval. This allows them to construct an -tester by taking to be a smooth approximation of a shifted Heaviside function. Unfortunately this approach is limited to -testers, and does not achieve the optimal bound; they require matrix-vector queries compared to the queries achieved by Krylov iteration.
Spectrum Estimation.
1.4 Notation
A symmetric matrix is positive semi-definite (PSD) if all eigenvalues are non-negative. We use to represent the PSD-cone, which is the subset of symmetric matrices that are PSD.
For a matrix we use to denote the Schatten -norm, which is the norm of the vector of singular values of The Frobenius norm will play a special role in several places, so we sometimes use the notation to emphasize this. Additionally, without the subscript indicates operator norm (which is equivalent to ).
We always use to indicate the dimension of the matrix being tested, and use to indicate the parameter in Definition 1.
When applied to vectors, indicates the standard inner product on When applied to matrices, it indicates the Frobenius inner product
indicates the set of all unit vectors in
We use the notation to indicate the Moore-Penrose pseudoinverse of .
For a symmetric matrix with eigenvalues , we let denote the matrix with all but the top eigenvalues zeroed out. Formally, if is an orthogonal matrix diagonalizing , then We also let
Throughout, we use to indicate an absolute constant. The value of may change between instances.
2 Vector-matrix-vector queries
2.1 An optimal one-sided tester.
To construct our vector-matrix-vector tester, we analyze the iteration
(2) |
where and
Our algorithm is essentially to run this scheme for a fixed number of iterations with with well-chosen step size If the value of ever becomes negative, then we output False, otherwise we output True. Using this approach we prove the following.
Theorem 2.
There exists a one-sided adaptive -tester, that makes vector-matrix-vector queries to
As an immediate corollary we obtain a bound for -testers.
Corollary 3.
There is a one-sided adaptive -tester that makes vector-matrix-vector queries.
Proof.
This follows from the previous result along with the bound ∎
We now turn to the proof of Theorem 2. Since our iterative scheme is rotation-invariant, we assume without loss of generality that For now, we assume that and that the smallest eigenvalue of is We consider running the algorithm for iterations. We will show that our iteration finds an with in iterations. We will use to denote absolute constants that we don’t track, and that may vary between uses.
Our key technical lemma is to show that the first coordinate (which is associated to the eigenvalue) grows fairly quickly with good probability.
Lemma 4.
Suppose and satisfy the following list of assumptions: (1) , (2) , (3) , (4) . Then with probability at least .
Proof.
Following [Jai+16] we define the matrix , where the are independent gaussians. Note that We will show that has large norm with good probability (in fact we will show that is large). This will then imply that is large with high probability, where
Step 1: Deriving a recurrence for the second moments.
Let and let be the second moment of the coordinate . Note that (where is the Dirac delta). To simplify the notation, we drop the superscript on the . We compute .
Next we observe that (after grouping terms) the coefficients of the terms are pairwise uncorrelated. Using this, along with the fact that the ’s are independent of the ’s gives
Let , and Then we can write the recurrence as . Iterating this recurrence gives
(3) |
Step 2: Bounding .
Summing the above equation over allows us to write a recurrence for the ’s: , where we define .
We split into two parts, and corresponding to terms in the sum where is positive or negative respectively. We now use the recurrence to bound First by Holder’s inequality,
We calculate
where we used that (which is a consequence of Assumption 1), that (which holds since ) and that Since we assume that is the smallest eigenvalue,
Let Then combining our bounds gives
The next step is to use this recurrence to bound For this, define such that Plugging in to the above and dividing through by , we get that satisfies
where we used the fact that Now set By assumptions 1 and 2, This gives
Note that so a straightforward induction using the above recurrence shows that for all . It follows that .
Step 3: Bounding the second moment. Plugging the bound above in to (3) gives
Step 4: Applying Chebyshev. We focus on the first coordinate, Note that has expectation , so a straightforward induction shows that .
Using the bound for the second moment of the first coordinate, we get
By Assumptions 2 and 4, and so we get that
Thus by Chebyshev’s inequality, . So with probability at least , .
Under assumption 4, which means that with at least probability.
Step 5: Concluding the argument. We showed that with probability at least . In particular this implies that Now since is distributed as , which is at least in magnitude with probability. It follows that with probability at least . ∎
Let We next understand how the value of is updated on each iteration.
Proposition 5.
For , we have .
Proof.
Plugging in the update rule and expanding gives
from which the proposition follows. ∎
A consequence of this update is that the sequence is almost guaranteed to be decreasing as long as is chosen small enough.
Proposition 6.
Assume that and that . After iterations, with probability at least provided that
Proof.
We show something stronger; namely that for the first iterations, the sequence is decreasing. By Proposition 5, as long as The probability that this does not occur is .
The terms are independent subexponential random variables. So by Bernstein’s inequality (see [Ver18] Theorem 2.8.2 for the version used here), this probability is bounded by as long as is a sufficiently small constant. Taking a union bound gives that with probability at least , which is at least under the conditions given. ∎
Theorem 7.
Suppose that , , and that has as an eigenvalue. If we take such that , then for some we have with at least probability.
Proof.
Given an as in the statement of the theorem, choose which satisfies the assumptions of Lemma 4. Then with probability at least . By proposition 6, with at least probability, using the fact that for an appropriately chosen absolute constant , such that the hypothesis of Proposition 6 holds.
If , then the algorithm has already terminated. Otherwise conditioned on the events in the above paragraph, we have with at least probability that and
Then by Proposition 5 it follows that ∎
We also observe that we can reduce the dimension of the problem by using a result of Andoni and Nguyen. This allows us to avoid a dependence.
Proposition 8.
Suppose that satisfies and let have independent entries. Then we can choose such that and
Proof.
For the first claim, we simply apply Theorem 1.1 in [AN13] and (in their notation) set and
To show that the Schatten -norm does not grow too much under the sketch, we first write where the nonzero eigenvalues of are exactly the positive eigenvalues of . Then using the usual analysis of Hutchinson’s trace estimator (see [Mey+21] for example), we have
∎
We are now ready to give the proof of Theorem 2.
Proof.
The above result applies after scaling the given in Theorem 7 by . So it suffices to choose to be bounded above by
and within a constant factor of this value.
To choose an , pick a standard normal , and compute using vector-matrix-vector queries. Then with constant probability, Given this, we have
(4) |
which allows us to approximate to within a factor of with constant probability. Given this, one may simply try the above algorithm with an at each of different scales, with the cost of an extra factor.
Finally, we may improve the factor to a factor by using Proposition 8 to sketch , and then applying the above analysis to Note that the sketch may be used implicitly; once is chosen, a vector-matrix-vector query to can be simulated with a single vector-matrix-vector query to ∎
2.2 Lower bounds
We will show a bound for two-sided testers which will imply that the bound for -testers given in Theorem 2 is tight up to factors. If we require the tester to have one-sided error, then we additionally show that the bound in Corollary 3 is tight for all . Note that this distinction between one-sided and two-sided testers is necessary given Theorem 29.
In order to obtain these lower bounds for adaptive testers, we first show corresponding lower bounds for non-adaptive testers. A minor modification to Lemma 3.1 in [Sun+19] shows that an adaptive tester can have at most quadratic improvement over a non-adaptive tester. This will allow us to obtain our adaptive lower bounds as a consequence of the non-adaptive bounds.
2.2.1 Non-adaptive lower bounds
We first observe that a one-sided tester must always be able to produce a witness matrix , that at least certifies that is not positive definite.
Proposition 9.
If a one-sided tester makes a series of symmetric linear measurements of , and outputs False on a given instance, then there must exist nonzero such that is PSD and
Proof.
We work within the space of symmetric matrices. Let , and let be the linear functional associated with Now suppose that is strictly positive for all nonzero . We will construct that agrees with on and is non-negative on .
Let , and note that . Now by convexity of , there exists a hyperplane and associated half-space such that (i) contains (ii) (iii) and (iv) is non-negative on . Moreover, since intersects trivially, can be chosen such that Now let be a projection onto that maps to , and choose .
The linear functional is represented by the inner product against some symmetric matrix By construction of , we have for all , and also for all PSD . So in particular for all , which implies that is PSD. Given the existence of the PSD matrix consistent with all measurements, the one-sided tester must not reject.
∎
We are now able to give an explicit non-PSD spectrum which is hard for any one-sided tester. Specifically, we show that it is hard for any vector-matrix-vector query algorithm to produce a witness in the sense of the proposition above.
Theorem 10.
Let and suppose for all matrices with spectrum that a non-adaptive one-sided tester outputs False with probability. Then must make at least vector-matrix-vector queries.
Proof.
By the polarization identity,
we may assume that all queries are of the form at the cost requiring at most a factor of three increase in the number of queries.
We set where is uniform over , and let By Proposition 9, the tester may only reject if there is an in with such that For such an we have
(5) |
But since and both have unit norm and , this condition implies that
Now we turn to understanding Indeed we have the following:
Lemma 11.
Let be drawn uniformly from and let be a -dimensional subspace of the symmetric matrices. Let and Then
where is the identity matrix.
Proof.
Let be an orthonormal basis for . By the Pythagorean theorem,
(6) |
For fixed we have
(7) |
Since is symmetric, we can diagonalize to in some orthonormal basis. Since has unit norm, Then we have
Finally, observe that by the Pythagorean theorem, which finishes the proof. ∎
Remark 12.
While approximations would suffice, this result gives a quick way to compute and Set to be the entire space of symmetric matrices, and The previous result gives
On the other hand, by expanding we have
Solving the system yields and
To finish the proof of the theorem, we recall that is spanned by the matrices each of which has rank one. Therefore each matrix in , and in particular , has rank at most .
We recall for a general matrix that is gotten by truncating all but the largest singular values of Applying this to the identity matrix, when we see that
since Since we always have
Combining this fact with Lemma 11 gives
and by Markov’s inequality,
So with probability But for to be correct, we saw that we must have with probability . It follows that
which implies that
∎
In particular, this result implies that for non-adaptive one-sided testers, a -tester can only exist for
Theorem 13.
A one-sided non-adaptive -tester must make at least vector-matrix-vector queries.
Proof.
This follows as a corollary of Theorem 10; simply apply that result to the spectrum where there are ’s. ∎
2.2.2 Adaptive lower bounds
As remarked earlier, our adaptive lower bounds follow as a corollary of our non-adaptive bounds, and a slightly modified version of Lemma 3.1 in [Sun+19], which we give here.
Lemma 14.
Let be a random symmetric real-valued matrix, with diagonal, and where is orthonormal and sampled from the rotationally invariant distribution. Any adaptive vector-matrix-vector queries to may be simulated by non-adaptive vector-matrix-vector queries.
Proof.
(Sketch) First note that the adaptive protocol may be simulated by adaptive quadratic form queries, of the form by the polarization identity
(8) |
These queries in turn may be simulated by non-adaptive queries by following exactly the same proof as Lemma 3.1 in [Sun+19] (but now with in their proof). ∎
As a direct consequence of this fact and our Theorem 13 we obtain the following.
Theorem 15.
An adaptive one-sided -tester must make at least vector-matrix-vector queries.
3 Adaptive matrix-vector queries
We analyze random Krylov iteration. Namely we begin with a random and construct the sequence of iterates using adaptive matrix-vector queries. The span of these vectors is denoted and referred to as the Krylov subspace.
Krylov iteration suggests a very simple algorithm. First compute If contains a vector such that then output False, otherwise output True. (Note that one can compute and hence for all such , given the matrix-vector queries.) We show that this simple algorithm is in fact optimal.
As a point of implementation, we note that the above condition on can be checked algorthmically. One first uses Gram-Schmidt to compute the projection onto . The existence of a with is equivalent to the condition . When is -far from PSD, the proof below will show that in fact , so it suffices to estimate to within accuracy.
Proposition 16.
For , and there exists a polynomial of degree , such that and for all
Proof.
Recall that the degree Chebyshev polynomial is bounded by in absolute value on and satisfies
(See [MM15] for example.) The proposition follows by shifting and scaling . ∎
Theorem 17.
Suppose that has an eigenvalue with When , the Krylov subspace contains a vector with for When , the same conclusion holds for
Proof.
Without loss of generality, assume that . Fix a value to be determined later, effectively corresponding to the number of top eigenvalues that we deflate. By Proposition 16 we can construct a polynomial , such that and for with
(9) |
where is an absolute constant.
Now set
(10) |
Since we assume there at most terms in the product, so
(11) |
By setting we get
(12) |
As long as is at least , then lies in and
(13) |
By construction, . Also for all in
Therefore the matrix has at least one eigenvalue less than , and the positive eigenvalues sum to at most
(14) |
by using Holder’s inequality along with the fact that . So with at least probability, as desired. ∎
Remark 18.
For , the dependence can be removed by simply applying the tester to as a matrix-vector query to may be simulated via matrix-vector queries to However this comes at the cost of a dependence, and is therefore only an improvement when is extremely large.
Remark 19.
While we observe that deflation of the top eigenvalues can be carried out implicitly within the Krylov space, this can also be done explicitly using block Krylov iteration, along with the guarantee given in Theorem 1 of [MM15].
We showed above that we could improve upon the usual analysis of Krylov iteration in our context. We next establish a matching lower bound that shows our analysis is tight up to factors. This is a corollary of the proof of Theorem 3.1 presented in [Bra+20].
Theorem 20.
A two-sided, adaptive -tester in the matrix-vector model must in general make at least queries.
Proof.
We make use of the proof of Theorem 3.1 given in [Bra+20]. We consider an algorithm that receives a matrix sampled from the Wishart distribution makes at most queries, and outputs either True or False, depending on whether is greater or less than (where is defined as in [Bra+20]). We say that fails on a given instance if either (i) outputs True and or (ii) outputs False and Exactly the same proof given in [Bra+20] shows that must fail with probability at least where is an absolute constant, as long as is chosen sufficiently large depending on Taking say, means that any such algorithm fails with probability at least as long as is a large enough constant.
Now consider an -tester with , applied to the random matrix While our definition allows to fail with probability we can reduce this failure probability to by running a constant number of independent instances and taking a majority vote. So from here on we assume that fails on a given instance with probability at most .
First recall that where each entry of is i.i.d. Then with high probability, the operator norm of is bounded, say, by , and the eigenvalues of are bounded by
Therefore with high probability, and so It follows that This means that can solve the problem above, and by correctness of the tester, fails with at most probability. For sufficiently small, the above analysis implies that must make at least queries.
∎
4 An optimal bilinear sketch
In this section we analyze a bilinear sketch for PSD-testing which will also yield an optimal -tester in the vector-matrix-vector model.
Our sketch is very simple. We choose to have independent entries and take our sketch to be In parallel we construct estimates and for the trace and Frobenius norm of respectively, such that is accurate to within a multiplicative error of , and is accurate to with additive error. (Note that this may be done at the cost of increasing the sketching dimension by .)
If is not PSD then we automatically reject. Otherwise, we then consider the quantity
(15) |
If is at most for some absolute constant , then the tester outputs False, otherwise it outputs True.
We first show that a large negative eigenvalue of results causes the smallest sketched eigenvalue to be at most On the other hand, when is PSD, we will show that is substantially larger.
4.1 Upper bound on
We start with the following result on trace estimators which we will need below.
Proposition 21.
Let be a symmetric matrix with eigenvalues and let be a random unit vector with respect to the spherical measure. Then
Proof.
By the spectral theorem, it suffices to prove the result when is diagonal. Then
By Remark 12, we have and for The result follows by expanding using linearity of expectation, and then applying these facts. ∎
The next two results will give an upper bound on the smallest eigenvalue of the Gaussian sketch. For the proof of Lemma 23 we will start with random orthogonal projections, from which the Gaussian result will quickly follow. We include a technical hypothesis that essentially enforces non-negativity of We write the hypothesis in the form below simply to streamline the argument.
Lemma 22.
Suppose that and that is an eigenvector of with associated eigenvalue Let be a projection onto a random dimensional subspace of , sampled from the rotationally invariant measure. Also suppose that with probability when is a random unit vector. Then
with probability at least .
Proof.
Let The subspace was chosen randomly, so with probability at least ,
Let be the projection of onto the hyperplane orthogonal to Observe by symmetry that is distributed uniformly over the sphere in .
Let be the matrix with the eigenvalue zeroed out. Then
as long as , which holds with probability at least as a consequence of the similar hypothesis. The latter term is a trace estimator for with variance bounded by (for example by Proposition 21). So with probability
and the result follows. ∎
In the following lemma, we introduce the technical assumption that . However this will be unimportant later, as any sketch with might as well have sketching dimension , at which point the testing problem is trivial.
Lemma 23.
Suppose that has an eigenvalue of , , and with has iid entries. Also suppose that with probability at least for a random unit vector and that for some absolute constant Then with probability at least ,
Proof.
Let denote projection onto the image of .
Let be an eigenvector of with associated eigenvalue smaller or equal to and set . We then have
We also have
By Theorem 4.6.1 in [Ver18], with probability at least . Conditional on this occurring,
from which it follows that
as long as the quantity on the right-hand side is non-negative. If this quantity is negative, then we similarly have
using the analogous bound on the smallest singular value of
Since is Gaussian, the image of is distributed with the respect to the rotationally invariant measure on -dimensional subspaces. Therefore Lemma 22 applies, and the result follows after collecting terms, and using the assumption that in the negative case. ∎
4.2 Lower bound on
We follow a standard protocol for bounding the extreme eigenvalues of a random matrix. We first show that is reasonably large for a fixed vector with high probability. Then by taking a union bound over an -net we upgrade this to a uniform bound over the sphere.
We require two additional tricks. Our lower bound on arises from Berstein’s inequality, which is hampered by the existence of large eigenvalues of . Therefore in order to get a guarantee that holds with high enough probability, we first prune the large eigenvalues of
Second, the mesh size of our -net needs to be inversely proportional to the Lipschitz constant of as ranges over the sphere. A priori, the Lipschitz constant might be as bad as which is typically larger than . This would ultimately give rise to an additional factor in the final sketching dimension. However we show that the the Lipschitz constant is in fact bounded by with good probability, avoiding the need for any dependence in the sketching dimension.
Proposition 24.
Let be a symmetric matrix, and let and be unit vectors. Then
Proof.
We first reduce to the -dimensional case. Let be a -dimensional subspace passing through and . The largest and smallest eigenvalues of the restriction to of the quadratic form associated to are bounded from above and below by and respectively. It therefore suffices to prove the result when has dimension
Since the result we wish to show is invariant under shifting by multiples of the identity, it suffices to consider the case when After these reductions, we have
Since and are unit vectors, and and the result follows. ∎
Lemma 25.
Let where has iid entries and Then
with probability at least .
Proof.
Consider the random quantity , where is a random unit vector in , independent from . Note that is distributed as a standard Gaussian, so is a trace estimator for with variance [AT11].
Lemma 26.
Suppose that is PSD with and that consists of iid entries. Then for we have
Proof.
We have
(17) | ||||
(18) |
Now note that , and that since Additionally, These bounds imply that
and
for When , the latter expression is smaller, and the conclusion follows.
∎
Theorem 27.
Suppose that is PSD with , and that has iid entries and that . Then with at least probability,
for some absolute constant .
Proof.
For any fixed unit vector , is distributed as a standard Gaussian, and so Lemma 26 applies. Therefore for a choice of constant,
(19) |
with probability at least
Let be a net for the sphere in with mesh size which can be taken to have at most elements [Ver18]. By taking a union bound, equation 19 holds over with probability at least
for
By choosing in Lemma 25, and applying Proposition 24, we get that
with probability at least . Since has mesh size , we have that
for all unit vectors in with probability at least
∎
Theorem 28.
There is a bilinear sketch with sketching dimension that yields a two-sided -tester that is correct with at least probability.
Proof.
If then we automatically reject. Otherwise we first use columns of the sketching matrices to estimate of to within an additive error of with failure probability. We then use another columns of the sketching matrices to construct an approximation of with with failure probability (see for example [MSW19]).
Now consider the quantity
On the other hand, if has a negative eigenvalue less than or equal to , then by Theorem 23
which implies that
for some absolute constant
Finally by taking , we have , which implies that the tester is correct if it outputs True precisely when
∎
Note that this result immediately gives a non-adaptive vector-matrix-vector tester which makes queries.
4.3 Application to adaptive vector-matrix-vector queries
By combining our bilinear sketch with Theorem 2 we achieve tight bounds for adaptive queries.
Theorem 29.
There is a two-sided adaptive -tester in the vector-matrix-vector model, which makes queries.
Proof.
To handle the technical condition in Lemma 23, we first compute for a constant number of independent Gaussian vectors If is ever negative, then we automatically reject.
We showed in the proof of Theorem 28 that with probability, if is PSD, and if is -far from PSD. By choosing some we can arrange for and also for
Next we compute estimates and of and as above, and (implicitly) form the matrix
If is PSD, then with very good probability,
Similarly, if is -far from PSD, then
Thus it suffices to distinguish being PSD from having a negative eigenvalue less than or equal to
For this we will utilize our adaptive -tester, so we must bound Note that is a trace estimator for with variance [AN13]. Therefore Define , so that The negative eigenvalues of sum to at most in magnitude, and so the bound on implies that Write , so that Note that Therefore From this we have
as long as , which it is by assumption.
Therefore Theorem 2 gives an adaptive vector-matrix-vector tester for which requires only queries.
∎
As a consequence we also obtain a two-sided -tester for all
Corollary 30.
For , there is a two-sided adaptive -tester in the vector-matrix-vector model, which make queries.
Proof.
Apply Theorem 29 along with the bound ∎
4.4 Lower bounds for two-sided testers
Our lower bounds for two-sided testers come from the spiked Gaussian model introduced in [LW16]. As before, our adaptive lower bounds will come as a consequence of the corresponding non-adaptive bounds.
Theorem 31.
A two-sided -tester that makes non-adaptive vector-matrix-vector queries requires at least
-
•
queries for
-
•
queries for as long as can be taken to be
-
•
queries for
Proof.
First, take to be a matrix with entries, where . Also let where and have entries, and is to be chosen later. We will show that a PSD-tester can be used to distinguish and , while this is hard for any algorithm that uses only a linear sketch.
Recall that has spectral norm at most with probability at least where is an absolute constant. (We will use throughout to indicate absolute constants that we do not track – it may have different values between uses, even within the same equation.) Set
(20) |
and define similarly. Note that the eigenvalues of are precisely where the are singular values of Therefore is PSD with high probability.
On the other hand, with high probability so
(21) |
which implies that has a negative eigenvalue with magnitude at least
We also have that since the operator norm of is bounded by with high probability. Hence if
(22) |
then a two-sided PSD-tester can distinguish between and with constant probability of failure.
On the other hand, Theorem 4 in [LW16] implies that any sketch that distinguishes these distributions with constant probability, must have sketching dimension at least
It remains to choose values of and for which the inequality in equation (22) holds. When , we take and giving a lower bound of When , we take and giving a lower bound of Finally, when we take giving a lower bound of
∎
Remark 32.
The argument above applies equally well to arbitrary linear sketches, of which a series of non-adaptive vector-matrix-vector queries is a special case.
Corollary 33.
A two-sided adaptive -tester in the vector-matrix-vector model requires at least
-
•
queries for
-
•
queries for as long as can be taken to be
-
•
queries for
For adaptive measurements, we supply a second proof via communication complexity which has the advantage of applying to general linear measurements, albeit at the cost of an additional bit complexity term.
Proposition 34.
Let , and . An adaptive two-sided -tester taking general linear measurements of , where each has integer entries in , must make at least queries.
Proof.
We reduce from the multiplayer set disjointness problem [KPW21]. Let the players have sets which either are (i) all pairwise disjoint or (ii) all share precisely one common element. Distinguishing between (i) and (ii) with probability in the blackboard model of communication requires bits. We will choose
For each let be the characteristic vector of and let Consider the matrix .
In situation (i), is PSD. In situation (ii), and We have
and
Given query access to , an -tester can therefore distinguish between (i) and (ii) with probability. Note that a single linear measurement may be simulated in the blackboard model using bits; each player simply computes and communicates , and the players add the resulting measurements. The players therefore need at least bits of communication to solve the PSD-testing problem.
∎
5 Spectrum Estimation
We make use of the following result, which is Lemma 11 of [CW17a] specialized to our setting.
Lemma 35.
For a symmetric matrix , there is a distribution over an oblivious sketching matrix with so that with at least proability,
(23) |
where is the optimal rank-one PSD approximation to in Frobenius norm.
Remark 36.
In our setting one can simply take to be Gaussian since the guarantee above must hold when is drawn from a rotationally invariant distribution. In many situations, structured or sparse matrices are useful, but we do not need this here.
We also recall the notion of an affine embedding [CW17].
Definition 37.
is an affine embedding for matrices and if for all matrices of the appropriate dimensions, we have
(24) |
We also recall that when is promised to have rank at most , there is a distribution over with rows such that (24) holds with constant probability for any choice of and [CW17].
Lemma 38.
There is an algorithm which makes vector-matrix-vector queries to and with at least probability outputs an approximation of , accurate to within additive error.
Proof.
We run two subroutines in parallel.
Subroutine 1. Approximate up to multiplicative error.
Our algorithm first draws affine embedding matrices and for , and with distortion, each with rows. We also draw a matrix as in Lemma 35 with columns.
We then compute and , each requiring vector-matrix-vector queries, and compute requiring queries.
Let be arbitrary with the appropriate dimensions (later we will optimize over rank PSD matrices). By using the affine embedding property along with the fact that has rank at most , we have
As a consequence of this, and the property held by , we have
(25) | ||||
(26) |
Thus by computing the quantity in the left-hand-side above, our algorithm computes an multiplicative approximation using vector-matrix-vector queries.
Subroutine 2. Approximate up to multiplicative error.
We simply apply Theorem 2.2. of [MSW19], set and note that the entries of the sketch correspond to vector-matrix-vector products. By their bound we require vector-matrix-vector queries.
Since , we obtain an additive approximation to by running the two subroutines above and subtracting their results.
Finally, by repeating the above procedure times in parallel and taking the median of the trails, we obtain a failure probability of at most
∎
The matrices and in Subroutine 1 each have rank whereas the dimensions of are The matrix therefore contains a large amount of data that will not play a role when optimizing over . If and were known ahead of time, then we could choose to compute only the portion of that is relevant to the optimization step, and simply estimate the Frobenius error incurred by the rest. This allows us to construct a slightly more efficient two-pass protocol.
Proposition 39.
By using a single round adaptivity, the guarantee of Lemma 38 may be achieved using vector-matrix-vector queries.
Proof.
As described above, we modify Subroutine 1. Write for and for Instead of computing , and at once, we instead compute and first using vector-matrix-vector queries.
We wish to estimate , where the minimum is over PSD matrices of rank at most . Let denote orthogonal projection onto the image of , and set Then for fixed , we use the Pythagorean theorem to write
(27) | ||||
(28) | ||||
(29) | ||||
(30) |
Note that each of the last three terms can be estimated to within multiplicative error using Subroutine 2, since a vector-matrix-vector query to one of these matrices may be simulated with a single query to . Also since each has rank , the ’s are projections onto dimensional subspaces. Since the ’s are known to the algorithm, we may compute explicitly using vector-matrix-vector queries, as it suffices to query over the Cartesian product of bases for the images of and By optimizing the first term over , we thus obtain an multiplicative approximation to as desired. This gives a version of Subroutine 1 that makes queries. ∎
We note that we immediately obtain a query -tester by applying Lemma 38 to approximate . However this yields a worse dependence than Theorem 28. Perhaps more interestingly, these techniques also give a way to approximate the top (in magnitude) eigenvalues of while preserving their signs. We note a minor caveat. If and are very close in magnitude, but have opposite signs, then we cannot guarantee that we approximate . Therefore in the statement below, we only promise to approximate eigenvalues with magnitude at least .
Theorem 40.
Let be the (signed) eigenvalues of sorted in decreasing order of magnitude.
There is an algorithm that makes non-adaptive vector-matrix-vector queries to , and with probability at least , outputs such that
(i) There exists a permutation on so that for all with ,
(ii) For all , there exists with and
With one additional round of adaptivity the number of measurements can be reduced to
Proof.
We set in Lemma 38 and use it to approximate , along with , each to within additive error. Note that we may use the same sketching matrices for each of these tasks, and then take a union bound to obtain a failure probability of at most . Thus we require only queries in total. With an additional round of adaptivity, Proposition 39 reduces this bound to
Let be the largest positive eigenvalue of if it exists, and otherwise. Define similarly. Note that for , and that . This allows us to compute approximations such that , and similarly for the ’s with . Note that this bound implies
Our algorithm then simply returns the largest magnitude elements of ∎
6 Non-adaptive testers
6.1 Non-adaptive vector-matrix-vector queries
We gave a lower bound for one-sided testers earlier in Theorem 13. Here we observe that the sketch of Andoni and Nguyen [AN13] provides a matching upper bound.
Proposition 41.
There is a one-sided non-adaptive -tester that makes non-adaptive vector matrix-vector queries to .
Proof.
We simply apply Proposition 8 Note that the sketch is of the form , where with in our case. Each entry of of which there are can be computed with a single vector-matrix-vector query. ∎
Corollary 42.
There is a one-sided non-adaptive -tester that makes non-adaptive vector matrix-vector queries to .
Proof.
Apply the previous proposition along with the bound ∎
6.2 Non-adaptive matrix-vector queries
As a simple corollary of the algorithm given by Corollary 42 we have the following.
Proposition 43.
There exists a one-sided non-adaptive tester making matrix-vector queries.
Proof.
Simply note that a bilinear sketch may be simulated with matrix-vector queries. ∎
We next show that this bound is tight. While we consider the case where the tester queries the standard basis vectors, this is done essentially without loss of generality as any non-adaptive tester may be implemented by querying on an orthonormal set.
Proposition 44.
Suppose that a one sided matrix-vector tester queries on the standard basis vectors and outputs False. Let be the top submatrix of Then if is non-singular, there must exist a “witness vector” such that .
Proof.
Let be the matrix with columns and decompose it as
(31) |
where and Suppose that there does not exist a as in the statement of the proposition. Note that this implies that is PSD, and in fact positive definite by the assumption that was non-singular. Now consider the block matrix
(32) |
for some choice of For arbitrary and of the appropriate dimensions, we have
(33) | ||||
(34) |
Since this expression viewed as a quadratic form in and is positive definite for large enough This implies that is positive definite as well. Since by construction, this shows that the queries are consistent with a PSD matrix. So a one-sided tester that cannot produce a witness vector in this case must not output False. ∎
Theorem 45.
Set let be a random orthogonal matrix, and take . In the matrix-vector model, a one-sided non-adaptive tester must make at least queries to be correct on this distribution with probability.
Proof.
Given this distribution we may assume without loss of generality that the tester queries on whose span we call Let denote the eigen-direction of which is distributed uniformly over For unit vectors , the quadratic form associated to is negative exactly when Also the as in Proposition 44 is non-singular with probability In this case, by Proposition 44 the tester can only succeed if On the other hand , so by Markov, with probability at least Therefore a tester that succeeds with probability must have . ∎
Corollary 46.
In the matrix-vector model, a one-sided non-adaptive -tester must make at least queries.
Proof.
Apply Theorem 45 with ∎
7 Conclusion and Open Problems
We gave a series of tight bounds for PSD-testing in both the matrix-vector and vector-matrix-vector models. We provided tight bounds as well as a separation between one and two-sided testers in the latter model. There are a number of additional questions that may yield interesting future work.
-
•
Our adaptive vector-matrix-vector algorithm for uses rounds of adaptivity, but this may not always be desirable in practice, since the queries cannot be run in parallel. Are there good algorithms that use less adaptivity? What is the optimal trade-off between query complexity and the number of rounds of adaptivity?
-
•
One could modify our testing model and consider testers which should output False whenever the norm of the negative eigenvalues is at least an fraction of the norm of positive eigenvalues. Is it possible to give tight bounds for this problem in the models that we considered?
-
•
Is it possible to use the ideas behind our two-sided bilinear sketch to give better bounds for spectral estimation with additive Frobenius error?
8 Acknowledgements
References
- [Oja82] Erkki Oja “Simplified neuron model as a principal component analyzer” In Journal of mathematical biology 15.3 Springer, 1982, pp. 267–273
- [AL86] Owe Axelsson and Gunhild Lindskog “On the rate of convergence of the preconditioned conjugate gradient method” In Numerische Mathematik 48 Springer, 1986, pp. 499–523
- [Hut89] Michael F Hutchinson “A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines” In Communications in Statistics-Simulation and Computation 18.3 Taylor & Francis, 1989, pp. 1059–1076
- [BFG96] Zhaojun Bai, Gark Fahey and Gene Golub “Some large-scale matrix computation problems” In Journal of Computational and Applied Mathematics 74.1-2 Elsevier, 1996, pp. 71–89
- [KS03] Robert Krauthgamer and Ori Sasson “Property testing of data dimensionality” In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 12-14, 2003, Baltimore, Maryland, USA ACM/SIAM, 2003, pp. 18–27
- [Lit+05] Alexander E Litvak, Alain Pajor, Mark Rudelson and Nicole Tomczak-Jaegermann “Smallest singular value of random matrices and geometry of random polytopes” In Advances in Mathematics 195.2 Elsevier, 2005, pp. 491–523
- [SW09] Daniel A Spielman and Jaeoh Woo “A note on preconditioning by low-stretch spanning trees” In arXiv preprint arXiv:0903.2816, 2009
- [AT11] Haim Avron and Sivan Toledo “Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix” In Journal of the ACM (JACM) 58.2 ACM New York, NY, USA, 2011, pp. 1–34
- [AN13] Alexandr Andoni and Huy L Nguyn “Eigenvalues of a matrix in the streaming model” In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms, 2013, pp. 1729–1737 SIAM
- [WWZ14] Karl Wimmer, Yi Wu and Peng Zhang “Optimal query complexity for estimating the trace of a matrix” In International Colloquium on Automata, Languages, and Programming, 2014, pp. 1051–1062 Springer
- [MM15] Cameron Musco and Christopher Musco “Randomized block krylov methods for stronger and faster approximate singular value decomposition” In arXiv preprint arXiv:1504.05477, 2015
- [Ai+16] Yuqing Ai, Wei Hu, Yi Li and David P Woodruff “New characterizations in turnstile streams with applications” In 31st Conference on Computational Complexity (CCC 2016), 2016 Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik
- [Jai+16] Prateek Jain et al. “Streaming pca: Matching matrix bernstein and near-optimal finite sample guarantees for oja’s algorithm” In Conference on learning theory, 2016, pp. 1147–1164 PMLR
- [LW16] Yi Li and David P Woodruff “Tight bounds for sketching the operator norm, Schatten norms, and subspace embeddings” In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2016), 2016 Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik
- [Sha16] Ohad Shamir “Convergence of stochastic gradient descent for PCA” In International Conference on Machine Learning, 2016, pp. 257–265 PMLR
- [AL17] Zeyuan Allen-Zhu and Yuanzhi Li “First efficient convergence for streaming k-pca: a global, gap-free, and near-optimal rate” In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), 2017, pp. 487–492 IEEE
- [CW17] Kenneth L Clarkson and David P Woodruff “Low-rank approximation and regression in input sparsity time” In Journal of the ACM (JACM) 63.6 ACM New York, NY, USA, 2017, pp. 1–45
- [CW17a] Kenneth L Clarkson and David P Woodruff “Low-rank PSD approximation in input-sparsity time” In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, 2017, pp. 2061–2072 SIAM
- [Han+17] Insu Han, Dmitry Malioutov, Haim Avron and Jinwoo Shin “Approximating spectral sums of large-scale matrices using stochastic chebyshev approximations” In SIAM Journal on Scientific Computing 39.4 SIAM, 2017, pp. A1558–A1585
- [SER18] Max Simchowitz, Ahmed El Alaoui and Benjamin Recht “Tight query complexity lower bounds for PCA via finite sample deformed Wigner law” In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018, pp. 1249–1259
- [Ver18] Roman Vershynin “High-dimensional probability: An introduction with applications in data science” Cambridge university press, 2018
- [Bal+19] Maria-Florina Balcan, Yi Li, David P Woodruff and Hongyang Zhang “Testing matrix rank, optimally” In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, 2019, pp. 727–746 SIAM
- [MSW19] Michela Meister, Tamas Sarlos and David Woodruff “Tight dimensionality reduction for sketching low degree polynomial kernels”, 2019
- [Sun+19] Xiaoming Sun, David P Woodruff, Guang Yang and Jialin Zhang “Querying a matrix through matrix-vector products” In arXiv preprint arXiv:1906.05736, 2019
- [BCJ20] Ainesh Bakshi, Nadiia Chepurko and Rajesh Jayaram “Testing positive semi-definiteness via random submatrices” In arXiv preprint arXiv:2005.06441, 2020
- [Bra+20] Mark Braverman, Elad Hazan, Max Simchowitz and Blake Woodworth “The gradient complexity of linear regression” In Conference on Learning Theory, 2020, pp. 627–647 PMLR
- [RWZ20] Cyrus Rashtchian, David P Woodruff and Hanlin Zhu “Vector-matrix-vector queries for solving linear algebra, statistics, and graph problems” In arXiv preprint arXiv:2006.14015, 2020
- [BMR21] Rajarshi Bhattacharjee, Cameron Musco and Archan Ray “Sublinear Time Eigenvalue Approximation via Random Sampling” In CoRR abs/2109.07647, 2021
- [BMR21a] Rajarshi Bhattacharjee, Cameron Musco and Archan Ray “Sublinear Time Eigenvalue Approximation via Random Sampling” In arXiv preprint arXiv:2109.07647, 2021
- [KPW21] Akshay Kamath, Eric Price and David P Woodruff “A simple proof of a new set disjointness with applications to data streams” In arXiv preprint arXiv:2105.11338, 2021
- [Mey+21] Raphael A Meyer, Cameron Musco, Christopher Musco and David P Woodruff “Hutch++: Optimal Stochastic Trace Estimation” In Symposium on Simplicity in Algorithms (SOSA), 2021, pp. 142–155 SIAM
- [INW22] Piotr Indyk, Shyam Narayanan and David P. Woodruff “Frequency Estimation with One-Sided Error” In SODA, 2022