Necessary and Sufficient Conditions for Convergence to the Semicircle Distribution
Abstract.
We consider random Hermitian matrices with independent upper triangular entries. Wigner’s semicircle law says that under certain additional assumptions, the empirical spectral distribution converges to the semicircle distribution. We characterize convergence to semicircle in terms of the variances of the entries, under natural assumptions such as the Lindeberg condition. The result extends to certain matrices with entries having infinite second moments. As a corollary, another characterization of semicircle convergence is given in terms of convergence in distribution of the row sums to the standard normal distribution.
1. Introduction
Let , for each , be a random Hermitian matrix whose upper triangular entries are independent. We call a Hermitian Wigner ensemble. In case is real symmetric for all , we call a symmetric Wigner ensemble. We write throughout this paper. If are the eigenvalues of counted with multiplicity, then the empirical spectral distribution of is defined by
Since is a random measure, we can think of the mean measure , which is defined and treated in Appendix A.
Let us use the term semicircle law to refer to a class of theorems that state, under certain conditions, that converges in some sense to the semicircle distribution on given by
(We let .) Wigner initiated the spectral study of random matrices by proving the following very first version of the semicircle law in [Wig55, Wig58].
Theorem 1.1 (semicircle law, Wigner).
Let be a symmetric Wigner ensemble such that the upper triangular entries of have identical symmetric distribution with mean zero and variance . If for each we have
(1.1) |
then . Here denotes convergence in distribution.
Subsequent works by [Arn71], [Pas73], and others led to the following much more general semicircle law.
Theorem 1.2 (semicircle law, [BS10, Theorem 2.9]).
Let be a Hermitian Wigner ensemble such that the upper triangular entries of are of mean zero and variance . If
(1.2) |
then a.s.
Note that (1.1) for implies (1.2) since
Let us call (1.2) the Lindeberg condition, following the Lindeberg–Feller central limit theorem. Girko [Gir90, Theorem 9.4.1] states that the converse of Theorem 1.2 holds.
Rather surprisingly, we have the following:
Lemma 1.3 (a.s. convergence).
Let be a Hermitian Wigner ensemble. Then
A proof of this fact using a concentration-of-measure inequality is given in Appendix A. Thanks to this equivalence, we will be able to go back and forth freely between the two types of convergences throughout the paper.
Theorem 1.2 suggests an extension of the semicircle law to the case where the entries of have variances other than . Here is one possible approach to such an extension. Assume that the underlying probability space is the product
of two probability spaces and . Then let and be random real symmetric matrices defined on and having i.i.d. upper triangular entries. If is standard normal and
then it is not difficult to show that given by
satisfies the conditions of Theorem 1.2.
Since for -a.e. , Tonelli’s theorem implies that for -a.e. , we have -a.s. Note that the -entry of the random matrix defined on has variance which can deviate by any amount from .
A problem with this approach is that we do not know for which we have the a.s. convergence , even though we know this happens for almost all . For instance, the above discussion does not tell us whether a.s. is true when is a symmetric Wigner ensemble such that
and if is odd.
Götze, Naumov, and Tikhomirov [GNT15] covered this case by proving the following:
Theorem 1.4 (semicircle law, [GNT15, Corollary 1]).
Let be a symmetric Wigner ensemble such that and for . If the Lindeberg condition (1.2) holds, and
(1.3) |
and
(1.4) |
then a.s.
From our main result (Theorem 1.6) it will follow that (1.4) is not needed in Theorem 1.4, and that can be assumed to be Hermitian, not necessarily real symmetric.
To illustrate that (1.3) is needed in Theorem 1.4, the authors of [GNT15] considered the random symmetric block matrix
where and are of size and , and the upper triangular entries of are independent. They let all entries of except the non-diagonal entries of be normal with mean and variance , and simulated the spectrum of for to see that does not look like a semicircle. Note that (1.3) does not hold.
Our main theorem will let us prove what was suggested by the simulation in [GNT15], namely that . More generally, we will prove for a large class of Hermitian Wigner ensembles that (or a.s., equivalently) holds if and only if (1.3) is true.
One thing we should notice is that changing rows of has no effect on the limit of due to the following:
Lemma 1.5 (rank inequality).
Let and be Hermitian matrices. If and are the distribution functions of and (defined in the same way as ), then
Proof.
See [BS10, Theorem A.43]. ∎
We want to say that for certain Hermitian Wigner ensembles with , we have (1.3). However, without further restriction on , we can always change rows and columns of so that (1.3) becomes false, while leaving the limiting distribution of unchanged. To avoid this problem, we assume that
(1.5) |
Notice that this condition is weaker than (1.4). If satisfies (1.3) and (1.5), and we change rows and columns of it to obtain Hermitian which also satisfies (1.5), then also satisfies (1.3). The following is our first main theorem:
Theorem 1.6 (characterization of semicircle convergence).
Remark 1.7.
We can actually go beyond Theorem 1.6 and allow the entries of to have infinite variances, for example when where has a density
and is a real number close to . To achieve this, instead of and the Lindeberg condition (1.2), we assume
(1.6) |
and
(1.7) |
If and (1.2) hold, then (1.6) follows due to
and (1.7) follows by
Finally, (1.5) is replaced by
(1.8) |
The following is our second main theorem:
Theorem 1.8 (characterization, general version).
Remark 1.9.
- (1)
- (2)
- (3)
-
(4)
Our full proof of the sufficiency is a careful consideration of Wigner’s moment method proof of the original semicircle law. This is arguably more elementary than the proof of Theorem 1.4 in [GNT15], which first deals with matrices with Gaussian entries using combinatorial arguments, and then generalizes the result to symmetric Wigner ensembles using Lindeberg’s universality scheme for random matrices.
The following corollary relates to the convergence in distribution of the sum of a row of to the standard normal random variable. The sufficiency direction when the entries of are identically distributed was covered by [Jun18]. We denote the Lévy metric by .
Corollary 1.10 (characterization, Gaussian convergence).
Under the hypotheses of Theorem 1.8, we have if and only if
(1.10) |
where and are the distribution functions of and the standard normal random variable. The signs are independent Rademacher random variables independent from .
The rest of the paper is organized as follows. Section 2 is a short section that introduces Theorem 2.1. This theorem is a reduction of Theorems 1.6 and 1.8, and will ultimately imply them as shown in Appendix B. This section also proves the sufficiency part of Theorem 2.1 in the case when the entries of are real.
Section 3 is the essence of the proof of the necessity part of Theorem 2.1. We add an assumption that the sixth moment of is bounded, but as a return we obtain a clean proof that is right to the point. The idea is to express the second and the fourth moments of in terms of the variances of the entries of .
Section 4 shows that we can remove the additional assumption on the sixth moments, assuming that a certain lemma (Lemma 4.1) holds. The condition (1.5) is used in this section.
Section 5 proves Lemma 4.1 by a systematic computation of the moments of . The computation is a variant of Wigner’s original moment method, but it can handle the case when the entries have non-identical variances.
In Section 6, we prove the sufficiency part of Theorem 2.1 using the results of Section 5. The classical argument involving Dyck paths is discussed for completeness.
2. Proof of sufficiency for symmetric Wigner ensembles
It is enough to prove the following in order to prove Theorems 1.6 and 1.8. The justification for the reduction is fairly standard, and is covered by Lemmas B.1 and B.4 in the appendix.
Theorem 2.1 (characterization, reduced form).
Lemma 2.2.
3. Proof of necessity under bounded sixth moments
Assume (2.1) throughout this section. In this section, we present a relatively simple proof of necessity in Theorem 2.1 under the following additional assumption:
(3.1) |
The number comes out just because it is an even number greater than . Our proof is based on an examination of the second and the fourth moments of . If are the eigenvalues of , then for each we have
and thus
(3.2) |
(See Lemma A.1.)
The second moment of can be easily expressed in terms of the variances of .
Lemma 3.1 (computation of the second moment).
We have
Proof.
It follows from (3.2) and
Computing the fourth moment requires more effort, but is still tractable.
Lemma 3.2 (computation of the fourth moment).
If
(3.3) |
then
Proof.
Note that
Since the upper triangular entries of are independent and have mean zero, in order for not to vanish,
should either be all the same, or be partitioned into two groups, where each group consists of two identical sets. This implies either or or both. Notice that, for instance, cannot happen because would then appear only once among , , , and .
Thus, the sum on the right side equals
where the last term corresponds to the case where both and are true. Since the first and the second sum both equal
we have
Now we are ready to prove the necessity part of Theorem 2.1 assuming (3.1). Assume . By Skorokhod’s representation theorem [Bil12, Theorem 25.6], we can take real-valued random variables on a common probability space such that is the distribution of , is the distribution of , and a.s.
4. Lifting the bounded sixth moment condition
In this section, we prove the necessity part of Theorem 2.1 without assuming the bounded sixth moment condition (3.1). We rely on the following lemma, which will be proved in the next section.
The number is here just because it is even and greater than . In fact, our proof easily extends to any even natural number. Given for all , let be the matrix obtained from by replacing with for all .
Lemma 4.2.
Proof.
We may assume . Suppose that the claim is false, and let be the set of size consisting of with smallest . Notice that
along some subsequence , where . For all such that the left side in the previous display is at least , let be any subset of such that
Then follows from . However,
for for which is defined. If we let for all for which is undefined, then contradicts (1.5). ∎
Given for each , let be the matrix obtained from by replacing with for all .
Proof.
Let , and and be as in the preceding lemma. Suppose that
for some . By Lemmas 4.1 and 4.2,
This implies that is tight, thus it has a subsequence weakly convergent to some , which we still, by abuse of notation, denote by . By Skorokhod’s theorem and the uniform integrability argument that followed (3.4), we have
and thus .
If has eigenvalues outside , then the Cauchy interlacing law [Tao12, Exercise 1.3.14] implies that has at least eigenvalues outside . Thus
and therefore
by Lemma A.1. Since , the portmanteau theorem implies
but this contradicts the fact that is supported on . Thus, we have
Since is arbitrary, we have actually proved that for each we can choose such that and
Choose positive integers so that
and let for and for . (We are redefining by abuse of notation.) Then we have and (4.1). ∎
We are ready to prove the necessity part of Theorem 2.1. Assume (1.5), (2.1), and . By Lemma 1.3, we have a.s. If are as in Lemma 4.3, then
and thus a.s. by Lemma 1.5. By another application of Lemma 1.3, . As we have (4.1), the previous section tells us that
Since , the assumption (1.5) implies (1.3). Thus, the necessity part of Theorem 2.1 is proved assuming that Lemma 4.1 holds.
5. Computation of moments
The goal of this section is to prove Lemma 4.1 and also establish some arguments needed in the next section. We use a variant of Wigner’s original moment method that can handle entries with non-identical variances. Those that are very familiar with these arguments may want to jump ahead to the proof of Lemma 4.1.
Assume (1.4) and (2.1) throughout this section. Recall that
(3.2) |
for all . In this section, we compute the asymptotics of as .
Fix . The boldface lower case letters will denote , , and so on. Let us call a -tuple with a closed walk of length . For any closed walk with , let
Notice that
where ranges over all closed walks (of length ) with .
Now we gather together the closed walks which have the same “shape.” Let us say that two closed walks and are isomorphic if for any we have if and only if . A canonical closed walk of length on vertices is a closed walk such that
-
(1)
,
-
(2)
, and
-
(3)
for each .
Let denote the set of such walks. It is straightforward to show that any closed walk is isomorphic to exactly one canonical closed walk. For each , let denote the set of all closed walks with which are isomorphic to . Then we have
(5.1) |
where the upper bound of is (rather arbitrarily) set to since is empty for all .
We will fix and , and compute . As a first step, we get an easy case out of the way.
Lemma 5.1 (zeroed out terms).
If crosses some edge exactly once, i.e., for exactly one , then for any and .
Proof.
Since we have (2.1) and the upper triangular entries of are independent, is the product of (or ) and a bounded random variable which is independent from . Since , we have . ∎
Now assume that does not cross any edge exactly once, i.e., for each there is some distinct from such that . To compute , we introduce some notation. Let be the graph (possibly having loops but no multiple edges) with the vertex set
and the edge set
For a tree , let and denote the vertex set and the edge set of . Given a finite tree and , we let denote the set of injections from the vertex set of to . For each , let us write
where denote the endpoints of . We are omitting the dependence of on , but there should be no confusion. Note that the value is well-defined because is Hermitian.
Now we get back to the problem of computing . As each edge of is crossed at least twice by , there are at most edges in . As is a connected graph with vertices, we have , and has a spanning tree with edges. Choose some . For each , consider the injection given by for each .
First assume . Since has edges and each edge of is crossed twice by , we have . As each edge of is traversed exactly once in each direction, and the map given by is a bijection, we have
(5.2) |
Now assume . By and the fact that crosses any edge of at least twice, we have
Note that since . By using the bijection again, we have
(5.3) |
The right side tends to by the following.
Lemma 5.2 (contribution of a tree).
If is a finite tree with edges, , , and , then
(5.4) |
where is as in (1.4). Note that it follows that
Proof.
Lemma 5.3 (contribution of a canonical walk).
Let and . Assume that does not cross any edge exactly one, i.e., for each there is some distinct from such that . Then we have . If , we have
If , then is a tree, and we have
Combining (3.2), (5.1), Lemma 5.1, and Lemma 5.3, we obtain the following approximation to the moments of .
Lemma 5.4 (computation of moments).
Let be the set of all which crosses each edge of twice. (Note that should be a tree, and that is finite.) Then,
We now prove Lemma 4.1 as promised.
6. Proof of sufficiency
We continue to use the notation introduced in Section 5. On top of (1.3) and (2.1), we assume (1.4), which is possible by Lemma 2.2. We are one lemma away from proving the sufficiency part of Theorem 2.1. Our proof is essentially a manifestation of Wigner’s original idea.
Lemma 6.1 (each tree contributes one).
Proof.
There is nothing to prove if has no edges. To proceed by induction, assume that (6.1) holds if has edges, and let be a tree with edges. Let be a leaf of , and be the only vertex of that is adjacent to . Note that
by and Lemma 5.2 (or the induction hypothesis). Applying Lemma 5.2 once again, we have
by (1.3). The claimed result follows from the previous two displays and
Assume that is even. A Dyck path of length is a finite sequence satisfying
-
(1)
,
-
(2)
for all , and
-
(3)
for all .
Given , let where is the distance between and in . Then it is clear that is indeed a Dyck path, and it is not difficult to see that is a bijection from to the set of all Dyck paths of length . It is well-known that there are exactly Dyck paths of length ; see [vLW01, Example 14.8]. Thus, we have .
A direct computation (see [AGZ10, 2.1.1]) yields
where the odd moments of are all zero due to the symmetry. Thus,
Since
by the ratio test for all , the probability measure is determined by its moments by [Bil12, Theorem 30.1]. Therefore, the moment convergence theorem [Bil12, Theorem 30.2] tells us that .
7. Gaussian convergence
Assume that satisfies the conditions of Theorem 1.8. In this section, we prove Corollary 1.10 by showing that (1.9) and (1.10) are equivalent. We need the following two simple facts.
Lemma 7.1 (converging averages).
For each , let . If
then we can take nonempty for each so that
Proof.
For each , we have
We can take positive with such that
Let be if it is nonempty, and let otherwise. ∎
Lemma 7.2 (uniform convergence).
Let be a metric space, , and . Then the following are equivalent:
-
(1)
for any choice of .
-
(2)
.
Proof.
We omit the easy proof. ∎
First assume (1.9). By Lemma 7.1, we have nonempty such that
By (7.1), we can make smaller so that
also holds while retaining .
Let for each . By Lemma 7.2, we have
Since
for all , the Lindeberg–Feller central limit theorem [Kal02, Theorem 5.12] implies
where is standard normal. As , it follows that .
Now we assume (1.10). By Lemma 7.1 and (7.1), we have nonempty with ,
Let for each . By Lemma 7.2, we have
Since , we have
(7.2) |
Let
If along some subsequence, then
along that subsequence, but it contradicts (7.2). If along some subsequence, then
along that subsequence by the Lindeberg–Feller central limit theorem, and so . Thus, we have .
Appendix A Mean probability measures
In this section, we clarify what we mean by , and prove that is equivalent to a.s. if is a Hermitian Wigner ensemble.
Let be the set of all Borel probability measures on . Equip with the smallest -field that makes measurable for all .
For any random element of , it is straightforward to show that is a distribution function of some Borel probability measure on . Let denote that measure. Then has the following property:
Lemma A.1 (change of order).
Let be a random element of , and be (Borel) measurable.
-
(1)
If is nonnegative, then is measurable, and we have
(A.1) -
(2)
If , then is a.s. finite and measurable, and we have (A.1).
Proof.
Since (2) follows immediately from (1), we will prove (1) only. As the statement of (1) holds for for all , Dynkin’s - theorem implies that the statement holds for all measurable . By the simple function approximation argument, the statement extends to all nonnegative measurable . ∎
In order to talk about , we first need to establish the measurability of for each .
Lemma A.2 (measurability).
If is a random Hermitian matrix, then is measurable.
Proof.
Let . Given an interval where , the event of having an eigenvalue of with multiplicity at least (where ) in is equal to
which is indeed measurable. Using this and by partitioning into many small intervals, one can show that is measurable for each and . Since
is measurable. ∎
Now we turn to the proof of Lemma 1.3. We need the following inequality, which was found independently by Guntuboyina and Leeb [GL09], and Bordenave, Caputo, and Chafaï [BCC11].
Lemma A.3 (concentration for spectral measures).
Let be a Hermitian Wigner ensemble. If the total variation of is less than or equal to , then
Proof.
See [BCC11, Lemma C.2]. ∎
Proof of Lemma 1.3.
Assume . For each with , let be on , on , and linear and continuous on . Then
By Lemma A.3 and the Borel-Cantelli lemma,
This proves a.s.
To show the converse, assume that a.s., and let be continuous and bounded. Since is bounded, we can apply the dominated convergence theorem to obtain
This shows . ∎
Appendix B Reductions
In this section, we prove that Theorem 1.6 follows from Theorem 1.8 (Lemma B.1), and that Theorem 1.8 follows from Theorem 2.1 (Lemma B.4).
To prove this, we use the following two lemmas.
Lemma B.2 (perturbation inequality).
If and are Hermitian matrices, and and are the distribution functions of and , then
where is the Lévy metric.
Proof.
See [BS10, Theorem A.41]. ∎
Lemma B.3.
Proof.
From (1.2) it follows that
Proof of Lemma B.1.
Lemma B.4.
We need the following lemma.
Lemma B.5 (reduction to vanishing bounds).
The proof of (1) will be based on the following lemma.
Lemma B.6 (Bernstein’s inequality).
Suppose that are independent real-valued random variables with and for . If , then
Proof.
The proof of [Bil99, M20] with a slight modification works. ∎
Proof of Lemma B.5.
Choose with such that
(B.5) |
Proof of (1). By Lemma 1.3, it is enough to show that a.s. if and only if a.s. Let . We will first show that a.s. if and only if a.s. Note that
By Lemma 1.5, it is enough to show that the right side tends to a.s.
Let be given. By (B.5), we have some such that
Since , , are independent, Bernstein’s inequality (Lemma B.6) implies
As
the Borel-Cantelli lemma implies
This implies that a.s. if and only if a.s. as explained above.
To show that a.s. if and only if a.s., use Lemma B.2 to note that
Since , the difference between the right side and
is bounded above by
Thus, by (1.6), we have a.s.
Proof of (2). Since , we have
The first term on the right side is bounded above by
and we have shown just above that the second term also tends to .
Proof of Lemma B.4.
Assume that is given as in Theorem 1.8, and define as in Lemma B.5. If we let , then satisfies the conditions of Theorem 2.1. In particular, (1.5) for follows from (3) of Lemma B.5.
References
- [AGZ10] Greg W. Anderson, Alice Guionnet, and Ofer Zeitouni. An introduction to random matrices, volume 118 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 2010.
- [Arn71] Ludwig Arnold. On wigner’s semicircle law for the eigenvalues of random matrices. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 19(3):191–198, 1971.
- [BCC11] Charles Bordenave, Pietro Caputo, and Djalil Chafaï. Spectrum of non-Hermitian heavy tailed random matrices. Comm. Math. Phys., 307(2):513–560, 2011.
- [Bil99] Patrick Billingsley. Convergence of probability measures. Wiley Series in Probability and Statistics: Probability and Statistics. John Wiley & Sons, Inc., New York, second edition, 1999. A Wiley-Interscience Publication.
- [Bil12] Patrick Billingsley. Probability and measure. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., Hoboken, NJ, 2012. Anniversary edition [of MR1324786], With a foreword by Steve Lalley and a brief biography of Billingsley by Steve Koppes.
- [BS10] Zhidong Bai and Jack W. Silverstein. Spectral analysis of large dimensional random matrices. Springer Series in Statistics. Springer, New York, second edition, 2010.
- [Gir90] V. L. Girko. Theory of random determinants, volume 45 of Mathematics and its Applications (Soviet Series). Kluwer Academic Publishers Group, Dordrecht, 1990. Translated from the Russian.
- [GL09] Adityanand Guntuboyina and Hannes Leeb. Concentration of the spectral measure of large Wishart matrices with dependent entries. Electron. Commun. Probab., 14:334–342, 2009.
- [GNT15] F. Götze, A. A. Naumov, and A. N. Tikhomirov. Limit theorems for two classes of random matrices with dependent entries. Theory Probab. Appl., 59(1):23–39, 2015.
- [Jun18] Paul Jung. Lévy-Khintchine random matrices and the Poisson weighted infinite skeleton tree. Trans. Amer. Math. Soc., 370(1):641–668, 2018.
- [Kal02] Olav Kallenberg. Foundations of modern probability. Probability and its Applications (New York). Springer-Verlag, New York, second edition, 2002.
- [Pas73] Leonid A Pastur. Spectra of random self adjoint operators. Russian mathematical surveys, 28(1):1, 1973.
- [Tao12] Terence Tao. Topics in random matrix theory, volume 132 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2012.
- [vLW01] J. H. van Lint and R. M. Wilson. A course in combinatorics. Cambridge University Press, Cambridge, second edition, 2001.
- [Wig55] Eugene P. Wigner. Characteristic vectors of bordered matrices with infinite dimensions. Ann. of Math. (2), 62:548–564, 1955.
- [Wig58] Eugene P Wigner. On the distribution of the roots of certain symmetric matrices. Annals of Mathematics, pages 325–327, 1958.