This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The strong convergence phenomenon

Ramon van Handel Department of Mathematics, Princeton University, Princeton, NJ 08544, USA rvan@math.princeton.edu
Abstract.

In a seminal 2005 paper, Haagerup and Thorbjørnsen discovered that the norm of any noncommutative polynomial of independent complex Gaussian random matrices converges to that of a limiting family of operators that arises from Voiculescu’s free probability theory. In recent years, new methods have made it possible to establish such strong convergence properties in much more general situations, and to obtain even more powerful quantitative forms of the strong convergence phenomenon. These, in turn, have led to a number of spectacular applications to long-standing open problems on random graphs, hyperbolic surfaces, and operator algebras, and have provided flexible new tools that enable the study of random matrices in unexpected generality. This survey aims to provide an introduction to this circle of ideas.

2010 Mathematics Subject Classification:
60B20; 15B52; 46L53; 46L54

1. Introduction

The aim of this survey is to discuss recent developments surrounding the following phenomenon, which has played a central role in a series of breakthroughs in the study of random graphs, hyperbolic surfaces, and operator algebras.

Definition 1.1.

Let 𝑿N=(X1N,,XrN)\bm{X}^{N}=(X_{1}^{N},\ldots,X_{r}^{N}) be a sequence of rr-tuples of random matrices of increasing dimension, and let 𝒙=(x1,,xr)\bm{x}=(x_{1},\ldots,x_{r}) be an rr-tuple of bounded operators on a Hilbert space. Then 𝑿N\bm{X}^{N} is said to converge strongly to 𝒙\bm{x} if

limNP(𝑿N,𝑿N)=P(𝒙,𝒙)in probability\lim_{N\to\infty}\|P(\bm{X}^{N},\bm{X}^{N*})\|=\|P(\bm{x},\bm{x}^{*})\|\quad\text{in probability}

for every DD\in\mathbb{N} and PMD()x1,,x2rP\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle.

Here we recall that a noncommutative polynomial with matrix coefficients PMD()x1,,xrP\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle x_{1},\ldots,x_{r}\rangle of degree qq is a formal expression

P(x1,,xr)=A0𝟏+k=1qi1,,ik=1rAi1,,ikxi1xik,P(x_{1},\ldots,x_{r})=A_{0}\otimes\mathbf{1}+\sum_{k=1}^{q}\sum_{i_{1},\ldots,i_{k}=1}^{r}A_{i_{1},\ldots,i_{k}}\otimes x_{i_{1}}\cdots x_{i_{k}},

where Ai1,,ikMD()A_{i_{1},\ldots,i_{k}}\in\mathrm{M}_{D}(\mathbb{C}) are D×DD\times D complex matrices. Such a polynomial defines a bounded operator whenever bounded operators are substituted for the free variables x1,,xrx_{1},\ldots,x_{r}. When D=1D=1, this reduces to the classical notion of a noncommutative polynomial (we will then write Px1,,xrP\in\mathbb{C}\langle x_{1},\ldots,x_{r}\rangle).

The significance of the strong convergence phenomenon may not be immediately obvious when it is encountered for the first time. Let us therefore begin with a very brief discussion of its origins.

The modern study of the spectral theory of random matrices arose from the work of physicists, especially that of Wigner and Dyson in the 1950s and 60s [129]. Random matrices arise here as generic models for real physical systems that are too complicated to be understood in detail, such as the energy level structure of complex nuclei. It is believed that universal features of such systems are captured by random matrix models that are chosen essentially uniformly within the appropriate symmetry class. Such questions have led to numerous developments in probability and mathematical physics, and the spectra of such models are now understood in stunning detail down to microscopic scales (see, e.g., [46]).

In contrast to these physically motivated models, random matrices that arise in other areas of mathematics often possess a much less regular structure. One way to build complex models is to consider arbitrary noncommutative polynomials of independent random matrices drawn from simple random matrix ensembles. It was shown in the seminal work of Voiculescu [126] that the limiting empirical eigenvalue distribution of such matrices can be described in terms of a family of limiting operators obtained by a free product construction. This is a fundamentally new perspective: while traditional random matrix methods are largely based on asymptotic explicit expressions or self-consistent equations satisfied by certain spectral statistics, Voiculescu’s theory provides us with genuine limiting objects whose spectral statistics are, in many cases, amenable to explicit computations. The interplay between random matrices and their limiting objects has proved to be of central importance, and will play a recurring role in the sequel.

While Voiculescu’s theory is extremely useful, it yields rather weak information in that it can only describe the asympotics of the trace of polynomials of random matrices. It was a major breakthrough when Haagerup and Thorbjørnsen [60] showed, for complex Gaussian (GUE) random matrices, that also the norm of arbitrary polynomials converges to that of the corresponding limiting object. This much more powerful property, which was the first instance of strong convergence, opened the door to many subsequent developments.

The works of Voiculescu and Haagerup–Thorbjørnsen were directly motivated by applications to the theory of operator algebras. The fact that polynomials of a family of operators can be approximated by matrices places strong constraints on the operator (von Neumann- or CC^{*}-)algebras generated by this family: roughly speaking, it ensures that such algebras are “approximately finite-dimensional” in a certain sense. These properties have led to the resolution of important open problems in the theory of operator algebras which do not a priori have anything to do with random matrices; see, e.g., [128, 99, 60].

The interplay between operator algebras and random matrices continues to be a rich source of problems in both areas; an influential recent example is the work of Hayes [65] on the Peterson–Thom conjecture (cf. section 5.4). In recent years, however, the notion of strong convergence has led to spectacular new applications in several other areas of mathematics. Broadly speaking, the importance of the strong convergence phenomenon is twofold.

  1. \bullet

    Noncommutative polynomials are highly expressive: many complex structures can be encoded in terms of spectral properties of noncommutative polynomials.

  2. \bullet

    Norm convergence is an extremely strong property, which provides access to challenging features of complex models.

Furthermore, new mathematical methods have made it possible to establish novel quantitative forms of strong convergence, which enable the treatment of even more general random matrix models that were previously out of reach.

We will presently highlight a number of themes that illustrate recent applications and developments surrounding strong convergence. The remainder of this survey is devoted to a more detailed introduction to this circle of ideas.

It should be emphasized at the outset that while I have aimed to give a general introduction to the strong convergence phenomenon and related topics, this survey is selectively focused on recent developments that are closest to my own interests, and is by no means comprehensive or complete. The interested reader may find complementary perspectives in surveys of Magee [88] and Collins [39], and is warmly encouraged to further explore the research literature on this subject.

1.1. Optimal spectral gaps

Let GNG^{N} be a dd-regular graph with NN vertices. By the Perron-Frobenius theorem, its adjacency matrix ANA^{N} has largest eigenvalue λ1(A)=d\lambda_{1}(A)=d with eigenvector 11 (the vector all of whose entries are one). The remaining eigenvalues are bounded by

AN|1=maxi=2,,N|λi(AN)|d.\|A^{N}|_{1^{\perp}}\|=\max_{i=2,\ldots,N}|\lambda_{i}(A^{N})|\leq d.

The smaller this quantity, the faster does a random walk on GNG^{N} mix. The following classical lemma yields a lower bound that holds for any sequence of dd-regular graphs. It provides a speed limit on how fast random walks can mix.

Lemma 1.2 (Alon–Boppana).

For any dd-regular graphs GNG^{N} with NN vertices,

AN|12d1o(1)asN.\|A^{N}|_{1^{\perp}}\|\geq 2\sqrt{d-1}-o(1)\quad\text{as}\quad N\to\infty.

Given a universal lower bound on the nontrivial eigenvalues, the obvious question is whether there exist graphs that attain this bound. Such graphs have the largest possible spectral gap. One may expect that such heavenly graphs must be very special, and indeed the first examples of such graphs were carefully constructed using deep number-theoretic ideas by Lubotzky–Phillips–Sarnak [87] and Margulis [97]. It may therefore seem surprising that this property does not turn out to be all that special at all: random graphs have an optimal spectral gap [50].111Here we gloss over an important distinction between the explicit and random constructions: the former yields the so-called Ramanujan property AN|12d1\|A^{N}|_{1^{\perp}}\|\leq 2\sqrt{d-1}, while the latter yields only AN|12d1+o(1)\|A^{N}|_{1^{\perp}}\|\leq 2\sqrt{d-1}+o(1) which is the natural converse to Lemma 1.2 (cf. section 6.5).

Theorem 1.3 (Friedman).

For a random dd-regular graph GNG^{N} on NN vertices,

AN|12d1+o(1)with probability1o(1)asN.\|A^{N}|_{1^{\perp}}\|\leq 2\sqrt{d-1}+o(1)\quad\text{with probability}\quad 1-o(1)\quad\text{as}\quad N\to\infty.

We now explain that Theorem 1.3 may be viewed as a very special instance of strong convergence. This viewpoint will open the door to establishing optimal spectral gaps in much more general situations.

Let us begin by recalling that the proof of Lemma 1.2 is based on the simple observation that for any graph GG, the number of closed walks with a given length and starting vertex is lower bounded by the number of such walks in its universal cover G~\tilde{G}. When GG is dd-regular, its universal cover G~\tilde{G} is the infinite dd-regular tree. From this, it is not difficult to deduce that the maximum nontrivial eigenvalue of a dd-regular graph is asymptotically lower bounded by the spectral radius of the infinite dd-regular tree, which is 2d12\sqrt{d-1} [71, §5.2.2].

Theorem 1.3 therefore states, in essence, that the support of the nontrivial spectrum of a random dd-regular graph behaves as that of the infinite dd-regular tree. To make the connection more explicit, it is instructive to construct both the random graph and infinite tree in a parallel manner. For simplicity, we assume dd is even (the construction can be modified to the odd case as well).

(123)(123)(1)(23)(1)(23)2\scriptstyle 23\scriptstyle 31\scriptstyle 1e\scriptstyle ea\scriptstyle aa-1\scriptstyle a^{\text{-}1}b-1\scriptstyle b^{\text{-}1}b\scriptstyle ba2\scriptstyle a^{2}ba\scriptstyle bab-1a\scriptstyle b^{\text{-}1}aa-2\scriptstyle a^{\text{-}2}b-1a-1\scriptstyle b^{\text{-}1}a^{\text{-}1}ba-1\scriptstyle ba^{\text{-}1}b-2\scriptstyle b^{\text{-}2}a-1b-1\scriptstyle a^{\text{-}1}b^{\text{-}1}ab-1\scriptstyle ab^{\text{-}1}b2\scriptstyle b^{2}ab\scriptstyle aba-1b\scriptstyle a^{\text{-}1}b
Figure 1.1. Left figure: 44-regular graph generated by two permutations. Right figure: 44-regular tree generated by two free generators a,ba,b of the free group 𝐅2\mathbf{F}_{2}. The edges defined by the two generators are colored red and blue, respectively.

Given a permutation σ𝐒N\sigma\in\mathbf{S}_{N}, we can define edges between NN vertices by connecting each vertex k[N]k\in[N] to its neighbors σ(k)\sigma(k) and σ1(k)\sigma^{-1}(k). This defines a 22-regular graph. To define a dd-regular graph, we repeat this process with r=d2r=\frac{d}{2} permutations. If the permutations are chosen independently and uniformly at random from 𝐒N\mathbf{S}_{N}, we obtain a random dd-regular graph with adjacency matrix

AN=U1N+U1N++UrN+UrN,A^{N}=U_{1}^{N}+U_{1}^{N*}+\cdots+U_{r}^{N}+U_{r}^{N*},

where UiNU_{i}^{N} are i.i.d. random permutation matrices of dimension NN.222This is the permutation model of random graphs; see [50, p. 3] for its relation to other models.

To construct the infinite dd-regular tree in a parallel manner, we identify the vertices of the tree with the free group 𝐅r\mathbf{F}_{r} with r=d2r=\frac{d}{2} free generators g1,,grg_{1},\ldots,g_{r}. Each vertex w𝐅rw\in\mathbf{F}_{r} is then connected to its neighbors giwg_{i}w and gi1wg_{i}^{-1}w for i=1,,ri=1,\ldots,r. This defines a dd-regular tree with adjacency matrix

a=u1+u1++ur+ur,a=u_{1}+u_{1}^{*}+\cdots+u_{r}+u_{r}^{*},

where ui=λ(gi)u_{i}=\lambda(g_{i}) is defined by the left-regular representation λ:𝐅rB(l2(𝐅r))\lambda:\mathbf{F}_{r}\to B(l^{2}(\mathbf{F}_{r})), i.e., λ(g)δw=δgw\lambda(g)\delta_{w}=\delta_{gw} where δwl2(𝐅r)\delta_{w}\in l^{2}(\mathbf{F}_{r}) is the coordinate vector of w𝐅rw\in\mathbf{F}_{r}.

These models are illustrated in Figure 1.1 for r=2r=2, where the edges are colored according to their generator; e.g., U1+U1U_{1}+U_{1}^{*} and u1+u1u_{1}+u_{1}^{*} are the adjacency matrices of the red edges in the left and right figures, respectively.

In these terms, Theorem 1.3 states that

limN(U1N+U1N++UrN+UrN)|1=u1+u1++ur+ur\lim_{N\to\infty}\|(U_{1}^{N}+U_{1}^{N*}+\cdots+U_{r}^{N}+U_{r}^{N*})|_{1^{\perp}}\|=\|u_{1}+u_{1}^{*}+\cdots+u_{r}+u_{r}^{*}\|

in probability. This is precisely the kind of convergence described by Definition 1.1, but only for one very special polynomial

P(𝒙,𝒙)=x1+x1++xr+xr.P(\bm{x},\bm{x}^{*})=x_{1}+x_{1}^{*}+\cdots+x_{r}+x_{r}^{*}.

Making a leap of faith, we can now ask whether the same conclusion might even hold for every noncommutative polynomial PP. That this is indeed the case is a remarkable result of Bordenave and Collins [19].

Theorem 1.4 (Bordenave–Collins).

Let 𝐔N=(U1N,,UrN)\bm{U}^{N}=(U_{1}^{N},\ldots,U_{r}^{N}) and 𝐮=(u1,,ur)\bm{u}=(u_{1},\ldots,u_{r}) be defined as above. Then 𝐔N|1\bm{U}^{N}|_{1^{\perp}} converges strongly to 𝐮\bm{u}, that is,

limNP(𝑼N,𝑼N)|1=P(𝒖,𝒖)in probability\lim_{N\to\infty}\|P(\bm{U}^{N},\bm{U}^{N*})|_{1^{\perp}}\|=\|P(\bm{u},\bm{u}^{*})\|\quad\text{in probability}

for every DD\in\mathbb{N} and PMD()x1,,x2rP\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle.333The reason that we must restrict to 11^{\perp} is elementary: the matrices UiNU_{i}^{N} have a Perron-Frobenius eigenvector 11, but the operators uiu_{i} do not as 1l2(𝐅r)1\not\in l^{2}(\mathbf{F}_{r}) (an infinite vector of ones is not in l2l^{2}). Thus we must remove the Perron-Frobenius eigenvector to achieve strong convergence.

Theorem 1.4 is a powerful tool because many problems can be encoded as special cases of this theorem by a suitable choice of PP. To illustrate this, let us revisit the optimal spectral gap phenomenon in a broader context.

In general terms, the optimal spectral gap phenomenon is the following. The spectrum of various kinds of geometric objects admits a universal bound in terms of that of their universal covering space. The question is then whether there exist such objects which meet this bound. In particular, we may ask whether that is the case for random constructions. Lemma 1.2 and Theorem 1.3 establish this for regular graphs. An analogous picture in other situations is much more recent:

  1. \bullet

    It was shown by Greenberg [71, Theorem 6.6] that for any sequence of finite graphs GNG^{N} with diverging number of vertices that have the same universal cover G~\tilde{G}, the maximal nontrivial eigenvalue of GNG^{N} is asymptotically lower bounded by the spectral radius of G~\tilde{G}. On the other hand, given any (not necessarily regular) finite graph GG, there is a natural construction of random lifts GNG^{N} with the same universal cover G~\tilde{G} [71, §6.1]. It was shown by Bordenave and Collins [19] that an optimal spectral gap phenomenon holds for random lifts of any graph GG.

  2. \bullet

    Any hyperbolic surface XX has the hyperbolic plane \mathbb{H} as its universal cover. Huber [77] and Cheng [35] showed that for any sequence XNX^{N} of closed hyperbolic surfaces with diverging diameter, the first nontrivial eigenvalue of the Laplacian ΔXN\Delta_{X^{N}} is upper bounded by the bottom of the spectrum of Δ\Delta_{\mathbb{H}}. Whether this bound can be attained was an old question of Buser [29]. An affirmative answer was obtained by Hide and Magee [69] by showing that an optimal spectral gap phenomenon holds for random covering spaces of hyperbolic surfaces.

The key ingredient in these breakthroughs is that the relevant spectral properties can be encoded as special instances of Theorem 1.4. How this is accomplished will be sketched in sections 5.1 and 5.2. In section 5.5, we will sketch another remarkable application due to Song [122] to minimal surfaces.

Another series of developments surrrounding optimal spectral gaps arises from a different perspective on Theorem 1.4. The map stdN:𝐒NMN1()\mathrm{std}_{N}:\mathbf{S}_{N}\to\mathrm{M}_{N-1}(\mathbb{C}) that assigns to each permutation σ𝐒N\sigma\in\mathbf{S}_{N} the restriction of the associated N×NN\times N permutation matrix to 11^{\perp} defines an irreducible representation of the symmetric group 𝐒N\mathbf{S}_{N} of dimension N1N-1 called the standard representation. Thus

UiN|1=stdN(σiN),U_{i}^{N}|_{1^{\perp}}=\mathrm{std}_{N}(\sigma_{i}^{N}),

where σ1N,,σrN\sigma_{1}^{N},\ldots,\sigma_{r}^{N} are independent uniformly distributed elements of 𝐒N\mathbf{S}_{N}. One may ask whether strong convergence remains valid if stdN\mathrm{std}_{N} is replaced by other representations πN\pi_{N} of 𝐒N\mathbf{S}_{N}. This and related optimal spectral gap phenomena in representation-theoretic settings are the subject of long-standing questions and conjectures; see, e.g., [119] and the references therein.

Recent advances in the study of strong convergence have led to major progress in the understanding of such questions [21, 32, 33, 90, 30]. One of the most striking results in this direction to date is the recent work of Cassidy [30], who shows that strong convergence for the symmetric group holds uniformly for all nontrivial irreducible representations of 𝐒N\mathbf{S}_{N} of dimension up to exp(N112δ)\exp(N^{\frac{1}{12}-\delta}).444For comparison, all irreducible representations of 𝐒N\mathbf{S}_{N} have dimension exp(O(NlogN))\exp(O(N\log N)). This makes it possible, for example, to study natural models of random regular graphs that achieve optimal spectral gaps using far less randomness than is required by Theorem 1.4. We will discuss these results in more detail in section 5.3.

1.2. Intrinsic freeness

We now turn to an entirely different development surrounding strong convergence that has enabled a sharp understanding of a very large class of random matrices in unexpected generality.

To set the stage for this development, let us begin by recalling the original strong convergence result of Haagerup and Thorbjørnsen [60]. Let G1N,,GrNG_{1}^{N},\ldots,G_{r}^{N} be independent GUE matrices, that is, N×NN\times N self-adjoint complex Gaussian random matrices whose off-diagonal elements have variance 1N\frac{1}{N} and whose distribution is invariant under unitary conjugation. The associated limiting object is a free semicircular family s1,,srs_{1},\ldots,s_{r} (cf. section 4.1). Define the random matrix

XN=A0𝟏+i=1rAiGiNX^{N}=A_{0}\otimes\mathbf{1}+\sum_{i=1}^{r}A_{i}\otimes G_{i}^{N}

and the limiting operator

Xfree=A0𝟏+i=1rAisi,X_{\rm free}=A_{0}\otimes\mathbf{1}+\sum_{i=1}^{r}A_{i}\otimes s_{i},

where A0,,ArMD()A_{0},\ldots,A_{r}\in\mathrm{M}_{D}(\mathbb{C}) are self-adjoint matrix coefficients.

Theorem 1.5 (Haagerup–Thorbjørnsen).

For XNX^{N} and XfreeX_{\rm free} defined as above,555Here sp(X)\mathrm{sp}(X) denotes the spectrum of XX, and we recall that the Hausdorff distance between sets A,BA,B\subseteq\mathbb{R} is defined as dH(A,B)=inf{ε>0:AB+[ε,ε] and BA+[ε,ε]}\mathrm{d_{H}}(A,B)=\inf\{\varepsilon>0:A\subseteq B+[-\varepsilon,\varepsilon]\text{ and }B\subseteq A+[-\varepsilon,\varepsilon]\}.

limNdH(sp(XN),sp(Xfree))=0a.s.\lim_{N\to\infty}\mathrm{d_{H}}\big{(}\mathrm{sp}(X^{N}),\mathrm{sp}(X_{\rm free})\big{)}=0\quad\text{a.s.}

It is a nontrivial fact, known as the linearization trick, that Theorem 1.5 implies that 𝑮N=(G1N,,GrN)\bm{G}^{N}=(G_{1}^{N},\ldots,G_{r}^{N}) converges strongly to 𝒔=(s1,,sr)\bm{s}=(s_{1},\ldots,s_{r}); see section 2.5. This conclusion was used by Haagerup and Thorbjørnsen to prove an old conjecture that the Ext\mathrm{Ext} invariant of the reduced CC^{*}-algebra of any countable free group with at least two generators is not a group. For our present purposes, however, the above formulation of Theorem 1.5 will be the most natural.

The result of Haagerup–Thorbjørnsen may be viewed as a strong incarnation of Voiculescu’s asymptotic freeness principle [126]. The limiting operators s1,,srs_{1},\ldots,s_{r} arise from a free product construction and are thus algebraically free (in fact, they are freely independent in the sense of Voiculescu). This makes it possible to compute the spectral statistics of XfreeX_{\rm free} by means of closed form equations, cf. section 4.1. The explanation for Theorem 1.5 provided by the asymptotic freeness principle is that since the random matrices GiNG_{i}^{N} have independent uniformly random eigenbases (due to their unitary invariance), they become increasingly noncommutative as NN\to\infty which leads them to behave freely in the limit.

From the perspective of applications, however, the most interesting case of this model is the special case N=1N=1, that is, the random matrix X=X1X=X^{1} defined by

X=A0+i=1rAigiX=A_{0}+\sum_{i=1}^{r}A_{i}g_{i}

where gig_{i} are independent standard Gaussians. Indeed, any D×DD\times D self-adjoint random matrix with jointly Gaussian entries (with arbitrary mean and covariance) can be expressed in this form. This model therefore captures almost arbitrarily structured random matrices: if one could understand random matrices at this level of generality, one would capture in one fell swoop a huge class of models that arise in applications. However, since the 1×11\times 1 matrices gi=Gi1g_{i}=G_{i}^{1} commute, the asymptotic freeness principle has no bearing on such matrices, and there is no reason to expect that XfreeX_{\rm free} has any significance for the behavior of XX.

It is therefore rather surprising that the spectral properties of arbitrarily structured Gaussian random matrices XX are nonetheless captured by those of XfreeX_{\rm free} in great generality. This phenomenon, developed in joint works of the author with Bandeira, Boedihardjo, Cipolloni, and Schröder [10, 11], is captured by the following counterpart of Theorem 1.5 (stated here in simplified form).

Theorem 1.6 (Intrinsic freeness).

For XX and XfreeX_{\rm free} be defined as above, we have

𝐏[dH(sp(X),sp(Xfree))>Cv~(X)((logD)34+t)]et2\mathbf{P}\big{[}\mathrm{d_{H}}\big{(}\mathrm{sp}(X),\mathrm{sp}(X_{\rm free})\big{)}>C\tilde{v}(X)\big{(}(\log D)^{\frac{3}{4}}+t\big{)}\big{]}\leq e^{-t^{2}}

for all t0t\geq 0. Here CC is a universal constant,

v~(X)=𝐄[(X𝐄X)2]14Cov(X)14,\tilde{v}(X)=\|\mathbf{E}[(X-\mathbf{E}X)^{2}]\|^{\frac{1}{4}}\|\mathrm{Cov}(X)\|^{\frac{1}{4}},

and Cov(X)\mathrm{Cov}(X) is the D2×D2D^{2}\times D^{2} covariance matrix of the entries of XX.

Remark 1.7.

While Theorem 1.6 captures the edges of the spectrum of XX, analogous results for other spectral parameters (such as the spectral distribution) may be found in [10, 11]. These results are further extended to a large class of non-Gaussian random matrices in joint work of the author with Brailovskaya [23].

Theorem 1.6 states that the spectrum of XX behaves like that of XfreeX_{\rm free} as soon at the parameter v~(X)\tilde{v}(X) is small. Unlike for the model XNX^{N}, where noncommutativity is overtly introduced by means of unitarily invariant matrices, the mechanism for XX to behave as XfreeX_{\rm free} can only arise from the structure of the matrix coefficients A0,,ArA_{0},\ldots,A_{r}. We call this phenomenon intrinsic freeness. It should not be obvious at this point why the parameter v~(X)\tilde{v}(X) captures intrinsic freeness; the origin of this phenomenon (which was inspired in part by [60, 125]) and the mechanism that gives rise to it will be discussed in section 4.

In practice, Theorem 1.6 proves to be a powerful result as v~(X)\tilde{v}(X) is indeed small in numerous applications, while the result applies without any structural assumptions on the random matrix XX. This is especially useful in questions of applied mathematics, where messy random matrices are par for the course. Several such applications are illustrated, for example, in [11, §3].

Refer to caption
Figure 1.2. Example of P(H1N,H2N,H3N)P(H_{1}^{N},H_{2}^{N},H_{3}^{N}) where HiNH_{i}^{N} are independent N×NN\times N GUE (left plot) or ww-sparse band matrices (right plot) with N=1000N=1000 and w=27w=27. The histograms show the eigenvalues of a single realization of the random matrix, the solid line is the spectral density of P(s1,s2,s3)P(s_{1},s_{2},s_{3}) (computed using NCDist.jl).

Another consequence of Theorem 1.6 is that the Haagerup–Thorbjørnsen strong convergence result extends to far more general situations. We only give one example for sake of illustration. A ww-sparse Wigner matrix HH is a self-adjoint real random matrix so that each row has exactly ww nonzero entries, each of which is an independent (modulo symmetry Hij=HjiH_{ij}=H_{ji}) centered Gaussian with variance w1w^{-1}. In this case v~(H)w14\tilde{v}(H)\sim w^{-\frac{1}{4}}. Theorem 1.6 shows that if H1N,,HrNH_{1}^{N},\ldots,H_{r}^{N} are independent ww-sparse Wigner matrices of dimension NN, then 𝑯N=(H1N,,HrN)\bm{H}^{N}=(H_{1}^{N},\ldots,H_{r}^{N}) converges strongly to 𝒔=(s1,,sr)\bm{s}=(s_{1},\ldots,s_{r}) as soon as w(logN)3w\gg(\log N)^{3} regardless of the choice of sparsity pattern. Unlike GUE matrices, such models need not possess any invariance and can have localized eigenbases. Even though this appears to dramatically violate the classical intuition behind asymptotic freeness, this model exhibits precisely the same strong convergence property as GUE (see Figure 1.2).

1.3. New methods in random matrix theory

The development of strong convergence has gone hand in hand with the discovery of new methods in random matrix theory. For example, Haagerup and Thorbjørnsen [60] pioneered the use of self-adjoint linerization (section 2.5), which enabled them to make effective use of Schwinger-Dyson equations to capture general polynomials (their work was extended to various classical random matrix models in [121, 7, 41]); while Bordenave and Collins [19, 21, 20] developed operator-valued nonbacktracking methods in order to efficiently apply the moment method to strong convergence.

More recently, however, two new methods for proving strong convergence have proved to be especially powerful, and have opened the door both to obtaining strong quantitative results and to achieving strong convergence in new situations that were previously out of reach. In contrast to previous approaches, these methods are rather different in spirit to those used in classical random matrix theory.

1.3.1. The interpolation method

The general principle captured by strong convergence (and by the earlier work of Voiculescu) is that the spectral statistics of families of random matrices behave of those of a family of limiting operators. In classical approaches to random matrix theory, however, the limiting operators do not appear directly: rather, one shows that the spectral statistics of these operators admit explicit expressions or closed-form equations, and that the random matrices obey these same expressions or equations approximately.

In contrast, the existence of limiting operators suggests that these may be exploited explicitly as a method of proof in random matrix theory. This is the basic idea behind the interpolation method, which was developed independently by Collins, Guionnet, and Parraud [40] to obtain a quantitative form of the Haagerup–Thorbjørnsen theorem, and by Bandeira, Boedihardjo, and the author [10] to prove the intrinsic freeness principle (Theorem 1.6).

Roughly speaking, the method works as follows. We aim to show that the spectral statistics of a random matrix XX behave as those of a limiting operator XfreeX_{\rm free}. To this end, one introduces a certain continuous interpolation (X(t))t[0,1](X(t))_{t\in[0,1]} between these objects, so that X(1)=XX(1)=X and X(0)=XfreeX(0)=X_{\rm free}. To bound the discrepancy between the spectral statistics of XX and XfreeX_{\rm free}, one can then estimate

|𝐄[trh(X)]τ(h(Xfree))|01|ddt𝐄[trh(X(t))]|𝑑t,|\mathbf{E}[\mathop{\mathrm{tr}}h(X)]-\tau(h(X_{\rm free}))|\leq\int_{0}^{1}\bigg{|}\frac{d}{dt}\mathbf{E}[\mathop{\mathrm{tr}}h(X(t))]\bigg{|}\,dt,

where τ\tau denotes the limiting trace (see section 2.1). If a good bound can be obtained for sufficiently general hh (we will choose h(x)=|zx|2ph(x)=|z-x|^{-2p} for pp\in\mathbb{N} and z\z\in\mathbb{C}\backslash\mathbb{R}), convergence of the norm will follow as a consequence.

As stated above, this procedure does not make much sense. Indeed XX (a random matrix) and XfreeX_{\rm free} (a deterministic operator) do not even live in the same space, so it is unclear what it means to interpolate between them. Moreover, the general approach outlined above does not in itself explain why the derivative along the interpolation should be small: the latter is the key part of the argument that requires one to understand the mechanism that gives rise to free behavior. Both these issues will be explained in more detail in section 4, where we will sketch the main ideas behind the proof of Theorem 1.6.

1.3.2. The polynomial method

We now describe an entirely different method, developed in the recent work of Chen, Garza-Vargas, Tropp, and the author [32], which has led to a series of new developments.

Consider a sequence of self-adjoint random matrices XNX^{N} with limiting operator XFX_{\rm F}; one may keep in mind the example XN=P(𝑼N,𝑼N)|1X^{N}=P(\bm{U}^{N},\bm{U}^{N*})|_{1^{\perp}} and XF=P(𝒖,𝒖)X_{\rm F}=P(\bm{u},\bm{u}^{*}) in the context of Theorem 1.4. In many natural models, it turns out to be the case that spectral statistics of polynomial test functions hh can be expressed as

𝐄[trh(XN)]=Φh(1N),\mathbf{E}[\mathop{\mathrm{tr}}h(X^{N})]=\Phi_{h}(\tfrac{1}{N}),

where Φh\Phi_{h} is a rational function whose degree is controlled by the degree qq of the polynomial hh. Whenever this is the case, the fact that

τ(h(XF))=Φh(0)\tau(h(X_{\rm F}))=\Phi_{h}(0)

is generally an immediate consequence. However, such soft information does not in itself suffice to reason about the norm.

The key observation behind the polynomial method is that classical results in the analytic theory of polynomials (due to Chebyshev, Markov, Bernstein, \ldots) can be exploited to “upgrade” the above soft information to strong quantitative bounds, merely by virtue of the fact that Φh\Phi_{h} is rational. The basic idea is to write

|𝐄[trh(XN)]τ(h(XF))|=|Φh(1N)Φh(0)|1NΦhL[0,1N].|\mathbf{E}[\mathop{\mathrm{tr}}h(X^{N})]-\tau(h(X_{\rm F}))|=|\Phi_{h}(\tfrac{1}{N})-\Phi_{h}(0)|\leq\frac{1}{N}\|\Phi_{h}^{\prime}\|_{L^{\infty}[0,\frac{1}{N}]}.

This is reminiscent of the interpolation method, where now instead of an interpolation parameter we “differentiate with respect to 1N\frac{1}{N}”. In contrast to the interpolation method, however, the surprising feature of the present approach is that the derivative of Φh\Phi_{h} can be controlled by means of completely general tools that do not require any understanding of the random matrix model. In particular, the analysis makes use of the following two classical facts about polynomials [34].

  1. \bullet

    An inequality of A. Markov states that fL[1,1]q2fL[1,1]\|f^{\prime}\|_{L^{\infty}[-1,1]}\leq q^{2}\|f\|_{L^{\infty}[-1,1]} for every real polynomial ff of degree at most qq.

  2. \bullet

    A corollary of the Markov inequality states that fL[1,1]2maxxI|f(x)|\|f\|_{L^{\infty}[-1,1]}\leq 2\max_{x\in I}|f(x)| for any discretization II of [1,1][-1,1] with spacing at most 1q2\frac{1}{q^{2}}.

Applying these results to the numerator and denominator of the rational function Φh\Phi_{h} yields with minimal effort a bound of the form

ΦhL[0,1N]q4maxN|Φh(1N)|=q4maxN|𝐄[trh(XN)]|\|\Phi_{h}^{\prime}\|_{L^{\infty}[0,\frac{1}{N}]}\lesssim q^{4}\max_{N}|\Phi_{h}(\tfrac{1}{N})|=q^{4}\max_{N}|\mathbf{E}[\mathop{\mathrm{tr}}h(X^{N})]|

(the additional factor q2q^{2} arises since we must restrict to the part of the interval where the spacing between the points {1N}\{\frac{1}{N}\} is sufficiently small). Thus we obtain a strong quantitative bound in a completely soft manner.

In this form, the above method does not suffice to achieve strong convergence. To this end, two additional ingredients must be added.

  1. 1.

    The above analysis requires the test function hh to be a polynomial. However, since the bound depends only polynomially on the degree of hh, one can use a Fourier-analytic argument to extend the bound to arbitrary smooth hh.

  2. 2.

    The 1N\frac{1}{N} rate obtained above does not suffice to deduce convergence of the norm, since it can only ensure that XNX^{N} has a bounded (rather than vanishing) number of eigenvalues outside the support of XFX_{\rm F}. To achieve strong convergence, we must expand Φh\Phi_{h} to second (or higher) order and control the additional term(s).

Nonetheless, all these ingredients are essentially elementary and can be implemented with minimal problem-specific inputs.

The polynomial method will be discussed in detail in section 3, where we will illustrate it by giving an essentially complete proof of Theorem 1.4. That an elementary proof is possible at all is surprising in itself, given that previous methods required delicate and lengthy computations.

x\scriptstyle xI(x)\scriptstyle I(x)2d1\scriptstyle 2\sqrt{d-1}ρm+1\scriptstyle\rho_{m_{*}+1}ρm+2\scriptstyle\rho_{m_{*}+2}ρm+3\scriptstyle\rho_{m_{*}+3}ρd/21\scriptstyle\rho_{d/2-1}d\scriptstyle d0\scriptstyle 0m\scriptstyle-m_{*}m1\scriptstyle-m_{*}-1m2\scriptstyle-m_{*}-2d/2+1\scriptstyle-d/2+1\scriptstyle-\infty\cdots
Figure 1.3. Staircase pattern of the large deviation probabilities in Friedman’s Theorem 1.3 for the permutation model of random regular graphs. Here we define I(x)=limNlog𝐏[ANx]logNI(x)=\lim_{N\to\infty}\frac{\log\mathbf{P}[\|A^{N}\|\,\geq\,x]}{\log N}, m=12(d1+1)m_{*}=\lfloor\frac{1}{2}(\sqrt{d-1}+1)\rfloor, and ρm=2m1+d12m1\rho_{m}=2m-1+\frac{d-1}{2m-1}.

When it is applicable, the polynomial method has typically provided the best known quantitative results and has made it possible to address previously inaccessible questions. To date, this includes works of Magee and de la Salle [90] and of Cassidy [30] on strong convergence of high dimensional representations of the unitary and symmetric groups (see also [33]); strong convergence for polynomials with coefficients in subexponential operator spaces [33]; strong convergence of the tensor GUE model of graph products of semicircular variables (ibid.); a characterization of the unusual large deviations in Friedman’s theorem [32] as illustrated in Figure 1.3; and work of Magee, Puder and the author on strong convergence of uniformly random permutation representations of surface groups [92].

1.4. Organization of this survey

The rest of this survey is organized as follows.

Section 2 collects a number of basic but very useful properties of strong (and weak) convergence that are scattered throughout the literature. These properties also illustrate the fundamdental interplay between strong convergence and the operator algebraic properties of the limiting objects.

Section 3 provides a detailed illustration of the polynomial method: we will give an essentially self-contained proof of Theorem 1.4.

Section 4 is devoted to further discussion of the intrinsic freeness phenomenon. In particular, we aim to explain the mechanism that gives rise to it.

Section 5 discusses in more detail various applications of the strong convergence phenomenon that we introduced above. In particular, we aim to explain how the strong convergence property is used in these applications.

Finally, section 6 is devoted to a brief exposition of various open problems surrounding the strong convergence phenomenon.

2. Strong convergence

The aim of this section is to collect various general properties of strong convergence that are often useful. Many of these properties rely on operator algebraic properties of the limiting objects. We have aimed to make the presentation accessible to readers without any prior background in operator algebras.

2.1. CC^{*}-probability spaces

Let XX be an N×NN\times N self-adjoint (usually random) matrix. We will be interested in understanding the empirical spectral distribution

μX=1Nk=1Nδλk(X)\mu_{X}=\frac{1}{N}\sum_{k=1}^{N}\delta_{\lambda_{k}(X)}

(that is, μX(I)\mu_{X}(I) is the fraction of the eigenvalues of XX that lies in the set II\subseteq\mathbb{R}); and the spectral edges, that is, the extreme eigenvalues

X=max1kN|λk(X)|\|X\|=\max_{1\leq k\leq N}|\lambda_{k}(X)|

or, more generally, the full spectrum sp(X)\mathrm{sp}(X) as a set. In the models that we will consider, both these spectral features are well described by the corresponding features of a limiting operator XFX_{\rm F} as NN\to\infty: convergence of the spectral distribution is weak convergence, and convergence of the spectral edges is strong convergence. These notions will be formally defined in the next section.666These notions capture the macroscopic features of the spectrum. A large part of modern random matrix theory is concerned with understanding the spectrum at the microscopic (or local) scale, that is, understanding the limit of the eigenvalues viewed as a point process. Such questions are rather different in spirit, as the behavior of the local statistics is expected to be universal and is not described by the spectral properties of limiting operators.

To do so, we must first give meaning to the spectral distribution and edges of the limiting operator XFX_{\rm F}. For the spectral edges, we may simply consider the norm or spectrum of XFX_{\rm F} which are well defined for bounded operators on any Hilbert space HH. However, the meaning of the spectral distribution of XFX_{\rm F} is not clear a priori. Indeed, since the empirical spectral distribution

f𝑑μX=1Nk=1Nf(λk(X))=tr(f(X))\int f\,d\mu_{X}=\frac{1}{N}\sum_{k=1}^{N}f(\lambda_{k}(X))=\mathop{\mathrm{tr}}(f(X))

is defined by the normalized trace,777We denote by trX=1NTrX\mathop{\mathrm{tr}}X=\frac{1}{N}\mathop{\mathrm{Tr}}X the normalized trace of an N×NN\times N matrix XX, and define f(X)f(X) by functional calculus (i.e., apply ff to the eigenvalues of XX while keeping the eigenvectors fixed). defining the spectral distribution of XFX_{\rm F} requires us to make sense of the normalized trace of infinite-dimensional operators. This is impossible in general, as any linear functional τ:B(H)\tau:B(H)\to\mathbb{C} with the trace property τ(xy)=τ(yx)\tau(xy)=\tau(yx) for all x,yB(H)x,y\in B(H) must be trivial τ0\tau\equiv 0 (this follows immediately by noting that when HH is infinite-dimensional, every element of B(H)B(H) can be written as the sum of two commutators [61]).

This situation is somewhat reminiscent of a basic issue of probability theory: one cannot define a probability measure on arbitrary subsets of an uncountable set, but must rather work with a suitable σ\sigma-algebra of sets for which the notion of measure makes sense. In the present setting, we cannot define a normalized trace for all bounded operators on an infinite-dimensional Hilbert space HH, but must rather work with a suitable algebra 𝒜B(H)\mathcal{A}\subset B(H) of operators on which the trace τ:𝒜\tau:\mathcal{A}\to\mathbb{C} can be defined. These objects must satisfy some basic axioms.888We are a bit informal in our terminology: CC^{*}-algebras are usually defined as Banach algebras rather than as subalgebras of B(H)B(H). However, as any CC^{*}-algebra may be represented in the latter form, our definition entails no loss of generality. What we call a trace should more precisely be called a tracial state. A crash course on the basic notions may be found in [106].

Definition 2.1 (CC^{*}-algebra).

A unital CC^{*}-algebra is an algebra 𝒜\mathcal{A} of bounded operators on a complex Hilbert space that is self-adjoint (a𝒜a\in\mathcal{A} implies a𝒜a^{*}\in\mathcal{A}), contains the identity 𝟏𝒜\mathbf{1}\in\mathcal{A}, and is closed in the operator norm topology.

Definition 2.2 (Trace).

A trace on a unital CC^{*}-algebra 𝒜\mathcal{A} is a linear functional τ:𝒜\tau:\mathcal{A}\to\mathbb{C} that is positive τ(aa)0\tau(a^{*}a)\geq 0, unital τ(𝟏)=1\tau(\mathbf{1})=1, and tracial τ(ab)=τ(ba)\tau(ab)=\tau(ba). A trace is called faithful if τ(aa)=0\tau(a^{*}a)=0 implies a=0a=0.

Definition 2.3 (CC^{*}-probability space).

A CC^{*}-probability space is a pair (𝒜,τ)(\mathcal{A},\tau), where 𝒜\mathcal{A} is a unital CC^{*}-algebra and τ:𝒜\tau:\mathcal{A}\to\mathbb{C} is a faithful trace.

The simplest example of a CC^{*}-probability space is the algebra of N×NN\times N matrices with its normalized trace (MN(),tr)(\mathrm{M}_{N}(\mathbb{C}),\mathop{\mathrm{tr}}). One may conceptually think of general CC^{*}-probability spaces as generalizations of this example.

Remark 2.4.

Most of the axioms in the above definitions are obvious analogues of the properties of (MN(),tr)(\mathrm{M}_{N}(\mathbb{C}),\mathop{\mathrm{tr}}). What may not be obvious at first sight is why we require 𝒜\mathcal{A} to be closed in the norm topology. The reason is that it ensures that f(a)𝒜f(a)\in\mathcal{A} for any self-adjoint a𝒜a\in\mathcal{A} not only when ff is a polynomial (which follows merely from the fact that 𝒜\mathcal{A} is an algebra), but also when ff is a continuous function, since the latter can be approximated in norm by polynomials. This property will presently be needed to define the spectral distribution.

Remark 2.5.

If we make the stronger assumption that 𝒜\mathcal{A} is closed in the strong operator topology, 𝒜\mathcal{A} is called a von Neumann algebra. This ensures that f(a)𝒜f(a)\in\mathcal{A} even when ff is a bounded measurable function. Von Neumann algebras form a major research area in their own right, but appear in this survey only in section 5.4.

Given a CC^{*}-probability space (𝒜,τ)(\mathcal{A},\tau), we can now associate to each self-adjoint element a𝒜a\in\mathcal{A}, a=aa=a^{*} a spectral distribution μa\mu_{a} by defining

f𝑑μa=τ(f(a))\int f\,d\mu_{a}=\tau(f(a))

for every continuous function f:f:\mathbb{R}\to\mathbb{C}. Indeed, that τ\tau is positive and unital implies that fτ(f(a))f\mapsto\tau(f(a)) is a positive and normalized linear functional on C0()C_{0}(\mathbb{R}), so μa\mu_{a} exists by the Riesz representation theorem.

This survey is primarily concerned with strong convergence, that is, with norms and not with spectral distributions. Nonetheless, it is generally the case that the only spectral statistics of random matrices that are directly computable are trace statistics (such as the moments 𝐄[trXp]\mathbf{E}[\mathop{\mathrm{tr}}X^{p}]), so that a good understanding of weak convergence is prerequisite for proving strong convergence. In particular, we must understand how to recover the spectrum from the spectral distribution. It is here that the faithfulness of the trace τ\tau plays a key role.

Lemma 2.6 (Spectral distribution and spectrum).

Let (𝒜,τ)(\mathcal{A},\tau) be a CC^{*}-probability space. Then for any self-adjoint a𝒜a\in\mathcal{A}, a=aa=a^{*}, we have suppμa=sp(a)\mathop{\mathrm{supp}}\mu_{a}=\mathrm{sp}(a) and thus

a=limpτ(a2p)12p.\|a\|=\lim_{p\to\infty}\tau(a^{2p})^{\frac{1}{2p}}.
Proof.

By the definition of support, xsuppμax\not\in\mathop{\mathrm{supp}}\mu_{a} if and only if there is a continuous nonnegative function ff so that f(x)>0f(x)>0 and f𝑑μa=0\int f\,d\mu_{a}=0. On the other hand, by the spectral theorem, xsp(a)x\not\in\mathrm{sp}(a) if and only if there is a continuous nonnegative function ff so that f(x)>0f(x)>0 and f(a)=0f(a)=0. That suppμa=sp(a)\mathop{\mathrm{supp}}\mu_{a}=\mathrm{sp}(a) therefore follows as τ(f(a))=0\tau(f(a))=0 if and only if f(a)=0f(a)=0, since τ\tau is faithful and f0f\geq 0. ∎

We now introduce one of the most important examples of a CC^{*}-probability space. Another important example will appear in section 4.1.

Example 2.7 (Reduced group CC^{*}-algebras).

Let 𝐆\mathbf{G} be a finitely generated group with generators g1,,grg_{1},\ldots,g_{r}, and λ:𝐆B(l2(𝐆))\lambda:\mathbf{G}\to B(l^{2}(\mathbf{G})) be its left-regular representation, i.e., λ(g)δw=δgw\lambda(g)\delta_{w}=\delta_{gw} where δel2(𝐆)\delta_{e}\in l^{2}(\mathbf{G}) is the coordinate vector of w𝐆w\in\mathbf{G}. Then

Cred(𝐆)=cl(span{λ(g):g𝐆})=C(λ(g1),,λ(gr))C^{*}_{\rm red}(\mathbf{G})=\mathrm{cl}_{\|\cdot\|}\big{(}\mathop{\mathrm{span}}\{\lambda(g):g\in\mathbf{G}\}\big{)}=C^{*}(\lambda(g_{1}),\ldots,\lambda(g_{r}))

is called the reduced CC^{*}-algebra of 𝐆\mathbf{G}. Here and in the sequel, C(𝒖)C^{*}(\bm{u}) denotes the CC^{*}-algebra generated by a family of operators 𝒖=(u1,,ur)\bm{u}=(u_{1},\ldots,u_{r}) (that is, the norm-closure of the set of all noncommutative polynomials in 𝒖,𝒖\bm{u},\bm{u}^{*}).

The reduced CC^{*}-algebra of any group always comes equipped with a canonical trace τ:Cred(𝐆)\tau:C^{*}_{\rm red}(\mathbf{G})\to\mathbb{C} that is defined for all aCred(𝐆)a\in C^{*}_{\rm red}(\mathbf{G}) by

τ(a)=δe,aδe,\tau(a)=\langle\delta_{e},a\,\delta_{e}\rangle,

where e𝐆e\in\mathbf{G} is the identity element. Note that:

  1. \bullet

    It is straightforward to check that τ\tau is indeed tracial: by linearity, it suffices to show that τ(λ(g)λ(h))=1gh=e=τ(λ(h)λ(g))\tau(\lambda(g)\lambda(h))=1_{gh=e}=\tau(\lambda(h)\lambda(g)) for all g,h𝐆g,h\in\mathbf{G}.

  2. \bullet

    τ\tau is also faithful: if τ(aa)=0\tau(a^{*}a)=0, then aδg2=τ(λ(g)aaλ(g))=τ(aa)=0\|a\delta_{g}\|^{2}=\tau(\lambda(g)^{*}a^{*}a\lambda(g))=\tau(a^{*}a)=0 for all g𝐆g\in\mathbf{G} by the trace property (since λ(g)λ(g)=𝟏\lambda(g)\lambda(g)^{*}=\mathbf{1}), and thus a=0a=0.

Thus (Cred(𝐆),τ)(C^{*}_{\rm red}(\mathbf{G}),\tau) defines a CC^{*}-probability space.

Example 2.8 (Free group).

In the case that 𝐆=𝐅r\mathbf{G}=\mathbf{F}_{r} is a free group, we implicitly encountered the above construction in section 1.1. We argued there that the adjacency matrix of a random 2r2r-regular graph is modelled by the operator

a=λ(g1)+λ(g1)++λ(gr)+λ(gr)Cred(𝐅r).a=\lambda(g_{1})+\lambda(g_{1})^{*}+\cdots+\lambda(g_{r})+\lambda(g_{r})^{*}\in C^{*}_{\rm red}(\mathbf{F}_{r}).

It follows immediately from the definition that the moments of the spectral distribution μa\mu_{a} (defined by the canonical trace τ\tau) are given by

xp𝑑μa=τ(ap)=#{words of length p in g1,g11,,gr,gr1 that reduce to e}.\int x^{p}\,d\mu_{a}=\tau(a^{p})=\#\{\text{words of length $p$ in }g_{1},g_{1}^{-1},\ldots,g_{r},g_{r}^{-1}\text{ that reduce to }e\}.

As the moments grow at most exponentially in pp, this uniquely determines μa\mu_{a}. The density of μa\mu_{a} was computed in a classic paper of Kesten [80, proof of Theorem 3], and is known as the Kesten distribution. Since the explicit formula for the density shows that suppμa=[22r1,22r1]\mathop{\mathrm{supp}}\mu_{a}=[-2\sqrt{2r-1},2\sqrt{2r-1}], Lemma 2.6 yields

a=22r1.\|a\|=2\sqrt{2r-1}.

This explains the value of the norm that appears in Theorem 1.3.

2.2. Strong and weak convergence

We can now formally define the notions of weak and strong convergence of families of random matrices.

Definition 2.9 (Weak and strong convergence).

Let 𝑿N=(X1N,,XrN)\bm{X}^{N}=(X_{1}^{N},\ldots,X_{r}^{N}) be a sequence of rr-tuples of random matrices, and let 𝒙=(x1,,xr)\bm{x}=(x_{1},\ldots,x_{r}) be an rr-tuple of elements of a CC^{*}-probability space (𝒜,τ)(\mathcal{A},\tau).

  1. \bullet

    𝑿N\bm{X}^{N} is said to converge weakly to 𝒙\bm{x} if for every Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle

    limNtr(P(𝑿N,𝑿N))=τ(P(𝒙,𝒙))in probability.\lim_{N\to\infty}\mathop{\mathrm{tr}}(P(\bm{X}^{N},\bm{X}^{N*}))=\tau(P(\bm{x},\bm{x}^{*}))\quad\text{in probability}. (2.1)
  2. \bullet

    𝑿N\bm{X}^{N} is said to converge strongly to 𝒙\bm{x} if for every Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle

    limNP(𝑿N,𝑿N)=P(𝒙,𝒙)in probability.\lim_{N\to\infty}\|P(\bm{X}^{N},\bm{X}^{N*})\|=\|P(\bm{x},\bm{x}^{*})\|\quad\text{in probability}. (2.2)

This definition appears to be slightly weaker than our initial definition of strong convergence in Definition 1.1, where we allowed for polynomials PP with matrix rather than scalar coefficients. We will show in section 2.4 that the apparently weaker definition in fact already implies the stronger one.

We begin by spelling out some basic properties, see for example [41, §2.1].

Lemma 2.10 (Equivalent formulations of weak convergence).

The following are equivalent.

  1. a.

    𝑿N\bm{X}^{N} converges weakly to 𝒙\bm{x}.

  2. b.

    Eq. (2.1) holds for every self-adjoint Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle.

  3. c.

    For every self-adjoint Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle, the empirical spectral distribution μP(𝑿N,𝑿N)\mu_{P(\bm{X}^{N},\bm{X}^{N*})} converges weakly to μP(𝒙,𝒙)\mu_{P(\bm{x},\bm{x}^{*})} in probability.

Proof.

Since every polynomial Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle can be written as P=P1+iP2P=P_{1}+iP_{2} for self-adjoint polynomials P1,P2P_{1},P_{2}, the equivalence aba\Leftrightarrow b is immediate by linearity of the trace. Moreover, the implication cbc\Rightarrow b is trivial since τ(a)=xμa(dx)\tau(a)=\int x\,\mu_{a}(dx) by the definition of the spectral distribution (and as μa\mu_{a} is compactly supported).

On the other hand, since Ppx1,,x2rP^{p}\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle for every pp\in\mathbb{N}, (2.1) implies

xpμP(𝑿N,𝑿N)(dx)=tr(P(𝑿N,𝑿N)p)Nτ(P(𝒙,𝒙)p)=xpμP(𝒙,𝒙)(dx)\int x^{p}\,\mu_{P(\bm{X}^{N},\bm{X}^{N*})}(dx)=\mathop{\mathrm{tr}}(P(\bm{X}^{N},\bm{X}^{N*})^{p})\xrightarrow{N\to\infty}\\ \tau(P(\bm{x},\bm{x}^{*})^{p})=\int x^{p}\,\mu_{P(\bm{x},\bm{x}^{*})}(dx)

in probability. As μP(𝒙,𝒙)\mu_{P(\bm{x},\bm{x}^{*})} is compactly supported, convergence of moments implies weak convergence, and the implication bcb\Rightarrow c follows. ∎

A parallel result holds for strong convergence.

Lemma 2.11 (Equivalent formulations of strong convergence).

The following are equivalent.

  1. a.

    𝑿N\bm{X}^{N} converges strongly to 𝒙\bm{x}.

  2. b.

    Eq. (2.2) holds for every self-adjoint Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle.

  3. c.

    For every self-adjoint Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle and fC()f\in C(\mathbb{R}), we have

    limNf(P(𝑿N,𝑿N))=f(P(𝒙,𝒙))in probability.\lim_{N\to\infty}\|f(P(\bm{X}^{N},\bm{X}^{N*}))\|=\|f(P(\bm{x},\bm{x}^{*}))\|\quad\text{in probability}.
  4. d.

    For every self-adjoint Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle, we have

    limNdH(sp(P(𝑿N,𝑿N)),sp(P(𝒙,𝒙)))=0in probability.\lim_{N\to\infty}\mathrm{d_{H}}\big{(}\mathrm{sp}(P(\bm{X}^{N},\bm{X}^{N*})),\mathrm{sp}(P(\bm{x},\bm{x}^{*}))\big{)}=0\quad\text{in probability}.
Proof.

Since X2=XX\|X\|^{2}=\|X^{*}X\| for any operator XX and as PPx1,,x2rP^{*}P\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle is self-adjoint, it is immediate that aba\Leftrightarrow b. That dbd\Rightarrow b is immediate as |XY|dH(sp(X),sp(Y))|\|X\|-\|Y\||\leq\mathrm{d_{H}}(\mathrm{sp}(X),\mathrm{sp}(Y)) for any bounded self-adjoint operators X,YX,Y.

To show that bcb\Rightarrow c, we may choose for any ε>0\varepsilon>0 a univariate real polynomial hh so that fhL[K,K]ε\|f-h\|_{L^{\infty}[-K,K]}\leq\varepsilon, where K=2P(𝒙,𝒙)K=2\|P(\bm{x},\bm{x}^{*})\|. Since (2.2) implies that P(𝑿N,𝑿N)K\|P(\bm{X}^{N},\bm{X}^{N*})\|\leq K with probability 1o(1)1-o(1) as NN\to\infty, we obtain

f(P(𝒙,𝒙))4εf(P(𝑿N,𝑿N))f(P(𝒙,𝒙))+4ε\|f(P(\bm{x},\bm{x}^{*}))\|-4\varepsilon\leq\|f(P(\bm{X}^{N},\bm{X}^{N*}))\|\leq\|f(P(\bm{x},\bm{x}^{*}))\|+4\varepsilon

with probability 1o(1)1-o(1) as NN\to\infty by applying (2.2) to hPx1,,x2rh\circ P\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle. That bcb\Rightarrow c follows by taking ε0\varepsilon\downarrow 0.

To show that cdc\Rightarrow d, choose fC()f\in C(\mathbb{R}) so that f(x)=0f(x)=0 for xsp(P(𝒙,𝒙))x\in\mathrm{sp}(P(\bm{x},\bm{x}^{*})) and f(x)>0f(x)>0 otherwise. Since cc implies that f(P(𝑿N,𝑿N))0\|f(P(\bm{X}^{N},\bm{X}^{N*}))\|\to 0 in probability,

sp(P(𝑿N,𝑿N))sp(P(𝒙,𝒙))+[ε,ε]\mathrm{sp}(P(\bm{X}^{N},\bm{X}^{N*}))\subseteq\mathrm{sp}(P(\bm{x},\bm{x}^{*}))+[-\varepsilon,\varepsilon]

with probability 1o(1)1-o(1) as NN\to\infty for every ε>0\varepsilon>0. On the other hand, for any ysp(P(𝒙,𝒙))y\in\mathrm{sp}(P(\bm{x},\bm{x}^{*})), we may choose fC()f\in C(\mathbb{R}) so that f(y)=1f(y)=1 and f(x)<1f(x)<1 for xyx\neq y. Since cc implies that f(P(𝑿N,𝑿N))1\|f(P(\bm{X}^{N},\bm{X}^{N*}))\|\to 1 in probability,

y+[ε2,ε2]sp(P(𝑿N,𝑿N))+[ε,ε]y+[-\tfrac{\varepsilon}{2},\tfrac{\varepsilon}{2}]\subseteq\mathrm{sp}(P(\bm{X}^{N},\bm{X}^{N*}))+[-\varepsilon,\varepsilon]

with probability 1o(1)1-o(1) as NN\to\infty for every ε>0\varepsilon>0. As sp(P(𝒙,𝒙))\mathrm{sp}(P(\bm{x},\bm{x}^{*})) can be covered by a finite number of such sets y+[ε2,ε2]y+[-\tfrac{\varepsilon}{2},\tfrac{\varepsilon}{2}], the implication cdc\Rightarrow d follows. ∎

The elementary equivalent formulations of weak and strong convergence discussed above are all concerned with the (real) eigenvalues of self-adjoint polynomials. In contrast, what implications weak or strong convergence may have for the empirical distributions of the complex eigenvalues of non-self-adjoint (or non-normal) polynomials is poorly understood; see section 6.9. We nonetheless record one easy observation in this direction [100, Remark 3.6].

Lemma 2.12 (Spectral radius).

Suppose that 𝐗N\bm{X}^{N} converges strongly to 𝐱\bm{x}. Then

ϱ(P(𝑿N,𝑿N))ϱ(P(𝒙,𝒙))+o(1)with probability1o(1)\varrho\big{(}P(\bm{X}^{N},\bm{X}^{N*})\big{)}\leq\varrho\big{(}P(\bm{x},\bm{x}^{*})\big{)}+o(1)\quad\text{with probability}\quad 1-o(1)

for every Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle, where ϱ(a)=sup{|λ|:λsp(a)}\varrho(a)=\sup\{|\lambda|:\lambda\in\mathrm{sp}(a)\} denotes the spectral radius of any (not necessarily normal) operator aa.

Proof.

This follows immediately from the fact that the spectral radius is upper semicontinuous with respect to the operator norm [62, §104]. ∎

2.3. Strong implies weak

While we have formulated weak and strong convergence as distinct phenomena, it turns out that strong convergence—or even merely a one-sided form of it—often automatically implies weak convergence. Such a statement should be viewed with suspicion, since the definition of weak convergence requires us to specify a trace while the definition of strong convergence is independent of the trace. However, it turns out that many CC^{*}-algebras have a unique trace, and this is precisely the setting we will consider.

Lemma 2.13 (Strong implies weak).

Let 𝐗N=(X1N,,XrN)\bm{X}^{N}=(X_{1}^{N},\ldots,X_{r}^{N}) be a sequence of rr-tuples of random matrices, and let 𝐱=(x1,,xr)\bm{x}=(x_{1},\ldots,x_{r}) be an rr-tuple of elements of a CC^{*}-probability space (𝒜,τ)(\mathcal{A},\tau). Consider the following conditions.

  1. a.

    For every Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle

    P(𝑿N,𝑿N)P(𝒙,𝒙)+o(1)with probability1o(1).\|P(\bm{X}^{N},\bm{X}^{N*})\|\leq\|P(\bm{x},\bm{x}^{*})\|+o(1)\quad\text{with probability}\quad 1-o(1).
  2. b.

    𝑿N\bm{X}^{N} converges weakly to 𝒙\bm{x}.

  3. c.

    For every Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle

    P(𝑿N,𝑿N)P(𝒙,𝒙)o(1)with probability1o(1).\|P(\bm{X}^{N},\bm{X}^{N*})\|\geq\|P(\bm{x},\bm{x}^{*})\|-o(1)\quad\text{with probability}\quad 1-o(1).

Then bcb\Rightarrow c, and aba\Rightarrow b if in addition C(𝐱)C^{*}(\bm{x}) has a unique trace.

Proof.

To prove bcb\Rightarrow c, note that weak convergence implies

P(𝑿N,𝑿N)tr(|P(𝑿N,𝑿N)|2p)12p=τ(|P(𝒙,𝒙)|2p)12po(1)\|P(\bm{X}^{N},\bm{X}^{N*})\|\geq\mathop{\mathrm{tr}}\big{(}|P(\bm{X}^{N},\bm{X}^{N*})|^{2p}\big{)}^{\frac{1}{2p}}=\tau\big{(}|P(\bm{x},\bm{x}^{*})|^{2p}\big{)}^{\frac{1}{2p}}-o(1)

with probability 1o(1)1-o(1) for every pp\in\mathbb{N}, as |P|2p=(PP)px1,,x2r|P|^{2p}=(P^{*}P)^{p}\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle. The conclusion follows by letting pp\to\infty and applying Lemma 2.6.

To prove aba\Rightarrow b, let us first consider the special case that 𝑿N\bm{X}^{N} are nonrandom. Define a linear functional N:x1,,x2r\ell_{N}:\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle\to\mathbb{C} by

N(P)=trP(𝑿N,𝑿N).\ell_{N}(P)=\mathop{\mathrm{tr}}P(\bm{X}^{N},\bm{X}^{N*}).

This is called the law of the family 𝑿N\bm{X}^{N}; it has the same properties as the trace in Definition 2.2, but restricted only to polynomials. Note that by linearity, N\ell_{N} is fully determined by its value on all monomials.

Since |N(Q)|maxN,iXiNdeg(Q)|\ell_{N}(Q)|\leq\max_{N,i}\|X_{i}^{N}\|^{\deg(Q)} for every monomial QQ, the sequence N\ell_{N} is precompact in the weak-topology. Thus for every subsequence of the indices NN, there is a further subsequence so that N\ell_{N}\to\ell pointwise for some law \ell that satisfies the properties of a trace. On the other hand, condition aa ensures that

|(P)|=lim|N(P)|lim supP(𝑿N,𝑿N)P(𝒙,𝒙)|\ell(P)|=\lim|\ell_{N}(P)|\leq\limsup\|P(\bm{X}^{N},\bm{X}^{N*})\|\leq\|P(\bm{x},\bm{x}^{*})\|

where the limits are taken along the subsequence. Thus \ell extends by continuity to a trace on C(𝒙)C^{*}(\bm{x}). Since the latter has the unique trace property, we must have (P)=τ(P(𝒙,𝒙))\ell(P)=\tau(P(\bm{x},\bm{x}^{*})), and thus we have proved weak convergence.

When 𝑿N\bm{X}^{N} are random, we note that condition aa implies (by Borel-Cantelli and as x1,,x2r\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle is separable) that for every subequence of indices NN, we can find a further subsequence along which P(𝑿N,𝑿N)P(𝒙,𝒙)+o(1)\|P(\bm{X}^{N},\bm{X}^{N*})\|\leq\|P(\bm{x},\bm{x}^{*})\|+o(1) for every Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle a.s. The proof now proceeds as in the nonrandom case. ∎

The unique trace property turns out to arise frequently in practice. In particular, that Cred(𝐅r)C^{*}_{\rm red}(\mathbf{F}_{r}) has a unique trace for r2r\geq 2 is a classical result of Powers [118], and a general characterization of countable groups 𝐆\mathbf{G} so that Cred(𝐆)C^{*}_{\rm red}(\mathbf{G}) has a unique trace is given by Breuillard–Kalantar–Kennedy–Ozawa [24]. In such situations, Lemma 2.13 shows that a strong convergence upper bound (condition aa) already suffices to establish both strong and weak convergence in full. Establishing such an upper bound is the main difficulty in proofs of strong convergence.

Remark 2.14.

The implication aca\Rightarrow c of Lemma 2.13 also holds under the alternative hypothesis that C(𝒙)C^{*}(\bm{x}) is a simple CC^{*}-algebra; see [86, pp. 16–19].

2.4. Scalar, matrix, and operator coefficients

In Definition 2.9, we have defined the weak and strong convergence properties for polynomials PP with scalar coefficients. However, applications often require polynomials with matrix or even operator coefficients to encode the models of interest.999Let (𝒜,τ)(\mathcal{A},\tau) and (,σ)(\mathcal{B},\sigma) be CC^{*}-probability spaces. If 𝒙=(x1,,xr)\bm{x}=(x_{1},\ldots,x_{r}) are elements of 𝒜\mathcal{A} and Px1,,x2rP\in\mathcal{B}\otimes\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle is a polynomial with coefficients in \mathcal{B}, then P(𝒙,𝒙)P(\bm{x},\bm{x}^{*}) lies in the algebraic tensor product 𝒜alg\mathcal{A}\otimes_{\rm alg}\mathcal{B}. This viewpoint suffices for weak convergence. To make sense of strong convergence, however, we must define a norm on the tensor product. We will do so in the obvious way: Given 𝒜B(H1)\mathcal{A}\subseteq B(H_{1}) and B(H2)\mathcal{B}\subseteq B(H_{2}), we define the CC^{*}-algebra 𝒜B(H1H2)\mathcal{A}\otimes\mathcal{B}\subseteq B(H_{1}\otimes H_{2}) by 𝒜=cl(span{ab:a𝒜,b}),\mathcal{A}\otimes\mathcal{B}=\mathrm{cl}_{\|\cdot\|}\big{(}\mathop{\mathrm{span}}\{a\otimes b:a\in\mathcal{A},b\in\mathcal{B}\}\big{)}, and extend the trace τσ:𝒜\tau\otimes\sigma:\mathcal{A}\otimes\mathcal{B}\to\mathbb{C} accordingly. This construction is called the minimal tensor product of CC^{*}-probability spaces, and is often denoted min\otimes_{\rm min}. For simplicity, we fix the following convention: in this survey, the notation \otimes will always denote the minimal tensor product. We now show that such properties are already implied by their counterparts for scalar polynomials.

For weak convergence, this situation is easy.

Lemma 2.15 (Operator-valued weak convergence).

The following are equivalent.

  1. a.

    𝑿N\bm{X}^{N} converges weakly to 𝒙\bm{x}, i.e., for all Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle

    limNtr(P(𝑿N,𝑿N))=τ(P(𝒙,𝒙))in probability.\lim_{N\to\infty}\mathop{\mathrm{tr}}(P(\bm{X}^{N},\bm{X}^{N*}))=\tau(P(\bm{x},\bm{x}^{*}))\quad\text{in probability}.
  2. b.

    For any CC^{*}-probability space (,σ)(\mathcal{B},\sigma) and Px1,,x2rP\in\mathcal{B}\otimes\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle

    limN(σtr)(P(𝑿N,𝑿N))=(στ)(P(𝒙,𝒙))in probability.\lim_{N\to\infty}(\sigma\otimes\mathop{\mathrm{tr}})(P(\bm{X}^{N},\bm{X}^{N*}))=(\sigma\otimes\tau)(P(\bm{x},\bm{x}^{*}))\quad\text{in probability}.
Proof.

That bab\Rightarrow a is obvious. To prove aba\Rightarrow b, let us express Px1,,x2rP\in\mathcal{B}\otimes\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle concretely as P(x1,,x2r)=b0𝟏+k=1qi1,,ik=12rbi1,,ikxi1xikP(x_{1},\ldots,x_{2r})=b_{0}\otimes\mathbf{1}+\sum_{k=1}^{q}\sum_{i_{1},\ldots,i_{k}=1}^{2r}b_{i_{1},\ldots,i_{k}}\otimes x_{i_{1}}\cdots x_{i_{k}} with operator coefficients bi1,,ikb_{i_{1},\ldots,i_{k}}\in\mathcal{B}. Then clearly

(σtr)(P(𝑿N,𝑿N))=σ(b0)+k=1qi1,,ik=12rσ(bi1,,ik)tr(XNi1XNik)(\sigma\otimes\mathop{\mathrm{tr}})(P(\bm{X}^{N},\bm{X}^{N*}))=\sigma(b_{0})+\sum_{k=1}^{q}\sum_{i_{1},\ldots,i_{k}=1}^{2r}\sigma(b_{i_{1},\ldots,i_{k}})\,\mathop{\mathrm{tr}}(X^{N}_{i_{1}}\cdots X^{N}_{i_{k}})

where we denote Xr+iN=XiNX_{r+i}^{N}=X_{i}^{N*} for i=1,,ri=1,\ldots,r. Since aa yields tr(XNi1XNik)τ(xi1xik)\mathop{\mathrm{tr}}(X^{N}_{i_{1}}\cdots X^{N}_{i_{k}})\to\tau(x_{i_{1}}\cdots x_{i_{k}}) for all k,i1,,ikk,i_{1},\ldots,i_{k}, the conclusion follows. ∎

Unfortunately, the analogous equivalence for strong convergence is simply false at this level of generality; a counterexample can be constructed as in [33, Appendix A]. Nonetheless, strong convergence extends in complete generality to polynomials with matrix (as opposed to operator) coefficients. This justifies the apparently more general Definition 1.1 given in the introduction.

Lemma 2.16 (Matrix-valued strong convergence).

The following are equivalent.

  1. a.

    𝑿N\bm{X}^{N} converges strongly to 𝒙\bm{x}, i.e., for all Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle

    limNP(𝑿N,𝑿N)=P(𝒙,𝒙)in probability.\lim_{N\to\infty}\|P(\bm{X}^{N},\bm{X}^{N*})\|=\|P(\bm{x},\bm{x}^{*})\|\quad\text{in probability}.
  2. b.

    For every DD\in\mathbb{N} and PMD()x1,,x2rP\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle

    limNP(𝑿N,𝑿N)=P(𝒙,𝒙)in probability.\lim_{N\to\infty}\|P(\bm{X}^{N},\bm{X}^{N*})\|=\|P(\bm{x},\bm{x}^{*})\|\quad\text{in probability}.
Proof.

That bab\Rightarrow a is obvious. To prove aba\Rightarrow b, express PMD()x1,,x2rP\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle as P=i,j=1DeiejPijP=\sum_{i,j=1}^{D}e_{i}e_{j}^{*}\otimes P_{ij} with Pijx1,,x2rP_{ij}\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle, where e1,,eDe_{1},\ldots,e_{D} denotes the standard basis of D\mathbb{C}^{D}. We can therefore estimate

maxi,jPij(𝒙,𝒙)P(𝒙,𝒙)D2maxi,jPij(𝒙,𝒙),\max_{i,j}\|P_{ij}(\bm{x},\bm{x}^{*})\|\leq\|P(\bm{x},\bm{x}^{*})\|\leq D^{2}\max_{i,j}\|P_{ij}(\bm{x},\bm{x}^{*})\|,

and analogously for P(𝑿N,𝑿N)P(\bm{X}^{N},\bm{X}^{N*}). Here we used Pij=(eiei𝟏)P(ejej𝟏)\|P_{ij}\|=\|(e_{i}e_{i}^{*}\otimes\mathbf{1})P(e_{j}e_{j}^{*}\otimes\mathbf{1})\| for the first inequality and the triangle inequality for the second. Thus aa yields

D2P(𝒙,𝒙)o(1)P(𝑿N,𝑿N)D2P(𝒙,𝒙)+o(1)D^{-2}\|P(\bm{x},\bm{x}^{*})\|-o(1)\leq\|P(\bm{X}^{N},\bm{X}^{N*})\|\leq D^{2}\|P(\bm{x},\bm{x}^{*})\|+o(1)

for probability 1o(1)1-o(1) as NN\to\infty for every PMD()x1,,x2rP\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle. Now note that since P2p=(PP)p\|P\|^{2p}=\|(P^{*}P)^{p}\| and (PP)pMD()x1,,x2r(P^{*}P)^{p}\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle for every pp\in\mathbb{N}, applying the above inequality to (PP)p(P^{*}P)^{p} implies a fortiori that

D1/pP(𝒙,𝒙)o(1)P(𝑿N,𝑿N)D1/pP(𝒙,𝒙)+o(1)D^{-1/p}\|P(\bm{x},\bm{x}^{*})\|-o(1)\leq\|P(\bm{X}^{N},\bm{X}^{N*})\|\leq D^{1/p}\|P(\bm{x},\bm{x}^{*})\|+o(1)

for probability 1o(1)1-o(1) as NN\to\infty. Taking pp\to\infty completes the proof. ∎

Strong convergence of polynomials with operator coefficients requires additional assumptions. For example, if the coefficients are compact operators, strong convergence follows easily from Lemma 2.16 since compact operators can be approximated in norm by finite rank operators (i.e., by matrices).

A much weaker requirement is provided by the following property of CC^{*}-algebras. We give the definition in the form that is most relevant for our purposes; its equivalence to the original more algebraic definition (in terms of short exact sequences) is a nontrivial fact due to Kirchberg, see [115, Chapter 17] or [25].

Definition 2.17 (Exact CC^{*}-algebra).

A CC^{*}-algebra \mathcal{B} is called exact if for every finite-dimensional subspace 𝒮\mathcal{S}\subseteq\mathcal{B} and ε>0\varepsilon>0, there exists DD\in\mathbb{N} and a linear embedding u:𝒮MD()u:\mathcal{S}\to\mathrm{M}_{D}(\mathbb{C}) such that

(uid)(x)x(1+ε)(uid)(x)\|(u\otimes\mathrm{id})(x)\|\leq\|x\|\leq(1+\varepsilon)\|(u\otimes\mathrm{id})(x)\|

for every CC^{*}-algebra 𝒜\mathcal{A} and x𝒮𝒜x\in\mathcal{S}\otimes\mathcal{A}.

We can now prove the following.

Lemma 2.18 (Operator-valued strong convergence).

Suppose that 𝐗N\bm{X}^{N} converges strongly to 𝐱\bm{x}. Then we have

limNP(𝑿N,𝑿N)=P(𝒙,𝒙)in probability\lim_{N\to\infty}\|P(\bm{X}^{N},\bm{X}^{N*})\|=\|P(\bm{x},\bm{x}^{*})\|\quad\text{in probability}

for every Px1,,x2rP\in\mathcal{B}\otimes\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle with coefficients in an exact CC^{*}-algebra \mathcal{B}.

Proof.

Fix Px1,,x2rP\in\mathcal{B}\otimes\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle, let 𝒮\mathcal{S}\subseteq\mathcal{B} be the linear span of the operator coefficients of PP, and let ε>0\varepsilon>0. Let u:𝒮MD()u:\mathcal{S}\to\mathrm{M}_{D}(\mathbb{C}) be the embedding provided by Definition 2.17. Since Q=(uid)(P)MD()x1,,x2rQ=(u\otimes\mathrm{id})(P)\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle, we obtain

Q(𝒙,𝒙)o(1)P(𝑿N,𝑿N)(1+ε)Q(𝒙,𝒙)+o(1)\|Q(\bm{x},\bm{x}^{*})\|-o(1)\leq\|P(\bm{X}^{N},\bm{X}^{N*})\|\leq(1+\varepsilon)\|Q(\bm{x},\bm{x}^{*})\|+o(1)

with probability 1o(1)1-o(1) as NN\to\infty by Lemma 2.16, while

Q(𝒙,𝒙)P(𝒙,𝒙)(1+ε)Q(𝒙,𝒙).\|Q(\bm{x},\bm{x}^{*})\|\leq\|P(\bm{x},\bm{x}^{*})\|\leq(1+\varepsilon)\|Q(\bm{x},\bm{x}^{*})\|.

The conclusion follows by letting ε0\varepsilon\downarrow 0. ∎

The exactness property turns out to arise frequently in practice. In particular, Cred(𝐅r)C^{*}_{\rm red}(\mathbf{F}_{r}) is exact [115, Corollary 17.10], as is Cred(𝐆)C^{*}_{\rm red}(\mathbf{G}) for many other groups 𝐆\mathbf{G}. For an extensive discussion, see [25, Chapter 5] or [6].

One reason that exactness is very useful in a strong convergence context is that it enables us construct complex strong convergence models by combining simpler building blocks, as will be explained briefly in section 5.4. Another useful application of exactness is that it enables an improved form of Lemma 2.13 with uniform bounds over polynomials with matrix coefficients of any dimension [90, §5.3].

2.5. Linearization

In the previous section, we showed that strong convergence of polynomials with scalar coefficients implies strong convergence of polynomials with matrix coefficients. If we allow for matrix coefficients, however, we can achieve a different kind of simplification: to establish strong convergence, it suffices to consider only polynomials with matrix coefficients of degree one. This nontrivial fact is often referred to as the linearization trick.

We first develop a version of the linearization trick for unitary families.

Theorem 2.19 (Unitary linearization).

Let 𝐔N=(U1N,,UrN)\bm{U}^{N}=(U_{1}^{N},\ldots,U_{r}^{N}) be a sequence of rr-tuples of unitary random matrices, and let 𝐮=(u1,,ur)\bm{u}=(u_{1},\ldots,u_{r}) be an rr-tuple of unitaries in a CC^{*}-algebra 𝒜\mathcal{A}. Then the following are equivalent.

  1. a.

    For every DD\in\mathbb{N} and self-adjoint PMD()x1,,x2rP\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle x_{1},\dots,x_{2r}\rangle of degree one,

    limnP(𝑼N,𝑼N)=P(𝒖,𝒖)in probability.\lim_{n\to\infty}\|P(\bm{U}^{N},\bm{U}^{N*})\|=\|P(\bm{u},\bm{u}^{*})\|\quad\text{in probability}.
  2. b.

    𝑼N\bm{U}^{N} converges strongly to 𝒖\bm{u}.

Theorem 2.19 is due to Pisier [114, 117], but the elementary proof we present here is due to Lehner [84, §5.1]. We will need a classical lemma.

Lemma 2.20.

For any operator XX in a CC^{*}-algebra 𝒜\mathcal{A}, define its self-adjoint dilation X~=e1e2X+e2e1X\tilde{X}=e_{1}e_{2}^{*}\otimes X+e_{2}e_{1}^{*}\otimes X^{*} in M2()𝒜\mathrm{M}_{2}(\mathbb{C})\otimes\mathcal{A}. Then X=X~\|X\|=\|\tilde{X}\| and sp(X~)=sp(X~)\mathrm{sp}(\tilde{X})=-\mathrm{sp}(\tilde{X}).

Proof.

We first note that X~2=X~2=e1e1XX+e2e2XX=X2\|\tilde{X}\|^{2}=\|\tilde{X}^{2}\|=\|e_{1}e_{1}^{*}\otimes XX^{*}+e_{2}e_{2}^{*}\otimes X^{*}X\|=\|X\|^{2}. To show that the spectrum is symmetric, it suffices to note that X~\tilde{X} is unitarily conjugate to X~-\tilde{X} since UX~U=X~U\tilde{X}U^{*}=-\tilde{X} with U=(e1e1e2e2)𝟏U=(e_{1}e_{1}^{*}-e_{2}e_{2}^{*})\otimes\mathbf{1}. ∎

The main step in the proof of Theorem 2.19 is as follows.

Lemma 2.21.

Fix D,rD,r\in\mathbb{N} and AijMD()A_{ij}\in\mathrm{M}_{D}(\mathbb{C}) for i,j[r]i,j\in[r]. Then there exist C0C\geq 0, DD^{\prime}\in\mathbb{N} and AiMD()A_{i}^{\prime}\in\mathrm{M}_{D^{\prime}}(\mathbb{C}) for i[r]i\in[r] such that

i,j=1rAijUiUj=i=1rAiUi2C\Bigg{\|}\sum_{i,j=1}^{r}A_{ij}\otimes U_{i}^{*}U_{j}\Bigg{\|}=\Bigg{\|}\sum_{i=1}^{r}A_{i}^{\prime}\otimes U_{i}\Bigg{\|}^{2}-C

for any family of unitaries U1,,UrU_{1},\ldots,U_{r} in any CC^{*}-algebra 𝒜\mathcal{A}.

Proof.

Let X=i,j=1rAijUiUjX=\sum_{i,j=1}^{r}A_{ij}\otimes U_{i}^{*}U_{j}. Lemma 2.20 yields X~=i,j=1rA~ijUiUj\tilde{X}=\sum_{i,j=1}^{r}\tilde{A}_{ij}\otimes U_{i}^{*}U_{j} with A~ij=e1e2Aij+e2e1Aji\tilde{A}_{ij}=e_{1}e_{2}^{*}\otimes A_{ij}+e_{2}e_{1}^{*}\otimes A_{ji}^{*}. We therefore obtain for any c>0c>0

X+rc=X~+rc=X~+rc𝟏=i,j=1r(A~ij+c1i=j𝟏)UiUj,\|X\|+rc=\|\tilde{X}\|+rc=\|\tilde{X}+rc\mathbf{1}\|=\Bigg{\|}\sum_{i,j=1}^{r}(\tilde{A}_{ij}+c1_{i=j}\mathbf{1})\otimes U_{i}^{*}U_{j}\Bigg{\|},

where the second equality used that X~\tilde{X} has a symmetric spectrum.

Now note that the r×rr\times r block matrix A~=(A~ij+c1i=j𝟏)i,j[r]M2Dr()\tilde{A}=(\tilde{A}_{ij}+c1_{i=j}\mathbf{1})_{i,j\in[r]}\in\mathrm{M}_{2Dr}(\mathbb{C}) is self-adjoint, and we can choose cc sufficiently large so that it is positive definite. Then we may write A~=BB\tilde{A}=B^{*}B for BM2Dr()B\in\mathrm{M}_{2Dr}(\mathbb{C}). Now view BB as an 1×r1\times r block matrix with 2Dr×2D2Dr\times 2D blocks B1,,BrB_{1},\ldots,B_{r}, so that A~ij+c1i=j𝟏=BiBj\tilde{A}_{ij}+c1_{i=j}\mathbf{1}=B_{i}^{*}B_{j}. Therefore X+rc=YY=Y2\|X\|+rc=\|Y^{*}Y\|=\|Y\|^{2} with Y=i=1rBiUiY=\sum_{i=1}^{r}B_{i}\otimes U_{i}. To conclude we let C=rcC=rc, D=2DrD^{\prime}=2Dr, and define AiA_{i}^{\prime} by padding BiB_{i} with 2D(r1)2D(r-1) zero columns. ∎

We can now conclude the proof of Theorem 2.19.

Proof of Theorem 2.19.

Fix any PMD()x1,,xrP\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle x_{1},\ldots,x_{r}\rangle of degree at most 2q2^{q}, and let 𝒖=(u1,,ur)\bm{u}=(u_{1},\ldots,u_{r}) be unitaries in any CC^{*}-algebra 𝒜\mathcal{A}. Denote by U1,,URU_{1},\ldots,U_{R} all monomials of degree at most 2q12^{q-1} in the variables 𝒖,𝒖\bm{u},\bm{u}^{*}. Then we may clearly express P(𝒖,𝒖)=i,j=1RAijUiUjP(\bm{u},\bm{u}^{*})=\sum_{i,j=1}^{R}A_{ij}\otimes U_{i}^{*}U_{j} for some matrix coefficients AijMD()A_{ij}\in\mathrm{M}_{D}(\mathbb{C}). Lemma 2.21 yields PMD()x1,,xrP^{\prime}\in\mathrm{M}_{D^{\prime}}(\mathbb{C})\otimes\mathbb{C}\langle x_{1},\ldots,x_{r}\rangle of degree at most 2q12^{q-1} so that

P(𝒖,𝒖)=P(𝒖,𝒖)2C.\|P(\bm{u},\bm{u}^{*})\|=\|P^{\prime}(\bm{u},\bm{u}^{*})\|^{2}-C.

Iterating this procedure qq times and using Lemma 2.20, we obtain a self-adjoint QMD()x1,,xrQ\in\mathrm{M}_{D^{\prime\prime}}(\mathbb{C})\otimes\mathbb{C}\langle x_{1},\ldots,x_{r}\rangle of degree at most one and a real polynomial hh so that

P(𝒖,𝒖)=h(Q(𝒖,𝒖))\|P(\bm{u},\bm{u}^{*})\|=h(\|Q(\bm{u},\bm{u}^{*})\|)

for any rr-tuple of unitaries 𝒖=(u1,,ur)\bm{u}=(u_{1},\ldots,u_{r}) in any CC^{*}-algebra 𝒜\mathcal{A}. As this identity therefore applies also to 𝑼N\bm{U}^{N}, the implication aba\Rightarrow b follows immediately. The converse implication bab\Rightarrow a follows from Lemma 2.16. ∎

We have included a full proof of Theorem 2.19 to give a flavor of how the linearization trick comes about. In the rest of this section, we briefly discuss two additional linearization results without proof.

The proof of Theorem 2.19 relied crucially on the unitary assumption. It is tempting to conjecture that its conclusion extends to the non-unitary case. Unfortuntately, a simple example shows that this cannot be true.

Example 2.22.

Consider any DD\in\mathbb{N} and PMD()x1P\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle x_{1}\rangle of degree one, that is, P(x1)=A0𝟏+A1x1P(x_{1})=A_{0}\otimes\mathbf{1}+A_{1}\otimes x_{1}. Then the spectral theorem yields

P(x)=supλsp(x)A0+λA1\|P(x)\|=\sup_{\lambda\in\mathrm{sp}(x)}\|A_{0}+\lambda A_{1}\|

for every self-adjoint operator xx. Now let x,yx,y be self-adjoint operators with sp(x)=[1,1]\mathrm{sp}(x)=[-1,1] and sp(y)={1,1}\mathrm{sp}(y)=\{-1,1\}. Since the right-hand side of the above identity is the supremum of a convex function of λ\lambda, it is clear that P(x)=P(y)\|P(x)\|=\|P(y)\| for every PMD()x1P\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle x_{1}\rangle of degree one. But clearly 1x2=1\|1-x^{2}\|=1 while 1y2=0\|1-y^{2}\|=0.

This example shows that the norms of polynomials of degree one cannot detect gaps in the spectrum of a self-adjoint operator, while higher degree polynomials can. Thus the norm of degree one polynomials does not suffice for strong convergence in the self-adjoint setting. However, it was realized by Haagerup and Thorbjørnsen [60] that this issue can be surmounted by requiring convergence not just of the norm, but rather of the full spectrum, of degree one polynomials.

Theorem 2.23 (Self-adjoint linearization).

Let 𝐗N=(X1N,,XrN)\bm{X}^{N}=(X_{1}^{N},\ldots,X_{r}^{N}) be a sequence of rr-tuples of self-adjoint random matrices, and let 𝐱=(x1,,xr)\bm{x}=(x_{1},\ldots,x_{r}) be an rr-tuple of self-adjoint elements of a CC^{*}-algebra 𝒜\mathcal{A}. The following are equivalent.

  1. a.

    For every DD\in\mathbb{N} and self-adjoint PMD()x1,,xrP\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle x_{1},\dots,x_{r}\rangle of degree one,

    sp(P(𝑿N))sp(P(𝒙))+o(1)[1,1]with probability1o(1).\mathrm{sp}\big{(}P(\bm{X}^{N})\big{)}\subseteq\mathrm{sp}\big{(}P(\bm{x})\big{)}+o(1)[-1,1]\quad\text{with probability}\quad 1-o(1).
  2. b.

    For every DD\in\mathbb{N} and self-adjoint PMD()x1,,xrP\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle x_{1},\dots,x_{r}\rangle

    P(𝑿N)P(𝒙)+o(1)with probability1o(1).\|P(\bm{X}^{N})\|\leq\|P(\bm{x})\|+o(1)\quad\text{with probability}\quad 1-o(1).

We omit the proof, which may be found in [60] or in [59] (see also [99, §10.3]). Let us note that while this theorem only gives an upper bound, the corresponding lower bound will often follow from Lemma 2.13.

Finally, while we have focused on strong convergence, linearization tricks for weak convergence can be found in the paper [45] of de la Salle. For example, we state the following result which follows readily from the proof of [45, Lemma 1.1].

Lemma 2.24 (Linearization and weak convergence).

Let (𝒜,τ)(\mathcal{A},\tau) be a CC^{*}-probability space. Then in the setting of Theorem 2.23, the following are equivalent.

  1. a.

    For every p,Dp,D\in\mathbb{N} and self-adjoint PMD()x1,,xrP\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle x_{1},\dots,x_{r}\rangle of degree one,

    limNtr(P(𝑿N)2p)=(trτ)(P(𝒙)2p)in probability.\lim_{N\to\infty}\mathop{\mathrm{tr}}\big{(}P(\bm{X}^{N})^{2p}\big{)}=({\mathop{\mathrm{tr}}}\otimes\tau)\big{(}P(\bm{x})^{2p}\big{)}\quad\text{in probability}.
  2. b.

    𝑿N\bm{X}^{N} converges weakly to 𝒙\bm{x}.

Why is linearization useful? It is often the case that one can perform computations more easily for polynomials of degree one than for general polynomials. For example, linearization played a key role in the Haagerup–Thorbjørnsen proof of strong convergence of GUE matrices [60] because the matrix Cauchy transform of polynomials of degree one can be computed by means of quadratic equations. Similarly, polynomials of degree one make the moment computations in the works of Bordenave and Collins [19, 21, 20] tractable. However, the interpolation and polynomial methods discussed in section 1.3 do not rely on linearization.

2.6. Positivization

The linearization trick of the previous section states that if we work with general matrix coefficients, it suffices to consider only polynomials of degree one. We now introduce (in the setting of group CC^{*}-algebras) a complementary principle: if we admit polynomials of any degree, it suffices to consider only polynomials with positive scalar coefficients. This positivization trick due to Mikael de la Salle appears in slightly different form in [90, §6.2].101010The form of this principle that is presented here was explained to the author by de la Salle.

The positivization trick will rely on another nontrivial operator algebraic property that we introduce presently. Let us fix a finitely generated group 𝐆\mathbf{G} with generators g1,,grg_{1},\ldots,g_{r}, let λ:𝐆B(l2(𝐆))\lambda:\mathbf{G}\to B(l^{2}(\mathbf{G})) be its left-regular representation, and let τ\tau be the canonical trace on Cred(𝐆)C^{*}_{\rm red}(\mathbf{G}). For simplicity, we will denote ui=λ(gi)u_{i}=\lambda(g_{i}). Then for any Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle, we can uniquely express

P(𝒖,𝒖)=g𝐆agλ(g)P(\bm{u},\bm{u}^{*})=\sum_{g\in\mathbf{G}}a_{g}\,\lambda(g) (2.3)

for some coefficients aga_{g}\in\mathbb{C} that vanish for all but a finite number of g𝐆g\in\mathbf{G}. Moreover, it is readily verified using the definition of the trace that

P(𝒖,𝒖)2=τ(|P(𝒖,𝒖)|2)12=(g𝐆|ag|2)12.\|P(\bm{u},\bm{u}^{*})\|_{2}=\tau(|P(\bm{u},\bm{u}^{*})|^{2})^{\frac{1}{2}}=\Bigg{(}\sum_{g\in\mathbf{G}}|a_{g}|^{2}\Bigg{)}^{\frac{1}{2}}.

We can now introduce the following property.

Definition 2.25 (Rapid decay property).

The group 𝐆\mathbf{G} is said to have the rapid decay property if there exists constants C,c>0C,c>0 so that

P(𝒖,𝒖)CqcP(𝒖,𝒖)2\|P(\bm{u},\bm{u}^{*})\|\leq Cq^{c}\|P(\bm{u},\bm{u}^{*})\|_{2}

for all qq\in\mathbb{N} and Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle of degree qq.

The key feature of this property is the polynomial dependence on degree qq. This is a major improvement over the trivial bound obtained by applying the triangle inequality and Cauchy–Schwarz, which would yield such an inequality with an exponential constant |{g𝐆:ag0}|1/2(2r+1)q/2|\{g\in\mathbf{G}:a_{g}\neq 0\}|^{1/2}\leq(2r+1)^{q/2}.

While the rapid decay property appears to be very strong, it is widespread. It was first proved by Haagerup [57] for the free group 𝐆=𝐅r\mathbf{G}=\mathbf{F}_{r}, for which rapid decay property is known as the Haagerup inequality. The rapid decay property is now known to hold for many other groups, cf. [31].

We are now ready to introduce the positivization trick. For simplicity, we formulate the result for the case of the free group 𝐆=𝐅r\mathbf{G}=\mathbf{F}_{r} (see Remark 2.27).

Lemma 2.26 (Positivization).

Let 𝐔N=(U1N,,UrN)\bm{U}^{N}=(U_{1}^{N},\ldots,U_{r}^{N}) be a sequence of rr-tuples of unitary random matrices, and let 𝐮=(u1,,ur)\bm{u}=(u_{1},\ldots,u_{r}) be defined as above for 𝐆=𝐅r\mathbf{G}=\mathbf{F}_{r} (that is, ui=λ(gi)Cred(𝐅r)u_{i}=\lambda(g_{i})\in C^{*}_{\rm red}(\mathbf{F}_{r})). Then the following are equivalent.

  1. a.

    For every self-adjoint P+x1,,x2rP\in\mathbb{R}_{+}\langle x_{1},\dots,x_{2r}\rangle

    P(𝑼N,𝑼N)P(𝒖,𝒖)+o(1)with probability1o(1).\|P(\bm{U}^{N},\bm{U}^{N*})\|\leq\|P(\bm{u},\bm{u}^{*})\|+o(1)\quad\text{with probability}\quad 1-o(1).
  2. b.

    𝑼N\bm{U}^{N} converges strongly to 𝒖\bm{u}.

Proof.

The implication bab\Rightarrow a is trivial. To prove aba\Rightarrow b, fix any Px1,,x2rP\in\mathbb{C}\langle x_{1},\dots,x_{2r}\rangle. We may clearly assume without loss of generality that all the monomials of PP are reduced (i.e., do not contain consecutive letters xixix_{i}x_{i}^{*} or xixix_{i}^{*}x_{i}), so that the coefficients of PP are precisely those that appear in the representation (2.3).

Let us write P=P1+iP2P=P_{1}+iP_{2} for P1,P2x1,,x2rP_{1},P_{2}\in\mathbb{R}\langle x_{1},\dots,x_{2r}\rangle defined by taking the real (imaginary) parts of the coefficients of PP. Since the polynomials PjPjP_{j}^{*}P_{j} are self-adjoint with real coefficients, we can write PjPj=QjRjP_{j}^{*}P_{j}=Q_{j}-R_{j} for self-adjoint Qj,Rj+x1,,x2rQ_{j},R_{j}\in\mathbb{R}_{+}\langle x_{1},\dots,x_{2r}\rangle defined by keeping only the positive (negative) coefficients of PjPjP_{j}^{*}P_{j}. Then we can estimate by the triangle inequality

P(𝑼N,𝑼N)22(P1(𝑼N,𝑼N)2+P2(𝑼N,𝑼N)2)\displaystyle\|P(\bm{U}^{N},\bm{U}^{N*})\|^{2}\leq 2(\|P_{1}(\bm{U}^{N},\bm{U}^{N*})\|^{2}+\|P_{2}(\bm{U}^{N},\bm{U}^{N*})\|^{2})
2(Q1(𝑼N,𝑼N)+R1(𝑼N,𝑼N)+Q2(𝑼N,𝑼N)+R2(𝑼N,𝑼N)).\displaystyle\leq 2(\|Q_{1}(\bm{U}^{N},\bm{U}^{N*})\|+\|R_{1}(\bm{U}^{N},\bm{U}^{N*})\|+\|Q_{2}(\bm{U}^{N},\bm{U}^{N*})\|+\|R_{2}(\bm{U}^{N},\bm{U}^{N*})\|).

On the other hand, note that

Qj(𝒖,𝒖)22+Rj(𝒖,𝒖)22\displaystyle\|Q_{j}(\bm{u},\bm{u}^{*})\|_{2}^{2}+\|R_{j}(\bm{u},\bm{u}^{*})\|_{2}^{2} =Pj(𝒖,𝒖)Pj(𝒖,𝒖)22,\displaystyle=\|P_{j}(\bm{u},\bm{u}^{*})^{*}P_{j}(\bm{u},\bm{u}^{*})\|_{2}^{2},
P1(𝒖,𝒖)22+P2(𝒖,𝒖)22\displaystyle\|P_{1}(\bm{u},\bm{u}^{*})\|_{2}^{2}+\|P_{2}(\bm{u},\bm{u}^{*})\|_{2}^{2} =P(𝒖,𝒖)22.\displaystyle=\|P(\bm{u},\bm{u}^{*})\|_{2}^{2}.

We can therefore estimate

Q1(𝒖,𝒖)+R1(𝒖,𝒖)+Q2(𝒖,𝒖)+R2(𝒖,𝒖)\displaystyle\|Q_{1}(\bm{u},\bm{u}^{*})\|+\|R_{1}(\bm{u},\bm{u}^{*})\|+\|Q_{2}(\bm{u},\bm{u}^{*})\|+\|R_{2}(\bm{u},\bm{u}^{*})\|
Cqc(P1(𝒖,𝒖)P1(𝒖,𝒖)2+P2(𝒖,𝒖)P2(𝒖,𝒖)2)\displaystyle\leq Cq^{c}(\|P_{1}(\bm{u},\bm{u}^{*})^{*}P_{1}(\bm{u},\bm{u}^{*})\|_{2}+\|P_{2}(\bm{u},\bm{u}^{*})^{*}P_{2}(\bm{u},\bm{u}^{*})\|_{2})
Cqc(P1(𝒖,𝒖)2+P2(𝒖,𝒖)2)\displaystyle\leq Cq^{c}(\|P_{1}(\bm{u},\bm{u}^{*})\|^{2}+\|P_{2}(\bm{u},\bm{u}^{*})\|^{2})
CqcP(𝒖,𝒖)2\displaystyle\leq C^{\prime}q^{c^{\prime}}\|P(\bm{u},\bm{u}^{*})\|^{2}

for some C,C,c,c>0C,C^{\prime},c,c^{\prime}>0, where qq is the degree of PP and we have applied the rapid decay property of 𝐅r\mathbf{F}_{r} in the first and last inequality. Thus aa implies that

P(𝑼N,𝑼N)CqcP(𝒖,𝒖)+o(1)with probability1o(1)\|P(\bm{U}^{N},\bm{U}^{N*})\|\leq Cq^{c}\|P(\bm{u},\bm{u}^{*})\|+o(1)\quad\text{with probability}\quad 1-o(1)

for every Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle of degree at most qq and some constants C,c>0C,c^{\prime}>0.

Now note that, for every pp\in\mathbb{N}, applying the above to (PP)p(P^{*}P)^{p} yields

P(𝑼N,𝑼N)C(2pq)c2pP(𝒖,𝒖)+o(1)with probability1o(1).\|P(\bm{U}^{N},\bm{U}^{N*})\|\leq C(2pq)^{\frac{c}{2p}}\|P(\bm{u},\bm{u}^{*})\|+o(1)\quad\text{with probability}\quad 1-o(1).

Taking pp\to\infty yields the strong convergence upper bound, and the lower bound now follows from Lemma 2.13 since Cred(𝐅r)C^{*}_{\rm red}(\mathbf{F}_{r}) has the unique trace property. ∎

The positivization trick is very useful in the context of the polynomial method, as we will see in section 3. Let us however give a hint as to its significance.

For a self-adjoint polynomial PP with positive coefficients, we may interpret (2.3) as defining the adjacency matrix of a weighted graph with vertex set 𝐆\mathbf{G}, where we place an edge with weight aga_{g} between every pair of vertices (w,gw)(w,gw) with w𝐆w\in\mathbf{G} and ag>0a_{g}>0. Thus, for example, computing the moments of P(𝒖,𝒖)P(\bm{u},\bm{u}^{*}) is in essence a combinatorial problem of counting the number of closed walks in this graph. This greatly facilitates the analysis of such quantities; for example, we can obtain upper bounds by overcounting some of the walks.

For a general choice of PP, we may still view P(𝒖,𝒖)P(\bm{u},\bm{u}^{*}) as a kind of adjacency matrix of a graph with complex edge weights. This is a much more complicated object, however, since the moments of this operator may exhibit cancellations between different walks and can therefore no longer by treated as a counting problem. The surprising consequence of the positivization trick is that for the purposes of proving strong convergence, we can completely ignore these cancellations and restrict attention only to the combinatorial situation.

Remark 2.27.

The only part of the proof of Lemma 2.26 where we used 𝐆=𝐅r\mathbf{G}=\mathbf{F}_{r} is in the very first step, where we argued that we may assume that the coefficients of PP agree with those in the representation (2.3). For other groups 𝐆\mathbf{G}, it is not clear that this is the case unless we assume that the matrices 𝑼N\bm{U}^{N} also satisfy the group relations, i.e., that UiN=πN(gi)U_{i}^{N}=\pi_{N}(g_{i}) where πN:𝐆MN()\pi_{N}:\mathbf{G}\to\mathrm{M}_{N}(\mathbb{C}) is a (random) unitary representation of 𝐆\mathbf{G}. Under the latter assumption, Lemma 2.26 extends directly to any 𝐆\mathbf{G} with the rapid decay and unique trace properties.

Alternatively, when the positivization trick is applied to the polynomial method, it is possible to apply a variant of the argument directly to the limiting object that appears in the proof, avoiding the need to invoke properties of the random matrices. This form of the positivization trick is developed in [90, §6.2] (cf. Remark 3.11).

3. The polynomial method

The polynomial method, which was introduced in the recent work of Chen, Garza-Vargas, Tropp, and the author [32], has enabled significantly simpler proofs of strong convergence and has opened the door to various new developments. The method was briefly introduced in section 1.3.2 above. In this section, we aim to provide a detailed illustration of this method by using it to prove strong convergence of random permutation matrices (Theorem 1.4).

We will follow a simplified form of the treatment in [32]. The simplifications arise for two reasons: we will make no attempt to get good quantitative bounds, enabling us to to use crude estimates in various places; and we will take advantage of the idea of [90] to significantly simplify one part of the argument by exploiting positivization. Aside from the use of standard results on polynomials and Schwartz distributions, the proof given here is essentially self-contained.

Despite its simplicity, what makes the polynomial method work appears rather mysterious at first sight. We will conclude this section with a discussion of the new phenomenon that is captured by this method (section 3.6).

Significant refinements of the polynomial method may be found in [33, 92].

3.1. Outline

In the following, we fix independent random permutation matrices 𝑼N=(U1N,,UrN)\bm{U}^{N}=(U_{1}^{N},\ldots,U_{r}^{N}) and the limiting model 𝒖=(u1,,ur)\bm{u}=(u_{1},\ldots,u_{r}) as in Theorem 1.4. More precisely, recall that ui=λ(gi)u_{i}=\lambda(g_{i}), where g1,,grg_{1},\ldots,g_{r} and λ\lambda are the free generators and left-regular representation of 𝐅r\mathbf{F}_{r}. We will view 𝒖\bm{u} as living in the CC^{*}-probability space (Cred(𝐅r),τ)(C^{*}_{\rm red}(\mathbf{F}_{r}),\tau) where τ\tau denotes the canonical trace.

For notational purposes, it will be convenient to define g0=eg_{0}=e and gr+i=gi1g_{r+i}=g_{i}^{-1} for i=1,,ri=1,\ldots,r. We analogously define u0=𝟏u_{0}=\mathbf{1} and ur+i=uiu_{r+i}=u_{i}^{*}, and similarly U0N=𝟏U_{0}^{N}=\mathbf{1} and Ur+iN=UiNU_{r+i}^{N}=U_{i}^{N*}, for i=1,,ri=1,\ldots,r. We will think of rr as fixed, and all constants that appear in this section may depend on rr.

We begin by outlining the key ingredients that are needed to conclude the proof. These ingredients will then be developed in the remainder of this section.

3.1.1. Polynomial encoding

The first step of the analysis is to show that the expected traces of monomials of 𝑼N|1\bm{U}^{N}|_{1^{\perp}} are rational expressions of 1N\frac{1}{N}.

Lemma 3.1.

For every qq\in\mathbb{N} and 𝐰=(w1,,wq){0,,2r}q\bm{w}=(w_{1},\ldots,w_{q})\in\{0,\ldots,2r\}^{q}, there exist real polynomials f𝐰f_{\bm{w}} and gqg_{q} of degree at most CqCq so that for all NqN\geq q

𝐄[trUw1NUwqN|1]=f𝒘(1N)gq(1N)=Φ𝒘(1N).\mathbf{E}\big{[}\mathop{\mathrm{tr}}U_{w_{1}}^{N}\cdots U_{w_{q}}^{N}|_{1^{\perp}}\big{]}=\frac{f_{\bm{w}}(\frac{1}{N})}{g_{q}(\frac{1}{N})}=\Phi_{\bm{w}}(\tfrac{1}{N}).

Lemma 3.1 immediately implies that

𝐄[trUw1NUwqN|1]=μ0(𝒘)+μ1(𝒘)N+O(1N2)\mathbf{E}\big{[}\mathop{\mathrm{tr}}U_{w_{1}}^{N}\cdots U_{w_{q}}^{N}|_{1^{\perp}}\big{]}=\mu_{0}(\bm{w})+\frac{\mu_{1}(\bm{w})}{N}+O\bigg{(}\frac{1}{N^{2}}\bigg{)}

as NN\to\infty. The values of μ0(𝒘)\mu_{0}(\bm{w}) and μ1(𝒘)\mu_{1}(\bm{w}) can be easily read off from the proof of Lemma 3.1. In particular, it will follow that

μ0(𝒘)=1gw1gwq=e=τ(uw1uwq),\mu_{0}(\bm{w})=1_{g_{w_{1}}\cdots g_{w_{q}}=e}=\tau(u_{w_{1}}\cdots u_{w_{q}}), (3.1)

which essentially establishes weak convergence of 𝑼N|1\bm{U}^{N}|_{1^{\perp}} to 𝒖\bm{u} (albeit in expectation rather than in probability; this will not be important in what follows).

3.1.2. Asymptotic expansion

Now fix a self-adjoint noncommutative polynomial Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle. Then for every univariate real polynomial hh, since hPh\circ P is again a noncommutative polynomial, we immediately obtain

𝐄[trh(P(𝑼N,𝑼N))|1]=ν0(h)+ν1(h)N+O(1N2).\mathbf{E}\big{[}\mathop{\mathrm{tr}}h(P(\bm{U}^{N},\bm{U}^{N*}))|_{1^{\perp}}\big{]}=\nu_{0}(h)+\frac{\nu_{1}(h)}{N}+O\bigg{(}\frac{1}{N^{2}}\bigg{)}. (3.2)

Here ν0\nu_{0} and ν1\nu_{1} are defined, a priori, as linear functionals on the space 𝒫\mathcal{P} of all univariate real polynomials (of course, ν0,ν1\nu_{0},\nu_{1} also depend on the choice of PP, but we will view PP as fixed throughout the argument).

The core of the proof is now to show that the expansion (3.2) is valid not only for polynomial test functions h𝒫h\in\mathcal{P}, but even for arbitrary smooth test functions hC()h\in C^{\infty}(\mathbb{R}). It is far from obvious why this should be the case; for example, it is conceivable that there could exist smooth test functions hh for which weak convergence takes place at a rate slower than the 1N\frac{1}{N} rate for polynomial hh. If that were to be the case, then ν1(h)\nu_{1}(h) would not even make sense for smooth hh. We will show, however, that this hypothetical scenario is not realized.

Recall that a linear functional ν\nu on C()C^{\infty}(\mathbb{R}) is called a compactly supported (Schwartz) distribution (see [72, Chapter II]) if

|ν(h)|ChCm[K,K]for all hC()|\nu(h)|\leq C\|h\|_{C^{m}[-K,K]}\quad\text{for all }h\in C^{\infty}(\mathbb{R})

holds for some constants C,K+C,K\in\mathbb{R}_{+} and m+m\in\mathbb{Z}_{+}.

Proposition 3.2.

For every self-adjoint Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle, the corresponding linear functionals ν0,ν1\nu_{0},\nu_{1} in (3.2) extend to compactly supported Schwartz distributions, and the expansion (3.2) remains valid for any hC()h\in C^{\infty}(\mathbb{R}).

Note that it is immediate from (3.1) that

ν0(h)=τ(h(P(𝒖,𝒖)))\nu_{0}(h)=\tau\big{(}h(P(\bm{u},\bm{u}^{*}))\big{)}

for all hC()h\in C^{\infty}(\mathbb{R}). In other words, ν0=μP(𝒖,𝒖)\nu_{0}=\mu_{P(\bm{u},\bm{u}^{*})} is nothing other than the spectral distribution of P(𝒖,𝒖)P(\bm{u},\bm{u}^{*}). The nontrivial aspect of Proposition 3.2 is that ν1\nu_{1} and the expansion (3.2) make sense for smooth hh as well.

The proof of Proposition 3.2 is the key point of the polynomial method. We will exploit the Markov inequality to achieve a quantitative form of (3.2) for h𝒫h\in\mathcal{P}. The resulting bound is so strong that it can be extended to any hC()h\in C^{\infty}(\mathbb{R}) by means of a simple Fourier-analytic argument.

3.1.3. The infinitesimal distribution

As ν0=μP(𝒖,𝒖)\nu_{0}=\mu_{P(\bm{u},\bm{u}^{*})}, Lemma 2.6 yields

suppν0[P(𝒖,𝒖),P(𝒖,𝒖)].\mathop{\mathrm{supp}}\nu_{0}\subseteq[-\|P(\bm{u},\bm{u}^{*})\|,\|P(\bm{u},\bm{u}^{*})\|].

The final ingredient of the proof is to show that ν1\nu_{1} satisfies the same bound. By the positivization trick, it suffices to assume that PP has positive coefficients.

Lemma 3.3.

For every choice of self-adjoint P+x1,,x2rP\in\mathbb{R}_{+}\langle x_{1},\ldots,x_{2r}\rangle, we have

suppν1[P(𝒖,𝒖),P(𝒖,𝒖)].\mathop{\mathrm{supp}}\nu_{1}\subseteq[-\|P(\bm{u},\bm{u}^{*})\|,\|P(\bm{u},\bm{u}^{*})\|].

To prove Lemma 3.3 we face a conundrum: while we know abstractly that ν1\nu_{1} is a compactly supported distribution, we are only able to compute its value for polynomial test functions (as we have an explicit formula for μ1(𝒘)\mu_{1}(\bm{w}) in section 3.1.1). To surmount this issue, we will use the following general fact [32, Lemma 4.9]: for any compactly supported distribution ν\nu, we have

suppν[ρ,ρ]withρ=lim supp|ν(xp)|1p.\mathop{\mathrm{supp}}\nu\subseteq[-\rho,\rho]\qquad\text{with}\qquad\rho=\limsup_{p\to\infty}|\nu(x^{p})|^{\frac{1}{p}}.

Thus it suffices to show that

lim supp|ν1(xp)|1pP(𝒖,𝒖),\limsup_{p\to\infty}|\nu_{1}(x^{p})|^{\frac{1}{p}}\leq\|P(\bm{u},\bm{u}^{*})\|,

which is tractable as we have access to the moments of ν1\nu_{1}. It is this moment estimate that is greatly simplified by the assumption that PP has positive coefficients.

3.1.4. Proof of Theorem 1.4

We now use these ingredients to conclude the proof.

Proof of Theorem 1.4.

Fix ε>0\varepsilon>0 and a self-adjoint PP with positive coefficients. Moreover, let hh be a nonnegative smooth function that vanishes in a neighborhood of [P(𝒖,𝒖),P(𝒖,𝒖)][-\|P(\bm{u},\bm{u}^{*})\|,\|P(\bm{u},\bm{u}^{*})\|] and such that h(x)=1h(x)=1 for |x|P(𝒖,𝒖)+ε|x|\geq\|P(\bm{u},\bm{u}^{*})\|+\varepsilon.

Note that ν0(h)=ν1(h)=0\nu_{0}(h)=\nu_{1}(h)=0 by Lemma 3.3. Thus Proposition 3.2 yields

𝐄[Trh(P(𝑼N,𝑼N))|1]=N𝐄[trh(P(𝑼N,𝑼N))|1]=o(1)\mathbf{E}\big{[}\mathop{\mathrm{Tr}}h(P(\bm{U}^{N},\bm{U}^{N*}))|_{1^{\perp}}\big{]}=N\,\mathbf{E}\big{[}\mathop{\mathrm{tr}}h(P(\bm{U}^{N},\bm{U}^{N*}))|_{1^{\perp}}\big{]}=o(1)

as NN\to\infty. But since Trh(X)1\mathop{\mathrm{Tr}}h(X)\geq 1 whenever XP(𝒖,𝒖)+ε\|X\|\geq\|P(\bm{u},\bm{u}^{*})\|+\varepsilon, this implies

𝐏[P(𝑼N,𝑼N)|1P(𝒖,𝒖)+ε]=o(1).\mathbf{P}\big{[}\|P(\bm{U}^{N},\bm{U}^{N*})|_{1^{\perp}}\|\geq\|P(\bm{u},\bm{u}^{*})\|+\varepsilon\big{]}=o(1).

As P,εP,\varepsilon are arbitrary, we verified condition aa of Lemma 2.26. ∎

We now turn to the proofs of the various ingredients described above.

3.2. Polynomial encoding

The aim of this section is to prove Lemma 3.1. We follow [85]; see also [105, 42]. We begin by noting that

N𝐄[trUw1NUwqN|1]=𝐄[TrUw1NUwqN|1]=𝐄[TrUw1NUwqN]1,N\,\mathbf{E}\big{[}\mathop{\mathrm{tr}}U_{w_{1}}^{N}\cdots U_{w_{q}}^{N}|_{1^{\perp}}\big{]}=\mathbf{E}\big{[}\mathop{\mathrm{Tr}}U_{w_{1}}^{N}\cdots U_{w_{q}}^{N}|_{1^{\perp}}\big{]}=\mathbf{E}\big{[}{\mathop{\mathrm{Tr}}U_{w_{1}}^{N}\cdots U_{w_{q}}^{N}}\big{]}-1,

so that it suffices to compute the rightmost expectation. Clearly

𝐄[TrUw1NUwqN]=i1,,iq[N]𝐄[(Uw1N)i1i2(Uw2N)i2i3(UwqN)iqi1].\mathbf{E}\big{[}{\mathop{\mathrm{Tr}}U_{w_{1}}^{N}\cdots U_{w_{q}}^{N}}\big{]}=\sum_{i_{1},\ldots,i_{q}\in[N]}\mathbf{E}\big{[}(U_{w_{1}}^{N})_{i_{1}i_{2}}(U_{w_{2}}^{N})_{i_{2}i_{3}}\cdots(U_{w_{q}}^{N})_{i_{q}i_{1}}\big{]}.

A tuple 𝒊=(i1,,iq)[N]q\bm{i}=(i_{1},\ldots,i_{q})\in[N]^{q} is realizable if the corresponding summand is nonzero. Denote by N(𝒘)\mathcal{I}_{N}(\bm{w}) the set of all realizable tuples.

To bring out the dependence on dimension NN, we note that by symmetry, the expectation inside the above sum only depends on how many distinct pairs of indices appear for each permutation matrix. To encode this information, we associate to each 𝒊N(𝒘)\bm{i}\in\mathcal{I}_{N}(\bm{w}) a directed edge-colored graph Γ\Gamma as follows. Number each distinct value among (i1,,iq)(i_{1},\ldots,i_{q}) by order of appearance, and assign to each a vertex. Now draw an edge colored w[r]w\in[r] from one vertex to another if (UNw)ii(U^{N}_{w})_{ii^{\prime}} or (UNr+w)ii(U^{N}_{r+w})_{i^{\prime}i} appears in the expectation, where i,i[N]i,i^{\prime}\in[N] are the values associated to the first and second vertex, respectively; see Figure 3.1.

1\scriptstyle 12\scriptstyle 23\scriptstyle 31\scriptstyle 11\scriptstyle 12\scriptstyle 22\scriptstyle 2
Figure 3.1. Graph Γ\Gamma associated to the term 𝐄[(U1N)58(U2N)86(U1N)68(U2N)85]\mathbf{E}[(U_{1}^{N})_{58}(U_{2}^{N})_{86}(U_{1}^{N*})_{68}(U_{2}^{N*})_{85}]. The vertices labelled 1,2,31,2,3 correspond to the values 5,8,65,8,6, respectively.

Denote by 𝒢(𝒘)\mathcal{G}(\bm{w}) the set of graphs Γ\Gamma thus constructed, and note that this set is independent of NN. For each such graph with vΓv_{\Gamma} vertices, we can recover all associated 𝒊N(𝒘)\bm{i}\in\mathcal{I}_{N}(\bm{w}) uniquely by assigning distinct values of [N][N] to its vertices. There are N(N1)(NvΓ+1)N(N-1)\cdots(N-v_{\Gamma}+1) ways to do this. If the graph has eΓwe_{\Gamma}^{w} edges with color ww, then the corresponding expectation for each such 𝒊\bm{i} is

𝐄[(Uw1N)i1i2(Uw2N)i2i3(UwqN)iqi1]=w=1r1N(N1)(NeΓw+1),\mathbf{E}\big{[}(U_{w_{1}}^{N})_{i_{1}i_{2}}(U_{w_{2}}^{N})_{i_{2}i_{3}}\cdots(U_{w_{q}}^{N})_{i_{q}i_{1}}\big{]}=\prod_{w=1}^{r}\frac{1}{N(N-1)\cdots(N-e_{\Gamma}^{w}+1)},

since the random variable in the expectation is the event that for each ww, the permutation matrix UwNU_{w}^{N} has eΓwe_{\Gamma}^{w} of its rows fixed as specified by the realizable tuple 𝒊\bm{i}. Here we presumed that NqN\geq q, which ensures that NvΓN\geq v_{\Gamma} and NeΓwN\geq e_{\Gamma}^{w}.

In summary, we have proved the following.

Lemma 3.4.

For every 𝐰=(w1,,wq)\bm{w}=(w_{1},\ldots,w_{q}) and NqN\geq q, we have

𝐄[TrUw1NUwqN]=Γ𝒢(𝒘)N(N1)(NvΓ+1)w=1rN(N1)(NeΓw+1).\mathbf{E}\big{[}{\mathop{\mathrm{Tr}}U_{w_{1}}^{N}\cdots U_{w_{q}}^{N}}\big{]}=\sum_{\Gamma\in\mathcal{G}(\bm{w})}\frac{N(N-1)\cdots(N-v_{\Gamma}+1)}{\prod_{w=1}^{r}N(N-1)\cdots(N-e_{\Gamma}^{w}+1)}.

The proof of Lemma 3.1 is now straightforward.

Proof of Lemma 3.1.

We can rewrite the above lemma as

𝐄[trUw1NUwqN|1]=Γ𝒢(𝒘)(1N)eΓvΓ+1k=1vΓ1(1kN)w=1rk=1eΓw1(1kN)1N,\mathbf{E}\big{[}\mathop{\mathrm{tr}}U_{w_{1}}^{N}\cdots U_{w_{q}}^{N}|_{1^{\perp}}\big{]}=\sum_{\Gamma\in\mathcal{G}(\bm{w})}\bigg{(}\frac{1}{N}\bigg{)}^{e_{\Gamma}-v_{\Gamma}+1}\frac{\prod_{k=1}^{v_{\Gamma}-1}(1-\frac{k}{N})}{\prod_{w=1}^{r}\prod_{k=1}^{e_{\Gamma}^{w}-1}(1-\frac{k}{N})}-\frac{1}{N},

where eΓe_{\Gamma} is the total number of edges in Γ\Gamma. As every Γ\Gamma is connected by construction, we have eΓvΓ+10e_{\Gamma}-v_{\Gamma}+1\geq 0 and thus the right-hand side is a rational function of 1N\frac{1}{N}.

Define a polynomial of degree r(q1)r(q-1) by

gq(x)=(1x)r(12x)r(1(q1)x)r.g_{q}(x)=(1-x)^{r}(1-2x)^{r}\cdots(1-(q-1)x)^{r}.

Since eΓweΓqe_{\Gamma}^{w}\leq e_{\Gamma}\leq q for all ww, it is clear that f𝒘(1N)=𝐄[trUw1NUwqN|1]gq(1N)f_{\bm{w}}(\frac{1}{N})=\mathbf{E}[\mathop{\mathrm{tr}}U_{w_{1}}^{N}\cdots U_{w_{q}}^{N}|_{1^{\perp}}]\,g_{q}(\frac{1}{N}) is a polynomial of degree at most CqCq for some constant CC (which depends on rr). ∎

We can now read off the first terms in the 1N\frac{1}{N}-expansion. Recall that g𝐅r\{e}g\in\mathbf{F}_{r}\backslash\{e\} is called a proper power if g=vkg=v^{k} for some v𝐅rv\in\mathbf{F}_{r}, k2k\geq 2, and is called a non-power otherwise. Every geg\neq e can be written uniquely as g=vkg=v^{k} for a non-power vv.

Corollary 3.5.

We have

μ0(𝒘)=limN𝐄[trUw1NUwqN|1]=1gw1gwq=e.\mu_{0}(\bm{w})=\lim_{N\to\infty}\mathbf{E}\big{[}\mathop{\mathrm{tr}}U_{w_{1}}^{N}\cdots U_{w_{q}}^{N}|_{1^{\perp}}\big{]}=1_{g_{w_{1}}\cdots g_{w_{q}}=e}.

Moreover, if gw1gwq=vkg_{w_{1}}\cdots g_{w_{q}}=v^{k} for a non-power vv, then

μ1(𝒘)=limNN𝐄[trUw1NUwqN|1]=ω(k)1,\mu_{1}(\bm{w})=\lim_{N\to\infty}N\,\mathbf{E}\big{[}\mathop{\mathrm{tr}}U_{w_{1}}^{N}\cdots U_{w_{q}}^{N}|_{1^{\perp}}\big{]}=\omega(k)-1,

where ω(k)\omega(k) denotes the number of divisors of kk.

Proof.

If gw1gwq=eg_{w_{1}}\cdots g_{w_{q}}=e, it is obvious that μ0(𝒘)=1\mu_{0}(\bm{w})=1. We therefore assume this is not the case. We may further assume that gw1gwqg_{w_{1}}\cdots g_{w_{q}} is cyclically reduced, since the left-hand side of Lemma 3.1 is unchanged under cyclic reduction. Then every vertex of any Γ𝒢(𝒘)\Gamma\in\mathcal{G}(\bm{w}) must have degree at least two.

For the first identity, it now suffices to note that there cannot exist Γ𝒢(𝒘)\Gamma\in\mathcal{G}(\bm{w}) with eΓvΓ+1=0e_{\Gamma}-v_{\Gamma}+1=0: this would imply that Γ\Gamma is a tree, which must have a vertex of degree one. Thus the expression in the proof of Lemma 3.1 yields μ0(𝒘)=0\mu_{0}(\bm{w})=0.

We can similarly read off from the proof of Lemma 3.1 that

μ1(𝒘)=#{Γ𝒢(𝒘):eΓvΓ=0}1.\mu_{1}(\bm{w})=\#\{\Gamma\in\mathcal{G}(\bm{w}):e_{\Gamma}-v_{\Gamma}=0\}-1.

That eΓvΓ=0e_{\Gamma}-v_{\Gamma}=0 implies (as each vertex has degree at least two) that Γ\Gamma is a cycle. As 𝒘\bm{w} defines a closed nonbacktracking walk in Γ\Gamma, it must go around the cycle an integer number of times, so the possible lengths of cycles are the divisors of kk. ∎

3.3. The master inequality

We now proceed to the core of the polynomial method. Our main tool is the following inequality of A. Markov [34, p. 91].

Lemma 3.6 (Markov inequality).

For any real polynomial ff of degree qq and a>0a>0

fL[0,a]2q2afL[0,a].\|f^{\prime}\|_{L^{\infty}[0,a]}\leq\frac{2q^{2}}{a}\|f\|_{L^{\infty}[0,a]}.

As well known consequence of the Markov inequality is that a bound on a polynomial on a sufficiently fine grid extends to a uniform bound [34, p. 91]. For completeness, we spell out the argument in the form we will need it.

Corollary 3.7.

For any real polynomial ff of degree qq and M2q2M\geq 2q^{2}, we have

fL[0,1M]2supNM|f(1N)|.\|f\|_{L^{\infty}[0,\frac{1}{M}]}\leq 2\sup_{N\geq M}|f(\tfrac{1}{N})|.
Proof.

For any x[0,1M]x\in[0,\frac{1}{M}], its distance to the set {1N}NM\{\frac{1}{N}\}_{N\geq M} is at most 12M2\frac{1}{2M^{2}}. Thus

fL[0,1M]supNM|f(1N)|+12M2fL[0,1M]supNM|f(1N)|+q2MfL[0,1M]\|f\|_{L^{\infty}[0,\frac{1}{M}]}\leq\sup_{N\geq M}|f(\tfrac{1}{N})|+\frac{1}{2M^{2}}\|f^{\prime}\|_{L^{\infty}[0,\frac{1}{M}]}\leq\sup_{N\geq M}|f(\tfrac{1}{N})|+\frac{q^{2}}{M}\|f\|_{L^{\infty}[0,\frac{1}{M}]}

by the Markov inequality. The conclusion follows. ∎

In the following, we fix a self-adjoint Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle of degree q0q_{0}. For every polynomial test function h𝒫h\in\mathcal{P} of degree qq, Lemma 3.1 yields

𝐄[trh(P(𝑼N,𝑼N))|1]=fh(1N)gqq0(1N)=Φh(1N)\mathbf{E}\big{[}\mathop{\mathrm{tr}}h(P(\bm{U}^{N},\bm{U}^{N*}))|_{1^{\perp}}\big{]}=\frac{f_{h}(\frac{1}{N})}{g_{qq_{0}}(\frac{1}{N})}=\Phi_{h}(\tfrac{1}{N})

where fh,gqq0f_{h},g_{qq_{0}} are real polynomials of degree at most CqCq and gqq0g_{qq_{0}} is defined in the proof of Lemma 3.1. We define ν0(h),ν1(h)\nu_{0}(h),\nu_{1}(h) for h𝒫h\in\mathcal{P} as in (3.2), and denote by KK the sum of the moduli of the coefficients of PP. Note that all the above objects depend on the choice of PP, which we consider fixed.

The key idea is to use the Markov inequality to bound the derivatives of Φh\Phi_{h}.

Lemma 3.8.

For any h𝒫h\in\mathcal{P} of degree qq, we have

ΦhL[0,1N]Cq4hL[K,K],ΦhL[0,1N]Cq8hL[K,K]\|\Phi_{h}^{\prime}\|_{L^{\infty}[0,\frac{1}{N}]}\leq Cq^{4}\|h\|_{L^{\infty}[-K,K]},\qquad\|\Phi_{h}^{\prime\prime}\|_{L^{\infty}[0,\frac{1}{N}]}\leq Cq^{8}\|h\|_{L^{\infty}[-K,K]}

for all NCq2N\geq Cq^{2}, where CC is a constant (which depends on PP).

Proof.

It is easily verified using the explicit expression for gqq0g_{qq_{0}} in the proof of Lemma 3.1 that there are constants C,c>0C,c>0 (which depend on PP) so that

cgq(x)1,|gqq0(x)gqq0(x)|Cq2,|gqq0(x)gqq0(x)|Cq4c\leq g_{q}(x)\leq 1,\qquad\bigg{|}\frac{g_{qq_{0}}^{\prime}(x)}{g_{qq_{0}}(x)}\bigg{|}\leq Cq^{2},\qquad\bigg{|}\frac{g_{qq_{0}}^{\prime\prime}(x)}{g_{qq_{0}}(x)}\bigg{|}\leq Cq^{4}

for all x[0,1q2]x\in[0,\frac{1}{q^{2}}]. We now simply apply the chain rule. For the first derivative,

ΦhL[0,1Cq2]=fhgqq0fhgqq0gqq0gqq0L[0,1Cq2]3Ccq4fhL[0,1Cq2]\|\Phi_{h}^{\prime}\|_{L^{\infty}[0,\frac{1}{Cq^{2}}]}=\bigg{\|}\frac{f_{h}^{\prime}}{g_{qq_{0}}}-\frac{f_{h}}{g_{qq_{0}}}\frac{g_{qq_{0}}^{\prime}}{g_{qq_{0}}}\bigg{\|}_{L^{\infty}[0,\frac{1}{Cq^{2}}]}\leq\frac{3C}{c}q^{4}\|f_{h}\|_{L^{\infty}[0,\frac{1}{Cq^{2}}]}

using Lemma 3.6. But Corollary 3.7 yields

fhL[0,1Cq2]supNq2|fh(1N)|supNq2|Φh(1N)|hL[K,K],\|f_{h}\|_{L^{\infty}[0,\frac{1}{Cq^{2}}]}\lesssim\sup_{N\geq q^{2}}|f_{h}(\tfrac{1}{N})|\leq\sup_{N\geq q^{2}}|\Phi_{h}(\tfrac{1}{N})|\leq\|h\|_{L^{\infty}[-K,K]},

where we used gq1g_{q}\leq 1 in the second inequality and that P(𝑼N,𝑼N)K\|P(\bm{U}^{N},\bm{U}^{N*})\|\leq K in the last inequality. The bound on Φh\Phi_{h}^{\prime\prime} is obtained in a completely analogous manner. ∎

We now easily obtain a quantitative form of (3.2).

Corollary 3.9 (Master inequality).

For every h𝒫h\in\mathcal{P} of degree qq and N1N\geq 1,

|𝐄[trh(P(𝑼N,𝑼N))|1]ν0(h)ν1(h)N|Cq8N2hL[K,K],\bigg{|}\mathbf{E}\big{[}\mathop{\mathrm{tr}}h(P(\bm{U}^{N},\bm{U}^{N*}))|_{1^{\perp}}\big{]}-\nu_{0}(h)-\frac{\nu_{1}(h)}{N}\bigg{|}\leq\frac{Cq^{8}}{N^{2}}\|h\|_{L^{\infty}[-K,K]},

as well as |ν1(h)|Cq4hL[K,K]|\nu_{1}(h)|\leq Cq^{4}\|h\|_{L^{\infty}[-K,K]}.

Proof.

The bound on |ν1(h)||\nu_{1}(h)| follows immediately from Lemma 3.8 as ν1(h)=Φh(0)\nu_{1}(h)=\Phi_{h}^{\prime}(0). Now note that the left-hand side of the equation display in the statement equals

|Φh(1N)Φh(0)1NΦh(0)|12N2ΦhL[0,1N].\big{|}\Phi_{h}(\tfrac{1}{N})-\Phi_{h}(0)-\tfrac{1}{N}\Phi_{h}^{\prime}(0)\big{|}\leq\tfrac{1}{2N^{2}}\|\Phi_{h}^{\prime\prime}\|_{L^{\infty}[0,\frac{1}{N}]}.

Thus the bound in the statement follows for NCq2N\geq Cq^{2} from Lemma 3.8. On the other hand, when N<Cq2N<Cq^{2}, we can trivially bound

|𝐄[trh(P(𝑼N,𝑼N))|1]ν0(h)ν1(h)N|(2+Cq4N)hL[K,K]\bigg{|}\mathbf{E}\big{[}\mathop{\mathrm{tr}}h(P(\bm{U}^{N},\bm{U}^{N*}))|_{1^{\perp}}\big{]}-\nu_{0}(h)-\frac{\nu_{1}(h)}{N}\bigg{|}\leq\bigg{(}2+\frac{Cq^{4}}{N}\bigg{)}\|h\|_{L^{\infty}[-K,K]}

by the triangle inequality, as ν0=μP(𝒖,𝒖)\nu_{0}=\mu_{P(\bm{u},\bm{u}^{*})} is supported in [K,K][-K,K], and using the bound on ν1(h)\nu_{1}(h). The conclusion follows using 1<Cq2N1<\frac{Cq^{2}}{N}. ∎

3.4. Extension to smooth functions

We are now ready to prove Proposition 3.2. To this end, we will show that Corollary 3.9 can be extended to smooth test functions hh using a simple Fourier-analytic argument.

Recall that the Chebyshev polynomial (of the first kind) TnT_{n} is the polynomial of degree nn defined by Tn(cosθ)=cos(nθ)T_{n}(\cos\theta)=\cos(n\theta). Any h𝒫h\in\mathcal{P} of degree qq can be written as

h(x)=n=0qanTn(K1x)h(x)=\sum_{n=0}^{q}a_{n}\,T_{n}(K^{-1}x)

for some real coefficients a0,,aqa_{0},\ldots,a_{q}. Note that the latter are merely the Fourier coefficients of the function h~:S1\tilde{h}:S^{1}\to\mathbb{R} defined by h~(θ)=h(Kcosθ)\tilde{h}(\theta)=h(K\cos\theta).

Proof of Proposition 3.2.

Fix any h𝒫h\in\mathcal{P} and let a0,,aqa_{0},\ldots,a_{q} be its Chebyshev coefficients as above. As TnL[1,1]=1\|T_{n}\|_{L^{\infty}[-1,1]}=1 for all nn, we can estimate

|ν1(h)|n=0q|an||ν1(Tn(K1))|Cn=0qn4|an|,|\nu_{1}(h)|\leq\sum_{n=0}^{q}|a_{n}|\,|\nu_{1}(T_{n}(K^{-1}\cdot))|\leq C\sum_{n=0}^{q}n^{4}|a_{n}|,

using the estimate on ν1\nu_{1} in Corollary 3.9. Now note that nkann^{k}a_{n} is the nnth Fourier coefficient of the kkth derivative h~(k)\tilde{h}^{(k)} of h~\tilde{h}. We can therefore estimate

|ν1(h)|C(n=0q1n2)12(n=0qn10|an|2)12Ch~(5)L2(S1)ChC5[K,K]|\nu_{1}(h)|\leq C\Bigg{(}\sum_{n=0}^{q}\frac{1}{n^{2}}\Bigg{)}^{\frac{1}{2}}\Bigg{(}\sum_{n=0}^{q}n^{10}|a_{n}|^{2}\Bigg{)}^{\frac{1}{2}}\leq C^{\prime}\|\tilde{h}^{(5)}\|_{L^{2}(S^{1})}\leq C^{\prime\prime}\|h\|_{C^{5}[-K,K]}

by Cauchy–Schwarz and Parseval, where the last inequality is obtained by applying the chain rule to h~(θ)=h(Kcosθ)\tilde{h}(\theta)=h(K\cos\theta). Since this estimate holds for all h𝒫h\in\mathcal{P}, the definition of ν1\nu_{1} extends uniquely by continuity to any hC()h\in C^{\infty}(\mathbb{R}). In particular, ν1\nu_{1} extends to a compactly supported distribution.

Applying the identical argument to the first inequality of Corollary 3.9 yields

|𝐄[trh(P(𝑼N,𝑼N))|1]ν0(h)ν1(h)N|CN2hC9[K,K]\bigg{|}\mathbf{E}\big{[}\mathop{\mathrm{tr}}h(P(\bm{U}^{N},\bm{U}^{N*}))|_{1^{\perp}}\big{]}-\nu_{0}(h)-\frac{\nu_{1}(h)}{N}\bigg{|}\leq\frac{C}{N^{2}}\|h\|_{C^{9}[-K,K]}

for all hC()h\in C^{\infty}(\mathbb{R}). In particular, (3.2) remains valid for any hC()h\in C^{\infty}(\mathbb{R}). ∎

3.5. The infinitesimal distribution

It remains to prove Lemma 3.3. As was explained in section 3.1.3, this result follows immediately from the following lemma, whose proof uses a spectral graph theory argument due to [49, Lemma 2.4].

Lemma 3.10.

Assume that PP has positive coefficients. Then

lim supp|ν1(xp)|1pP(𝒖,𝒖).\limsup_{p\to\infty}|\nu_{1}(x^{p})|^{\frac{1}{p}}\leq\|P(\bm{u},\bm{u}^{*})\|.

To set up the proof, let us fix a noncommutative polynomial PP of degree dd with positive coefficients. By homogeneity, we may assume without loss of generality that the coefficients sum to one. It will be convenient to express

P(𝒖,𝒖)=i1,,id=02rai1,,idui1uid=𝐄[uI1uId],P(\bm{u},\bm{u}^{*})=\sum_{i_{1},\ldots,i_{d}=0}^{2r}a_{i_{1},\ldots,i_{d}}\,u_{i_{1}}\cdots u_{i_{d}}=\mathbf{E}[u_{I_{1}}\cdots u_{I_{d}}],

where 𝑰=(I1,,Id)\bm{I}=(I_{1},\ldots,I_{d}) are random variables such that 𝐏[𝑰=(i1,,id)]=ai1,,id\mathbf{P}[\bm{I}=(i_{1},\ldots,i_{d})]=a_{i_{1},\ldots,i_{d}}. Now let (Isd+1,,I(s+1)d)(I_{sd+1},\ldots,I_{(s+1)d}) be independent copies of 𝑰\bm{I} for ss\in\mathbb{N}, so that we have P(𝒖,𝒖)p=𝐄[uI1uIpd]P(\bm{u},\bm{u}^{*})^{p}=\mathbf{E}[u_{I_{1}}\cdots u_{I_{pd}}]. Then we can apply Corollary 3.5 to compute

ν1(xp)=τ(P(𝒖,𝒖)p)+k=2pd(ω(k)1)v𝐅rnp𝐄[1gI1gIpd=vk],\nu_{1}(x^{p})=-\tau\big{(}P(\bm{u},\bm{u}^{*})^{p}\big{)}+\sum_{k=2}^{pd}(\omega(k)-1)\sum_{v\in\mathbf{F}_{r}^{\rm np}}\mathbf{E}\big{[}1_{g_{I_{1}}\cdots g_{I_{pd}}=v^{k}}\big{]},

where 𝐅rnp\mathbf{F}_{r}^{\rm np} denotes the set of non-powers in 𝐅r\mathbf{F}_{r}.111111Here we used that μ1(𝒘)=1\mu_{1}(\bm{w})=-1 if gw1gwq=eg_{w_{1}}\cdots g_{w_{q}}=e, since tr𝟏|1=N1N\mathop{\mathrm{tr}}\mathbf{1}|_{1^{\perp}}=\frac{N-1}{N}.

Proof of Lemma 3.10.

We would like to argue that if a word gi1gipd=vkg_{i_{1}}\cdots g_{i_{pd}}=v^{k}, it must be a concatenation of kk words that reduce to vv. This is only true, however, if vv is cyclically reduced: otherwise the last letters of vv may cancel the first letters of the next repetition of vv, and the cancelled letters need not appear in our word. The correct version of this statement is that there exist g,w𝐅dg,w\in\mathbf{F}_{d} with v=gwg1v=gwg^{-1} (where ww is the cyclic reduction of vv) so that every word that reduces to vkv^{k} is a concatenation of words that reduce to g,w,w,wk2,g1g,w,w,w^{k-2},g^{-1}. Thus

v𝐅rnp1gi1gipd=vkg,w𝐅r0t1t4pd1gi1git1=g 1git1+1git2=w×1git2+1git3=w 1git3+1git4=wk2 1git4+1gipd=g1.\sum_{v\in\mathbf{F}_{r}^{\rm np}}1_{g_{i_{1}}\cdots g_{i_{pd}}=v^{k}}\leq\sum_{g,w\in\mathbf{F}_{r}}\sum_{0\leq t_{1}\leq\cdots\leq t_{4}\leq pd}1_{g_{i_{1}}\cdots g_{i_{t_{1}}}=g}\,1_{g_{i_{t_{1}+1}}\cdots g_{i_{t_{2}}}=w}\times\mbox{}\\ 1_{g_{i_{t_{2}+1}}\cdots g_{i_{t_{3}}}=w}\,1_{g_{i_{t_{3}+1}}\cdots g_{i_{t_{4}}}=w^{k-2}}\,1_{g_{i_{t_{4}+1}}\cdots g_{i_{pd}}=g^{-1}}.

To relate this bound to the spectral properties of P(𝒖,𝒖)P(\bm{u},\bm{u}^{*}), we make the simple observation that the indicators above can be expressed as matrix elements

1gw1gwq=v=δv,uw1uwqδe.1_{g_{w_{1}}\cdots g_{w_{q}}=v}=\langle\delta_{v},u_{w_{1}}\cdots u_{w_{q}}\,\delta_{e}\rangle.

If we substitute the formula in the above inequality, and then take the expectation with respect to each independent block of variables (Isd+1,,I(s+1)d)(I_{sd+1},\ldots,I_{(s+1)d}) that lies entirely inside one of the matrix elements, we obtain

v𝐅rnp𝐄[1gI1gIpd=vk]g,w𝐅r0t1t4pd𝐄[δg,X1,𝒕δeδw,X2,𝒕δe×δw,X3,𝒕δeδwk2,X4,𝒕δeδg1,X5,𝒕δe]\sum_{v\in\mathbf{F}_{r}^{\rm np}}\mathbf{E}\big{[}1_{g_{I_{1}}\cdots g_{I_{pd}}=v^{k}}\big{]}\leq\sum_{g,w\in\mathbf{F}_{r}}\sum_{0\leq t_{1}\leq\cdots\leq t_{4}\leq pd}\mathbf{E}\big{[}\langle\delta_{g},X_{1,\bm{t}}\,\delta_{e}\rangle\,\langle\delta_{w},X_{2,\bm{t}}\,\delta_{e}\rangle\times\mbox{}\\ \langle\delta_{w},X_{3,\bm{t}}\,\delta_{e}\rangle\,\langle\delta_{w^{k-2}},X_{4,\bm{t}}\,\delta_{e}\rangle\,\langle\delta_{g^{-1}},X_{5,\bm{t}}\,\delta_{e}\rangle\big{]}

with

Xj,𝒕=uItj1+1uIajP(𝒖,𝒖)mjuIbj+1uItj,X_{j,\bm{t}}=u_{I_{t_{j-1}+1}}\cdots u_{I_{a_{j}}}P(\bm{u},\bm{u}^{*})^{m_{j}}\,u_{I_{b_{j}+1}}\cdots u_{I_{t_{j}}},

where aj=min{sd:s+,tj1sd}tja_{j}=\min\{sd:s\in\mathbb{Z}_{+},~{}t_{j-1}\leq sd\}\wedge t_{j}, bj=max{sd:s+,sdtj}ajb_{j}=\max\{sd:s\in\mathbb{Z}_{+},~{}sd\leq t_{j}\}\vee a_{j}, mjd=bjajm_{j}d=b_{j}-a_{j}, and we write t0=0t_{0}=0 and t5=pdt_{5}=pd for simplicity.

The crux of the proof is now to note that as

v𝐅r|δv,Xj,𝒕δe|2=Xj,𝒕δe2P(𝒖,𝒖)2mj,\sum_{v\in\mathbf{F}_{r}}|\langle\delta_{v},X_{j,\bm{t}}\,\delta_{e}\rangle|^{2}=\|X_{j,\bm{t}}\,\delta_{e}\|^{2}\leq\|P(\bm{u},\bm{u}^{*})\|^{2m_{j}},

it follows readily using Cauchy–Schwarz that

v𝐅rnp𝐄[1gI1gIpd=vk]\displaystyle\sum_{v\in\mathbf{F}_{r}^{\rm np}}\mathbf{E}\big{[}1_{g_{I_{1}}\cdots g_{I_{pd}}=v^{k}}\big{]} 0t1t4pdP(𝒖,𝒖)m1++m5\displaystyle\leq\sum_{0\leq t_{1}\leq\cdots\leq t_{4}\leq pd}\|P(\bm{u},\bm{u}^{*})\|^{m_{1}+\cdots+m_{5}}
(pd+1)4P(𝒖,𝒖)p+O(1),\displaystyle\leq(pd+1)^{4}\|P(\bm{u},\bm{u}^{*})\|^{p+O(1)},

since each Xj,𝒕X_{j,\bm{t}} contains at most 2d=O(1)2d=O(1) variables other than P(𝒖,𝒖)mjP(\bm{u},\bm{u}^{*})^{m_{j}}. As k=2pd(ω(k)1)(pd)2\sum_{k=2}^{pd}(\omega(k)-1)\leq(pd)^{2} and |τ(P(𝒖,𝒖)p)|P(𝒖,𝒖)p|\tau(P(\bm{u},\bm{u}^{*})^{p})|\leq\|P(\bm{u},\bm{u}^{*})\|^{p}, the conclusion follows directly from the expression for ν1(xp)\nu_{1}(x^{p}) stated before the proof. ∎

Remark 3.11.

The proof of Lemma 3.10 relies on positivization: since all the terms in the proof are positive, we are able to obtain upper bounds by overcounting as in the first equation display of the proof. While this argument only applies in first instance to polynomials with positive coefficients, strong convergence for arbitrary polynomials then follows a posteriori by Lemma 2.26.

It is also possible, however, to apply a variant of the positivization trick directly to ν1\nu_{1}. This argument [90, §6.2] shows that the validity of Lemma 3.10 for polynomials with positive coefficients already implies its validity for all self-adjoint polynomials (even with matrix coefficients), so that the polynomial method can be applied directly to general polynomials. The advantage of this approach is that it yields much stronger quantitative bounds than can be achieved by applying Lemma 2.26. Since we have not emphasized the quantitative features of the polynomial method in our presentation, we do not develop this approach further here.

3.6. Discussion: on the role of cancellations

When encountered for the first time, the simplicity of proofs by the polynomial method may have the appearance of a magic trick. An explanation for the success of the method is that it uncovers a genuinely new phenomenon that is not captured by classical methods of random matrix theory. Now that we have provided a complete proof of Theorem 1.4 by the polynomial method, we aim to revisit the proof to highlight where this phenomenon arises. For simplicity, we place the following discussion in the context of random matrices XNX^{N} with limiting operator XFX_{\rm F}; the reader may keep in mind

XN=P(𝑼N,𝑼N)|1,XF=P(𝒖,𝒖)X^{N}=P(\bm{U}^{N},\bm{U}^{N*})|_{1^{\perp}},\qquad X_{\rm F}=P(\bm{u},\bm{u}^{*})

in the context of Theorem 1.4 and its proof.

3.6.1. The moment method

It is instructive to first recall the classical moment method that is traditionally used in random matrix theory. Let us take for granted that XNX^{N} converges weakly to XFX_{\rm F}, so that

𝐄[tr(XN)2p]12p=(1+o(1))τ(XF2p)12p(1+o(1))XF\mathbf{E}[\mathop{\mathrm{tr}}{(X^{N})^{2p}}]^{\frac{1}{2p}}=(1+o(1))\tau(X_{\rm F}^{2p})^{\frac{1}{2p}}\leq(1+o(1))\|X_{\rm F}\| (3.3)

as NN\to\infty with pp fixed. The premise of the moment method is that if it could be shown that this convergence remains valid when pp is allowed to grow with NN at rate plogNp\gg\log N, then a strong convergence upper bound would follow: indeed, since XN2pTr[(XN)2p]=Ntr[(XN)2p]\|X^{N}\|^{2p}\leq\mathop{\mathrm{Tr}}[(X^{N})^{2p}]=N\,\mathop{\mathrm{tr}}[(X^{N})^{2p}], we could then estimate

𝐄XNN12p𝐄[tr(XN)2p]12p(1+o(1))XF,\mathbf{E}\|X^{N}\|\leq N^{\frac{1}{2p}}\mathbf{E}[\mathop{\mathrm{tr}}{(X^{N})^{2p}}]^{\frac{1}{2p}}\leq(1+o(1))\|X_{\rm F}\|,

where we used that N12p=1+o(1)N^{\frac{1}{2p}}=1+o(1) for plogNp\gg\log N.

There are two difficulties in implementing the above method. First, establishing (3.3) for plogNp\gg\log N can be technically challenging and often requires delicate combinatorial estimates. When pp is fixed, we can write an expansion

𝐄[tr(XN)2p]=α0(p)+α1(p)N+α2(p)N2+\mathbf{E}[\mathop{\mathrm{tr}}{(X^{N})^{2p}}]=\alpha_{0}(p)+\frac{\alpha_{1}(p)}{N}+\frac{\alpha_{2}(p)}{N^{2}}+\cdots

(this is immediate, for example, from (3.1)) and establishing (3.3) requires us to understand only the lowest-order term α0(p)\alpha_{0}(p). In contrast, when plogNp\gg\log N the coefficients αk(p)\alpha_{k}(p) themselves grow faster than polynomially in NN, so that it is necessary to understand the terms in the expansion to all orders.

In the setting of Theorem 1.4, however, there is a more serious problem: (3.3) is not just difficult to prove, but actually fails altogether.

Example 3.12.

Consider the permutation model of 2r2r-regular random graphs as in Theorem 1.3, so that XN=AN|1X^{N}=A^{N}|_{1^{\perp}} where ANA^{N} is the adjacency matrix. We claim that XN=2r\|X^{N}\|=2r with probability at least NrN^{-r}. As XF=22r1\|X_{\rm F}\|=2\sqrt{2r-1}, this implies

𝐄[Tr(XN)2p]Nr(2r)2pNr(43)pXF2p,\mathbf{E}[\mathop{\mathrm{Tr}}{(X^{N})^{2p}}]\geq N^{-r}(2r)^{2p}\geq N^{-r}\bigg{(}\frac{4}{3}\bigg{)}^{p}\|X_{\rm F}\|^{2p},

contradicting the validity of (3.3) for plogNp\gg\log N.

To prove the claim, note that any given point of [N][N] is simultenously a fixed point of the random permutations U1N,,UrNU_{1}^{N},\ldots,U_{r}^{N} with probability NrN^{-r}. Thus with probability at least NrN^{-r}, random graph has a vertex with 2r2r self-loops which is disconnected from the rest of the graph, so that ANA^{N} has eigenvalue 2r2r with multiplicity at least two. The latter cleatly implies that XN=2r\|X^{N}\|=2r.

Example 3.12 shows that the appearance of outliers in the spectrum with polynomially small probability Nc\sim N^{-c} presents a fundamental obstruction to the moment method. In random graph models, this situation arises due to the appearance of “bad” subgraphs, called tangles. Previous proofs [50, 18, 19] of optimal spectral gaps in this setting must overcome these difficulties by conditioning on the absence of tangles, which significantly complicates the analysis and has made it difficult to adapt these methods to more challenging models.121212A notable exception being the work of Anantharaman and Monk on random surfaces [4, 5].

3.6.2. A new phenomenon

The polynomial method is essentially based on the same input as the moment method: we consider the spectral statistics

𝐄[trh(XN)]=linear combination of 𝐄[tr(XN)p] for pq,\mathbf{E}[\mathop{\mathrm{tr}}h(X^{N})]=\text{linear combination of }\mathbf{E}[\mathop{\mathrm{tr}}{(X^{N})^{p}}]\text{ for }p\leq q,

where hh is any real polynomial of degree qq, and aim to compare these with the spectral statistics of XFX_{\rm F}. Since we have shown in Example 3.12 that each moment can be larger than its limiting value by a factor exponential in the degree, that is, 𝐄[tr(XN)2p]eCpτ((XF)2p)\mathbf{E}[\mathop{\mathrm{tr}}{(X^{N})^{2p}}]\geq e^{Cp}\tau((X_{\rm F})^{2p}) for plogNp\gg\log N, it seems inevitable that 𝐄[trh(XN)]\mathbf{E}[\mathop{\mathrm{tr}}h(X^{N})] must be poorly approximated by τ(h(XF))\tau(h(X_{\rm F})) for high degree polynomials hh. The surprising feature of the polynomial method is that it defies this expectation: for example, a trivial modification of the proof of Corollary 3.9 yields the bound

|𝐄[trh(XN)]τ(h(XF))|Cq4NhL[K,K]\big{|}\mathbf{E}[\mathop{\mathrm{tr}}h(X^{N})]-\tau(h(X_{\rm F}))\big{|}\leq\frac{Cq^{4}}{N}\|h\|_{L^{\infty}[-K,K]} (3.4)

which depends only polynomially on the degree qq.

There is of course no contradiction between these observations: if we choose h(x)=xph(x)=x^{p} in (3.4), then hL[K,K]=KpeCpXFp\|h\|_{L^{\infty}[-K,K]}=K^{p}\geq e^{Cp}\|X_{\rm F}\|^{p} and we recover the exponential dependence on degree that was observed in Example 3.12. On the other hand, (3.4) shows that the dependence on the degree becomes polynomial when hh is uniformly bounded on the interval [K,K][-K,K]. Thus the polynomial method reveals an unexpected cancellation phenomenon that happens when the moments are combined to form bounded test functions hh.

The idea that classical tools from the analytic theory of polynomials, such as the Markov inequality, make it possible to capture such cancellations lies at the heart of the polynomial method. These cancellations would be very difficult to realize by a direct combinatorial analysis of the moments. The reason that this phenomenon greatly simplifies proofs of strong convergence is twofold. First, it only requires us to understand the 1N\frac{1}{N}-expansion of the moments to first order, rather than to every order as would be required by the moment method. Second, this eliminates the need to deal with tangles, since tangles do not appear in the first-order term in the expansion. (The tangles are however visible in the higher order terms, which gives rise to the large deviations behavior in Figure 1.3.)

Remark 3.13.

We have contrasted the polynomial method with the moment method since both rely only on the ability to compute moments 𝐄[tr(XN)p]\mathbf{E}[\mathop{\mathrm{tr}}{(X^{N})^{p}}]. Beside the moment method, another classical method of random matrix theory is based on resolvent statistics such as tr(zXN)1\mathop{\mathrm{tr}}{(z-X^{N})^{-1}}. This approach was used by Haagerup–Thorbjørnsen [60] and Schultz [121] to establish strong convergence for Gaussian ensembles, where strong analytic tools are available. It is unclear, however, how such quantities can be computed or analyzed in the context of discrete models as in Theorem 1.4. Nonetheless, let us note that the recent works [76, 74, 75] have successfully used such an approach in the setting of random regular graphs.

4. Intrinsic freeness

The aim of this section is to explain the origin of the intrinsic freeness phenomenon that was introduced in section 1.2. Since Theorem 1.6 requires a number of technical ingredients whose details do not in themselves shed significant light on the underlying phenomenon, we defer to [10, 11] for a complete proof. Instead, we aim to give an informal discussion of the key ideas behind the proof: in particular, we aim to explain the underlying mechanism.

Before we can do so, however, we must first describe the limiting object XfreeX_{\rm free} and explain why it is useful in practice, which we will do in section 4.1. We subsequently sketch some key ideas behind the proof of Theorem 1.6 in section 4.2.

4.1. The free model

To work with Gaussian random matrices, we must recall how to compute moments of independent standard Gaussians 𝒈=(g1,,gr)\bm{g}=(g_{1},\ldots,g_{r}): given any k1,,kn[r]k_{1},\ldots,k_{n}\in[r], the Wick formula [106, Theorem 22.3] states that

𝐄[gk1gkn]=πP2[n]{i,j}π1ki=kj,\mathbf{E}[g_{k_{1}}\cdots g_{k_{n}}]=\sum_{\pi\in\mathrm{P}_{2}[n]}\prod_{\{i,j\}\in\pi}1_{k_{i}=k_{j}},

where P2[n]\mathrm{P}_{2}[n] denotes the set of pairings of [n][n] (that is, partitions into blocks of size two). This classical result is easily proved by induction on nn using integration by parts. A convenient way to rewrite the Wick formula is to introduce for every πP2[n]\pi\in\mathrm{P}_{2}[n] and j[n]j\in[n] random variables 𝒈j|π=(g1j|π,,grj|π)\bm{g}^{j|\pi}=(g_{1}^{j|\pi},\ldots,g_{r}^{j|\pi}) with the same law as 𝒈\bm{g} so that 𝒈j|π=𝒈l|π\bm{g}^{j|\pi}=\bm{g}^{l|\pi} for {j,l}π\{j,l\}\in\pi, and 𝒈j|π,𝒈l|π\bm{g}^{j|\pi},\bm{g}^{l|\pi} are independent otherwise. Then

𝐄[gk1gkn]=πP2[n]𝐄[gk11|πgknn|π],\mathbf{E}[g_{k_{1}}\cdots g_{k_{n}}]=\sum_{\pi\in\mathrm{P}_{2}[n]}\mathbf{E}\big{[}g_{k_{1}}^{1|\pi}\cdots g_{k_{n}}^{n|\pi}\big{]},

as the expectation in the sum factors as {i,j}πE[gkigkj]\prod_{\{i,j\}\in\pi}E[g_{k_{i}}g_{k_{j}}].

What happens if we replace the scalar Gaussians 𝒈=(g1,,gr)\bm{g}=(g_{1},\ldots,g_{r}) by independent GUE matrices 𝑮N=(G1N,,GrN)\bm{G}^{N}=(G_{1}^{N},\ldots,G_{r}^{N})? To explain this, we need the following notion: a pairing π\pi has a crossing if there exist pairs {i,j},{l,m}π\{i,j\},\{l,m\}\in\pi so that i<l<j<mi<l<j<m. If we represent π\pi by drawing each element of [n][n] as a vertex on a line, and drawing a semicircular arc between the vertices in each pair {i,j}π\{i,j\}\in\pi, the pairing has a crossing precisely when two of the arcs cross; see Figure 4.1.

π1=\pi_{1}=~{}π2=\pi_{2}=~{}
Figure 4.1. Illustration of a noncrossing pairing π1\pi_{1} and a crossing pairing π2\pi_{2}.
Lemma 4.1.

We have

limN𝐄[trGNk1GNkn]=πNC2[n]{i,j}π1ki=kj,\lim_{N\to\infty}\mathbf{E}\big{[}{\mathop{\mathrm{tr}}G^{N}_{k_{1}}\cdots G^{N}_{k_{n}}}\big{]}=\sum_{\pi\in\mathrm{NC}_{2}[n]}\prod_{\{i,j\}\in\pi}1_{k_{i}=k_{j}},

where NC2[n]\mathrm{NC}_{2}[n] denotes the set of noncrossing pairings.

Proof.

Define 𝑮N,j|π=(GN,j|π1,,GN,j|πr)\bm{G}^{N,j|\pi}=(G^{N,j|\pi}_{1},\ldots,G^{N,j|\pi}_{r}) analogously to 𝒈j|π\bm{g}^{j|\pi} above. Then

𝐄[trGNk1GNkn]=πP2[n]𝐄[trGN,1|πk1GN,n|πkn]\mathbf{E}\big{[}{\mathop{\mathrm{tr}}G^{N}_{k_{1}}\cdots G^{N}_{k_{n}}}\big{]}=\sum_{\pi\in\mathrm{P}_{2}[n]}\mathbf{E}\big{[}{\mathop{\mathrm{tr}}G^{N,1|\pi}_{k_{1}}\cdots G^{N,n|\pi}_{k_{n}}}\big{]}

by the Wick formula. Consider first a noncrossing pairing π\pi. Since pairs cannot cross, there must be an adjacent pair {i,i+1}π\{i,i+1\}\in\pi, and if this pair is removed we obtain a noncrossing pairing of [n]\{i,i+1}[n]\backslash\{i,i+1\}. As 𝐄[GNkiGNkj]=1ki=kj𝟏\mathbf{E}[G^{N}_{k_{i}}G^{N}_{k_{j}}]=1_{k_{i}=k_{j}}\mathbf{1},131313This follows from a simple explicit computation using the following characterization of GUE matrices: GiNG_{i}^{N} is a self-adjoint matrix whose entries above the diagonal are i.i.d. complex Gaussians and entries on the diagonal are i.i.d. real Gaussians with mean zero and variance 1N\frac{1}{N}. we obtain

𝐄[trGN,1|πk1GN,n|πkn]={i,j}π1ki=kj\mathbf{E}\big{[}{\mathop{\mathrm{tr}}G^{N,1|\pi}_{k_{1}}\cdots G^{N,n|\pi}_{k_{n}}}\big{]}=\prod_{\{i,j\}\in\pi}1_{k_{i}=k_{j}}

by repeatedly taking the expectation with respect to an adjacent pair.

On the other hand, if 𝑮~N\bm{\tilde{G}}^{N} is an independent copy of 𝑮N\bm{G}^{N}, we can computefootnote 13

𝐄[GkiNAG~klNBGkjNCG~kmN]=1N2CBA 1ki=kj 1kl=km\mathbf{E}\big{[}G_{k_{i}}^{N}\,A\,\tilde{G}_{k_{l}}^{N}\,B\,G_{k_{j}}^{N}\,C\,\tilde{G}_{k_{m}}^{N}\big{]}=\frac{1}{N^{2}}\,CBA\,1_{k_{i}=k_{j}}\,1_{k_{l}=k_{m}} (4.1)

for any matrices A,B,CA,B,C that are independent of 𝑮N,𝑮~N\bm{G}^{N},\bm{\tilde{G}}^{N}. Thus

𝐄[trGN,1|πk1GN,n|πkn]=o(1)\mathbf{E}\big{[}{\mathop{\mathrm{tr}}G^{N,1|\pi}_{k_{1}}\cdots G^{N,n|\pi}_{k_{n}}}\big{]}=o(1)

as NN\to\infty whenever π\pi is a crossing pairing. ∎

In view of Lemma 4.1, the significance of the following definition of the limiting object associated to independent GUE matrices is self-evident.

Definition 4.2 (Free semicircular family).

A family 𝒔=(s1,,sr)𝒜\bm{s}=(s_{1},\ldots,s_{r})\in\mathcal{A} of self-adjoint elements of a CC^{*}-probability space (𝒜,τ)(\mathcal{A},\tau) such that

τ(sk1skn)=πNC2[n]{i,j}π1ki=kj\tau(s_{k_{1}}\cdots s_{k_{n}})=\sum_{\pi\in\mathrm{NC}_{2}[n]}\prod_{\{i,j\}\in\pi}1_{k_{i}=k_{j}}

for all nn\in\mathbb{N} and k1,,kn[r]k_{1},\ldots,k_{n}\in[r] is called a free semicircular family.

Free semicircular families can be constructed in various ways, guaranteeing their existence; see, e.g. [106, pp. 102–108]. Lemma 4.1 states that a family 𝑮N\bm{G}^{N} of independent GUE matrices converges weakly to a free semicircular family 𝒔\bm{s}.

Remark 4.3.

The variables sis_{i} are called “semicircular” because their moments τ(sip)=|NC2[p]|=22xp12π4x2dx\tau(s_{i}^{p})=|\mathrm{NC}_{2}[p]|=\int_{-2}^{2}x^{p}\cdot\frac{1}{2\pi}\sqrt{4-x^{2}}\,dx are the moments of the semicircle distribution. Thus Lemma 4.1 recovers the classical fact that the empirical spectral distribution of a GUE matrix converges to the semicircle distribution.

The intrinsic freeness principle states that both the spectral distribution and spectral edges of a D×DD\times D self-adjoint Gaussian random matrix

X=A0+i=1rAigiX=A_{0}+\sum_{i=1}^{r}A_{i}g_{i}

are captured in a surprisingly general setting by those of the operator

Xfree=A0𝟏+i=1rAisi.X_{\rm free}=A_{0}\otimes\mathbf{1}+\sum_{i=1}^{r}A_{i}\otimes s_{i}.

This is unexpected, as this phenomenon does not arise as a limit of GUE type matrices which motivated the definition of XfreeX_{\rm free} and thus it is not clear where the free behavior of XX comes from. The latter will be explained in section 4.2.

Beside its fundamental interest, this principle is of considerable practical utility because the spectral statistics of the operator XfreeX_{\rm free} can be explicitly computed by means of closed form equations, as we will presently explain. Let us first show how to compute the spectral distribution μXfree\mu_{X_{\rm free}}.

Lemma 4.4 (Matrix Dyson equation).

For zz\in\mathbb{C} with Imz>0\mathrm{Im}\,z>0, we denote by

G(z)=(idτ)[(z𝟏Xfree)1]G(z)=(\mathrm{id}\otimes\tau)\big{[}(z\mathbf{1}-X_{\rm free})^{-1}\big{]}

the matrix Green’s function of XfreeX_{\rm free}. Then G(z)G(z) satisfies the matrix Dyson equation

G(z)1+A0+i=1rAiG(z)Ai=z𝟏,G(z)^{-1}+A_{0}+\sum_{i=1}^{r}A_{i}G(z)A_{i}=z\mathbf{1},

and fdμXfree=1πlimε0f(x)Im[trG(x+iε)]dx\int f\,d\mu_{X_{\rm free}}=-\frac{1}{\pi}\lim_{\varepsilon\downarrow 0}\int f(x)\,\mathrm{Im}[\mathop{\mathrm{tr}}G(x+i\varepsilon)]\,dx for all fCb()f\in C_{b}(\mathbb{R}).

Proof.

We can construct all πNC2[n]\pi\in\mathrm{NC}_{2}[n] as follows: first choose the pair {1,l}\{1,l\} containing the first point; and then pair the remaining points by choosing any noncrossing pairings of the sets {2,,l1}\{2,\ldots,l-1\} and {l+1,,n}\{l+1,\ldots,n\}. Thus

τ(sk1skn)=l=2n1k1=klτ(sk2skl1)τ(skl+1skn)\tau(s_{k_{1}}\cdots s_{k_{n}})=\sum_{l=2}^{n}1_{k_{1}=k_{l}}\,\tau(s_{k_{2}}\cdots s_{k_{l-1}})\,\tau(s_{k_{l+1}}\cdots s_{k_{n}})

for k1,,kn[r]k_{1},\ldots,k_{n}\in[r] by the definition of a free semicircular family. In the following, it will be convenient to allow also ki=0k_{i}=0, where we define s0=𝟏s_{0}=\mathbf{1}. In this case, the identity clearly remains valid provided that k1>0k_{1}>0.

Now define the matrix moments

Mn=(idτ)[Xfreen]=𝒌{0,,r}nAk1Aknτ(sk1skn).M_{n}=(\mathrm{id}\otimes\tau)[X_{\rm free}^{n}]=\sum_{\bm{k}\in\{0,\ldots,r\}^{n}}A_{k_{1}}\cdots A_{k_{n}}\,\tau(s_{k_{1}}\cdots s_{k_{n}}).

Applying the above identity yields for n2n\geq 2 the recursion (with M0=𝟏M_{0}=\mathbf{1}, M1=A0M_{1}=A_{0})

Mn=A0Mn1+l=2nk=1rAkMl2AkMnl.M_{n}=A_{0}M_{n-1}+\sum_{l=2}^{n}\sum_{k=1}^{r}A_{k}M_{l-2}A_{k}M_{n-l}.

When |z||z| is sufficiently large, we can write G(z)=n=0zn1MnG(z)=\sum_{n=0}^{\infty}z^{-n-1}M_{n}, and the matrix Dyson equation follows readily from the recursion for MnM_{n}. The equation remains valid for all zz\in\mathbb{C} with Imz>0\mathrm{Im}\,z>0 by analytic continuation.

The final claim follows as 1πIm(x+iε)1=1πεx2+ε2=ρε(x)-\frac{1}{\pi}\mathrm{Im}\,(x+i\varepsilon)^{-1}=\frac{1}{\pi}\frac{\varepsilon}{x^{2}+\varepsilon^{2}}=\rho_{\varepsilon}(x) is the density of the Cauchy distribution with scale ε\varepsilon, so that 1πIm[trG(x+iε)]-\frac{1}{\pi}\mathrm{Im}[\mathop{\mathrm{tr}}G(x+i\varepsilon)] is the density of the convolution μXfreeρε\mu_{X_{\rm free}}*\rho_{\varepsilon} which converges weakly to μXfree\mu_{X_{\rm free}} as ε0\varepsilon\to 0. ∎

Lemma 4.4 shows that the spectral distribution of XfreeX_{\rm free} can be computed by solving a system of quadratic equations for the entries of G(z)G(z). While these equations usually do not have a closed form solution, they are well behaved and are amenable to analysis and numerical computation [67, 2].

The spectral edges of XfreeX_{\rm free} can in principle be obtained from its spectral distribution (cf. Lemma 2.6). However, the following formula of Lehner [84], which we state without proof,141414The difficulty is to upper bound λmax(Xfree)\lambda_{\rm max}(X_{\rm free}): as M=G(z)>0M=G(z)>0 for any z>λmax(Xfree)z>\lambda_{\rm max}(X_{\rm free}), Lemma 4.4 shows that λmax(Xfree)\lambda_{\rm max}(X_{\rm free}) is lower bounded by the right-hand side of Lehner’s formula. provides an often more powerful tool: it expresses the outer edges of the spectrum of XfreeX_{\rm free} in terms of a variational principle.

Theorem 4.5 (Lehner).

We have

λmax(Xfree)=infM>0λmax(M1+A0+i=1rAiMAi),\lambda_{\rm max}(X_{\rm free})=\inf_{M>0}\lambda_{\rm max}\Bigg{(}M^{-1}+A_{0}+\sum_{i=1}^{r}A_{i}MA_{i}\Bigg{)},

where we denote λmax(X)=supsp(X)\lambda_{\rm max}(X)=\sup\mathrm{sp}(X) for any self-adjoint operator XX.

Various applications of this formula are illustrated in [11]. On the other hand, in applications where the exact location of the edge is not important, the following simple bounds often suffice and are easy to use:

A0i=1rAi21/2XfreeA0+2i=1rAi21/2.\|A_{0}\|\vee\Bigg{\|}\sum_{i=1}^{r}A_{i}^{2}\Bigg{\|}^{1/2}\leq\|X_{\rm free}\|\leq\|A_{0}\|+2\Bigg{\|}\sum_{i=1}^{r}A_{i}^{2}\Bigg{\|}^{1/2}.

These bounds admit a simple direct proof [115, p. 208].

By connecting the spectral statistics of a random matrix XX to those of XfreeX_{\rm free}, the intrinsic freeness principle makes it possible to understand the spectra of complicated random matrix models that would be difficult to analyze directly. One may view the operator XfreeX_{\rm free} as a “platonic ideal”: a perfect object which captures the essence of the random matrices XX that exist in the real world.

4.2. Interpolation and crossings

We now aim to explain how the intrinsic freeness principle actually arises. In this section, we will roughly sketch the most basic ideas behind the proof of Theorem 1.6.

The most natural way to interpolate between XX and XfreeX_{\rm free} is to define

XN=A0𝟏+i=1rAiGiNX^{N}=A_{0}\otimes\mathbf{1}+\sum_{i=1}^{r}A_{i}\otimes G_{i}^{N}

as in section 1.2, where G1N,,GrNG_{1}^{N},\ldots,G_{r}^{N} are independent GUE matrices. Then XN=XX^{N}=X when N=1N=1, while XNXfreeX^{N}\to X_{\rm free} weakly as NN\to\infty by Lemma 4.1. One may thus be tempted to approach intrinsic freeness by applying the polynomial method to XNX^{N}. The problem with this approach, however, is that the small parameter that arises in the polynomial method is not v~(X)\tilde{v}(X) as in Theorem 1.6, but rather 1N\frac{1}{N}. This is useless for understanding what happens when N=1N=1.

The basic issue here is that unlike classical strong convergence, the intrinsic freeness phenomenon is truly nonasymptotic in nature: it aims to capture an intrinsic property of XX that causes it to behave as the corresponding free model. Thus we cannot hope to deduce such a property from the asymptotic behavior of the model XNX^{N} alone; the proof must explicitly explain where intrinsic freeness comes from, and why it is quantified by a parameter such as v~(X)\tilde{v}(X).

4.2.1. The interpolation method

Rather than using XNX^{N} as an interpolating family, the proof of intrinsic freeness is based on a continuous interpolating family parametrized by q[0,1]q\in[0,1]. Roughly speaking, we would like to define

“ Xq=qX+1qXfree ”,\text{`` }X_{q}=\sqrt{q}\,X+\sqrt{1-q}\,X_{\rm free}\text{ ''},

and apply the fundamental theorem of calculus as explained in section 1.3.1 to bound the discrepancy between the spectral statistics of X=X1X=X_{1} and Xfree=X0X_{\rm free}=X_{0}. The obvious problem with the above definition is that it makes no sense: XX is random matrix and XfreeX_{\rm free} is a deterministic operator, which live in different spaces. To implement this program, we will construct proxies for XX and XfreeX_{\rm free} that are high-dimensional random matrices of the same dimension.

To this end, we proceed as follows. Let G1N,,GrNG_{1}^{N},\ldots,G_{r}^{N} be independent GUE matrices, and let D1N,,DrND_{1}^{N},\ldots,D_{r}^{N} be independent diagonal matrices with i.i.d. standard Gaussian entries on the diagonal. Then we define the DN×DNDN\times DN random matrices

XqN=A0𝟏+i=1rAi(qDiN+1qGiN).X_{q}^{N}=A_{0}\otimes\mathbf{1}+\sum_{i=1}^{r}A_{i}\otimes\big{(}\sqrt{q}\,D_{i}^{N}+\sqrt{1-q}\,G_{i}^{N}\big{)}.

The significance of this definition is that

𝐄[tr(X1N)p]=𝐄[trXp],limN𝐄[tr(X0N)p]=(trτ)(Xfreep)\mathbf{E}[\mathop{\mathrm{tr}}{(X_{1}^{N})^{p}}]=\mathbf{E}[\mathop{\mathrm{tr}}X^{p}],\qquad\quad\lim_{N\to\infty}\mathbf{E}[\mathop{\mathrm{tr}}{(X_{0}^{N})^{p}}]=({\mathop{\mathrm{tr}}}\otimes\tau)\big{(}X_{\rm free}^{p}\big{)}

for all pp\in\mathbb{N}; the first identity follows as X1NX_{1}^{N} is a block-diagonal matrix with i.i.d. copies of XX on the diagonal, while the second follows by Lemma 4.1 as X0N=XNX_{0}^{N}=X^{N}. Thus XqNX_{q}^{N} does indeed interpolate between XX and XfreeX_{\rm free} in the limit as NN\to\infty. (We emphasize that we now view qq as the interpolation parameter, as opposed to the interpolation parameter 1N\frac{1}{N} in the polynomial method.)

Now that we have defined a suitable interpolation, we aim to compute the rate of change ddq𝐄[trh(XqN)]\frac{d}{dq}\mathbf{E}[\mathop{\mathrm{tr}}h(X_{q}^{N})] of spectral statistics along the interpolation: if it is small, then the spectral statistics of XX and XfreeX_{\rm free} must be nearly the same. For simplicity, we will illustrate the method using moments h(x)=x2ph(x)=x^{2p}, which suffices to capture the operator norm by the moment method.151515To achieve Theorem 1.6 in its full strength, one uses instead spectral statistics of the form h(x)=|zx|2ph(x)=|z-x|^{-2p} for zz\in\mathbb{C}, Imz>0\mathrm{Im}\,z>0. The computations involved are however very similar. We state the resulting expression informally; the computation is somewhat tedious (see the proof of [10, Lemma 5.4]) but uses only standard tools of Gaussian analysis.

Lemma 4.6 (Informal statement).

For any pp\in\mathbb{N}, we have

ddq𝐄[tr(XqN)2p]=sum of terms of the form𝐄[trHaN(XqN)m1H~bN(XqN)m2HaN(X~qN)m3H~bN(X~qN)m4]with m1+m2+m3+m4=2p4 and a,b{0,1},\frac{d}{dq}\mathbf{E}[\mathop{\mathrm{tr}}{(X_{q}^{N})^{2p}}]=\text{sum of terms of the form}\\ \mathbf{E}\big{[}{\mathop{\mathrm{tr}}H_{a}^{N}\,(X_{q}^{N})^{m_{1}}\,\tilde{H}_{b}^{N}\,(X_{q}^{N})^{m_{2}}\,H_{a}^{N}\,(\tilde{X}_{q}^{N})^{m_{3}}\,\tilde{H}_{b}^{N}\,(\tilde{X}_{q}^{N})^{m_{4}}}\big{]}\\ \phantom{\sum}\text{with }m_{1}+m_{2}+m_{3}+m_{4}=2p-4\text{ and }a,b\in\{0,1\},

where X~qN\tilde{X}_{q}^{N} is a suitably constructed (dependent) copy of XqNX_{q}^{N} and HqN,H~qNH_{q}^{N},\tilde{H}_{q}^{N} are independent copies of XqN𝐄[XqN]X_{q}^{N}-\mathbf{E}[X_{q}^{N}].

From a conceptual perspective, the expression of Lemma 4.6 should not be unexpected. Indeed, the explicit formulas for 𝐄[trX2p]\mathbf{E}[\mathop{\mathrm{tr}}X^{2p}] and (trτ)(Xfree2p)({\mathop{\mathrm{tr}}}\otimes\tau)(X_{\rm free}^{2p}) that arise from the Wick formula and Definition 4.2, respectively, differ only in that the former has a sum over all pairings while the latter sums only over noncrossing pairings. Thus the difference between these two quantities is a sum over all pairings that contain at least one crossing. The point of the interpolation method, however, is that by changing qq infinitesimally we can isolate the effect of a single crossing—this is precisely what Lemma 4.6 shows. This key feature of the interpolation method is crucial for accessing the edges of the spectrum (see section 4.3).

4.2.2. The crossing inequality

By Lemma 4.6, it remains to control the effect of a single crossing. We can now finally explain the significance of the mysterious parameter v~(X)\tilde{v}(X): this parameter controls the contribution of crossings. The following result is a combination of [10, Lemma 4.5 and Proposition 4.6].

Lemma 4.7 (Crossing inequality).

Let H,H~H,\tilde{H} be any independent and centered self-adjoint random matrices, and 1p1,,p41\leq p_{1},\ldots,p_{4}\leq\infty with 1p1++1p4=1\frac{1}{p_{1}}+\cdots+\frac{1}{p_{4}}=1. Then

|𝐄[trHM1H~M2HM3H~M4]|v~(H)2v~(H~)2i=14(tr|Mi|pi)1pi\big{|}\mathbf{E}\big{[}{\mathop{\mathrm{tr}}H\,M_{1}\,\tilde{H}\,M_{2}\,H\,M_{3}\,\tilde{H}\,M_{4}}\big{]}\big{|}\leq\tilde{v}(H)^{2}\,\tilde{v}(\tilde{H})^{2}\prod_{i=1}^{4}\big{(}{\mathop{\mathrm{tr}}|M_{i}|^{p_{i}}}\big{)}^{\frac{1}{p_{i}}}

for any matrices M1,,M4M_{1},\ldots,M_{4} that are independent of H,H~H,\tilde{H}.

Rather than reproduce the details of the proof of this inequality here, we aim to explain the intuition behind the proof.

Idea behind the proof of Lemma 4.7.

We first observe that it suffices by the Riesz–Thorin interpolation theorem [16, p. 202] to prove the theorem for the case p4=1p_{4}=1. Thus the proof reduces to bounding the matrix alignment parameter

w(H,H~)4=supM1,M2,M31𝐄[HM1H~M2HM3H~]w(H,\tilde{H})^{4}=\sup_{\|M_{1}\|,\|M_{2}\|,\|M_{3}\|\leq 1}\big{\|}\mathbf{E}\big{[}H\,M_{1}\,\tilde{H}\,M_{2}\,H\,M_{3}\,\tilde{H}\big{]}\big{\|}

by

v~(H)2v~(H~)2=𝐄[H2]12Cov(H)12𝐄[H~2]12Cov(H~)12.\tilde{v}(H)^{2}\,\tilde{v}(\tilde{H})^{2}=\|\mathbf{E}[H^{2}]\|^{\frac{1}{2}}\,\|\mathrm{Cov}(H)\|^{\frac{1}{2}}\,\|\mathbf{E}[\tilde{H}^{2}]\|^{\frac{1}{2}}\,\|\mathrm{Cov}(\tilde{H})\|^{\frac{1}{2}}.

How can we do this? The basic intuition behind the proof is as follows. Note first that if GG is a GUE matrix, then Cov(G)=1N𝟏\mathrm{Cov}(G)=\frac{1}{N}\mathbf{1}. Thus

Cov(H)NCov(H)Cov(G)\mathrm{Cov}(H)\leq N\,\|\mathrm{Cov}(H)\|\,\mathrm{Cov}(G)

for any random matrix HH. If it were to be the case that w(H,H~)w(H,\tilde{H}) is monotone as a function of Cov(H)\mathrm{Cov}(H) and Cov(H~)\mathrm{Cov}(\tilde{H}), then one could bound

w(H,H~)4?N2Cov(H)Cov(H~)w(G,G~)=Cov(H)Cov(H~)w(H,\tilde{H})^{4}\mathop{\stackrel{{\scriptstyle?}}{{\leq}}}N^{2}\,\|\mathrm{Cov}(H)\|\,\|\mathrm{Cov}(\tilde{H})\|\,w(G,\tilde{G})=\|\mathrm{Cov}(H)\|\,\|\mathrm{Cov}(\tilde{H})\|

using that w(G,G~)=1N2w(G,\tilde{G})=\frac{1}{N^{2}} for independent GUE matrices G,G~G,\tilde{G} by (4.1).

Unfortunately, w(H,H~)w(H,\tilde{H}) is not monotone as a function of Cov(H)\mathrm{Cov}(H) and Cov(H~)\mathrm{Cov}(\tilde{H}), so the above reasoning does not apply directly. However, we can use a trick to rescue the argument. The key observation is that the parameter w(H,H~)w(H,\tilde{H}) can be “symmetrized” by applying the Cauchy–Schwarz inequality as is illustrated informally in Figure 4.2. This results in two symmetric terms—a single pair which is readily bounded by 𝐄[H2]\|\mathbf{E}[H^{2}]\|, and a double crossing that is a positive functional of (and hence monotone in) Cov(H)\mathrm{Cov}(H). We can thus apply the above logic to the double crossing to replace HH by a GUE matrix, which yields a factor Cov(H)\|\mathrm{Cov}(H)\|. The term that remains can now be bounded using a similar argument.

M1\scriptstyle M_{1}M2\scriptstyle M_{2}M3\scriptstyle M_{3}H\scriptstyle HH~\scriptstyle\tilde{H}\leq12\frac{1}{2}M1\scriptstyle M_{1}M1\scriptstyle M_{1}^{*}H\scriptstyle H12\frac{1}{2}M3\scriptstyle M_{3}^{*}M2\scriptstyle M_{2}^{*}M2\scriptstyle M_{2}M3\scriptstyle M_{3}H~\scriptstyle\tilde{H}H~\scriptstyle\tilde{H}H\scriptstyle H
Figure 4.2. Cauchy–Schwarz argument in the proof of Lemma 4.7.

We can now sketch how all the above ingredients fit together. Combining Lemmas 4.6 and 4.7 with pi=2p4mip_{i}=\frac{2p-4}{m_{i}} yields an inequality of the form

|ddq𝐄[tr(XqN)2p]|p4v~(X)4𝐄[tr(XqN)2p4].\bigg{|}\frac{d}{dq}\mathbf{E}[\mathop{\mathrm{tr}}{(X_{q}^{N})^{2p}}]\bigg{|}\lesssim p^{4}\,\tilde{v}(X)^{4}\,\mathbf{E}[\mathop{\mathrm{tr}}{(X_{q}^{N})^{2p-4}}].

Using 𝐄[tr(XqN)2p4]𝐄[tr(XqN)2p]12p\mathbf{E}[\mathop{\mathrm{tr}}{(X_{q}^{N})^{2p-4}}]\leq\mathbf{E}[\mathop{\mathrm{tr}}{(X_{q}^{N})^{2p}}]^{1-\frac{2}{p}} by Jensen’s inequality, we obtain a differential inequality that can be integrated by a straightforward change of variables. This yields (after taking NN\to\infty) the final inequality

|𝐄[trX2p]12p(trτ)(Xfree2p)12p|p34v~(X).\big{|}\mathbf{E}[\mathop{\mathrm{tr}}X^{2p}]^{\frac{1}{2p}}-({\mathop{\mathrm{tr}}}\otimes\tau)\big{(}X_{\rm free}^{2p}\big{)}^{\frac{1}{2p}}\big{|}\lesssim p^{\frac{3}{4}}\tilde{v}(X).

This inequality captures the intrinsic freeness phenomenon for the moments of XX. Since the right-hand side depends only polynomially on the degree pp, however, one can apply the moment method as explained in section 3.6.1 to deduce also a bound on the operator norm X\|X\| by Xfree\|X_{\rm free}\|. In this manner, we achieve both weak convergence and norm convergence of XX to XfreeX_{\rm free} as v~(X)0\tilde{v}(X)\to 0.

Remark 4.8.

The matrix alignment parameter w(H,H~)w(H,\tilde{H}) that appears in the proof of Lemma 4.7 (as well as the use of the Riesz–Thorin theorem in this context) was first introduced in the work of Tropp [125], which predates the discovery of the intrinsic freeness principle. Let us briefly explain how it appears there.

The idea of [125] is to mimic the classical proof of the Schwinger-Dyson equation for GUE matrices, see, e.g., [55, Chapter 2], in the context of a general Gaussian random matrix. Tropp observed that the error term that arises from this argument can be naturally bounded by w(H,H~)w(H,\tilde{H}), and that this parameter is small in some examples (e.g., for matrices with independent entries).

The reason this argument cannot give rise to generally applicable bounds is that it fails to capture the intrinsic freeness phenomenon. Indeed, the validity of the Schwinger-Dyson equation for GUE matrices requires that H,H~H,\tilde{H} themselves behave as free semicircular variables; this is not at all the case in general, as the spectral distribution XfreeX_{\rm free} need not look anything like a semicircle. To ensure this is the case, [125] has to impose strong symmetry assumptions on HH that are close in spirit to the classical setting of Voiculescu’s asymptotic freeness.161616The paper [125] also develops another set of inequalities that are applicable to general Gaussian matrices, but are suboptimal by a dimension-dependent multiplicative factor. We do not discuss these inequalities as they are less closely connected to the topic of this survey.

In contrast, intrinsic freeness captures a more subtle property of random matrices: v~(H)\tilde{v}(H) does not quantify whether HH itself behaves freely, but rather how sensitive the model H=i=1nAigiH=\sum_{i=1}^{n}A_{i}g_{i} is to whether the scalar variables gig_{i} are taken to be commutative or free. Consequently, when v~(H)\tilde{v}(H) is small, the variables gig_{i} can be replaced by their free counterparts sis_{i} (i.e., “liberated”) with a negligible effect on the spectral statistics. This viewpoint paves the way to the development of the interpolation method which is key to subsequent developments.

The works of Haagerup–Thorbjørnsen [60] and Tropp [125] may nonetheless be viewed as precursors to the intrinsic freeness principle, and provided the motivation for the development of the theory that is described in this section.

4.3. Discussion: on the role of interpolation

To conclude this section, we aim to explain why the interpolation method plays an essential role in the development of intrinsic freeness. For simplicity we will assume in this section that A0=0A_{0}=0, so that X=i=1rAigiX=\sum_{i=1}^{r}A_{i}g_{i} is a centered Gaussian matrix.

Since the moments of XX and XfreeX_{\rm free} can be easily computed explicitly, it is tempting to reason directly using the resulting expressions. More precisely, note that

𝐄[trX2p]=\displaystyle\mathbf{E}[\mathop{\mathrm{tr}}X^{2p}]= πP2[2p]\displaystyle\sum_{\pi\in\mathrm{P}_{2}[2p]} 𝐄[trX1|πX2p|π]\displaystyle\mathbf{E}[\mathop{\mathrm{tr}}X^{1|\pi}\cdots X^{2p|\pi}]
by the Wick formula, while
(trτ)(Xfree2p)=\displaystyle({\mathop{\mathrm{tr}}}\otimes\tau)\big{(}X_{\rm free}^{2p}\big{)}= πNC2[2p]\displaystyle\sum_{\pi\in\mathrm{NC}_{2}[2p]} 𝐄[trX1|πX2p|π]\displaystyle\mathbf{E}[\mathop{\mathrm{tr}}X^{1|\pi}\cdots X^{2p|\pi}]

by Definition 4.2. Thus clearly the difference between these two expressions involves only a sum over crossing pairings, and we can control each term in the sum directly using Lemma 4.7. This elementary approach yields the inequality

|𝐄[trX2p](trτ)(Xfree2p)|(Cp)pv~(X)4𝐄[trX2p4],\big{|}\mathbf{E}[\mathop{\mathrm{tr}}X^{2p}]-({\mathop{\mathrm{tr}}}\otimes\tau)\big{(}X_{\rm free}^{2p}\big{)}\big{|}\leq(Cp)^{p}\,\tilde{v}(X)^{4}\,\mathbf{E}[\mathop{\mathrm{tr}}X^{2p-4}],

where we used that the number of crossing pairings of [2p][2p] is of order (Cp)p(Cp)^{p} for a universal constant CC. In particular, we obtain

|𝐄[trX2p]12p(trτ)(Xfree2p)12p|pv~(X)2p(𝐄[trX2p]12p)12p.\big{|}\mathbf{E}[\mathop{\mathrm{tr}}X^{2p}]^{\frac{1}{2p}}-({\mathop{\mathrm{tr}}}\otimes\tau)\big{(}X_{\rm free}^{2p}\big{)}^{\frac{1}{2p}}\big{|}\lesssim\sqrt{p}\,\tilde{v}(X)^{\frac{2}{p}}\,\big{(}\mathbf{E}[\mathop{\mathrm{tr}}X^{2p}]^{\frac{1}{2p}}\big{)}^{1-\frac{2}{p}}.

This inequality suffices to prove weak convergence of XX to XfreeX_{\rm free} as v~(X)0\tilde{v}(X)\to 0, but is far too weak to provide access to the edges of the spectrum. To see why, recall from section 3.6.1 that to bound the norm of the D×DD\times D matrix XX by the moment method, we must control 𝐄[trX2p]12p\mathbf{E}[\mathop{\mathrm{tr}}X^{2p}]^{\frac{1}{2p}} for plogDp\gg\log D. However, even when XX is a GUE matrix we only have v~(X)=D14\tilde{v}(X)=D^{-\frac{1}{4}}, so that the error term pv~(X)2p\sqrt{p}\,\tilde{v}(X)^{\frac{2}{p}} in the above inequality diverges as DD\to\infty when plogDp\gg\log D.

The reason for the inefficiency of this approach is readily understood. What we used is that the difference between the moments of XX and XfreeX_{\rm free} is a sum of terms with at least one crossing. However, most pairings of [2p][2p] contain not just one crossing, but many (typically of order pp) crossings at the same time. Unfortunately, Lemma 4.7 can only capture the effect of a single crossing: it cannot be iterated to obtain an improved bound in the presence of multiple crossings, as the Hölder type bound destroys the structure of the pairing. Thus we are forced to ignore the effect of multiple crossings, which results in a loss of information.

The key feature of the interpolation method that is captured by Lemma 4.6 is that when we move infinitesimally from XX to XfreeX_{\rm free}, the change of the moments is controlled by a single crossing rather than by many crossings at the same time. This is the reason why we are able to obtain an efficient bound using the somewhat crude crossing inequality provided by Lemma 4.7.

Remark 4.9.

In the special case that XX is a GUE matrix, we obtained a much better result than Lemma 4.7: the crossing identity (4.1) captures the effect of a crossing exactly. This identity can be iterated in the presence of multiple crossings, which results in the genus expansion for GUE matrices (see, e.g., [99, §1.7]). This is a rather special feature of classical random matrix models, however, and we do not know of any method that can meaningfully capture the effect of multiple crossings in the setting of arbitrarily structured random matrices.

5. Applications

In recent years, strong convergence has led to several striking applications to problems in different areas of mathematics, which has in turn motivated new developments surrounding the strong convergence phenomenon. The aim of this section is to briefly describe some of these applications. The discussion is necessarily at a high level, since the detailed background needed to understand each application is beyond the scope of this survey. Our primary aim is to give a hint as to why and how strong convergence enters in these different settings.

We will focus on applications where strong convergence enters in a non-obvious manner. In particular, we omit applications of the intrinsic freeness principle in applied mathematics, since it is generally applied in a direct manner to analyze complicated random matrices that arise in such applications.

5.1. Random lifts of graphs

We begin by recalling some basic notions that can be found, for example, in [71, §6].

Let G=(V,E)G=(V,E) be a connected graph. A connected graph G=(V,E)G^{\prime}=(V^{\prime},E^{\prime}) is said to cover GG if there is a surjective map f:VVf:V^{\prime}\to V that maps the local neighborhood of each vertex vv^{\prime} in GG^{\prime} bijectively to the local neighborhood of f(v)f(v^{\prime}) in GG (the local neighborhood consists of the given vertex and the edges incident to it).171717This definition is slightly ambiguous if GG has a self-loop, which we gloss over for simplicity.

Every connected graph GG has a universal cover G~\tilde{G} which covers all other covers of GG. Given a base vertex v0v_{0} in GG, one can construct G~\tilde{G} by choosing its vertex set to be the set of all finite non-backtracking paths in GG starting at v0v_{0}, with two vertices being joined by an edge if one of the paths extends the other by one step; thus G~\tilde{G} is a tree (the construction does not depend on the choice of v0v_{0}).

It is clear that if GG is any dd-regular graph, then G~\tilde{G} is the infinite dd-regular tree. In particular, all dd-regular graphs have the same universal cover. In this setting, we have an optimal spectral gap phenomenon: for any sequence of dd-regular graphs with diverging number of vertices, the maximum nontrivial eigenvalue is asymptotically lower bounded by the spectral radius of the universal cover (Lemma 1.2), and this bound is attained by random dd-regular graphs (Theorem 1.3).

It is expected that the optimal spectral gap phenomenon is a very general one that is not specific to the setting of dd-regular graphs. Progress in this direction was achieved only recently, however, and makes crucial use of strong convergence. In this section, we will describe such a phenomenon in the setting of non-regular graphs; the setting of hyperbolic surfaces will be discussed in section 5.2 below.

5.1.1. Random lifts

From the perspective of the lower bound, there is nothing particularly special about dd-regular graphs beside that they all have the same universal cover. Indeed, for any sequence of graphs with diverging number of vertices that have the same universal cover, the maximum nontrivial eigenvalue is asymptotically lower bounded by the spectral radius of the universal cover. This follows by a straightforward adaptation of Lemma 1.2, cf. [71, Theorem 6.6].

What may be less obvious, however, is how to construct a model of random graphs that share the same universal cover beyond the regular setting. The natural way to think about this problem, which dates back to Friedman [49] (see also [3]), is as follows. Fix any finite connected base graph GG; we will then construct random graphs with an increasing number of vertices by choosing a sequence of random finite covers of GG. By construction, the universal cover of all these graphs coincides with the universal cover G~\tilde{G} of the base graph.

To this end, let us explain how to construct finite covers of a finite connected graph G=(V,E)G=(V,E). Fix an arbitrary orientation (x,y)(x,y) for every edge {x,y}E\{x,y\}\in E, and denote by EorE_{\mathrm{or}} the set of oriented edges. Fix also NN\in\mathbb{N} and a permutation σe𝐒N\sigma_{e}\in\mathbf{S}_{N} for each eEore\in E_{\rm or}. Then we can construct a graph GN=(VN,EN)G^{N}=(V^{N},E^{N}) with

VN=V×[N]V^{N}=V\times[N]

and

EN={{(x,i),(y,σe(i))}:e=(x,y)Eor,i[N]}.E^{N}=\big{\{}\{(x,i),(y,\sigma_{e}(i))\}:e=(x,y)\in E_{\rm or},~{}i\in[N]\big{\}}.

In other words, GNG^{N} is obtained by taking NN copies of GG, and scrambling the endpoints of the NN copies of each edge ee according the permutation σe\sigma_{e} (see Figure 5.1). Then GNG^{N} is a cover of GG with covering map f:(x,i)xf:(x,i)\mapsto x.

Conversely, it is not difficult to see that any finite cover of GG can be obtained in this manner by some choice of NN and σe\sigma_{e} (as all fibers f1(x)f^{-1}(x) of a covering map ff must have the same cardinality NN, called the degree of the cover), and that the set of graphs thus constructed is independent of the choice of orientation EorE_{\rm or}.

(12)(3)\scriptstyle(12)(3)(1)(2)(3)\scriptstyle(1)(2)(3)(1)(23)\scriptstyle(1)(23)(123)\scriptstyle(123)(1)(2)(3)\scriptstyle(1)(2)(3)
Figure 5.1. A finite cover GNG^{N} of degree N=3N=3 (right) of a base graph GG (left). The three copies of the vertices of GG in GNG^{N} are highlighted by the shaded regions.
Remark 5.1.

GNG^{N} need not be connected for every choice of σe\sigma_{e}; for example, if each σe\sigma_{e} is the identity permutation, then GNG^{N} consists of NN disjoint copies of GG. It is always the case, however, that each connected component of GNG^{N} is a cover of GG.

The above construction immediately gives rise to the natural model of random covers of graphs: given a finite connected base graph GG, a random cover GNG^{N} of degree NN is obtained by choosing the permutations σe\sigma_{e} in the above construction independently and uniformly at random from 𝐒N\mathbf{S}_{N}. This model is commonly referred to as the random lift model in graph theory (as a cover of degree NN of a finite graph is sometimes referred to in graph theory as an NN-lift).

5.1.2. Old and new eigenvalues

From now on, we fix the base graph G=(V,E)G=(V,E) and its random lifts GNG^{N} as above. Then it is clear from the construction that the adjacency matrix ANA^{N} of GNG^{N} can be expressed as

AN=e=(x,y)Eor(eyexUeN+exeyUeN),A^{N}=\sum_{e=(x,y)\in E_{\rm or}}\big{(}e_{y}e_{x}^{*}\otimes U_{e}^{N}+e_{x}e_{y}^{*}\otimes U_{e}^{N*}\big{)},

where {ex}xV\{e_{x}\}_{x\in V} is the coordinate basis of V\mathbb{C}^{V} and {UeN}eEor\{U_{e}^{N}\}_{e\in E_{\rm or}} are i.i.d. random permutation matrices of dimension NN. The significance of strong convergence for this model is now obvious: we have encoded the adjacency matrix of the random lift model as a polynomial of degree one with matrix coefficients of i.i.d. permutation matrices, to which Theorem 1.4 can be applied.

Before we can do so, however, we must clarify the nature of the optimal spectral gap phenomenon in the present setting. In first instance, one might hope to establish the obvious converse to the lower bound, that is, that AN|1\|A^{N}|_{1^{\perp}}\| converges to the spectral radius ϱ\varrho of the universal cover G~\tilde{G}. Such a statement cannot be true in general, however, for the following reason. Note that for any vVv\in\mathbb{C}^{V}, we have

AN(v1)=Av1,A^{N}(v\otimes 1)=Av\otimes 1,

where AA denotes the adjacency matrix of GG. Thus any eigenvalue λ\lambda of GG is also an eigenvalue of GNG^{N}, since the corresponding eigenvector vv of AA lifts to an eigenvector v1v\otimes 1 of ANA^{N}: in other words, the eigenvalues of the base graph are always inherited by its covers. In particular, if the base graph GG happens to have an eigenvalue λ\lambda that is strictly larger than ϱ\varrho, then AN|1λ>ϱ\|A^{N}|_{1^{\perp}}\|\geq\lambda>\varrho for all NN.

For this reason, the best we can hope for is to show that the new eigenvalues of GNG^{N}, that is, those eigenvalues that are not inherited from GG, are asymptotically bounded by the spectral radius of G~\tilde{G}. More precisely, denote by

ANnew=AN|(V1)=e=(x,y)Eor(eyexUeN|1+exeyUeN|1)A^{N}_{\rm new}=A^{N}|_{(\mathbb{C}^{V}\otimes 1)^{\perp}}=\sum_{e=(x,y)\in E_{\rm or}}\big{(}e_{y}e_{x}^{*}\otimes U_{e}^{N}|_{1^{\perp}}+e_{x}e_{y}^{*}\otimes U_{e}^{N*}|_{1^{\perp}}\big{)}

the restriction of ANA^{N} to the space spanned by the new eigenvalues. Then we aim to show that ANnew\|A^{N}_{\rm new}\| converges to the spectral radius of G~\tilde{G}. This is the correct formulation of the optimal spectral gap phenomenon for the random lift model: indeed, a variant of the lower bound shows that for any sequence of covers of GG with diverging number of vertices, the maximum new eigenvalue is asymptotically lower bounded by the spectral radius of G~\tilde{G} [49, §4].

As was noted by Bordenave and Collins [19], the validity of the optimal spectral gap phenomenon for random lifts, conjectured by Friedman [49], is now a simple corollary of strong convergence of random permutation matrices.

Corollary 5.2 (Optimal spectral gap of random lifts).

Fix any finite connected graph GG, and denote by ϱ\varrho the spectral radius of its universal cover G~\tilde{G}. Then

limNANnew=ϱin probability.\lim_{N\to\infty}\|A^{N}_{\rm new}\|=\varrho\quad\text{in probability}.
Proof.

It follows immediately from Theorem 1.4 that ANnewa\|A^{N}_{\rm new}\|\to\|a\| with

a=e=(x,y)Eor(eyexλ(ge)+exeyλ(ge1)),a=\sum_{e=(x,y)\in E_{\rm or}}\big{(}e_{y}e_{x}^{*}\otimes\lambda(g_{e})+e_{x}e_{y}^{*}\otimes\lambda(g_{e}^{-1})\big{)},

where geg_{e} are the generators of a free group 𝐅\mathbf{F} and λ\lambda is the left-regular representation of 𝐅\mathbf{F}. It remains to show that in fact a=ϱ\|a\|=\varrho.

To see this, note that by construction, aa is an adjacency matrix of an infinite graph with vertex set V×𝐅V\times\mathbf{F}. Moreover, all vertices reachable from an initial vertex (v0,g)(v_{0},g) have the form (vk,g(vk1,vk)g(v0,v1)g)(v_{k},g_{(v_{k-1},v_{k})}\cdots g_{(v_{0},v_{1})}g) where (v0,,vk)(v_{0},\ldots,v_{k}) is a path in GG and we define g(y,x)=g(x,y)1g_{(y,x)}=g_{(x,y)}^{-1} for (x,y)Eor(x,y)\in E_{\rm or}. Note that this description is not unique: two paths define the same vertex if g(vk1,vk)g(v0,v1)g_{(v_{k-1},v_{k})}\cdots g_{(v_{0},v_{1})} reduces to the same element of 𝐅\mathbf{F}. Thus the vertices reachable from (v0,g)(v_{0},g) are uniquely indexed by paths (v1,,vk)(v_{1},\ldots,v_{k}) so that g(vk1,vk)g(v0,v1)g_{(v_{k-1},v_{k})}\cdots g_{(v_{0},v_{1})} is reduced, i.e., by nonbacktracking paths. We have therefore shown that aa is the adjecency matrix of an infinite graph, each of whose connected components is isomorphic to G~\tilde{G}. ∎

Corollary 5.2 may be viewed as a far-reaching generalization of Theorem 1.3. Indeed, the permutation model of random 2r2r-regular graphs is a special case of the random lift model, obtained by choosing the base graph GG to consist of a single vertex with rr self-loops (often called a “bouquet”).

Even though Corollary 5.2 is only concerned with the new eigenvalues of GNG^{N}, it implies the classical spectral gap property AN|1ϱ\|A^{N}|_{1^{\perp}}\|\to\varrho whenever the base graph satisfies A|1ϱ\|A|_{1^{\perp}}\|\leq\varrho. Another simple consequence is that whenever the base graph satisfies A>ϱ\|A\|>\varrho, the random lift GNG^{N} is connected with probability 1o(1)1-o(1); this holds if and only if GG has at least two cycles [73, Theorem 2].

5.2. Buser’s conjecture

Let XX be a hyperbolic surface, that is, a connected Riemannian surface of constant curvature 1-1. Then XX has the hyperbolic plane \mathbb{H} as its universal cover, and we can in fact obtain X=Γ\X=\Gamma\backslash\mathbb{H} as a quotient of the hyperbolic plane by a Fuchsian group Γ\Gamma (i.e., a discrete subgroup of PSL2()\mathrm{PSL}_{2}(\mathbb{R})) which is isomorphic to the fundamental group Γπ1(X)\Gamma\simeq\pi_{1}(X).

If XX is a closed hyperbolic surface, its Laplacian ΔX\Delta_{X} has discrete eigenvalues

0=λ0(X)<λ1(X)λ2(X)0=\lambda_{0}(X)<\lambda_{1}(X)\leq\lambda_{2}(X)\leq\cdots

The following is the direct analogue in this setting of Lemma 1.2.

Lemma 5.3 (Huber [77], Cheng [35]).

For any sequence XNX^{N} of closed hyperbolic surfaces with diverging diameter, we have

λ1(XN)14+o(1)asN.\lambda_{1}(X^{N})\leq\frac{1}{4}+o(1)\quad\text{as}\quad N\to\infty.

The significance of the value λ1()=14\lambda_{1}(\mathbb{H})=\frac{1}{4} is that it is the bottom of the spectrum of the Laplacian Δ\Delta_{\mathbb{H}} on the hyperbolic plane.

It is therefore natural to ask whether there exist closed hyperbolic surfaces with arbitrarily large diameter (or, equivalently in this setting, arbitrarily large genus) that attain this bound. The existence of such surfaces with optimal spectral gap, a long-standing conjecture181818Curiously, Buser has at different times conjectured both existence [29] and nonexistence [28] of such surfaces. On the other hand, the (very much open) Selberg eigenvalue conjecture in number theory [120] predicts that a specific class of noncompact hyperbolic surfaces have this property. of Buser [29], was resolved by Hide and Magee [69] by means of a striking application of strong convergence.

5.2.1. Random covers

The basic approach of the work of Hide and Magee is to prove an optimal spectral gap phenomenon for random covers XNX^{N} of a given base surface XX, in direct analogy with the random lift model for graphs. To explain how such covers are constructed, we must first sketch the analogue in the present setting of the covering construction described in section 5.1.1.

Let us begin with an informal discussion. The action of a Fuchsian group Γ\Gamma on \mathbb{H} defines a Dirichlet fundamental domain FF whose translates {γF:γΓ}\{\gamma F:\gamma\in\Gamma\} tile \mathbb{H}; FF is a polygon whose sides are given by FγFF\cap\gamma F and Fγ1FF\cap\gamma^{-1}F for some generating set γ{γ1,,γs}\gamma\in\{\gamma_{1},\ldots,\gamma_{s}\} of Γ\Gamma. Then X=Γ\X=\Gamma\backslash\mathbb{H} is obtained from FF by gluing each pair of sides FγiFF\cap\gamma_{i}F and Fγi1FF\cap\gamma_{i}^{-1}F. See Figure 5.2 and [12, Chapter 9].

FFγ1F\scriptstyle\gamma_{1}Fγ2-1F\scriptstyle\gamma_{2}^{\text{-}1}Fγ1-1F\scriptstyle\gamma_{1}^{\text{-}1}Fγ2F\scriptstyle\gamma_{2}Fγ3F\scriptstyle\gamma_{3}Fγ4-1F\scriptstyle\gamma_{4}^{\text{-}1}Fγ3-1F\scriptstyle\gamma_{3}^{\text{-}1}Fγ4F\scriptstyle\gamma_{4}FΓ\\Gamma\backslash\mathbb{H}
Figure 5.2. Illustration of a tiling of \mathbb{H} (in the Poincaré disc model) by hyperbolic octagons; gluing the sides of the fundamental domain FF yields a genus 22 surface.

To construct a candidate NN-fold cover XNX^{N} of XX, we fix NN copies F×[N]F\times[N] of the fundamental domain and permutations σ1,,σs𝐒N\sigma_{1},\ldots,\sigma_{s}\in\mathbf{S}_{N}. We then glue the side (FγiF)×{k}(F\cap\gamma_{i}F)\times\{k\} to the corrsponding side (Fγi1F)×{σi(k)}(F\cap\gamma_{i}^{-1}F)\times\{\sigma_{i}(k)\}, that is, we scramble the gluing of the sides between the copies of FF. Unlike in the case of graphs, however, it need not be the case that every choice of σi\sigma_{i} yields a valid covering: if we glue the sides without regard for the corners of FF, the resulting surface may develop singularities. The additional condition that is needed to obtain a valid covering is that σ1,,σs\sigma_{1},\ldots,\sigma_{s} must satisfy the same relations as γ1,,γs\gamma_{1},\ldots,\gamma_{s}; that is, we must choose σi=πN(γi)\sigma_{i}=\pi_{N}(\gamma_{i}) for some πNHom(Γ,𝐒N)\pi_{N}\in\mathrm{Hom}(\Gamma,\mathbf{S}_{N}).

More formally, this construction can be implemented as follows. Fix a base surface X=Γ\X=\Gamma\backslash\mathbb{H} and a homomorphism πNHom(Γ,𝐒N)\pi_{N}\in\mathrm{Hom}(\Gamma,\mathbf{S}_{N}). Define

XN=Γ\(×[N]),X^{N}=\Gamma\backslash(\mathbb{H}\times[N]),

where we let γΓ\gamma\in\Gamma act on ×[N]\mathbb{H}\times[N] as γ(z,i)=(γz,πN(γ)i)\gamma(z,i)=(\gamma z,\pi_{N}(\gamma)i). Then XNX^{N} is an NN-fold cover of XX, and every NN-fold cover of XX arises in this manner for some choice of πN\pi_{N}; cf. [64, pp. 68–70] or [53, §14a and §16d].

To define a random cover of XX we may now simply choose a random homomorphism πN\pi_{N}, or equivalently, choose σi=πN(γi)\sigma_{i}=\pi_{N}(\gamma_{i}) to be random permutations. The major complication that arises here is that these permutations cannot in general be chosen independently, since they must satisfy the relations of Γ\Gamma. For example, if XX is a closed orientable surface of genus gg, then Γ\Gamma is the surface group

ΓΓg=γ1,,γ2g|[γ1,γ2][γ2g1,γ2g]=1\Gamma\simeq\Gamma_{g}=\big{\langle}\gamma_{1},\ldots,\gamma_{2g}~{}\big{|}~{}[\gamma_{1},\gamma_{2}]\cdots[\gamma_{2g-1},\gamma_{2g}]=1\big{\rangle}

where [g,h]=ghg1h1[g,h]=ghg^{-1}h^{-1}. In this case, the random permutations σi\sigma_{i} must be chosen to satisfy [σ1,σ2][σ2g1,σ2g]=1[\sigma_{1},\sigma_{2}]\cdots[\sigma_{2g-1},\sigma_{2g}]=1, which precludes them from being independent. The reason this issue does not arise for graphs is that the fundamental group of every graph is free, and thus there are no relations to be satisfied.

The above obstacle has been addressed in three distinct ways.

  1. 1.

    While the fundamental group of a closed hyperbolic surface is never free, there are finite volume noncompact hyperbolic surfaces with a free fundamental group; e.g., the thrice punctured sphere has π1(X)=𝐅2\pi_{1}(X)=\mathbf{F}_{2} and admits a finite volume hyperbolic metric with three cusps. Thus random covers of such surfaces can be defined using independent random permutation matrices. Hide and Magee [69] proved an optimal spectal gap phenomenon for this model; this leads indirectly to a solution to Buser’s conjecture by compactifying the resulting surfaces.

  2. 2.

    Louder and Magee [86] showed that surface groups can be approximately embedded in free groups by mapping each generator of Γ\Gamma to a suitable word in the free group. This gives rise to a non-uniform random model of covers of closed hyperbolic surfaces by choosing πN\pi_{N} that maps each generator of Γ\Gamma to the coresponding word in independent random permutation matrices.

  3. 3.

    Finally, the most natural model of random covers of closed surfaces is to choose πNHom(Γ,𝐒N)\pi_{N}\in\mathrm{Hom}(\Gamma,\mathbf{S}_{N}) uniformly at random, that is, choose σi=πN(γi)\sigma_{i}=\pi_{N}(\gamma_{i}) uniformly at random among the set of tuples σ1,,σs𝐒N\sigma_{1},\ldots,\sigma_{s}\in\mathbf{S}_{N} that satisfy the relation of Γ\Gamma. This corresponds to choosing an NN-fold cover of XX uniformly at random [91]. The challenge in analyzing this model is that σi\sigma_{i} have a complicated dependence structure that cannot be reduced to independent random permutations.

These three approaches give rise to distinct models of random covers. The advantage of the first two approaches is that their analysis is based on strong convergence of independent random permutations (Theorem 1.4). This suffices for proving the existence of covers with optimal spectral gaps, i.e., to resolve Buser’s conjecture, but leaves unclear whether optimal spectral gaps are rare or common. That typical covers of closed surfaces have an optimal spectral gap was recently proved by Magee, Puder, and the author [92] by resolving the strong convergence problem for uniformly random πNHom(Γg,𝐒N)\pi_{N}\in\mathrm{Hom}(\Gamma_{g},\mathbf{S}_{N}) (cf. section 6.1).

The aim of the remainder of this section is to sketch how the optimal spectral gap problem for the Laplacian ΔXN\Delta_{X^{N}} of a random cover is encoded as a strong convergence problem. This reduction proceeds in an analogous manner for the three models described above. We therefore fix in the following a base surface X=Γ\X=\Gamma\backslash\mathbb{H} and a sequence of random homomorphisms πNHom(Γ,𝐒N)\pi_{N}\in\mathrm{Hom}(\Gamma,\mathbf{S}_{N}) as in any of the above models. The key assumption that will be needed, which holds in all three models, is that the random matrices (U1N,,UsN)|1(U_{1}^{N},\ldots,U_{s}^{N})|_{1^{\perp}} defined by

UiN\displaystyle U_{i}^{N} =πN(γi)\displaystyle=\pi_{N}(\gamma_{i})
converge strongly to the operators (u1,,us)(u_{1},\ldots,u_{s}) defined by
ui\displaystyle u_{i} =λΓ(γi).\displaystyle=\lambda_{\Gamma}(\gamma_{i}).

Here we implicitly identify πN(γi)𝐒N\pi_{N}(\gamma_{i})\in\mathbf{S}_{N} with the corresponding N×NN\times N permutation matrix, and λΓ\lambda_{\Gamma} denotes the left-regular representation of Γ\Gamma.

Remark 5.4.

Beside models of random covers of hyperbolic surfaces, another important model of random surfaces is obtained by sampling from the Weil–Petersson measure on the moduli space of hyperbolic surfaces of genus gg; this may be viewed as the natural notion of a typical surface of genus gg. In a recent tour-de-force, Anantharaman and Monk [4, 5] proved that the Weil–Petersson model also exhibits an optimal spectral gap phenomenon by using methods inspired by Friedman’s original proof of Theorem 1.3. In contrast to random cover models, it does not appear that this problem can be reduced to a strong convergence problem. However, it is an interesting question whether a form of the polynomial method, which plays a key role in [92], could be used to obtain a new proof of this result.

5.2.2. Exploiting strong convergence

In contrast to the setting of random lifts of graphs, it is not immediately clear how the Laplacian spectrum of random surface covers relates to strong convergence. This connection is due to Hide and Magee [69]; for expository purposes, we sketch a variant of their argument [70].

We begin with some basic observations. Any fL2(X)f\in L^{2}(X) lifts to a function in L2(XN)L^{2}(X^{N}) by composing it with the covering map ι:XNX\iota:X^{N}\to X. As

ΔXN(fι)=ΔXfι,\Delta_{X^{N}}(f\circ\iota)=\Delta_{X}f\circ\iota,

it follows precisely as for random lifts of graphs that the spectrum of the base surface XX is a subset of that of any of its covers XNX^{N}. What we aim to show is that the smallest new eigenvalue of ΔXN\Delta_{X^{N}}, that is, the smallest eigenvalue of its restriction ΔnewXN\Delta^{\rm new}_{X^{N}} to the orthogonal complement of functions lifted from XX, converges to the bottom of the spectrum of Δ\Delta_{\mathbb{H}}. In other words, we aim to prove that

limNeΔXNnew=eΔ=e14.\lim_{N\to\infty}\big{\|}e^{-\Delta_{X^{N}}^{\rm new}}\big{\|}=\big{\|}e^{-\Delta_{\mathbb{H}}}\big{\|}=e^{-\frac{1}{4}}.

This leads us to consider the heat operators eΔe^{-\Delta_{\mathbb{H}}} and eΔXNe^{-\Delta_{X^{N}}}.

Recall that eΔe^{-\Delta_{\mathbb{H}}} is an integral operator on L2()L^{2}(\mathbb{H}) with a smooth kernel p(x,y)p_{\mathbb{H}}(x,y). The Laplacian ΔXN\Delta_{X^{N}} on XN=Γ\(×[N])X^{N}=\Gamma\backslash(\mathbb{H}\times[N]) is obtained by restricting the Laplacian on ×[N]\mathbb{H}\times[N] to functions that are invariant under Γ\Gamma. In particular, this implies that eΔXNe^{-\Delta_{X^{N}}} may be viewed as an integral operator on L2(F×[N])L^{2}(F\times[N]) with kernel

pXN((x,i),(y,j))=γΓp(x,γy) 1i=πN(γ)jp_{X^{N}}((x,i),(y,j))=\sum_{\gamma\in\Gamma}p_{\mathbb{H}}(x,\gamma y)\,1_{i=\pi_{N}(\gamma)j}

by parameterizing XNX^{N} as F×[N]F\times[N], where FF is the fundamental domain of the action of Γ\Gamma on \mathbb{H}. See, for example, [17, §3.7] or [70, §2].

In the following, we identify L2(F×[N])L2(F)NL^{2}(F\times[N])\simeq L^{2}(F)\otimes\mathbb{C}^{N}, and denote by aγa_{\gamma} the integral operator on L2(F)L^{2}(F) with kernel p(x,γy)p_{\mathbb{H}}(x,\gamma y). In this notation, the above expression can be rewritten in the more suggestive form

eΔXN\displaystyle e^{-\Delta_{X^{N}}} =γΓaγπN(γ).\displaystyle=\sum_{\gamma\in\Gamma}a_{\gamma}\otimes\pi_{N}(\gamma).
In particular, we have
eΔXNnew\displaystyle e^{-\Delta_{X^{N}}^{\rm new}} =γΓaγπN(γ)|1.\displaystyle=\sum_{\gamma\in\Gamma}a_{\gamma}\otimes\pi_{N}(\gamma)|_{1^{\perp}}.

Since πN\pi_{N} is a homomorphism, each πN(γ)=πN(γi1γik)=Ui1NUikN\pi_{N}(\gamma)=\pi_{N}(\gamma_{i_{1}}\cdots\gamma_{i_{k}})=U_{i_{1}}^{N}\cdots U_{i_{k}}^{N} can be written as a word in the random permutation matrices UiN=πN(γi)U_{i}^{N}=\pi_{N}(\gamma_{i}) associated to the generators γi\gamma_{i} of Γ\Gamma. Thus eΔXNnewe^{-\Delta_{X^{N}}^{\rm new}} is nearly, but not exactly, a noncommutative polynomial of (U1N,,UsN)|1(U_{1}^{N},\ldots,U_{s}^{N})|_{1^{\perp}} with matrix coefficients:

  1. \bullet

    The above sum is over all γΓ\gamma\in\Gamma with no bound on the word length |γ||\gamma|. However, as p(x,y)p_{\mathbb{H}}(x,y) decays rapidly as a function of dist(x,y)\mathrm{dist}_{\mathbb{H}}(x,y) (this can be read off from the explicit expression for p(x,y)p_{\mathbb{H}}(x,y) [69] or from general heat kernel estimates [70]), the size of the coefficients aγ\|a_{\gamma}\| decays rapidly as function of |γ||\gamma|. The infinite sum is therefore well approximated by a finite sum.

  2. \bullet

    The coefficients aγa_{\gamma} are operators rather than matrices. However, when XX is a closed surface, aγa_{\gamma} are compact operators and are therefore well approximated by matrices. (The argument in the case that XX is noncompact requires an additional truncation to remove the cusps; see [69, 103] for details.)

We therefore conclude that eΔXNnewe^{-\Delta_{X^{N}}^{\rm new}} is well approximated in operator norm by a noncommutative polynomial in (U1N,,UsN)|1(U_{1}^{N},\ldots,U_{s}^{N})|_{1^{\perp}} with matrix coefficients. In particular, we can apply strong convergence to conclude that

limNeΔXNnew=b\lim_{N\to\infty}\big{\|}e^{-\Delta_{X^{N}}^{\rm new}}\big{\|}=\|b\|

with

b=γΓaγλΓ(γ).b=\sum_{\gamma\in\Gamma}a_{\gamma}\otimes\lambda_{\Gamma}(\gamma).

It remains to observe that the operator bb is eΔe^{-\Delta_{\mathbb{H}}} in disguise. To see this, note that the map η:F×Γ\eta:F\times\Gamma\to\mathbb{H} defined by η(x,g)=g1x\eta(x,g)=g^{-1}x is a.e. invertible, as the translates of the fundamental domain tile \mathbb{H}. Thus ff~=fηf\mapsto\tilde{f}=f\circ\eta defines an isomorphism L2()L2(F×Γ)L^{2}(\mathbb{H})\simeq L^{2}(F\times\Gamma). We can now readily compute for any fL2()f\in L^{2}(\mathbb{H})

bf~(x,g)=γΓFp(x,γy)f(g1γy)dy=p(g1x,y)f(y)dy=eΔf~(x,g),b\tilde{f}(x,g)=\sum_{\gamma\in\Gamma}\int_{F}p_{\mathbb{H}}(x,\gamma y)\,f(g^{-1}\gamma y)\,dy=\int_{\mathbb{H}}p_{\mathbb{H}}(g^{-1}x,y)\,f(y)\,dy=\widetilde{e^{-\Delta_{\mathbb{H}}}f}(x,g),

where we used that p(g1x,y)=p(x,gy)p_{\mathbb{H}}(g^{-1}x,y)=p_{\mathbb{H}}(x,gy).

Remark 5.5.

There are several variants of the above argument. The original work of Hide and Magee [69] used the resolvent (zΔXN)1(z-\Delta_{X^{N}})^{-1} instead of eΔXNe^{-\Delta_{X^{N}}}. The heat operator approach of Hide–Moy–Naud [70, 103] has the advantage that it extends to surfaces with variable negative curvature by using heat kernel estimates. For hyperbolic surfaces, another variant due to Hide–Macera–Thomas [68] uses a specially designed function hh with the property that h(ΔXN)h(\Delta_{X^{N}}) is already a noncommutative polynomial of U1N,,UsNU_{1}^{N},\ldots,U_{s}^{N} with operator coefficients, avoiding the need to truncate the sum over γ\gamma. The advantage of this approach is that it leads to much better quantitative estimates, since the truncation of the sum is the main source of loss in the previous arguments. Finally, Magee [88] presents a more general perspective that uses the continuity of induced representations under strong convergence.

5.3. Random Schreier graphs

In this section, we take a different perspective on random regular graphs that will lead us in a new direction.

Definition 5.6.

Given σ1,,σr𝐒N\sigma_{1},\ldots,\sigma_{r}\in\mathbf{S}_{N} and an action 𝐒NV\mathbf{S}_{N}\curvearrowright V of the symmetric group on a finite set VV, the Schreier graph Sch(𝐒NV;σ1,,σr)\mathrm{Sch}(\mathbf{S}_{N}\curvearrowright V;\sigma_{1},\ldots,\sigma_{r}) is the 2r2r-regular graph with vertex set VV, where each vertex vVv\in V has neighbors σi(v),σi1(v)\sigma_{i}(v),\sigma_{i}^{-1}(v) for i=1,,ri=1,\ldots,r (allowing for multiple edges and self-loops).

The permutation model of random 2r2r-regular graphs that was introduced in section 1.1 is merely the special case Sch(𝐒N[N];σ1,,σr)\mathrm{Sch}(\mathbf{S}_{N}\curvearrowright[N];\sigma_{1},\ldots,\sigma_{r}) where 𝐒N[N]\mathbf{S}_{N}\curvearrowright[N] is the natural action of permutations of [N][N] on the points of [N][N], and σ1,,σr\sigma_{1},\ldots,\sigma_{r} are independent and uniformly distributed random elements of 𝐒N\mathbf{S}_{N}.

We may however ask what happens if we consider other actions of the symmetric group. Following [51, 30], denote by [N]k[N]_{k} the set of all kk-tuples of distinct elements of [N][N]. Then we obtain the natural action 𝐒N[N]k\mathbf{S}_{N}\curvearrowright[N]_{k} by letting σ\sigma act on each element of the tuple, that is, σ(i1,,ik)=(σ(i1),,σ(ik))\sigma(i_{1},\ldots,i_{k})=(\sigma(i_{1}),\ldots,\sigma(i_{k})). If we again choose σ1,,σr\sigma_{1},\ldots,\sigma_{r} to be i.i.d. uniform random elements of 𝐒N\mathbf{S}_{N}, then

Sch(𝐒N[N]k;σ1,,σr)\mathrm{Sch}(\mathbf{S}_{N}\curvearrowright[N]_{k};\sigma_{1},\ldots,\sigma_{r})

yields a new model of random 2r2r-regular graphs that generalizes the permutation model. The interesting aspect of these graphs is that even though the number of vertices Nk\sim N^{k} grows rapidly as we increase kk, the number of random bits rNlogN\sim rN\log N that generate the graph is fixed independently of kk. We may therefore think of the model as becoming increasingly less random as kk is increased.191919A different, much less explicit approach to derandomization of random graphs from a theoretical computer science perspective may be found in [101, 107].

What is far from obvious is whether the optimal spectral gap of the random graph persists as we increase kk. Let us consider the two extremes.

  1. \bullet

    The case k=1k=1 is the permutation model of random regular graphs, which have an optimal spectral gap by Theorem 1.3.

  2. \bullet

    The case k=Nk=N corresponds to the Cayley graph of 𝐒N\mathbf{S}_{N} with the random generators σ1,,σr\sigma_{1},\ldots,\sigma_{r}, since [N]N𝐒N[N]_{N}\simeq\mathbf{S}_{N}. Whether random Cayley graphs of 𝐒N\mathbf{S}_{N} have an optimal spectral gap is a long-standing question (see section 6.2) that remains wide open: it has not even been shown that the maximum nontrivial eigenvalue is bounded away from the trivial eigenvalue in this setting.

The intermediate values of kk interpolate between these two extremes. In a major improvement over previously known results, Cassidy [30] recently proved that the optimal spectral gap persists in the range kNαk\leq N^{\alpha} for some α<1\alpha<1.

Theorem 5.7.

Denote by AN,kA^{N,k} the adjacency matrix of Sch(𝐒N[N]k;σ1,,σr)\mathrm{Sch}(\mathbf{S}_{N}\curvearrowright[N]_{k};\sigma_{1},\ldots,\sigma_{r}) where σ1,,σr\sigma_{1},\ldots,\sigma_{r} are i.i.d. uniform random elements of 𝐒N\mathbf{S}_{N}. Then

AN,kN|1=22r1+o(1)with probability1o(1)\|A^{N,k_{N}}|_{1^{\perp}}\|=2\sqrt{2r-1}+o(1)\quad\text{with probability}\quad 1-o(1)

as NN\to\infty whenever kNN112δk_{N}\leq N^{\frac{1}{12}-\delta}, for any δ>0\delta>0.

This yields a natural model of random 2r2r-regular graphs with |V||V| vertices that has an optimal spectral gap using only (log|V|)12+δ\sim(\log|V|)^{12+\delta} bits of randomness, as compared to |V|log|V|\sim|V|\log|V| bits for ordinary random regular graphs.

Theorem 5.7 arises from a much more general result about strong convergence of representations of 𝐒N\mathbf{S}_{N}. To motivate this result, note that we can write

AN,k=πN,k(σ1)+πN,k(σ1)++πN,k(σr)+πN,k(σr),A^{N,k}=\pi_{N,k}(\sigma_{1})+\pi_{N,k}(\sigma_{1})^{*}+\cdots+\pi_{N,k}(\sigma_{r})+\pi_{N,k}(\sigma_{r})^{*},

where πN,k:𝐒NM[N]k()\pi_{N,k}:\mathbf{S}_{N}\to\mathrm{M}_{[N]_{k}}(\mathbb{C}) maps σ𝐒N\sigma\in\mathbf{S}_{N} to the permutation matrix defined by its action on [N]k[N]_{k}. Then πN,k\pi_{N,k} is clearly a group representation of 𝐒N\mathbf{S}_{N}, so it decomposes as a direct sum of irreducible representations πNλ\pi_{N}^{\lambda}. Theorem 5.7 now follows from the following result about strong convergence of irreducible representations of 𝐒N\mathbf{S}_{N} that vastly generalizes Theorem 1.4 (which is the special case where πNλ=stdN\pi_{N}^{\lambda}=\mathrm{std}_{N} is the standard representation, so that dim(stdN)=N1\dim(\mathrm{std}_{N})=N-1). For expository purposes, we state the result in a slightly more general form than is given in [30].

Theorem 5.8.

Let 𝛔=(σ1,,σr)\bm{\sigma}=(\sigma_{1},\ldots,\sigma_{r}) be i.i.d. uniform random elements of 𝐒N\mathbf{S}_{N}, and let 𝐮=(u1,,ur)\bm{u}=(u_{1},\ldots,u_{r}) be defined as in Theorem 1.4. Then

limNmax1<dim(πNλ)exp(N112δ)|P(πNλ(𝝈),πNλ(𝝈))P(𝒖,𝒖)|=0\lim_{N\to\infty}\max_{1<\dim(\pi_{N}^{\lambda})\leq\exp(N^{\frac{1}{12}-\delta})}\big{|}\|P(\pi_{N}^{\lambda}(\bm{\sigma}),\pi_{N}^{\lambda}(\bm{\sigma}^{*}))\|-\|P(\bm{u},\bm{u}^{*})\|\big{|}=0

in probability for every δ>0\delta>0, DD\in\mathbb{N}, and PMD()x1,,x2rP\in\mathrm{M}_{D}(\mathbb{C})\otimes\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle, where the maximum is taken over irreducible representations πNλ\pi_{N}^{\lambda} of 𝐒N\mathbf{S}_{N}.

Proof.

The irreducible representations πNλ\pi_{N}^{\lambda} are indexed by Young diagrams λN\lambda\vdash N. The argument in [47, §6] shows that for any 0<δ<δ0<\delta^{\prime}<\delta and sufficiently large NN, every irreducible representation with dim(πNλ)exp(N112δ)\dim(\pi_{N}^{\lambda})\leq\exp(N^{\frac{1}{12}-\delta}) has the property that the first row of either λ\lambda or of the conjugate diagram λ\lambda^{\prime} has length at least NN112δN-N^{\frac{1}{12}-\delta^{\prime}}. In the first case, the conclusion

limNmaxλ1NN112δ|P(πNλ(𝝈),πNλ(𝝈))P(𝒖,𝒖)|=0\lim_{N\to\infty}\max_{\lambda_{1}\geq N-N^{\frac{1}{12}-\delta^{\prime}}}\big{|}\|P(\pi_{N}^{\lambda}(\bm{\sigma}),\pi_{N}^{\lambda}(\bm{\sigma}^{*}))\|-\|P(\bm{u},\bm{u}^{*})\|\big{|}=0 (5.1)

follows from the proof of [30, Theorem 1.9].

On the other hand, as πNλ(σ)=sgn(σ)πNλ(σ)\pi_{N}^{\lambda^{\prime}}(\sigma)=\mathrm{sgn}(\sigma)\pi_{N}^{\lambda}(\sigma) [78, Theorem 6.7], we obtain

limNmaxλ1NN112δ|P(πNλ(𝝈),πNλ(𝝈))P(sgn(𝝈)𝒖,sgn(𝝈)𝒖)|=0\lim_{N\to\infty}\max_{\lambda_{1}\geq N-N^{\frac{1}{12}-\delta^{\prime}}}\big{|}\|P(\pi_{N}^{\lambda^{\prime}}(\bm{\sigma}),\pi_{N}^{\lambda^{\prime}}(\bm{\sigma}^{*}))\|-\|P(\mathrm{sgn}(\bm{\sigma})\bm{u},\mathrm{sgn}(\bm{\sigma})\bm{u}^{*})\|\big{|}=0

using that (5.1) holds uniformly over any finite set of polynomials PP, and thus in particular over the polynomials (𝒖,𝒖)P(𝜺𝒖,𝜺𝒖)(\bm{u},\bm{u}^{*})\mapsto P(\bm{\varepsilon u},\bm{\varepsilon u}^{*}) for all choices of signs 𝜺{1,1}r\bm{\varepsilon}\in\{-1,1\}^{r}. It remains to note that P(sgn(𝝈)𝒖,sgn(𝝈)𝒖)=P(𝒖,𝒖)\|P(\mathrm{sgn}(\bm{\sigma})\bm{u},\mathrm{sgn}(\bm{\sigma})\bm{u}^{*})\|=\|P(\bm{u},\bm{u}^{*})\| by the Fell absorption principle [115, Proposition 8.1]. ∎

The above results are made possible by a marriage of two complementary developments: new representation-theoretic ideas due Cassidy, and the polynomial method for proving strong convergence. Here, we merely give a hint of the underlying phenomenon, and refer to [30] for the details.

Fix λk\lambda\vdash k, and consider the sequence of Young diagrams λ(N)N\lambda(N)\vdash N (for N2kN\geq 2k) so that removing the first row of λ(N)\lambda(N) yields λ\lambda; in particular, the first row has length NkN-k. Then the sequence of representations πNλ(N)\pi_{N}^{\lambda(N)} is called stable [48]. As was the case in section 3, any stable representation has the property that

𝐄[TrπNλ(N)(σw1σwk)]=Ψ𝒘λ(1N)\mathbf{E}\big{[}\mathop{\mathrm{Tr}}\pi_{N}^{\lambda(N)}(\sigma_{w_{1}}\cdots\sigma_{w_{k}})\,\big{]}=\Psi_{\bm{w}}^{\lambda}(\tfrac{1}{N})

is a rational function of 1N\frac{1}{N}. Moreover, as in Corollary 3.5,

𝐄[TrπNλ(N)(σw1σwk)]=O(1N)\mathbf{E}\big{[}\mathop{\mathrm{Tr}}\pi_{N}^{\lambda(N)}(\sigma_{w_{1}}\cdots\sigma_{w_{k}})\,\big{]}=O\bigg{(}\frac{1}{N}\bigg{)} (5.2)

if gw1gwrg_{w_{1}}\cdots g_{w_{r}} is a non-power, where g1,,grg_{1},\ldots,g_{r} are free generators of 𝐅r\mathbf{F}_{r}; see [63]. These facts already suffice, by the polynomial method, for proving a form of Theorem 5.8 that applies to representations of polynomial dimension dim(πNλ)Nk\dim(\pi_{N}^{\lambda})\leq N^{k} for any fixed kk [32]. This falls far short, however, of Theorem 5.8.

The key new ingredient that is developed in [30] is a major improvement of (5.2): when gw1gwrg_{w_{1}}\cdots g_{w_{r}} is a non-power, it turns out that in fact

𝐄[TrπNλ(N)(σw1σwk)]=O(1dim(πNλ(N))).\mathbf{E}\big{[}\mathop{\mathrm{Tr}}\pi_{N}^{\lambda(N)}(\sigma_{w_{1}}\cdots\sigma_{w_{k}})\,\big{]}=O\bigg{(}\frac{1}{\dim\big{(}\pi_{N}^{\lambda(N)}\big{)}}\bigg{)}.

The surprising aspect of this bound is that it exhibits more cancellation as the dimension of the representation increases—contrary to what one may expect, since the model becomes “less random”. This phenomenon therefore captures a kind of pseudorandomness in high-dimensional representations. This is achieved in [30] by combining a new representation of the stable characters of 𝐒N\mathbf{S}_{N} with ideas from low-dimensional topology. The improved estimate makes it possible to Taylor expand the rational function Ψ𝒘λ\Psi_{\bm{w}}^{\lambda} to much higher order in the polynomial method, enabling it to reach representations of quasi-exponential dimension.

Taken more broadly, high dimensional representations of finite and matrix groups form a natural setting for the study of strong convergence and give rise to many interesting questions. For the unitary group U(N)U(N), strong convergence was established earlier by Bordenave and Collins [21] for representations of polynomial dimension NkN^{k}, and by Magee and de la Salle [90] for representations of quasi-exponential dimension exp(Nα)\exp(N^{\alpha}) (further improved in [33] using complementary ideas). On the other hand it is a folklore conjecture (see, e.g., [119, Conjecture 1.6]) that any sequence of representations of 𝐒N\mathbf{S}_{N} of diverging dimension should give rise to optimal spectral gaps; Theorem 5.8 is at present the best known result in this direction. Analogous questions for finite simple groups of Lie type remain entirely open.

5.4. The Peterson–Thom conjecture

In this section, we discuss a very different application of strong convergence to the theory of von Neumann algebras, which has motivated many recent works in this area.

Recall that a von Neumann algebra is defined as a unital CC^{*}-algebra, but is closed in the strong operator topology rather than the operator norm topology; see Remark 2.5. An important example is the free group factor

L(𝐅r)=clSOT(span{λ(g):g𝐅r}),L(\mathbf{F}_{r})=\mathrm{cl}_{\rm SOT}\big{(}\mathop{\mathrm{span}}\{\lambda(g):g\in\mathbf{F}_{r}\}\big{)},

i.e., the closure of Cred(𝐅r)C^{*}_{\rm red}(\mathbf{F}_{r}) in the strong operator topology. Von Neumann algebras are much “bigger” than CC^{*}-algebras and thus much less well understood; for example, it is not even known whether or not L(𝐅r)L(\mathbf{F}_{r}) and L(𝐅s)L(\mathbf{F}_{s}) are isomorphic for rsr\neq s, which is one of the major open problems in this area.

However, the subclass of amenable von Neumann algebras—the counterpart in this context of the notion of an amenable group—is very well understood due to the work of Connes [43]. For example, amenable von Neumann algebras can be characterized as those that are approximately finite dimensional, i.e., the closure in the strong operator topology of an increasing net of matrix algebras. It is therefore natural to try to gain a better understanding of non-amenable von Neumann algebras such as L(𝐅r)L(\mathbf{F}_{r}) by studying the collection of its amenable subalgebras. The following conjecture of Peterson and Thom [113]—now a theorem due to the works to be discussed below—is in this spirit: it states that two distinct maximal amenable subalgebras of L(𝐅r)L(\mathbf{F}_{r}) cannot have a too large overlap.

Theorem 5.9 (Peterson–Thom conjecture).

Let r2r\geq 2. If M1M_{1} and M2M_{2} are distinct maximal amenable von Neumann subalgebras of L(𝐅r)L(\mathbf{F}_{r}), then M1M2M_{1}\cap M_{2} is not diffuse.

A von Neumann algebra is called diffuse if it has no minimal projection. Being non-diffuse is a strong constraint: if MM is not diffuse, then the spectral distribution μa\mu_{a} of every self-adjoint aMa\in M must have an atom. (Here and below, we always compute laws with respect to the canonical trace τ\tau on L(𝐅r)L(\mathbf{F}_{r}).)

Example 5.10.

Let MiM_{i} be the von Neumann subalgebra of L(𝐅2)L(\mathbf{F}_{2}) generated by λ(gi)\lambda(g_{i}), where g1,g2g_{1},g_{2} are free generators of 𝐅2\mathbf{F}_{2}. Then M1,M2L()M_{1},M_{2}\simeq L(\mathbb{Z}) are maximal amenable, but M1M2M_{1}\cap M_{2} is trivial and thus certainly not diffuse.

The affirmative solution of the Peterson-Thom conjecture was made possible by the work of Hayes [65], who in fact provides a much stronger result. For every von Neumann subalgebra ML(𝐅r)M\leq L(\mathbf{F}_{r}), Hayes defines a quantity h(M:L(𝐅r))h(M:L(\mathbf{F}_{r})) called the 11-bounded entropy in the presence of L(𝐅r)L(\mathbf{F}_{r}), see [66, §2.2 and Appendix], that satisfies h(M:L(𝐅r))0h(M:L(\mathbf{F}_{r}))\geq 0 for every MM and h(M:L(𝐅r))=0h(M:L(\mathbf{F}_{r}))=0 if MM is amenable. Hayes’ main result is that the converse of this property also holds—thus providing an entropic characterization of amenable subalgebras of L(𝐅r)L(\mathbf{F}_{r}).

Theorem 5.11 (Hayes).

ML(𝐅r)M\leq L(\mathbf{F}_{r}) is amenable if and only if h(M:L(𝐅r))=0h(M:L(\mathbf{F}_{r}))=0.

Theorem 5.9 follows immediately from Theorem 5.11 using the following subadditivity property of the 11-bounded entropy [66, §2.2]:

h(M1M2:L(𝐅r))h(M1:L(𝐅r))+h(M2:L(𝐅r))h(M_{1}\vee M_{2}:L(\mathbf{F}_{r}))\leq h(M_{1}:L(\mathbf{F}_{r}))+h(M_{2}:L(\mathbf{F}_{r}))

whenever M1M2M_{1}\cap M_{2} is diffuse, where M1M2M_{1}\vee M_{2} is the von Neumann algebra generated by M1,M2M_{1},M_{2}. Indeed, it follows that if M1M2M_{1}\neq M_{2} are amenable and M1M2M_{1}\cap M_{2} is diffuse then M1M2M_{1}\vee M_{2} is amenable, so M1,M2M_{1},M_{2} cannot be maximal amenable.

Theorem 5.11 is not stated as such in [65]. The key insight of Hayes was that the validity of Theorem 5.11 can be reduced (in a highly nontrivial fashion) to proving strong convergence of a certain random matrix model. This problem was outside the reach of the methods that were available when [65] was written, and thus Theorem 5.11 was given there as a conditional statement. Hayes’ work strongly influenced new developments on the random matrix side, and the requisite strong convergence has now been proved by several approaches [15, 20, 90, 111, 33]. This has in turn not only completed the proofs of Theorems 5.9 and 5.11, but also led to new developments on the operator algebras side [66].

In the remainder of this section, we aim to discuss the relevant strong convergence problem, and to give a hint as to how it gives rise to Theorem 5.11.

5.4.1. Tensor models

Let 𝑼N=(U1N,,UrN)\bm{U}^{N}=(U_{1}^{N},\ldots,U_{r}^{N}) be independent Haar-distributed random unitary matrices of dimension NN, and let 𝒖=(u1,,ur)\bm{u}=(u_{1},\ldots,u_{r}) be the standard generators of L(𝐅r)L(\mathbf{F}_{r}) as defined in section 1.1. That 𝑼N\bm{U}^{N} strongly converges to 𝒖\bm{u} is a consequence of the Haagerup-Thorbjørnsen theorem for GUE matrices, as was shown by Collins and Male [41]. The basic question posed by Hayes is whether strong convergence continues to hold if we consider the tensor product of two independent copies of this model. More precisely:

Question.

Let 𝑼~N\bm{\tilde{U}}^{N} be an independent copy of 𝑼N\bm{U}^{N}. Is it true that the family

(𝑼N𝟏,𝟏𝑼~N)\displaystyle(\bm{U}^{N}\otimes\mathbf{1},~{}\mathbf{1}\otimes\bm{\tilde{U}}^{N}) =(U1N𝟏,,UrN𝟏,𝟏U~1N,,𝟏U~rN)\displaystyle=(U_{1}^{N}\otimes\mathbf{1},~{}\ldots,~{}U_{r}^{N}\otimes\mathbf{1},~{}\mathbf{1}\otimes\tilde{U}_{1}^{N},~{}\ldots,~{}\mathbf{1}\otimes\tilde{U}_{r}^{N})
of random unitaries of dimension N2N^{2} converges strongly to
(𝒖𝟏,𝟏𝒖)\displaystyle(\bm{u}\otimes\mathbf{1},~{}\mathbf{1}\otimes\bm{u}) =(u1𝟏,,ur𝟏,𝟏u1,,𝟏ur)\displaystyle=(u_{1}\otimes\mathbf{1},~{}\ldots,~{}u_{r}\otimes\mathbf{1},~{}\mathbf{1}\otimes u_{1},~{}\ldots,~{}\mathbf{1}\otimes u_{r})

as NN\to\infty?202020Recall that we always denote by =min\otimes=\otimes_{\rm min} the minimal tensor product, see section 2.4. (Alternatively, one may replace 𝑼N,𝑼~N\bm{U}^{N},\bm{\tilde{U}}^{N} by independent GUE matrices and 𝒖\bm{u} by a free semicircular family.)

The main result of Hayes [65, Theorem 1.1] states that an affirmative answer to this question implies the validity of Theorem 5.11.

Because 𝑼N\bm{U}^{N} and 𝑼~N\bm{\tilde{U}}^{N} are independent, it is natural to attempt to apply strong convergence of each copy separately. To this end, note that for any noncommutative polynomial Px1,,x4rP\in\mathbb{C}\langle x_{1},\ldots,x_{4r}\rangle, we can write

P(𝑼N𝟏,𝑼N𝟏,𝟏𝑼~N,𝟏𝑼~N)=PN(𝑼N,𝑼N)P\big{(}\bm{U}^{N}\otimes\mathbf{1},~{}\bm{U}^{N*}\otimes\mathbf{1},~{}\mathbf{1}\otimes\bm{\tilde{U}}^{N},~{}\mathbf{1}\otimes\bm{\tilde{U}}^{N*}\big{)}=P_{N}\big{(}\bm{U}^{N},\bm{U}^{N*}\big{)}

where PNMN()x1,,x2rP_{N}\in\mathrm{M}_{N}(\mathbb{C})\otimes\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle is a noncommutative polynomial with matrix coefficients of dimension NN that depend only on 𝑼~N\bm{\tilde{U}}^{N}. We can now condition on 𝑼~N\bm{\tilde{U}}^{N} and think of PNP_{N} as a determinstic polynomial with matrix coefficients. In particular, one may hope to use strong convergence of 𝑼N\bm{U}^{N} to 𝒖\bm{u} to show that

PN(𝑼N,𝑼N)=?(1+o(1))PN(𝒖,𝒖)\big{\|}P_{N}\big{(}\bm{U}^{N},\bm{U}^{N*}\big{)}\big{\|}\stackrel{{\scriptstyle?}}{{=}}(1+o(1))\|P_{N}(\bm{u},\bm{u}^{*})\| (5.3)

as NN\to\infty. If (5.3) holds, then the proof of strong convergence of the tensor model is readily completed. Indeed, we may now write PN(𝒖,𝒖)=Q(𝑼~N,𝑼~N)P_{N}(\bm{u},\bm{u}^{*})=Q(\bm{\tilde{U}}^{N},\bm{\tilde{U}}^{N*}) where QCred(𝐅r)x1,,x2rQ\in C^{*}_{\rm red}(\mathbf{F}_{r})\otimes\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle is a polynomial with operator coefficients that depend only on 𝒖\bm{u}. Since Cred(𝐅r)C^{*}_{\rm red}(\mathbf{F}_{r}) is exact, Lemma 2.18 yields

Q(𝑼~N,𝑼~N)=(1+o(1))Q(𝒖,𝒖)\big{\|}Q\big{(}\bm{\tilde{U}}^{N},\bm{\tilde{U}}^{N*}\big{)}\big{\|}=(1+o(1))\|Q(\bm{u},\bm{u}^{*})\|

as NN\to\infty. Finally, as

Q(𝒖,𝒖)=P(𝒖𝟏,𝒖𝟏,𝟏𝒖,𝟏𝒖),Q(\bm{u},\bm{u}^{*})=P(\bm{u}\otimes\mathbf{1},~{}\bm{u}^{*}\otimes\mathbf{1},~{}\mathbf{1}\otimes\bm{u},~{}\mathbf{1}\otimes\bm{u}^{*}),

the desired strong convergence property is established.

This argument reduces the question of strong convergence of the tensor product of two independent families of random unitaries to a question about strong convergence (5.3) of a single family of random unitaries for polynomials with matrix coefficients. The latter is far from obvious, however. While norm convergence of any fixed polynomial PP with matrix coefficients is an automatic consequence of strong convergence of 𝑼N\bm{U}^{N} (Lemma 2.16), here the polynomial PNMDN()x1,,x2rP_{N}\in\mathrm{M}_{D_{N}}(\mathbb{C})\otimes\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle and the dimension DND_{N} of the matrix coefficients changes with NN. This cannot follow from strong convergence alone, but may be obtained if the proof of strong convergence provides sufficiently strong quantitative estimates.

The question of strong convergence of polynomials PNP_{N} with matrix coefficients of increasing dimension DND_{N} was first raised by Pisier [116] in his study of subexponential operator spaces. Pisier noted that (5.3) can fail for matrix coefficients of dimension DNeCN2D_{N}\geq e^{CN^{2}} (see [33, Appendix A]); while a careful inspection of the quantitative estimates in the strong convergence proof of Haagerup–Thorbjørnsen yields that (5.3) holds for matrix coefficients of dimension DN=o(N1/4)D_{N}=o(N^{1/4}). This leaves a huge gap between the upper and lower bound, and in particular excludes the case DN=ND_{N}=N that is required to prove Theorem 5.11.

Recent advances in strong convergence have led to a greatly improved understanding of this problem by means of several independent methods [20, 90, 111, 33], all of which suffice to complete the proof of Theorem 5.11. The best result to date, obtained by the polynomial method [33], is that strong convergence in the GUE and Haar unitary models remains valid for matrix coefficients of dimension DN=eo(N)D_{N}=e^{o(N)}. Let us briefly sketch how this is achieved.

The arguments that we developed in section 3 for random permutation matrices can be applied in a very similar manner to random unitary matrices. In particular, one obtains as in the proof of Proposition 3.2 an estimate of the form

|𝐄[trh(P(𝑼N,𝑼N))]ν0(h)ν1(h)N|CN2hC[K,K].\bigg{|}\mathbf{E}\big{[}\mathop{\mathrm{tr}}h(P(\bm{U}^{N},\bm{U}^{N*}))\,\big{]}-\nu_{0}(h)-\frac{\nu_{1}(h)}{N}\bigg{|}\leq\frac{C}{N^{2}}\|h\|_{C^{\ell}[-K,K]}.

Here PP is any noncommutative polynomial with matrix coefficients of dimension DD, \ell is an absolute constant, CC is a constant that depends only on the degree of PP, and ν0,ν1\nu_{0},\nu_{1} are Schwartz distributions that are supported in [P(𝒖,𝒖),P(𝒖,𝒖)][-\|P(\bm{u},\bm{u}^{*})\|,\|P(\bm{u},\bm{u}^{*})\|]. If we choose a test function hh that vanishes in the latter interval, we obtain

|𝐄[Trh(P(𝑼N,𝑼N))]|CDNhC[K,K]\bigg{|}\mathbf{E}\big{[}\mathop{\mathrm{Tr}}h(P(\bm{U}^{N},\bm{U}^{N*}))\,\big{]}\bigg{|}\leq\frac{CD}{N}\|h\|_{C^{\ell}[-K,K]}

as P(𝑼N,𝑼N)P(\bm{U}^{N},\bm{U}^{N*}) has dimension DNDN. Repeating the proof of Theorem 1.4 now yields strong convergence whenever the right-hand side is o(1)o(1), that is, for D=o(N)D=o(N). This does not suffice to prove the Peterson-Thom conjecture.

The above estimate was obtained by Taylor expanding the rational function in the polynomial method to first order. Nothing is preventing us, however, from expanding to higher order mm; then a very similar argument yields

|𝐄[trh(P(𝑼N,𝑼N))]k=0mνk(h)Nk|C(m)Nm+1hC(m)[K,K]\bigg{|}\mathbf{E}\big{[}\mathop{\mathrm{tr}}h(P(\bm{U}^{N},\bm{U}^{N*}))\,\big{]}-\sum_{k=0}^{m}\frac{\nu_{k}(h)}{N^{k}}\bigg{|}\leq\frac{C(m)}{N^{m+1}}\|h\|_{C^{\ell(m)}[-K,K]}

where all νi\nu_{i} are Schwartz distributions. The new ingredient that now arises is that we must show that the support of each νi\nu_{i} is included in [P(𝒖,𝒖),P(𝒖,𝒖)][-\|P(\bm{u},\bm{u}^{*})\|,\|P(\bm{u},\bm{u}^{*})\|]. Surprisingly, a very simple technique that is developed in [33] (see also [110, 109]) shows that this property follows automatically in the present setting from concentration of measure. This yields strong convergence for D=o(Nm)D=o(N^{m}) for any mm\in\mathbb{N}. Reaching D=eo(N)D=e^{o(N)} is harder and requires several additional ideas.

5.4.2. Some ideas behind the reduction

In the remainder of this section, we aim to give a hint as to how the purely operator-algebraic statement of Theorem 5.11 is reduced to a strong convergence problem. Since we cannot do justice to the details of the argument within the scope of this survey, we must content ourselves with an impressionistic sketch. From now on, we fix a nonamenable ML(𝐅r)M\leq L(\mathbf{F}_{r}) with h(M:L(𝐅r))=0h(M:L(\mathbf{F}_{r}))=0, and aim to prove a contradiction.

The starting point for the proof is the following theorem of Haagerup and Connes [58, Lemma 2.2] that provides a spectral characterization of amenability.

Theorem 5.12 (Haagerup–Connes).

A tracial von Neumann algebra (M,τ)(M,\tau) is nonamenable if and only if there is a nontrivial projection qMq\in M that commutes with every element of MM, and unitaries v1,,vrMv_{1},\ldots,v_{r}\in M, so that hi=qvih_{i}=qv_{i} satisfy

1ri=1rhihi¯<1.\Bigg{\|}\frac{1}{r}\sum_{i=1}^{r}h_{i}\otimes\overline{h_{i}}\Bigg{\|}<1.

Here x¯B(H¯)\bar{x}\in B(\bar{H}) denotes the complex conjugate of an operator xB(H)x\in B(H).212121More concretely, if xMN()x\in\mathrm{M}_{N}(\mathbb{C}) is a matrix, then it conjugate x¯\bar{x} may be identified with the elementwise complex conjugate of xx; while if xx is a polynomial P(𝐮,𝐮)P(\bm{u},\bm{u}^{*}) in the standard generators ui=λ(gi)u_{i}=\lambda(g_{i}) of L(𝐅r)L(\mathbf{F}_{r}), then its conjugate x¯\bar{x} may be identified with the polynomial P¯(𝐮,𝐮)\bar{P}(\bm{u},\bm{u}^{*}) where the coefficients of P¯\bar{P} are the complex conjugates of the coefficients of PP.

The above spectral property is very much false for matrices: if Hi=QViH_{i}=QV_{i} where V1,,VrV_{1},\ldots,V_{r} are unitary matrices and QQ is nontrivial projection that commutes with them, and we define the unit norm vector z=(TrQ)1/2k,lQklekelz=(\mathop{\mathrm{Tr}}Q)^{-1/2}\sum_{k,l}Q_{kl}\,e_{k}\otimes e_{l}, then

z,(1ri=1rHiHi¯)z=1ri=1rTrQHiQHiTrQ=1.\Bigg{\langle}z,\Bigg{(}\frac{1}{r}\sum_{i=1}^{r}H_{i}\otimes\overline{H_{i}}\Bigg{)}z\bigg{\rangle}=\frac{1}{r}\sum_{i=1}^{r}\frac{\mathop{\mathrm{Tr}}QH_{i}QH_{i}^{*}}{\mathop{\mathrm{Tr}}Q}=1. (5.4)

Of course, this just shows that MN()\mathrm{M}_{N}(\mathbb{C}) is amenable.

Since we assumed that MM is nonamenable, we can choose h1,,hrMh_{1},\ldots,h_{r}\in M as in Theorem 5.12. To simplify the discussion, let us suppose that hi=Pi(𝒖,𝒖)h_{i}=P_{i}(\bm{u},\bm{u}^{*}) are polynomials of the standard generators 𝒖\bm{u} of L(𝐅r)L(\mathbf{F}_{r}): this clearly need not be true in general, and we will return to this issue at the end of this section. Let

HiN=Pi(𝑼N,𝑼N),H~iN=Pi(𝑼~N,𝑼~N).H_{i}^{N}=P_{i}(\bm{U}^{N},\bm{U}^{N*}),\qquad\quad\tilde{H}_{i}^{N}=P_{i}(\bm{\tilde{U}}^{N},\bm{\tilde{U}}^{N*}).

Then strong convergence of (𝑼N𝟏,𝟏𝑼~N)(\bm{U}^{N}\otimes\mathbf{1},~{}\mathbf{1}\otimes\bm{\tilde{U}}^{N}) implies that there exists δ>0\delta>0 with

1ri=1rHiNH~iN¯1δ\Bigg{\|}\frac{1}{r}\sum_{i=1}^{r}H_{i}^{N}\otimes\overline{\tilde{H}_{i}^{N}}\Bigg{\|}\leq 1-\delta (5.5)

with probability 1o(1)1-o(1) as NN\to\infty. The crux of the proof is now to show that h(M:L(𝐅r))=0h(M:L(\mathbf{F}_{r}))=0 implies “microstate collapse”: with high probability, there is a unitary matrix VV so that HiNVH~iNVH_{i}^{N}\approx V\tilde{H}_{i}^{N}V^{*} for all ii. Thus (5.5) contradicts (5.4), and we have achieved the desired conclusion.

We now aim to explain the origin of microstates collapse without giving a precise definition of h(M:L(𝐅r))h(M:L(\mathbf{F}_{r})). Roughly speaking, h(M:L(𝐅r))h(M:L(\mathbf{F}_{r})) measures the growth rate as NN\to\infty of the metric entropy with respect to the metric

dorb(𝑨N,𝑩N)=infVU(N)(i=1rtr|AiNVBiNV|2)1/2d^{\mathrm{orb}}(\bm{A}^{N},\bm{B}^{N})=\inf_{V\in U(N)}\Bigg{(}\sum_{i=1}^{r}\mathop{\mathrm{tr}}|A_{i}^{N}-VB_{i}^{N}V^{*}|^{2}\Bigg{)}^{1/2}

of the set of families 𝑨N=(A1N,,ArN)\bm{A}^{N}=(A_{1}^{N},\ldots,A_{r}^{N}) of NN-dimensional matrices whose law lies in a weak neighborhood of the law of 𝒉=(h1,,hr)\bm{h}=(h_{1},\ldots,h_{r}) (recall that the notion of a law was defined in the proof of Lemma 2.13; in particular, weak convergence of laws is equivalent to weak convergence of matrices). As 𝑯N\bm{H}^{N} converges weakly to 𝒉\bm{h}, the following is essentially a consequence of the definition: if h(M:L(𝐅r))=0h(M:L(\mathbf{F}_{r}))=0, then for all NN sufficiently large, there is a set ΩN(MN())r\Omega^{N}\subset(\mathrm{M}_{N}(\mathbb{C}))^{r} so that

𝐏[𝑯NΩN]=1o(1)\mathbf{P}\big{[}\bm{H}^{N}\in\Omega^{N}\big{]}=1-o(1)

and ΩN\Omega^{N} can be covered by eo(N2)e^{o(N^{2})} balls of radius o(1)o(1) in the metric dorbd^{\rm orb}. In particular, this implies that at least one of these balls must have probability greater than eo(N2)e^{-o(N^{2})}; in other words, there exist nonrandom 𝑨N\bm{A}^{N} so that

𝐏[dorb(𝑯N,𝑨N)=o(1)]eo(N2).\mathbf{P}\big{[}d^{\rm orb}(\bm{H}^{N},\bm{A}^{N})=o(1)\big{]}\geq e^{-o(N^{2})}.

We now conclude by a beautiful application of the concentration of measure phenomenon [83], which states in the present context that for any set Ω\Omega such that 𝐏[𝑯NΩ]eCε2N2\mathbf{P}[\bm{H}^{N}\in\Omega]\geq e^{-C\varepsilon^{2}N^{2}}, taking an ε\varepsilon-neighborhood Ωε\Omega_{\varepsilon} of Ω\Omega with respect to the metric dorbd^{\rm orb} yields 𝐏[𝑯NΩε]1eCε2N2\mathbf{P}[\bm{H}^{N}\in\Omega_{\varepsilon}]\geq 1-e^{-C\varepsilon^{2}N^{2}}. Thus we finally obtain

𝐏[dorb(𝑯N,𝑨N)=o(1)]=1o(1).\mathbf{P}\big{[}d^{\rm orb}(\bm{H}^{N},\bm{A}^{N})=o(1)\big{]}=1-o(1).

Since 𝑯~N\bm{\tilde{H}}^{N} is an independent copy of 𝑯N\bm{H}^{N} and thus satisfies the same property, it follows that dorb(𝑯N,𝑯~N)=o(1)d^{\rm orb}(\bm{H}^{N},\bm{\tilde{H}}^{N})=o(1) with probability 1o(1)1-o(1).

While we have overlooked many details in the above sketch of the proof, we made one simplification that is especially problematic: we assumed that hih_{i} are polynomials of the standard generators 𝒖\bm{u}. In general, however, all we know is that hih_{i} can be approximated by such polynomials in the strong operator topology. This does not suffice for our purposes, since such an approximation need not preserve the conclusion of Theorem 5.12 on the norm of (tensor products of) hih_{i}. Indeed, from a broader perspective, it seems surprising that strong convergence has anything meaningful to say about the von Neumann algebra L(𝐅r)L(\mathbf{F}_{r}): strong convergence is a statement about norms of polynomials, so it would appear that it should not provide any meaningful information on objects that live outside the norm-closure Cred(𝐅r)C^{*}_{\rm red}(\mathbf{F}_{r}) of the set of polynomials of the standard generators.

This issue is surmounted in [65] by using that any given h1,,hrL(𝐅r)h_{1},\ldots,h_{r}\in L(\mathbf{F}_{r}) can be approximated by φk(h1),,φk(hr)Cred(𝐅r)\varphi_{k}(h_{1}),\ldots,\varphi_{k}(h_{r})\in C^{*}_{\rm red}(\mathbf{F}_{r}) in a special way: not only do φk(hi)hi\varphi_{k}(h_{i})\to h_{i} in the strong operator topology, but in addition φk\varphi_{k} are contractive completely positive maps (this uses exactness of Cred(𝐅r)C^{*}_{\rm red}(\mathbf{F}_{r})). Consequently, even though the approximation does not preserve the norm, it preserves the upper bound on the norm that appears in Theorem 5.12. Since only the upper bound is needed in the proof, this suffices to make the rest of the argument work.

5.5. Minimal surfaces

We finally discuss yet another unexpected application of strong convergence to the theory of minimal surfaces.

An immersed surface XX in a Riemannian manifold MM is called a minimal surface if it is a critical point (or, what is equivalent in this case, a local minimizer) of the area under compact perturbations; think of a soap film. Minimal surfaces have fascinated mathematicians since the 18th century and are a major research topic in geometric analysis; see [98, 38] for an introduction.

We will use a slightly more general notion of a minimal surface that need only be immersed outside a set of isolated branch points (at which the surface can self-intersect locally), cf. [56]. These objects, called branched minimal surfaces, arise naturally when taking limits of immersed minimal surfaces. For simplicity we will take “minimal surface” to mean a branched minimal surface.

A basic question one may ask is how the geometry of a minimal surface is constrained by that of the manifold it sits in. For example, a question in this spirit is: can an NN-dimensional sphere—a manifold with constant positive curvature—contain a minimal surface that has constant negative curvature? It was shown by Bryant [26] that the answer is no. Thus the following result of Song [122], which shows the answer is “almost” yes in high dimension, appears rather surprising.

Theorem 5.13 (Song).

There exist closed minimal surfaces XjX_{j} in Euclidean unit spheres 𝕊Nj\mathbb{S}^{N_{j}} so that the Gaussian curvature KjK_{j} of XjX_{j} satisfies

limj1Area(Xj)Xj|Kj+8|=0.\lim_{j\to\infty}\frac{1}{\mathrm{Area}(X_{j})}\int_{X_{j}}|K_{j}+8|=0.

The minimal surfaces in this theorem arise from a random construction: one finds, by a variational argument, a sequence of minimal surfaces in finite-dimensional spheres that are symmetric under the action of a set of random rotations. Strong convergence is applied in the analysis in a non-obvious manner to understand the limiting behavior of these surfaces.

In the remainder of this section, we aim to give an impressionistic sketch of some of the ingredients of the proof of Theorem 5.13. Our primary aim is to give a hint of the role that strong convergence plays in the proof.

5.5.1. Harmonic maps

We must first recall the connection between minimal surfaces and harmonic maps. If f:XMf:X\to M is a map from a Riemann surface XX to a Riemannian manifold MM, its Dirichlet energy is defined by

E(f)=12X|df|2.\mathrm{E}(f)=\frac{1}{2}\int_{X}|df|^{2}.

A critical point of the energy is called a harmonic map. If ff is weakly conformal (i.e., conformal away from branch points), then E(f)\mathrm{E}(f) coincides with the area of the surface f(X)f(X) in MM. Thus a weakly conformal map ff is harmonic if and only if f(X)f(X) is a minimal surface in MM. See, e.g., [102, §4.2.1].

This viewpoint yields a variational method for constructing minimal surfaces. Clearly any minimizer of the energy is, by definition, a harmonic map. In general, such a map is not guaranteed to be weakly conformal. However, this will be the case if we take XX to be a surface with a unique conformal class—the thrice punctured sphere—and then a minimizer of E(f)\mathrm{E}(f) automatically defines a minimal surface f(X)f(X) in MM. We will make this choice of XX from now on.222222More generally, one obtains minimal surfaces by minimizing the energy both with respect to the map ff and with respect to the conformal class of XX; see [102, Theorem 4.8.6].

The construction in [122] uses a variant of the variational method which produces minimal surfaces that have many symmetries. Let us write X=Γ\X=\Gamma\backslash\mathbb{H}, and consider a unitary representation πN:ΓU(N)\pi_{N}:\Gamma\to U(N) with finite range |πN(Γ)|<|\pi_{N}(\Gamma)|<\infty which we view as acting on the unit sphere 𝕊2N1\mathbb{S}^{2N-1} of N\mathbb{C}^{N} with its standard Euclidean metric. The following variational problem is considered in [122]:

E(X,πN)=inf{12F|df|2;f:𝕊2N1 is πN-equivariant},\mathrm{E}(X,\pi_{N})=\inf\bigg{\{}\frac{1}{2}\int_{F}|df|^{2};~{}f:\mathbb{H}\to\mathbb{S}^{2N-1}\text{ is }\pi_{N}\text{-equivariant}\bigg{\}},

where FF is the fundamental domain of the action of Γ\Gamma on \mathbb{H}. To interpret this variational problem, note that a πN\pi_{N}-equivariant map f:𝕊2N1f:\mathbb{H}\to\mathbb{S}^{2N-1} can be identified with a map f:XN𝕊2N1f:X^{N}\to\mathbb{S}^{2N-1} on the surface XN=ΓN\X^{N}=\Gamma_{N}\backslash\mathbb{H}, where232323As πN\pi_{N} has finite range, ΓN\Gamma_{N} is a finite index subgroup of Γ\Gamma and thus XNX^{N} is a finite cover of XX. This construction of covering spaces is different than the one considered in section 5.2.

ΓN=kerπN={γΓ:πN(γ)=1}.\Gamma_{N}=\ker\pi_{N}=\{\gamma\in\Gamma:\pi_{N}(\gamma)=1\}.

Since a minimizer fNf_{N} in E(X,πN)\mathrm{E}(X,\pi_{N}) minimizes the Dirichlet energy, it defines242424Even though XNX^{N} has punctures, taking the closure of fN(XN)f_{N}(X^{N}) yields a closed surface. This is a nontrivial property of harmonic maps, see [102, §4.6.4]. a minimal surface fN(XN)f_{N}(X^{N}) in 𝕊2N1\mathbb{S}^{2N-1} that has many symmetries (it contains many rotated copies of the image fN(F)f_{N}(F) of the fundamental domain).

Once a minimizer fNf_{N} has been chosen for every NN, we can take NN\to\infty to obtain a limiting object. Indeed, if we embed each 𝕊2N1\mathbb{S}^{2N-1} in the unit sphere 𝕊\mathbb{S}^{\infty} of an infinite-dimensional Hilbert space HH, we can view all fN:𝕊f_{N}:\mathbb{H}\to\mathbb{S}^{\infty} on the same footing. Then the properties of harmonic maps furnish enough compactness to ensure that fNf_{N} converges along a subsequence to a limiting map f:𝕊f_{\infty}:\mathbb{H}\to\mathbb{S}^{\infty}, which is π\pi_{\infty}-equivariant for some unitary representation π:ΓB(H)\pi_{\infty}:\Gamma\to B(H).

5.5.2. An infinite-dimensional model

So far, it is not at all clear why choosing our energy-minimizing maps to have many symmetries helps our cause. The reason is that certain equivariant maps into the infinite-dimensional sphere 𝕊\mathbb{S}^{\infty} turn out to have remarkable properties, which will make it possible to realize them as the limit ff_{\infty} of the finite-dimensional minimal surfaces constructed above.

Recall that minimal surfaces in the spheres 𝕊2N1\mathbb{S}^{2N-1} cannot have constant negative curvature. The situation is very different, however, in infinite dimension: one can isometrically embed the hyperbolic plane \mathbb{H} in the Hilbert sphere 𝕊\mathbb{S}^{\infty} by means of an energy-minimizing map. What is more surprising is that this phenomenon is very rigid: any energy-minimizing map φ:𝕊\varphi:\mathbb{H}\to\mathbb{S}^{\infty} that is equivariant with respect to a certain class of representations is necessarily an isometry.

More precisely, we have the following [122, Corollary 2.4]. Here two unitary representations ρ1:ΓB(H1)\rho_{1}:\Gamma\to B(H_{1}) and ρ2:ΓB(H2)\rho_{2}:\Gamma\to B(H_{2}) are said to be weakly equivalent if any matrix element of ρ1\rho_{1} can be approximated uniformly on compacts by finite linear combinations of matrix elements of ρ2\rho_{2}, and vice versa.

Theorem 5.14.

Let ρ:ΓB(H)\rho:\Gamma\to B(H) be a unitary representation of Γ\Gamma that is weakly equivalent to the regular representation λΓ\lambda_{\Gamma}. Then any ρ\rho-equivariant energy-minimizing map φ:𝕊\varphi:\mathbb{H}\to\mathbb{S}^{\infty} must satisfy φg𝕊=18ghyp\varphi^{*}g_{\mathbb{S}^{\infty}}=\frac{1}{8}g_{\rm hyp}, where ghypg_{\rm hyp} denotes the hyperbolic metric on \mathbb{H} (so 18ghyp\frac{1}{8}g_{\rm hyp} is the metric on \mathbb{H} with constant curvature 8-8).

The proof of this result is one of the main ingredients of [122]. Very roughly speaking, one first produces a single ρ\rho and φ\varphi that satisfy the conclusion of the theorem by an explicit construction; weak equivalence is then used to transfer the conclusion to other ρ\rho and φ\varphi as in the theorem.

Theorem 5.14 explains the utility of constructing equivariant minimal surfaces: if we choose the sequence of representations πN\pi_{N} in such a way that the limiting representation π\pi_{\infty} is weakly equivalent to the regular representation, then this will automatically imply that the metrics fNg𝕊2N1f_{N}^{*}g_{\mathbb{S}^{2N-1}} on the minimal surfaces converge to the metric fg𝕊=18ghypf_{\infty}^{*}g_{\mathbb{S}^{\infty}}=\frac{1}{8}g_{\rm hyp} with constant curvature 8-8.

5.5.3. Weak containment and strong convergence

At first sight, none of the above appears to be related to strong convergence. However, the following classical result [13, Theorem F.4.4] makes the connection immediately obvious.

Proposition 5.15.

Let Γ\Gamma be a finitely generated group with generating set 𝐠=(g1,,gr)\bm{g}=(g_{1},\ldots,g_{r}), and let ρ1:ΓB(H1)\rho_{1}:\Gamma\to B(H_{1}) and ρ2:ΓB(H2)\rho_{2}:\Gamma\to B(H_{2}) be unitary representations. Then the following are equivalent:

  1. 1.

    ρ1\rho_{1} and ρ2\rho_{2} are weakly equivalent.

  2. 2.

    P(ρ1(𝒈),ρ1(𝒈))=P(ρ2(𝒈),ρ2(𝒈))\|P(\rho_{1}(\bm{g}),\rho_{1}(\bm{g})^{*})\|=\|P(\rho_{2}(\bm{g}),\rho_{2}(\bm{g})^{*})\| for all Px1,,x2rP\in\mathbb{C}\langle x_{1},\ldots,x_{2r}\rangle.

In the present setting, XX is the thrice punctured sphere whose fundamental group is Γ𝐅2\Gamma\simeq\mathbf{F}_{2}. Thus we can define a random representation πN:ΓU(N)\pi_{N}:\Gamma\to U(N) with finite range by choosing πN(g1)=U1N|1\pi_{N}(g_{1})=U_{1}^{N}|_{1^{\perp}} and πN(g2)=U2N|1\pi_{N}(g_{2})=U_{2}^{N}|_{1^{\perp}}, where U1N,U2NU_{1}^{N},U_{2}^{N} are independent random permutation matrices of dimension N+1N+1 and we identified NN+11\mathbb{C}^{N}\simeq\mathbb{C}^{N+1}\cap 1^{\perp}. Since Theorem 1.4 yields

limNP(πN(𝒈),πN(𝒈))=P(λΓ(𝒈),λΓ(𝒈)),\lim_{N\to\infty}\|P(\pi_{N}(\bm{g}),\pi_{N}(\bm{g})^{*})\|=\|P(\lambda_{\Gamma}(\bm{g}),\lambda_{\Gamma}(\bm{g})^{*})\|,

it follows from Proposition 5.15 that the limiting representation π\pi_{\infty} must be weakly equivalent to the regular representation. Thus we obtain a sequence of random minimal surfaces fN(XN)f_{N}(X^{N}) in 𝕊2N1\mathbb{S}^{2N-1} with the desired property.

6. Open problems

Despite rapid developments on the topic of strong convergence in recent years, many challenging questions remain poorly understood. We therefore conclude this survey by highlighting a number of open problems and research directions.

6.1. Strong convergence without freeness

Until recently, nearly all known strong convergence results were concerned with polynomials of independent random matrices, and thus with limiting objects that are free. As we have seen in section 5.2, however, it is of considerable interest in applications to achieve strong convergence in non-free settings; for example, to establish optimal spectral gaps for random covers of hyperbolic manifolds, one needs models of random permutation matrices that converge strongly to the regular representation of the fundamental group of the base manifold. Such questions are challenging, in part, because they give rise to complicated dependent models of random matrices.

The systematic study of strong convergence to the regular representation of non-free groups was pioneered by Magee; see the survey [88]. To date, a small number of positive results are known in this direction:

  1. \bullet

    Louder and Magee [86] show that there are models of random permutation matrices that strongly converge to the regular representation of any fully residually free group: that is, a group that locally embeds in a free group. The prime example of finitely residually free groups are surface groups.

  2. \bullet

    Magee and Thomas [93] show that there are models of random unitary (but not permutation!) matrices that strongly converge to the regular representation of any right-angled Artin group; these are obtained from GUE matrices that act on overlapping factors of a tensor product (see also [33, §9.4]). This also implies a strong convergence result for any group that virtually embeds in a right-angled Artin group, such as fundamental groups of closed hyperbolic 33-manifolds.

  3. \bullet

    Magee, Puder, and the author [92] show that uniformly random permutation representations of the fundamental groups of orientable closed hyperbolic surfaces strongly converge to the regular representation.

On the other hand, not every discrete group admits a strongly convergent model: there cannot be a model of random permutation matrices that strongly converges to the regular representation of 𝐅2×𝐅2×𝐅2\mathbf{F}_{2}\times\mathbf{F}_{2}\times\mathbf{F}_{2} (a very special case of a right-angled Artin group) [88, Proposition 2.7], or a model of random unitary matrices that converges strongly to the regular representation of SLd()\mathrm{SL}_{d}(\mathbb{Z}) with d4d\geq 4 [89]. Thus existence of strongly convergent models cannot be taken for granted.

To give a hint of the difficulties that arise in non-free settings, recall that the fundamental group of a closed orientable surface of genus 22 is

Γ=g1,g2,g3,g4|[g1,g2][g3,g4]=1.\Gamma=\big{\langle}g_{1},g_{2},g_{3},g_{4}~{}\big{|}~{}[g_{1},g_{2}][g_{3},g_{4}]=1\big{\rangle}.

The most natural random matrix model of this group is obtained by sampling 44-tuples of random permutation matrices U1N,U2N,U3N,U4NU_{1}^{N},U_{2}^{N},U_{3}^{N},U_{4}^{N} uniformly at random from the set of such matrices that satisfy [U1N,U2N][U3N,U4N]=𝟏[U_{1}^{N},U_{2}^{N}][U_{3}^{N},U_{4}^{N}]=\mathbf{1}. This constraint introduces complicated dependencies, which causes the model to behave very differently than independent random permutation matrices. For example, unlike in the setting of section 3, the expected traces of monomials of these matrices are not even analytic, let alone rational, as a function of 1N\frac{1}{N}.

For surface groups, one can use the representation theory of 𝐒N\mathbf{S}_{N} to analyze this model; in particular, this enabled Magee–Naud–Puder [91] to show that its spectral statistics admit an asymptotic expansion in 1N\frac{1}{N}. The proof of strong convergence of this model in [92] is made possible by an extension of the polynomial method to models that admit “good” asymptotic expansions.

However, even for models that look superficially similar to surface groups, essentially nothing is known. For example, perhaps the the simplest fundamental group of a (non-orientable, finite volume) hyperbolic 33-manifold is

Γ=g1,g2|g12g22=g2g1.\Gamma=\big{\langle}g_{1},g_{2}~{}\big{|}~{}g_{1}^{2}g_{2}^{2}=g_{2}g_{1}\big{\rangle}.

This is the fundamental group of the Gieseking manifold, which is obtained by gluing the sides of a tetrahedron [94, §V.2]. Whether sampling uniformly from the set of permutation matrices U1N,U2NU_{1}^{N},U_{2}^{N} with (U1N)2(U2N)2=U2NU1N(U_{1}^{N})^{2}(U_{2}^{N})^{2}=U_{2}^{N}U_{1}^{N} yields a strongly convergent model is not known. Such questions are of considerable interest, since they provide a route to extending Buser’s conjecture to higher dimensions.

6.2. Random Cayley graphs

Let 𝝈=(σ1,,σr)\bm{\sigma}=(\sigma_{1},\ldots,\sigma_{r}) be i.i.d. uniform random elements of 𝐒N\mathbf{S}_{N}, and let πN\pi_{N} be an irreducible representation of 𝐒N\mathbf{S}_{N}. The results in section 5.3 show that the random matrices πN(𝝈)\pi_{N}(\bm{\sigma}) strongly converge to the regular representation λ(𝒈)\lambda(\bm{g}) of the free generators 𝒈=(g1,,gr)\bm{g}=(g_{1},\ldots,g_{r}) of 𝐅r\mathbf{F}_{r} for any sequence of irreducible representations with 1<dim(πN)exp(N112δ)1<\dim(\pi_{N})\leq\exp(N^{\frac{1}{12}-\delta}). What happens beyond this regime is a mystery: it may even be the case that strong convergence holds for any irreducible representations with dim(πN)\dim(\pi_{N})\to\infty.

Such questions are of particular interest since they are closely connected to the expansion of random Cayley graphs of finite groups. Let us recall that the Cayley graph Cay(𝐒N;σ1,,σr)\mathrm{Cay}(\mathbf{S}_{N};\sigma_{1},\ldots,\sigma_{r}) is the graph whose vertex set is 𝐒N\mathbf{S}_{N}, and whose edges are defined by connecting each vertex τ\tau to its neighbors σiτ\sigma_{i}\tau and σi1τ\sigma_{i}^{-1}\tau for i=1,,ri=1,\ldots,r. Its adjacency matrix is therefore given by

AN=λ𝐒N(σ1)+λ𝐒N(σ1)++λ𝐒N(σr)+λ𝐒N(σr),A^{N}=\lambda_{\mathbf{S}_{N}}(\sigma_{1})+\lambda_{\mathbf{S}_{N}}(\sigma_{1})^{*}+\cdots+\lambda_{\mathbf{S}_{N}}(\sigma_{r})+\lambda_{\mathbf{S}_{N}}(\sigma_{r})^{*},

where λ𝐒N\lambda_{\mathbf{S}_{N}} is the left-regular representation of 𝐒N\mathbf{S}_{N}. It is a folklore question whether there are sequences of finite groups so that, if generators are chosen independently and uniformly at random, the assocated Cayley graph has an optimal spectral gap. This question is open for any sequence of finite groups.

Now recall that every irreducible representation of a finite group is contained in its regular representation with multiplicity equal to its dimension. Thus

AN|1=supπNtrivπN(σ1)+πN(σ1)++πN(σr)+πN(σr),\|A^{N}|_{1^{\perp}}\|=\sup_{\pi_{N}\neq\mathrm{triv}}\|\pi_{N}(\sigma_{1})+\pi_{N}(\sigma_{1})^{*}+\cdots+\pi_{N}(\sigma_{r})+\pi_{N}(\sigma_{r})^{*}\|,

where the supremum is over all nontrivial irreducible representations πN\pi_{N} (the trivial repesentation is removed by restricting to 11^{\perp}). Thus in order to establish optimal spectral gaps for Cayley graphs, we must understand the random matrices πN(𝝈)\pi_{N}(\bm{\sigma}) defined by all irreducible representations πN\pi_{N}.

Note that random Cayley graphs of 𝐒N\mathbf{S}_{N} cannot have an optimal spectral gap with probability 1o(1)1-o(1), as there is a nontrivial 11-dimensional representation (the sign representation). The latter produces a single eigenvalue that is distributed as twice the sum of rr independent Bernoulli variables, which exceeds the lower bound of Lemma 1.2 with constant probability. Thus we interpret the optimal spectral gap question to mean whether all eigenvalues, except those coming from the trivial and sign representations, meet the lower bound with probability 1o(1)1-o(1). That this is the case is a well known conjecture, see, e.g., [119, Conjecture 1.6]. However, to date it has not even been shown that such graphs are expanders, i.e., that their nontrivial eigenvalues are bounded away from the trivial eigenvalue as NN\to\infty; nor is there any known construction of Cayley graphs of 𝐒N\mathbf{S}_{N} that achieve an optimal spectral gap. The only known result, due to Kassabov [79], is that there exists a choice of generators for which the Cayley graph is an expander.

The analogous question is of significant interest for other finite groups, such as SLN(𝔽q)\mathrm{SL}_{N}(\mathbb{F}_{q}) (here we may take either qq\to\infty or NN\to\infty). In some ways, these groups are considerably better understood than the symmetric group: in this setting, Bourgain and Gamburd [22] (see also [123]) show that random Cayley graphs are expanders, while Lubotzky–Phillips–Sarnak [87] and Margulis [97] provide a determinstic choice of generators for which the Cayley graph has an optimal spectral gap. That random Cayley graphs of these groups have an optimal spectral gap is suggested by numerical evidence [82, 81, 119]. However, the study of strong convergence in the context of such groups has so far remained out of reach.

Remark 6.1.

The above questions are concerned with random Cayley graphs with a bounded number of generators. If the number of generators is allowed to diverge with the size of the group, rather general results are known: expansion follows from a classical result of Alon and Roichman [1], while optimal spectral gaps were obtained by Brailovskaya and the author in [23, §3.2.3].

6.3. Representations of a fixed group

In section 5.3 and above, we always considered strong convergence in the context of a sequence of groups 𝐆N\mathbf{G}_{N} of increasing size and a representation πN\pi_{N} of each 𝐆N\mathbf{G}_{N}. It is a tantalizing question [21] whether strong convergence might even arise when the group 𝐆\mathbf{G} is fixed, and we take a sequence of irreducible representations πN\pi_{N} of 𝐆\mathbf{G} of dimension tending to infinity. Since the entropy of the random generators that are sampled from the group is now fixed, strong convergence would have to arise in this setting entirely from the pseudorandom behavior of high-dimensional representations.

This situation cannot arise, of course, for a finite group 𝐆\mathbf{G}, since it has only finitely many irreducible representations. The question makes sense, however, when 𝐆\mathbf{G} is a compact Lie group. The simplest model of this kind arises when 𝐆=SU(2)\mathbf{G}=\mathrm{SU}(2), which has a single sequence of irreducible representations πN=symNV\pi_{N}=\mathrm{sym}^{N}V where VV is the standard representation. The question is then, if 𝑼=(U1,,Ur)\bm{U}=(U_{1},\ldots,U_{r}) are sampled independently from the Haar measure on SU(2)\mathrm{SU}(2), whether πN(𝑼)\pi_{N}(\bm{U}) strongly converges to the regular representation λ(𝒈)\lambda(\bm{g}) of the free generators 𝒈=(g1,,gr)\bm{g}=(g_{1},\ldots,g_{r}) of 𝐅r\mathbf{F}_{r}. A special case of this question is discussed in detail by Gamburd–Jakobson–Sarnak [54], who present numerical evidence in its favor.

Let us note that while strong convergence of representations of a fixed group is poorly understood, the corresponding weak convergence property is known to hold in great generality. For example, if 𝑼=(U1,,Ur)\bm{U}=(U_{1},\ldots,U_{r}) are sampled independently from the Haar measure on any compact connected semisimple Lie group 𝐆\mathbf{G}, and if πN\pi_{N} is any sequence of irreducible representations of 𝐆\mathbf{G} with dim(πN)\dim(\pi_{N})\to\infty, then πN(𝐆)\pi_{N}(\mathbf{G}) converges weakly to λ(𝒈)\lambda(\bm{g}); see [9, Proposition 7.2(1)].

6.4. Deterministic constructions

To date, all known instances of the strong convergence phenomenon require random constructions (except in amenable situations, cf. [88, §2.1]). This is in contrast to the setting of graphs with an optimal spectral gap, for which explicit deterministic constructions exist and even predate the understanding of random graphs [87, 97]. It remains a major challenge to achieve strong convergence by a deterministic construction.

A potential candidate arises from the celebrated works of Lubotzky–Phillips–Sarnak [87] and Margulis [97], who show that the Cayley graph of PSL2(𝔽q)\mathrm{PSL}_{2}(\mathbb{F}_{q}) defined by a certain explicit deterministic choice of generators has an optimal spectral gap. We may therefore ask, by extension, whether the matrices obtained by applying the regular representation of PSL2(𝔽q)\mathrm{PSL}_{2}(\mathbb{F}_{q}) to these generators converge strongly to the regular representation of the free generators of 𝐅r\mathbf{F}_{r} (cf. section 6.2). This question was raised by Voiculescu [127, p. 146] in an early paper that motivated the development of strong convergence of random matrices by Haagerup and Thorbjørnsen. However, the deterministic question remains open, and the methods of [87, 97] appear to be powerless for addressing this question.

Another tantalizing candidate is the following simple model. Let qq be a prime and 𝐏1(𝔽q)\mathbf{P}^{1}(\mathbb{F}_{q}) be the projective line over 𝔽q\mathbb{F}_{q}; thus z𝐏1(𝔽q)z\in\mathbf{P}^{1}(\mathbb{F}_{q}) may take the values 0,1,2,,q1,0,1,2,\ldots,q-1,\infty. PSL2()\mathrm{PSL}_{2}(\mathbb{Z}) acts on 𝐏1(𝔽q)\mathbf{P}^{1}(\mathbb{F}_{q}) by Möbius transformations

[abcd]z=az+bcz+d;\begin{bmatrix}a&b\\ c&d\end{bmatrix}z=\frac{az+b}{cz+d};

this is just the linear action of PSL2()\mathrm{PSL}_{2}(\mathbb{Z}) if we parametrize 𝐏1(𝔽q)\mathbf{P}^{1}(\mathbb{F}_{q}) by homogeneous coordinates z=[z1:z2]z=[z_{1}:z_{2}]. Let πqHom(PSL2();𝐒q+1)\pi_{q}\in\mathrm{Hom}(\mathrm{PSL}_{2}(\mathbb{Z});\mathbf{S}_{q+1}) be the homomorphism defined by this action, that is, πq(X)\pi_{q}(X) is the permutation of the elements of 𝐏1(𝔽q)\mathbf{P}^{1}(\mathbb{F}_{q}) that maps zz to XzXz. The question is whether the permutation matrices

(πq(X1),,πq(Xr))|1(\pi_{q}(X_{1}),\ldots,\pi_{q}(X_{r}))|_{1^{\perp}}

converge strongly to the regular representation

(λPSL2()(X1),,λPSL2()(Xr))(\lambda_{\mathrm{PSL}_{2}(\mathbb{Z})}(X_{1}),\ldots,\lambda_{\mathrm{PSL}_{2}(\mathbb{Z})}(X_{r}))

for any X1,,XrPSL2()X_{1},\ldots,X_{r}\in\mathrm{PSL}_{2}(\mathbb{Z}). Numerical evidence [27, 82, 81, 119] supports this phenomenon, but a mathematical understanding remains elusive.

The above convergence was conjectured by Buck [27] for diffusion operators—that is, for polynomials with positive coefficients—and by Magee (personal communication) for arbitrary polynomials. Note, however, that these two conjectures are actually equivalent by the positivization trick (cf. Lemma 2.26 and Remark 2.27) since PSL2()\mathrm{PSL}_{2}(\mathbb{Z}) satisfies the rapid decay and unique trace properties [31, 14].

6.5. Ramanujan constructions

Recall that if ANA^{N} is the adjacency matrix of a dd-regular graph with NN vertices, the lower bound of Lemma 1.2 states that

AN|12d1o(1)\|A^{N}|_{1^{\perp}}\|\geq 2\sqrt{d-1}-o(1)

as NN to infinity. In this survey, we have said that a sequence of graphs has an optimal spectral gap if satisfies this bound in reverse, that is, if

AN|12d1+o(1).\|A^{N}|_{1^{\perp}}\|\leq 2\sqrt{d-1}+o(1).

However, a more precise question that has attracted significant attention in the literature is whether it is possible for graphs to have their nontrivial eigenvalues be strictly bounded by the spectral radius of the universal cover, without an error term: that is, can one have dd-regular graphs with NN vertices such that

AN|12d1\|A^{N}|_{1^{\perp}}\|\leq 2\sqrt{d-1}

for arbitrarily large NN? Graphs satisfying this property are called Ramanujan graphs. Ramanujan graphs do indeed exist and can be obtained by several remarkable constructions; we refer to the breakthrough papers [87, 97, 96, 75].

Whether there can be a stronger notion of strong convergence that generalizes Ramanujan graphs is unclear. For example, one may ask whether there exist N×NN\times N permutation matrices 𝑼N=(U1N,,UrN)\bm{U}^{N}=(U_{1}^{N},\ldots,U_{r}^{N}) so that

P(𝑼N,𝑼N)|1P(𝒖,𝒖)\|P(\bm{U}^{N},\bm{U}^{N*})|_{1^{\perp}}\|\leq\|P(\bm{u},\bm{u}^{*})\| (6.1)

for every polynomial PP, where 𝒖\bm{u} are as defined in Theorem 1.4. It cannot be the case that (6.1) holds simultaneously for all PP in fixed dimension NN, since that would imply by Lemma 2.13 that Cred(𝐅r)C^{*}_{\rm red}(\mathbf{F}_{r}) embeds in MN()\mathrm{M}_{N}(\mathbb{C}). However, we are not aware of an obstruction to the existence of 𝑼N\bm{U}^{N} so that (6.1) holds for all polynomials PP with a bound on the degree deg(P)q(N)\mathrm{deg}(P)\leq q(N) that diverges sufficiently slowly with NN. A weaker form of this question is whether for each PP, there exist 𝑼N\bm{U}^{N} (which may now depend on PP) for arbitrarily large NN so that (6.1) holds.

The interest in Ramanujan graphs stems in part from an analogy with number theory: the Ramanujan property of a graph is equivalent to the validity of the Riemann hypothesis for its Ihara zeta function [124, Theorem 7.4]. In the setting of hyperbolic surfaces, the analogous “Ramanujan property” that a hyperbolic surface XX has λ1(X)14\lambda_{1}(X)\geq\frac{1}{4} is equivalent to the validity of the Riemann hypothesis for its Selberg zeta function [104, §6]. An important conjecture of Selberg [120] predicts that a specific family of hyperbolic surfaces has this property. However, no such surfaces have yet been proved to exist. The results in section 5.2 therefore provide additional motivation for studying “Ramanujan forms” of strong convergence.

6.6. The optimal dimension of matrix coefficients

The strong convergence problem for polynomials of NN-dimensional random matrices with matrix coefficients of dimension DND_{N}\to\infty was discussed in section 5.4.1 in the context of the Peterson-Thom conjecture. While only the case DN=ND_{N}=N is needed for that purpose, the optimal range of DND_{N} for which strong convergence holds remains open: for both Gaussian and Haar distributed matrices, it is known that strong convergence holds when DN=eo(N)D_{N}=e^{o(N)} and can fail when DNeCN2D_{N}\geq e^{CN^{2}} [33]. Understanding what lies in between is related to questions in operator space theory [116, §4].

From the random matrix perspective, an interesting feature of this question is that there is a basic obstacle to going beyond subexponential dimension that is explained in [33, §1.3.1]. While strong convergence is concerned with understanding extreme eigenvalues of a random matrix XNX^{N}, essentially all known proofs of strong convergence are based on spectral statistics such as 𝐄[trh(XN)]\mathbf{E}[\mathop{\mathrm{tr}}h(X^{N})] which count eigenvalues. However, when DNeCND_{N}\geq e^{CN} the expected number of eigenvalues of P(𝑼N,𝑼N)P(\bm{U}^{N},\bm{U}^{N*}) away from the support of the spectrum of P(𝒖,𝒖)P(\bm{u},\bm{u}^{*}) may not go to zero even in situations where strong convergence holds, because polynomials with matrix coefficients can have outlier eigenvalues with very large multiplicity. Thus going beyond coefficients of exponential dimension appears to present a basic obstacle to any method of proof that uses trace statistics.

6.7. The optimal scale of fluctuations

The largest eigenvalue of a GUE matrix has fluctuations of order N2/3N^{-2/3}, and the exact (Tracy-Widom) limit distribution is known. The universality of this phenomenon has been the subject of a major research program in mathematical physics [46], and corresponding results are known for many classical models of random matrix theory. In a major breakthrough, Huang–McKenzie–Yau [75] recently showed that the largest nontrivial eigenvalue of a random regular graph has the same behavior, which implies the remarkable result that about 69%69\% of random regular graphs are Ramanujan.

It is natural to expect that except in degenerate situations, the same scale and edge statistics should arise in strong convergence problems. However, to date the optimal scale of fluctuations N2/3N^{-2/3} has only been established for the norm of quadratic polynomials of Wigner matrices [52]. For polynomials of arbitrary degree and for a broader class of models, the best known rate N1/2N^{-1/2} is achieved both by the interpolation [110, 109] and polynomial [33] methods.

There is, in fact, a good reason why this is the case. The model considered by Parraud in [110, 109] is somewhat more general in that it considers polynomials of both random and deterministic matrices (see section 6.8 below). In this setting, however, one can readily construct examples where N1/2N^{-1/2} is the true order of the fluctuations: for example, one may take the sum of a GUE matrix and a deterministic matrix of rank one [112]. The random matrix scale N2/3N^{-2/3} can therefore only be expected to appear for polynomials of random matrices alone.

6.8. Random and deterministic matrices

Let 𝑮N=(G1N,,GrN)\bm{G}^{N}=(G_{1}^{N},\ldots,G_{r}^{N}) be i.i.d. GUE matrices, and let 𝑩N=(B1N,,BsN)\bm{B}^{N}=(B_{1}^{N},\ldots,B_{s}^{N}) be deterministic matrices of the same dimension. It was realized by Male [95] that the Haagerup–Thorbjørnsen theorem admits the following extension: if it is assumed that 𝑩N\bm{B}^{N} converges strongly to some limiting family of operators 𝒃\bm{b}, then (𝑮N,𝑩N)(\bm{G}^{N},\bm{B}^{N}) converges strongly to (𝒔,𝒃)(\bm{s},\bm{b}) where the free semicircular family 𝒔\bm{s} is taken to be freely independent of 𝒃\bm{b} in the sense of Voiculescu. This joint strong convergence property of random and deterministic matrices was extended to Haar unitaries by Collins and Male [41], and was developed in a nonasymptotic form by Collins, Guionnet, and Parraud [40, 108, 110, 109]. The advantage of this formulation is that it encodes a variety of applications that cannot be achieved without the inclusion of deterministic matrices.

To date, however, strong convergence of random and deterministic matrices has only been amenable to analytic methods, such those of Haagerup–Thorbjørnsen [60] or the interpolation methods of [40, 10]. Thus a counterpart of this form of strong convergence for random permutation matrices remains open. The development of such a result is motivated by various applications [8, 36, 37].

6.9. Complex eigenvalues

In contrast to the real eigenvalues of self-adjoint polynomials, complex eigenvalues of non-self-adjoint polynomials are much more poorly understood. While an upper bound on the spectral radius follows directly from strong convergence by Lemma 2.12, a lower bound on the spectral radius and convergence of the empirical distribution of the complex eigenvalues remain largely open. The difficulty here is reversed from the study of strong convergence, where an upper bound on the norm is typically the main difficulty and both a lower bound on the norm and weak convergence follow automatically by Lemma 2.13.

It is not even entirely obvious at first sight how the complex eigenvalue distribution of a non-self-adjoint operator in a CC^{*}-probability space should be defined. The natural object of this kind, whose definition reduces to the complex eigenvalue distribution in the case of matrices, is called the Brown measure [99, Chapter 11]. It is tempting to conjecture that if a family of random matrices 𝑿N\bm{X}^{N} strongly converges to a family of limiting operators 𝒙\bm{x}, then the empirical distribution of the complex eigenvalues of any noncommutative polynomial P(𝑿N,𝑿N)P(\bm{X}^{N},\bm{X}^{N*}) should converge to P(𝒙,𝒙)P(\bm{x},\bm{x}^{*}). To date, this has only been proved in the special case of quadratic polynomials of independent complex Ginibre matrices [44].

One may similarly ask whether the intrinsic freeness principle extends to complex eigenvalues of non-self-adjoint random matrices. For example, is there a counterpart of Theorem 1.6 for complex eigenvalues, and if so what are the objects that should appear in it? No results of this kind have been obtained to date.

Acknowledgments

I first learned about strong convergence a decade or so ago from Gilles Pisier, who asked me about its connection with the study of nonhomogeneous random matrices. Only many years later did I come to fully appreciate the significance of this question. An informal CC^{*}-seminar organized by Peter Sarnak at Princeton during Fall 2023 further led to many fruitful interactions.

I am grateful to Michael Magee and Mikael de la Salle who taught me various things about this topic that could not easily be found in the literature, and to Ben Hayes and Antoine Song for explaining the material in sections 5.45.5 to me. It is a great pleasure to thank all my collaborators, acknowledged throughout this survey, with whom I have thought about these problems.

Last but not least, I thank the organizers of Current Developments in Mathematics for the invitation to present this survey.

The author was supported in part by NSF grant DMS-2347954. This survey was written while the author was at the Institute for Advanced Study in Princeton, NJ, which is thanked for providing a fantastic mathematical environment.

References

  • [1] N. Alon and Y. Roichman. Random Cayley graphs and expanders. Random Structures Algorithms, 5(2):271–284, 1994.
  • [2] J. Alt, L. Erdős, and T. Krüger. The Dyson equation with linear self-energy: spectral bands, edges and cusps. Doc. Math., 25:1421–1539, 2020.
  • [3] A. Amit, N. Linial, J. Matoušek, and E. Rozenman. Random lifts of graphs. In Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’01, pages 883–894, 2001.
  • [4] N. Anantharaman and L. Monk. Friedman-Ramanujan functions in random hyperbolic geometry and application to spectral gaps, 2023. arXiv:2304.02678.
  • [5] N. Anantharaman and L. Monk. Friedman-Ramanujan functions in random hyperbolic geometry and application to spectral gaps II, 2025. arXiv:2502.12268.
  • [6] C. Anantharaman-Delaroche. Amenability and exactness for groups, group actions and operator algebras, 2007. ESI lecture notes, HAL:cel-00360390.
  • [7] G. W. Anderson. Convergence of the largest singular value of a polynomial in independent Wigner matrices. Ann. Probab., 41(3B):2103–2181, 2013.
  • [8] B. Au, G. Cébron, A. Dahlqvist, F. Gabriel, and C. Male. Freeness over the diagonal for large random matrices. Ann. Probab., 49(1):157–179, 2021.
  • [9] N. Avni and I. Glazer. On the Fourier coefficients of word maps on unitary groups. Compos. Math., 2025. To appear.
  • [10] A. S. Bandeira, M. T. Boedihardjo, and R. van Handel. Matrix concentration inequalities and free probability. Invent. Math., 234(1):419–487, 2023.
  • [11] A. S. Bandeira, G. Cipolloni, D. Schröder, and R. van Handel. Matrix concentration inequalities and free probability II. Two-sided bounds and applications, 2024. Preprint arxiv:2406.11453.
  • [12] A. F. Beardon. The geometry of discrete groups, volume 91 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1995. Corrected reprint of the 1983 original.
  • [13] B. Bekka, P. de la Harpe, and A. Valette. Kazhdan’s property (T), volume 11 of New Mathematical Monographs. Cambridge University Press, Cambridge, 2008.
  • [14] M. Bekka, M. Cowling, and P. de la Harpe. Some groups whose reduced CC^{*}-algebra is simple. Inst. Hautes Études Sci. Publ. Math., (80):117–134, 1994.
  • [15] S. Belinschi and M. Capitaine. Strong convergence of tensor products of independent GUE matrices, 2022. Preprint arXiv:2205.07695.
  • [16] C. Bennett and R. Sharpley. Interpolation of operators, volume 129 of Pure and Applied Mathematics. Academic Press, Inc., Boston, MA, 1988.
  • [17] N. Bergeron. The spectrum of hyperbolic surfaces. Universitext. Springer, Cham; EDP Sciences, Les Ulis, 2016. Appendix C by Valentin Blomer and Farrell Brumley, Translated from the 2011 French original by Brumley [2857626].
  • [18] C. Bordenave. A new proof of Friedman’s second eigenvalue theorem and its extension to random lifts. Ann. Sci. Éc. Norm. Supér. (4), 53(6):1393–1439, 2020.
  • [19] C. Bordenave and B. Collins. Eigenvalues of random lifts and polynomials of random permutation matrices. Ann. of Math. (2), 190(3):811–875, 2019.
  • [20] C. Bordenave and B. Collins. Norm of matrix-valued polynomials in random unitaries and permutations, 2024. Preprint arxiv:2304.05714v2.
  • [21] C. Bordenave and B. Collins. Strong asymptotic freeness for independent uniform variables on compact groups associated to nontrivial representations. Invent. Math., 237(1):221–273, 2024.
  • [22] J. Bourgain and A. Gamburd. Uniform expansion bounds for Cayley graphs of SL2(𝔽p){\rm SL}_{2}(\mathbb{F}_{p}). Ann. of Math. (2), 167(2):625–642, 2008.
  • [23] T. Brailovskaya and R. van Handel. Universality and sharp matrix concentration inequalities. Geom. Funct. Anal., 34(6):1734–1838, 2024.
  • [24] E. Breuillard, M. Kalantar, M. Kennedy, and N. Ozawa. CC^{*}-simplicity and the unique trace property for discrete groups. Publ. Math. Inst. Hautes Études Sci., 126:35–71, 2017.
  • [25] N. P. Brown and N. Ozawa. CC^{*}-algebras and finite-dimensional approximations, volume 88 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2008.
  • [26] R. L. Bryant. Minimal surfaces of constant curvature in SnS^{n}. Trans. Amer. Math. Soc., 290(1):259–271, 1985.
  • [27] M. W. Buck. Expanders and diffusers. SIAM J. Algebraic Discrete Methods, 7(2):282–304, 1986.
  • [28] P. Buser. Cubic graphs and the first eigenvalue of a Riemann surface. Math. Z., 162(1):87–99, 1978.
  • [29] P. Buser. On the bipartition of graphs. Discrete Appl. Math., 9(1):105–109, 1984.
  • [30] E. Cassidy. Random permutations acting on kk-tuples have near-optimal spectral gap for k=poly(n)k=\mathrm{poly}(n), 2024. Preprint arxiv:2412.13941v2.
  • [31] I. Chatterji. Introduction to the rapid decay property. In Around Langlands correspondences, volume 691 of Contemp. Math., pages 53–72. Amer. Math. Soc., Providence, RI, 2017.
  • [32] C.-F. Chen, J. Garza-Vargas, J. A. Tropp, and R. van Handel. A new approach to strong convergence. Ann. of Math., 2025. To appear.
  • [33] C.-F. Chen, J. Garza-Vargas, and R. van Handel. A new approach to strong convergence II. The classical ensembles, 2025. Preprint arxiv:2412.00593.
  • [34] E. W. Cheney. Introduction to approximation theory. AMS, Providence, RI, 1998.
  • [35] S. Y. Cheng. Eigenvalue comparison theorems and its geometric applications. Math. Z., 143(3):289–297, 1975.
  • [36] G. Cohen, I. Cohen, and G. Maor. Tight bounds for the Zig-Zag product. In 2024 IEEE 65th Annual Symposium on Foundations of Computer Science—FOCS 2024, pages 1470–1499. IEEE Computer Soc., Los Alamitos, CA, [2024] ©2024.
  • [37] G. Cohen, I. Cohen, G. Maor, and Y. Peled. Derandomized squaring: an analytical insight into its true behavior. In 16th Innovations in Theoretical Computer Science Conference, volume 325 of LIPIcs. Leibniz Int. Proc. Inform., pages Art. No. 40, 24. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern, 2025.
  • [38] T. H. Colding and W. P. Minicozzi, II. A course in minimal surfaces, volume 121 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2011.
  • [39] B. Collins. Moment methods on compact groups: Weingarten calculus and its applications. In ICM—International Congress of Mathematicians. Vol. 4. Sections 5–8, pages 3142–3164. EMS Press, Berlin, [2023] ©2023.
  • [40] B. Collins, A. Guionnet, and F. Parraud. On the operator norm of non-commutative polynomials in deterministic matrices and iid GUE matrices. Camb. J. Math., 10(1):195–260, 2022.
  • [41] B. Collins and C. Male. The strong asymptotic freeness of Haar and deterministic matrices. Ann. Sci. Éc. Norm. Supér. (4), 47(1):147–163, 2014.
  • [42] B. Collins, S. Matsumoto, and J. Novak. The Weingarten calculus. Notices Amer. Math. Soc., 69(5):734–745, 2022.
  • [43] A. Connes. Classification of injective factors. Cases II1,II_{1}, II,II_{\infty}, IIIλ,III_{\lambda}, λ1\lambda\not=1. Ann. of Math. (2), 104(1):73–115, 1976.
  • [44] N. A. Cook, A. Guionnet, and J. Husson. Spectrum and pseudospectrum for quadratic polynomials in Ginibre matrices. Ann. Inst. Henri Poincaré Probab. Stat., 58(4):2284–2320, 2022.
  • [45] M. de la Salle. Complete isometries between subspaces of noncommutative LpL_{p}-spaces. J. Operator Theory, 64(2):265–298, 2010.
  • [46] L. Erdős and H.-T. Yau. A dynamical approach to random matrix theory, volume 28 of Courant Lecture Notes in Mathematics. Courant Institute of Mathematical Sciences, New York; American Mathematical Society, Providence, RI, 2017.
  • [47] P. Etingof. Representation theory in complex rank, I. Transform. Groups, 19(2):359–381, 2014.
  • [48] B. Farb. Representation stability. In Proceedings of the International Congress of Mathematicians—Seoul 2014. Vol. II, pages 1173–1196. Kyung Moon Sa, Seoul, 2014.
  • [49] J. Friedman. Relative expanders or weakly relatively Ramanujan graphs. Duke Math. J., 118(1):19–35, 2003.
  • [50] J. Friedman. A proof of Alon’s second eigenvalue conjecture and related problems. Mem. Amer. Math. Soc., 195(910):viii+100, 2008.
  • [51] J. Friedman, A. Joux, Y. Roichman, J. Stern, and J.-P. Tillich. The action of a few random permutations on rr-tuples and an application to cryptography. In STACS 96 (Grenoble, 1996), volume 1046 of Lecture Notes in Comput. Sci., pages 375–386. Springer, Berlin, 1996.
  • [52] J. Fronk, T. Krüger, and Y. Nemish. Norm convergence rate for multivariate quadratic polynomials of Wigner matrices. J. Funct. Anal., 287(12):Paper No. 110647, 59, 2024.
  • [53] W. Fulton. Algebraic topology, volume 153 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1995. A first course.
  • [54] A. Gamburd, D. Jakobson, and P. Sarnak. Spectra of elements in the group ring of SU(2){\rm SU}(2). J. Eur. Math. Soc. (JEMS), 1(1):51–85, 1999.
  • [55] A. Guionnet. Asymptotics of random matrices and related models, volume 130 of CBMS Regional Conference Series in Mathematics. American Mathematical Society, Providence, RI, 2019. The uses of Dyson-Schwinger equations, Published for the Conference Board of the Mathematical Sciences.
  • [56] R. D. Gulliver, II, R. Osserman, and H. L. Royden. A theory of branched immersions of surfaces. Amer. J. Math., 95:750–812, 1973.
  • [57] U. Haagerup. An example of a nonnuclear CC^{\ast}-algebra, which has the metric approximation property. Invent. Math., 50(3):279–293, 1978/79.
  • [58] U. Haagerup. Injectivity and decomposition of completely bounded maps. In Operator algebras and their connections with topology and ergodic theory (Buşteni, 1983), volume 1132 of Lecture Notes in Math., pages 170–222. Springer, Berlin, 1985.
  • [59] U. Haagerup, H. Schultz, and S. Thorbjørnsen. A random matrix approach to the lack of projections in Cred(𝔽2)C^{*}_{\rm red}(\mathbb{F}_{2}). Adv. Math., 204(1):1–83, 2006.
  • [60] U. Haagerup and S. Thorbjørnsen. A new application of random matrices: Ext(Cred(F2)){\rm Ext}(C^{*}_{\rm red}(F_{2})) is not a group. Ann. of Math. (2), 162(2):711–775, 2005.
  • [61] P. R. Halmos. Commutators of operators. II. Amer. J. Math., 76:191–198, 1954.
  • [62] P. R. Halmos. A Hilbert space problem book, volume 17 of Encyclopedia of Mathematics and its Applications. Springer-Verlag, New York-Berlin, second edition, 1982. Graduate Texts in Mathematics, 19.
  • [63] L. Hanany and D. Puder. Word measures on symmetric groups. Int. Math. Res. Not. IMRN, (11):9221–9297, 2023.
  • [64] A. Hatcher. Algebraic topology. Cambridge University Press, Cambridge, 2002.
  • [65] B. Hayes. A random matrix approach to the Peterson-Thom conjecture. Indiana Univ. Math. J., 71(3):1243–1297, 2022.
  • [66] B. Hayes, D. Jekel, and S. Kunnawalkam Elayavalli. Consequences of the random matrix solution to the Peterson-Thom conjecture. Anal. PDE, 18(7):1805–1834, 2025.
  • [67] J. W. Helton, R. Rashidi Far, and R. Speicher. Operator-valued semicircular elements: solving a quadratic matrix equation with positivity constraints. Int. Math. Res. Not. IMRN, (22):Art. ID rnm086, 15, 2007.
  • [68] W. Hide, D. Macera, and J. Thomas. Spectral gap with polynomial rate for random covering surfaces, 2025. Preprint arxiv:2505.08479.
  • [69] W. Hide and M. Magee. Near optimal spectral gaps for hyperbolic surfaces. Ann. of Math. (2), 198(2):791–824, 2023.
  • [70] W. Hide, J. Moy, and F. Naud. On the spectral gap of negatively curved surface covers, 2025. Preprint arxiv:2502.10733.
  • [71] S. Hoory, N. Linial, and A. Wigderson. Expander graphs and their applications. Bull. Amer. Math. Soc. (N.S.), 43(4):439–561, 2006.
  • [72] L. Hörmander. The analysis of linear partial differential operators. I. Classics in Mathematics. Springer-Verlag, Berlin, 2003. Distribution theory and Fourier analysis.
  • [73] B. Huang and M. Rahman. On the local geometry of graphs in terms of their spectra. European J. Combin., 81:378–393, 2019.
  • [74] J. Huang, T. McKenzie, and H.-T. Yau. Optimal eigenvalue rigidity of random regular graphs, 2024. Preprint arxiv:2405.12161.
  • [75] J. Huang, T. McKenzie, and H.-T. Yau. Ramanujan property and edge universality of random regular graphs, 2024. Preprint arxiv:2412.20263.
  • [76] J. Huang and H.-T. Yau. Spectrum of random dd-regular graphs up to the edge. Comm. Pure Appl. Math., 77(3):1635–1723, 2024.
  • [77] H. Huber. Über den ersten Eigenwert des Laplace-Operators auf kompakten Riemannschen Flächen. Comment. Math. Helv., 49:251–259, 1974.
  • [78] G. D. James. The representation theory of the symmetric groups, volume 682 of Lecture Notes in Mathematics. Springer, Berlin, 1978.
  • [79] M. Kassabov. Symmetric groups and expander graphs. Invent. Math., 170(2):327–354, 2007.
  • [80] H. Kesten. Symmetric random walks on groups. Trans. Amer. Math. Soc., 92:336–354, 1959.
  • [81] J. Lafferty and D. Rockmore. Numerical investigation of the spectrum for certain families of Cayley graphs. In Expanding graphs (Princeton, NJ, 1992), volume 10 of DIMACS Ser. Discrete Math. Theoret. Comput. Sci., pages 63–73. Amer. Math. Soc., Providence, RI, 1993.
  • [82] J. D. Lafferty and D. Rockmore. Fast Fourier analysis for SL2{\rm SL}_{2} over a finite field and related numerical experiments. Experiment. Math., 1(2):115–139, 1992.
  • [83] M. Ledoux. The concentration of measure phenomenon, volume 89 of Mathematical Surveys and Monographs. American Mathematical Society, Providence, RI, 2001.
  • [84] F. Lehner. Computing norms of free operators with matrix coefficients. Amer. J. Math., 121(3):453–486, 1999.
  • [85] N. Linial and D. Puder. Word maps and spectra of random graph lifts. Random Structures Algorithms, 37(1):100–135, 2010.
  • [86] L. Louder, M. Magee, and W. Hide. Strongly convergent unitary representations of limit groups. J. Funct. Anal., 288(6):Paper No. 110803, 2025.
  • [87] A. Lubotzky, R. Phillips, and P. Sarnak. Ramanujan graphs. Combinatorica, 8(3):261–277, 1988.
  • [88] M. Magee. Strong convergence of unitary and permutation representations of discrete groups, 2024. Proceedings of the ECM, to appear.
  • [89] M. Magee and M. de la Salle. SL4(){\rm SL}_{4}(\mathbb{Z}) is not purely matricial field. C. R. Math. Acad. Sci. Paris, 362:903–910, 2024.
  • [90] M. Magee and M. de la Salle. Strong asymptotic freeness of Haar unitaries in quasi-exponential dimensional representations, 2024. Preprint arXiv:2409.03626.
  • [91] M. Magee, F. Naud, and D. Puder. A random cover of a compact hyperbolic surface has relative spectral gap 316ε\frac{3}{16}-\varepsilon. Geom. Funct. Anal., 32(3):595–661, 2022.
  • [92] M. Magee, D. Puder, and R. van Handel. Strong convergence of uniformly random permutation representations of surface groups, 2025. Preprint arxiv:2504.08988.
  • [93] M. Magee and J. Thomas. Strongly convergent unitary representations of right-angled Artin groups, 2023. Preprint arxiv:2308.00863.
  • [94] W. Magnus. Noneuclidean tesselations and their groups, volume Vol. 61 of Pure and Applied Mathematics. Academic Press [Harcourt Brace Jovanovich, Publishers], New York-London, 1974.
  • [95] C. Male. The norm of polynomials in large random and deterministic matrices. Probab. Theory Related Fields, 154(3-4):477–532, 2012. With an appendix by Dimitri Shlyakhtenko.
  • [96] A. W. Marcus, D. A. Spielman, and N. Srivastava. Interlacing families I: Bipartite Ramanujan graphs of all degrees. Ann. of Math. (2), 182(1):307–325, 2015.
  • [97] G. A. Margulis. Explicit group-theoretic constructions of combinatorial schemes and their applications in the construction of expanders and concentrators. Problemy Peredachi Informatsii, 24(1):51–60, 1988.
  • [98] W. H. Meeks, III and J. Pérez. The classical theory of minimal surfaces. Bull. Amer. Math. Soc. (N.S.), 48(3):325–407, 2011.
  • [99] J. A. Mingo and R. Speicher. Free probability and random matrices, volume 35 of Fields Institute Monographs. Springer, New York; Fields Institute for Research in Mathematical Sciences, Toronto, ON, 2017.
  • [100] A. Miyagawa. A short note on strong convergence of qq-Gaussians. Internat. J. Math., 34(14):Paper No. 2350087, 8, 2023.
  • [101] S. Mohanty, R. O’Donnell, and P. Paredes. Explicit near-Ramanujan graphs of every degree, 2019. Preprint arXiv:1909.06988v3.
  • [102] J. D. Moore. Introduction to global analysis, volume 187 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2017. Minimal surfaces in Riemannian manifolds.
  • [103] J. Moy. Spectral gap of random covers of negatively curved noncompact surfaces, 2025. Preprint arxiv:2505.07056.
  • [104] M. R. Murty. An introduction to Selberg’s trace formula. J. Indian Math. Soc. (N.S.), 52:91–126, 1987.
  • [105] A. Nica. On the number of cycles of given length of a free word in several random permutations. Random Structures Algorithms, 5(5):703–730, 1994.
  • [106] A. Nica and R. Speicher. Lectures on the combinatorics of free probability. Cambridge, 2006.
  • [107] R. O’Donnell and X. Wu. Explicit near-fully X-Ramanujan graphs, 2020. Preprint arXiv:2009.02595.
  • [108] F. Parraud. On the operator norm of non-commutative polynomials in deterministic matrices and iid Haar unitary matrices, 2021. Preprint arXiv:2005.13834.
  • [109] F. Parraud. Asymptotic expansion of smooth functions in deterministic and iid Haar unitary matrices, and application to tensor products of matrices, 2023. Preprint arXiv:2302.02943.
  • [110] F. Parraud. Asymptotic expansion of smooth functions in polynomials in deterministic matrices and iid GUE matrices. Comm. Math. Phys., 399(1):249–294, 2023.
  • [111] F. Parraud. The spectrum of a tensor of random and deterministic matrices, 2024. Preprint arXiv:2410.04481.
  • [112] S. Péché. The largest eigenvalue of small rank perturbations of Hermitian random matrices. Probab. Theory Related Fields, 134(1):127–173, 2006.
  • [113] J. Peterson and A. Thom. Group cocycles and the ring of affiliated operators. Invent. Math., 185(3):561–592, 2011.
  • [114] G. Pisier. A simple proof of a theorem of Kirchberg and related results on CC^{*}-norms. J. Operator Theory, 35(2):317–335, 1996.
  • [115] G. Pisier. Introduction to operator space theory, volume 294 of London Mathematical Society Lecture Note Series. Cambridge University Press, Cambridge, 2003.
  • [116] G. Pisier. Random matrices and subexponential operator spaces. Israel J. Math., 203(1):223–273, 2014.
  • [117] G. Pisier. On a linearization trick. Enseign. Math., 64(3-4):315–326, 2018.
  • [118] R. T. Powers. Simplicity of the CC^{\ast}-algebra associated with the free group on two generators. Duke Math. J., 42:151–156, 1975.
  • [119] I. Rivin and N. T. Sardari. Quantum chaos on random Cayley graphs of SL2[/p]{\rm SL}_{2}[\mathbb{Z}/p\mathbb{Z}]. Exp. Math., 28(3):328–341, 2019.
  • [120] P. Sarnak. Selberg’s eigenvalue conjecture. Notices Amer. Math. Soc., 42(11):1272–1277, 1995.
  • [121] H. Schultz. Non-commutative polynomials of independent Gaussian random matrices. The real and symplectic cases. Probab. Theory Related Fields, 131(2):261–309, 2005.
  • [122] A. Song. Random harmonic maps into spheres, 2025. Preprint arxiv:2402.10287v2.
  • [123] T. Tao. Expansion in finite simple groups of Lie type, volume 164 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2015.
  • [124] A. Terras. Zeta functions of graphs, volume 128 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 2011. A stroll through the garden.
  • [125] J. A. Tropp. Second-order matrix concentration inequalities. Appl. Comput. Harmon. Anal., 44(3):700–736, 2018.
  • [126] D. Voiculescu. Limit laws for random matrices and free products. Invent. Math., 104(1):201–220, 1991.
  • [127] D. Voiculescu. Around quasidiagonal operators. Integral Equations Operator Theory, 17(1):137–149, 1993.
  • [128] D.-V. Voiculescu, N. Stammeier, and M. Weber, editors. Free probability and operator algebras. Münster Lectures in Mathematics. European Mathematical Society (EMS), Zürich, 2016. Lecture notes from the masterclass held in Münster, September 2–6, 2013.
  • [129] E. P. Wigner. Random matrices in physics. SIAM Review, 9(1):1–23, 1967.