Global Universality of Singular Values in Products
of Many Large Random Matrices
Abstract
We study the singular values (and Lyapunov exponents) for products of independent random matrices with i.i.d. entries. Such matrix products have been extensively analyzed using free probability, which applies when at fixed , and the multiplicative ergodic theorem, which holds when while remains fixed. The regime when simultaneously is considerably less well understood, and our work is the first to prove universality for the global distribution of singular values in this setting. Our main result gives non-asymptotic upper bounds on the Kolmogorov-Smirnoff distance between the empirical measure of (normalized) squared singular values and the uniform measure on that go to zero when at any relative rate. We assume only that the distribution of matrix entries has zero mean, unit variance, bounded fourth moment, and a bounded density. Our proofs rely on two key ingredients. The first is a novel small-ball estimate on singular vectors of random matrices from which we deduce a non-asymptotic variant of the multiplicative ergodic theorem that holds for growing matrix size . The second is a martingale concentration argument, which shows that while Lyapunov exponents at large are not universal at fixed matrix size, their empirical distribution becomes universal as soon as the matrix size grows with .
1 Introduction
This article concerns the distribution of singular values for products of independent random matrices
(1) |
with entries of drawn i.i.d. from a fixed distribution . We assume satisfies the following
Condition 1.
The probability measure has zero mean, unit variance, a finite fourth moment , and a density with respect to the Lebesgue measure bounded above by .
Our main result, Theorem˜1, is a quantitative universality result for
the empirical distribution of rescaled singular values of .
Theorem 1.
Under Condition˜1, there exist constants depending on with the following property. For all , if and , then
(2) |
where is the Kolmogorov-Smirnoff distance and is the uniform distribution on .
Theorem˜1 is the first universality result for which holds for a general class of distributions regardless of the relative size of . It guarantees that as soon as matches the first two moments of a standard Gaussian (and has bounded fourth moment and norm), the empirical singular value distribution is close to the case when is a (real) standard Gaussian (see Theorem 1.2 in [19]). Our result also extends to complex distributions with independent real and imaginary parts (see Section˜4). The nature of the universality underlying Theorem˜1, however, is unusual in the following two senses:
-
•
Requirement of bounded density. Universality is a hallmark of random matrix theory in the regime where the matrix size tends to infinity. It is uncommon in such universality results to include the hypothesis that has a bounded density. But such an assumption is essential in our setting because we are interested in the setting of growing . For example, the matrix product has rank one with positive probability if contains an atom and is exponential in . In the ergodic limit of fixed and diverging , moreover, the limiting empirical measure of singular values is known to be non-universal and depends on , see e.g. [22, 2, 4]. This is in contrast the free probability limit where it is typical to consider polynomials of fixed degree (e.g. fixed ) evaluated at a collection of random matrices with (see e.g. [16, 31, 12]).
-
•
One global shape, many local shapes. For many classical random matrix ensembles, universality holds not just for the global distribution of eigenvalues or singular values but persists also at the microscopic scale where consecutive eigenvalues or singular values remains order apart as . In our setting, however, even in the simplest case when is a standard complex Gaussian, the local distribution depends on the limiting value of [2, 25]. The relative size of therefore determines the local statistics but does not impact the global properties of . It remains open both to determine at what scale the local distribution of singular values begins to depend on and whether the local limits, derived using methods from integrable systems, are universal (see [18] for some partial progress).
While the effect of simultaneously large on the statistics is still far from understood, prior work showed that the global distribution of singular values converges to if one either first takes and then or vice versa [20, 21]. These articles use very different tools: the work [21], which first takes , relies on free probability while [20] uses the multiplicative ergodic theorem to analyze what happens when one first takes .
Neither free probability nor ergodic techniques are simple to make effective when both and are large but finite. To make progress in this direction the article [19] used small ball probabilities to quantify, at finite , the rate of the convergence in the multiplicative ergodic theorem and obtain a sharper version of Theorem˜1 in the special case when is the standard (real) Gaussian (see Section˜4 for a discussion of our optimality). We take a similar approach. The core difference is that the distribution of the individual matrices is no longer isotropic (invariant under left or right rotations). As we explain in Section˜3, this means we must obtain new small ball probabilities for the inner product between a fixed -frame in and the projection onto the span of the top singular vectors for .
Outline of Remainder of the Article.
The rest of this article is organized as follows. First, in Section˜2, we give a more thorough review of the relation between our results and prior work. Then in Section˜3 we state the main results needed to prove Theorem˜1. We make some further remarks on the results as well as future directions in Section˜4. The remaining proofs of these results are provided in Section˜6, after a brief recall of auxiliary technical results needed in Section˜5.
2 Related works
Products of random matrices are a vast subject. We provide here some representative references, focusing mainly on work in which the number of matrices, , is large or growing.
The setting where the matrix size is fixed while the the number of terms in the matrix product grows has attracted much interest starting from the seminal work of Furstenberg [11] and later of Oseledec [32] on the multiplicative ergodic theorem. Particularly relevant to the present article are the works of Newman [29] and Isopsi-Newman [20]. Since then, the study of Lyapunov exponents of random matrix products has found applications to the study of random Schödinger operators [8], number theory and dynamics [23, 28], and beyond [42, 6].
Matrix products when while is potentially large but fixed have also been extensively studied. For instance, classical results in free probability concern the spectrum of products of a fixed number of (freely) independent matrices [27, 40, 30]. In this vein, the articles [38, 21] both use tools from free probability to obtain the analog of Theorem˜1 in the setting where first and then . Prior work has also taken up a non-asymptotic analysis of eigenvalues [16, 12] for such matrix products as well as the local distribution of their singular values [26].
The setting when simultaneously grow is less well understood but has nonetheless attracted significant interest in recent years. For example, we point the reader to a beautiful set of the articles that use techniques from integrable systems and integrable probability to study singular values for products of iid complex Ginibre matrices and related integrable matrix ensembles. These include the works [2, 1, 3, 7, 10, 9] which, at a physics level of rigor, were the first to analyze the asymptotic distribution of singular values for products of iid complex Ginibre matrices. Some of the results in the preceding articles were proved rigorously in [25]. We also point the interested reader to [5, 14] another perspective on how to use techniques from integrable probability to study such matrix products. The study of the singular values of when are both large has also received attention due to its connection with the spectrum of input-output Jacobians in randomly initialized neural networks [33, 34, 35, 18, 17].
3 Main ideas and proof outline
As we will explain in this section, there are three key steps in the proof of Theorem˜1. To present them, let us agree on some notations. We write for the space of -frames in (i.e. orthonormal systems of vectors). For any matrix , we write: the product of top singular values. For any matrix we thus have
(3) |
following the Gram identity. Unless specified otherwise, all constants are finite, positive, and may depend on , the moments in Condition˜1.
Step 1: From many singular values to the top singular value.
To study the singular values of , it will be convenient to study their partial products:
(4) |
where the first equality is a definition and the second equality follows from standard linear algebra. The representation on the right of (4) recasts the product of the top singular values of as the top singular value for the action of on the space of -frames in . This is useful since analyzing the top singular value, or equivalently top Lyapunov exponent, is a natural and well-studied way to understand the long time behavior of a dynamical system. This is precisely the philosophy of most prior work in the regime where (see e.g. [20, 11, 19, 24]).
Step 2: Removing the supremum in (4).
One advantage of the representation on the right of (4) is that each term inside the supremum can naturally be thought of as simple sub-multiplicative functional of the state of a random dynamical system after steps. In this analogy, the frame determines the initial condition and the evolution from time to time consists of multiplying by .
The second and most important technical step in our proof of Theorem˜1 is to show that once is large, we can approximately drop the supremum over frames in (4). That this is possible is a key conceptual insight that goes back to [11], which showed that for a wide range of entry distributions , we have
on fixed . The previous displayed equation is a consequence of the multiplicative ergodic theorem, which guarantees that as the supremum, the average, and the pointwise behavior of is the same for almost every frame . Since we seek to describe the distribution of singular values of when is finite, we will need a quantitative version of this result. This is the content of Proposition˜1, which is more conveniently phrased in terms of the Lyapunov exponents of :
Proposition 1 (Reduction from sup norm to pointwise norm).
Denote by the -frame whose columns are the first standard basis vectors of . Then, assuming Condition˜1 holds, there exist constants depending only on , such that for any :
(5) |
for all .
We prove Proposition˜1 in Section˜6.1 and emphasize here only the main ideas. The key observation is that the estimate (5) follows from proving that the subspace spanned by the top singular vectors of is “well-spread” on the Grassmannian with high probability when . To explain this, let us consider the simple but illustrative case of . Our goal is then to obtain a lower bound for
where . If are the singular values of and are the corresponding right and left singular vectors, then
and we have
(6) |
Obtaining lower bounds on is the same as obtaining small ball probabilities for around the orthogonal complement to . Repeating this argument for general shows that Proposition˜1 will follow from the statement that “the distribution of the top singular vectors does not concentrate on a fixed co-dimension subspace”. See Lemma˜11 for the key result to this end.
Prior work [10, 9, 2, 5, 15, 19] assumed that the matrices are rotationally invariant and hence that the law of right singular vectors is Haar for every . In particular, when this implies
for a universal constant . The main technical difficulty in our present setting is that we do not know how to characterize the (joint) distribution of the top singular vectors of , even when . Nonetheless, we obtain small ball probabilities for in Section˜6.1 relying only on small ball probabilities for the entrywise measure .
Finally, we mention that the frame from Proposition˜1 does not have to be the canonical frame. A different proof shows, that a slightly weaker version of (5) holds for any (see eq.˜20). This yields seemingly new information on the problem of studying singular vectors of , which we believe is of independent interest. We defer this discussion to Section˜6.1.1.
Step 3: Doob decomposition and concentration for in (4).
In light of Proposition˜1, estimating the partial products comes down to bounding the “point-wise” norms for a fixed frame , which is done in the following
Proposition 2.
Assuming Condition˜1, there exist constants depending on such that for any , any , and for all :
(7) |
We prove Proposition˜2 in Section˜6.2. The main idea is to express as an average. For this, let and define inductively through the singular value decomposition of :
Then, recalling (3) and noting that is a measurable function of , a simple computation gives the following equality in distribution
(8) |
A direct computation now shows that given any we have
(9) |
These expectations determine the constants around which concentrates in Proposition˜1. The result then follows from an Azuma-type concentration inequality for random variables with sub-exponential tails (see Lemma˜9).
Combining everything together.
Putting Proposition˜1 and Proposition˜2 we get for that
(10) |
Together with some elementary algebra in Section˜6.3, we are able to control the cumulative distribution function for the empirical measure of singular values with high probability.
4 Discussions
In this section, we discuss some extensions and limitations of the present work.
Dependence on .
We begin by briefly remarking on the dependence in Theorem˜1 and relation (10) on . In particular, for fixed , consider an i.i.d. sequence of and let one has:
where . By Borel-Cantelli lemma, this implies that with probability one,
The rate can be seen as a Berry-Esseen type bound, see also [19, Section 1.2]. This suggests that our dependence on is at least comparable to standard CLT rates. However, since we require in (10) that (even when ), is unclear whether the dependence on is optimal. This dependence, unfortunately, cannot be improved significantly based on current techniques. To illustrate this, consider . The dependence of the mean on can be shown to be (i.e. there exists different ’s such that the expectation differs by ). The studies of more fine-grained behaviors of Lyapunov exponents in the ergodic regime (when universality does not hold for ) are left open to future works.
Extension to complex matrices.
While our proofs are formulated for real matrices, we remark that all results can be directly extended to complex random variables under the following assumptions on :
Condition 2.
Suppose that each entry of the ’s are i.i.d. drawn from a distribution , where and are independent (real) random variables satisfying (1) zero mean and unit variance: ; (2) finite fourth moment ; and (3) density bounded above .
Extending both Proposition˜1 and Proposition˜2 mainly requires replacing transpose with conjugate transpose and absolute values with norms does not require much conversions at all. The only nontrivial technical difference lies in transforming existing small ball estimates for real random variables (which appears in Lemma˜4) to their complex analog. This can be done by noting that a complex frame can be converted into a real frame (and defining the density of a complex distribution to be the density of the distribution of their canonical real decomposition in ). As a result, Theorem˜1 and (10) holds under Condition˜2 as well with different set of constants (versus Condition˜1). Moreover, in fact, our proof only needs that ’s are independent and distributed according to where is any fixed orthogonal matrix and for some satisfying prescribed conditions.
Dependence on constants .
A careful analysis of the proof shows that the constants appearing in the statement of Theorem˜1 can be taken to depend on constants as follows
(11) |
where is arbitrary but fixed and the implicit constants in the big-O terms are universal.
5 Review of auxiliary technical results
Before we complete all the deferred proofs, we collect below we several technical results use in our main proofs and establish some notation.
Notation.
We use to denote equivalence in distribution. Unless specified otherwise, we use to denote the minimum and the maximum of two numbers. We denote and . When not specified otherwise, we write (for a matrix ) as the -th row of for and as the submatrix for . Furthermore, we denote be the ordered singular values of any matrix .
An isotropic inequality for right product with random uniform frames.
We will examine the effect of a “uniformly random” frame being applied on any matrix.
Lemma 1 (See also Section 9 in [19]).
There exists a constant with the following property. Suppose is sampled from the Haar measure on . For any invertible matrix
Proof.
Note that there exists such that:
so
and hence
where is a fixed frame. The rest follows from Corollary 9.4 and equation (9.1) in [19]. ∎
Sub-multiplicativity for products of top singular values.
We will use the following result, which shows that the norm is sub-multiplicative.
Lemma 2 (See also [13]).
For any two matrices , one has:
(12) |
Proof.
For any two matrices and frames such that , the standard SVD gives: and where so:
which concludes our claim. ∎
Useful results on small-ball probability.
To control the small-ball density of projections, we will use the following result. In fact, this is the only place in which we needed bounded Lebesgue density in Condition˜1.
Lemma 3 (Theorem 1.1 of [36]).
Let where are real-valued independent random variables. Assume that the densities of are bounded by almost everywhere. Let be an orthogonal projection from onto . Then the density of the random vector is bounded by almost everywhere, where is a positive absolute constant. Furthermore, when and is a vector with norm 1, the max density of is at most .
As a corollary, we can show that
Lemma 4.
Let where are real-valued independent random variables. Assume that the densities of are bounded by almost everywhere. Let be a orthogonal projection in onto . Then
where is an absolute constant.
Proof.
Consider the Lebesgue measure on a -dimensional ball with radius . Its volume satisfies:
where is a universal constant independent of . Hence, via Lemma˜3 we get:
for some universal . ∎
Useful results on sub-exponential random variables.
We collect here some simple results on sub-exponential random variables. We begin with an elementary result:
Lemma 5.
If a random variable with constants and such that:
for all then for all .
Proof.
Since probabilities are always bounded above by 1, this can be easily verified via checking:
for all . ∎
We now recall the usual equivalent definitions of a sub-exponential random variable.
Lemma 6 (Sub-exponential properties, Proposition 2.7.1 in [41]).
Let be a random variable. Then the following properties are equivalent: the parameters appearing in these properties differ from each other by at most an absolute constant factor.
-
1.
The tails of satisfy
-
2.
The moments of satisfy
-
3.
The MGF of satisfies
-
4.
The MGF of is bounded at some point, namely
Moreover, if then previous properties are also equivalent to the following one: the MGF of satisfies
for all such that .
Lemma 7 (Sub-exp properties for almost centered random variables).
There exists a universal , such that the following holds. If a random variable satisfies for some constants :
then
Moment inequalities for almost-martingale stochastic processes
Lemma 8 (See also [37]).
Suppose there is a stochastic process along with a filtration and that for all and some function one has:
for a sequence of positive numbers , then for any fixed :
Proof.
We use the law of total expectations multiple times
∎
Lemma 9.
There exists a universal such that the following holds. For stochastic process on a filtration where for all , with probability one over , the next item satisfies:
for some . Then, for any ,
(13) |
An inequality concerning the partial sums of ’s.
We adapt two inequalities from prior literature that will be useful. Their proofs involve converting sums into respective integrals.
Lemma 10 (Adapted from the proof of Lemma 12.1 in [19]).
Fix positive integers satisfying . If , then
and (even for )
6 Remainder of the proofs
6.1 Remaining proofs in Step 2
The key estimate result (5), is follows immediately once we show that (recall (3) and (4))
(14) |
for . We prove this estimate by first obtaining an upper bounding the inverse moment of the determinant of (and thus also a small-ball estimate this determinant).
Lemma 11.
For all , there exists a constant with the following property. For all and , where satisfies:
where is the upper bound for the density of in condition˜1.
Proof.
For matrix and vector we define . Note that
where for each fixed we’ve set
Note that the rows of are projections of the rows of of onto the column space of . Hence, we may also write
(15) |
Note that is independent of . Hence, by Lemma˜8, the conclusion of Lemma˜11 will follow once we show that there exists such that for all (and frame ):
To obtain this estimate, we will show that
(16) | ||||
(17) |
where is some universal constant such that . To obtain (16) and (17), we return to (15) and denote by a frame consisting an orthonormal basis for the orthogonal complement to . We then have
(18) |
where is independent of . Since , we have has
Moreover, by Lemma˜4,
for some universal constant , which we assume is sufficiently large that . In particular, we have that
and hence that
This confirms (16). Next, to verify (17), note that since is a convex function, there exists a finite such that for all it is bounded above by it’s second order Taylor expansion around :
The right hand side is always positive and hence
(19) |
To deduce (17) we must therefore show that the expression on the right is . For this observe, recall from (18) that
Write
where we emphasize the dependence on the distribution . Note that is a degree four polynomial in the ’s and that
Since the first two moments of are the same as those of a standard Gaussian we therefore find
Moreover, writing for a chi-squared distribution with degrees of freedom we find
Hence,
and so
When combined with (19) this yields Therefore , and
where suffices. This verifies (17) and completes the proof. ∎
Completion of Proof of Proposition˜1.
Given the above tools, we are now in place to show Proposition˜1, for which we only needed to establish (14). To do this, note that via Lemma˜2:
and that there exists via the SVD of such that
and hence the determinant is
Thus,
where is the un-normalized random matrix. To complete the derivation of (14), it now suffices to show that there exists constants such that for (see Lemma˜1 for the exact definition):
are all at most
whenever . We bound these probabilities separately below:
-
1.
We consider, for any full-rank , with randomness over :
This quantity is bounded directly by Lemma˜1 which states that the above objective is at most:
for any and some universal constant . This means that
for a universal .
- 2.
-
3.
For being the truncated identity, by Lemma˜11, there exists such that:
holds for any directly, since .
Combining these three points along with a union bound concludes our proof of (14) and Proposition˜1.
6.1.1 A different proof without restricting
Of perhaps separate interest, we show a similar result to Proposition˜1 without restricting on to be the truncated identity. Specifically, we show that:
Proposition 3.
Assuming Condition˜1, there exist constants depending only on , such that for any and any :
(20) |
for all .
Following the exact same recipe for the proof in Proposition˜1, the only distinction lies in the following lemma, which we find interesting in its own rights.
Lemma 12.
For any fixed frames and where satisfies Condition˜1, one has that for all :
Proof.
First, we look at what linear transformations we can do to while preserving :
-
•
Adding a constant multiple of one column to another column. This, by definition, does not change the determinant.
-
•
Row exchanges. This changes the determinant but preserves the law of as the law of is invariant with respect to row and column exchanges.
Note that for any frame, these two operations allow us to turn them into the form:
where , by following the algorithm below. Algorithm: Diagonalizing frame column by column. 1. Initialize . For , do the following to get from : (1) row-exchanging the argmax (of absolute value) entry row of the -th column to the -th row; (2) use column elimination (adding appropriate scalar multiples of the -th column) to make the -th row all zero except at the -th column. 2. Output .
To analyze this procedure, note that under time , the norm of the -th column is at least 1 because (ignoring row exchanges which are irrelevant) it has only been added a linear combination of the first columns which are all orthogonal to itself. Hence, the arg-max absolute value is at least at time . Hence, the result must have the top submatrix diagonalized with diagonal entries at least .
Note that we can take any matrices that have bounded determinant and left (right) multiply to our product. Let two matrices be
then both and share the form of Note that since
we only need to study the small ball probability for
(21) |
To analyze this determinant, we need the following result.
Lemma 13.
Under Condition˜1, let . Fix any , one has:
Proof.
Note this simple observation (Lemma 5.1, [39]): let be any matrix and then the -th row of satisfies:
and this is simply because always for any . Hence,
To use a union bound on the rows of , we only need to show that for any fixed ,
This is because, fix and only consider . Let the unit vector of null space (which is independent with ) be then:
where are -measurable. The probability that this is small is directly concluded Lemma˜3. ∎
To complete the proof, let us write the product as a linear combination of
where . Note that for where denotes the rank-1 matrix with only the th entry being 1 and else 0. Hence, conditioned on the irrelevant entries (treating them as constant) we get
6.2 Remaining proofs in Step 3: Derivation of Proposition˜2
Our goal in this section is to derive Proposition˜2. As mentioned in Section˜3 the main idea is to express the norm as an average. To do this, we repeatedly use the SVD yields to obtain an alternative representation of as follows. First, let and be defined via (for ) the singular value decomposition of as:
for and . Then can be also written as (recall (3)):
Since is independent with for all , we only need to study the objective in (8):
Let us define the product matrix where each rows are independent (conditioned on ) from the following law
with for all . Thus, for any submatrix (indexed by a row subset of size ), the expected determinant squared (by independence of rows) is exactly
This gives
To complete the proof of Proposition˜2 note that
(22) |
where the upper tail follows from Markov’s inequality. Observe thatLemma˜11 gives the following inverse moment bound:
for some constant depending only on . Therefore, one can apply Jensen’s inequality on to get:
This gives the following bound on the lower tail:
(23) |
To conclude the proof of Proposition˜2, we combine (8), (23) and (13) in Lemma˜9 with
to see that the conditions for Lemma˜9 hold with and . We get as a result that for some constants depending only on and for all :
This concludes the proof of (7).
6.3 Completion of Proof of Theorem˜1
We are now in a position to complete the proof of Theorem˜1. For this, note that (10) follows immediately when combining (5), (7), and a union bound. Given this, we can check Theorem˜1 via a union bound as follows.
By twisting constants (multiplying by universal constants) in (2), we may assume that and that . Note that under the given conditions of , one has that (via (10)):
if we pick large enough constants . As a result, a union bound may be applied such that
for some . In fact, we will show that so long as:
(24) |
holds for all , one has that for all :
(25) |
which, if true, concludes the proof of Theorem˜1 (by, again, twisting constants). First let us check two basic inequalities following from Lemma˜10 directly. In particular, it follows that for any one has:
(26) |
Furthermore for any , we also have
(27) |
To show (25), it suffices to check that for and where is an integer. The reason is that is non-decreasing. Hence if where then
and if . The case for is trivial. Suppose where is an integer. We will show that for .
The case of
The case of , where is an integer
Suppose
Then . Again, if , then we are already done.
- •
- •
Our proof is concluded.
References
- AB [12] Gernot Akemann and Zdzislaw Burda. Universal microscopic correlation functions for products of independent ginibre matrices. Journal of Physics A: Mathematical and Theoretical, 45(46):465201, 2012.
- ABK [14] Gernot Akemann, Zdzislaw Burda, and Mario Kieburg. Universal distribution of lyapunov exponents for products of ginibre matrices. Journal of Physics A: Mathematical and Theoretical, 47(39):395202, 2014.
- ABK [19] Gernot Akemann, Zdzislaw Burda, and Mario Kieburg. From integrable to chaotic systems: Universal local statistics of lyapunov exponents. EPL (Europhysics Letters), 126(4):40001, 2019.
- AEV [23] Artur Avila, Alex Eskin, and Marcelo Viana. Continuity of the lyapunov exponents of random matrix products. arXiv preprint arXiv:2305.06009, 2023.
- Ahn [22] Andrew Ahn. Fluctuations of -jacobi product processes. Probability Theory and Related Fields, 183(1):57–123, 2022.
- AJM+ [95] Ludwig Arnold, Christopher KRT Jones, Konstantin Mischaikow, Geneviève Raugel, and Ludwig Arnold. Random dynamical systems. Springer, 1995.
- AKMP [19] Gernot Akemann, Mario Kieburg, Adam Mielke, and Tomaž Prosen. Universal signature from integrability to chaos in dissipative open quantum systems. Physical review letters, 123(25):254101, 2019.
- B+ [12] Philippe Bougerol et al. Products of random matrices with applications to Schrödinger operators, volume 8. Springer Science & Business Media, 2012.
- BLS [13] Zdzislaw Burda, Giacomo Livan, and Artur Swiech. Commutative law for products of infinitely large isotropic random matrices. Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 88(2):022107, 2013.
- BNS [12] Zdzislaw Burda, Maciej A Nowak, and Artur Swiech. Spectral relations between products and powers of isotropic random matrices. Physical Review E—Statistical, Nonlinear, and Soft Matter Physics, 86(6):061137, 2012.
- FK [60] Harry Furstenberg and Harry Kesten. Products of random matrices. The Annals of Mathematical Statistics, 31(2):457–469, 1960.
- GJ [21] Friedrich Götze and Jonas Jalowy. Rate of convergence to the circular law via smoothing inequalities for log-potentials. Random Matrices: Theory and Applications, 10(03):2150026, 2021.
- GN [50] Izrail Moiseevich Gel’fand and Mark Aronovich Naimark. The relation between the unitary representations of the complex unimodular group and its unitary subgroup. Izvestiya Rossiiskoi Akademii Nauk. Seriya Matematicheskaya, 14(3):239–260, 1950.
- GS [18] Vadim Gorin and Yi Sun. Gaussian fluctuations for products of random matrices. arXiv preprint arXiv:1812.06532, 2018.
- GS [22] Vadim Gorin and Yi Sun. Gaussian fluctuations for products of random matrices. American Journal of Mathematics, 144(2):287–393, 2022.
- GT [10] Friedrich Götze and Alexander Tikhomirov. On the asymptotic spectrum of products of independent random matrices. arXiv preprint arXiv:1012.2710, 2010.
- Han [18] Boris Hanin. Which neural net architectures give rise to exploding and vanishing gradients? In Advances in Neural Information Processing Systems, 2018.
- HN [20] Boris Hanin and Mihai Nica. Products of many large random matrices and gradients in deep neural networks. Communications in Mathematical Physics, 376(1):287–322, 2020.
- HP [21] Boris Hanin and Grigoris Paouris. Non-asymptotic results for singular values of gaussian matrix products. Geometric and Functional Analysis, 31(2):268–324, 2021.
- IN [92] Marco Isopi and Charles M. Newman. The triangle law for lyapunov exponents of large random matrices. Communications in Mathematical Physics, 143(3):591–598, 1992.
- Kar [08] Vladislav Kargin. Lyapunov exponents of free operators. Journal of Functional Analysis, 255(8):1874–1888, 2008.
- Kar [14] Vladislav Kargin. On the largest lyapunov exponent for products of gaussian matrices. Journal of Statistical Physics, 157(1):70–83, 2014.
- KZ [97] Maxim Kontsevich and Anton Zorich. Lyapunov exponents and hodge theory. arXiv preprint hep-th/9701164, 1997.
- LP [06] Émile Le Page. Théorèmes limites pour les produits de matrices aléatoires. In Probability Measures on Groups: Proceedings of the Sixth Conference Held at Oberwolfach, Germany, June 28–July 4, 1981, pages 258–303. Springer, 2006.
- LWW [18] Dang-Zheng Liu, Dong Wang, and Yanhui Wang. Lyapunov exponent, universality and phase transition for products of random matrices. arXiv preprint arXiv:1810.00433, 2018.
- LWZ [19] Dang-Zheng Liu, Dong Wang, and Lun Zhang. Bulk and soft-edge universality for singular values of products of ginibre random matrices. Annals Henri Poincare, 55(1):98–126, 2019.
- MS [17] James A Mingo and Roland Speicher. Free probability and random matrices, volume 35. Springer, 2017.
- MT [02] Howard Masur and Serge Tabachnikov. Rational billiards and flat structures. In Handbook of dynamical systems, volume 1, pages 1015–1089. Elsevier, 2002.
- New [86] Charles M Newman. The distribution of lyapunov exponents: exact results for random matrices. Communications in mathematical physics, 103(1):121–126, 1986.
- NS [06] Alexandru Nica and Roland Speicher. Lectures on the combinatorics of free probability, volume 13. Cambridge University Press, 2006.
- OS [10] Sean O’Rourke and Alexander Soshnikov. Products of independent non-hermitian random matrices. 2010.
- Ose [68] Valery Iustinovich Oseledec. A multiplicative ergodic theorem, lyapunov characteristic numbers for dynamical systems. Transactions of the Moscow Mathematical Society, 19:197–231, 1968.
- PSG [17] Jeffrey Pennington, Samuel Schoenholz, and Surya Ganguli. Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice. In Advances in neural information processing systems, pages 4788–4798, 2017.
- [34] Jeffrey Pennington, Samuel Schoenholz, and Surya Ganguli. The emergence of spectral universality in deep networks. In International Conference on Artificial Intelligence and Statistics, pages 1924–1932. PMLR, 2018.
- [35] Jeffrey Pennington, Samuel S. Schoenholz, and Surya Ganguli. The emergence of spectral universality in deep networks. In International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 9-11 April 2018, Playa Blanca, Lanzarote, Canary Islands, Spain, pages 1924–1932, 2018.
- RV [15] Mark Rudelson and Roman Vershynin. Small ball probabilities for linear images of high-dimensional distributions. International Mathematics Research Notices, 2015(19):9594–9617, 2015.
- Sha [11] Ohad Shamir. A variant of azuma’s inequality for martingales with subgaussian tails. arXiv preprint arXiv:1110.2392, 2011.
- Tuc [10] Gabriel H Tucci. Limits laws for geometric means of free random variables. Indiana University mathematics journal, pages 1–13, 2010.
- TV [10] Terence Tao and Van Vu. Random matrices: The distribution of the smallest singular values. Geometric And Functional Analysis, 20:260–297, 2010.
- VDN [92] Dan V Voiculescu, Ken J Dykema, and Alexandru Nica. Free random variables, volume 1. American Mathematical Soc., 1992.
- Ver [18] Roman Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018.
- Wil [17] Amie Wilkinson. What are lyapunov exponents, and why are they interesting? Bulletin of the American Mathematical Society, 54(1):79–105, 2017.