This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

When random tensors meet random matrices

Mohamed El Amine Seddiklabel=e1]mohamed.seddik@huawei.com [    Maxime Guillaudlabel=e2]maxime.guillaud@huawei.com [    Romain Couilletlabel=e3]romain.couillet@univ-grenoble-alpes.fr [ Mathematical and Algorithmic Sciences Laboratory, Huawei Technologies France,
Université Grenoble Alpes, CNRS, GIPSA-lab, Grenoble INP,
Abstract

Relying on random matrix theory (RMT), this paper studies asymmetric order-dd spiked tensor models with Gaussian noise. Using the variational definition of the singular vectors and values of (Lim, 2005) [15], we show that the analysis of the considered model boils down to the analysis of an equivalent spiked symmetric block-wise random matrix, that is constructed from contractions of the studied tensor with the singular vectors associated to its best rank-1 approximation. Our approach allows the exact characterization of the almost sure asymptotic singular value and alignments of the corresponding singular vectors with the true spike components, when nij=1dnjci(0,1)\frac{n_{i}}{\sum_{j=1}^{d}n_{j}}\to c_{i}\in(0,1) with nin_{i}’s the tensor dimensions. In contrast to other works that rely mostly on tools from statistical physics to study random tensors, our results rely solely on classical RMT tools such as Stein’s lemma. Finally, classical RMT results concerning spiked random matrices are recovered as a particular case.

60B20,
15B52,
random matrix theory,
random tensor theory,
spiked models,
keywords:
[class=MSC]
keywords:
\startlocaldefs\endlocaldefs

, and

Notations:

[n][n] denotes the set {1,,n}\{1,\ldots,n\}. The set of rectangular matrices of size m×nm\times n is denoted 𝕄m,n{\mathbb{M}}_{m,n}. The set of square matrices of size nn is denoted 𝕄n{\mathbb{M}}_{n}. The set of dd-order tensors of size n1××ndn_{1}\times\cdots\times n_{d} is denoted 𝕋n1,,nd{\mathbb{T}}_{n_{1},\ldots,n_{d}}. The set of hyper-cubic tensors of size nn and order dd is denoted 𝕋nd{\mathbb{T}}_{n}^{d}. The notation 𝑿𝕋n1,,nd(𝒩(0,1)){\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{d}}(\mathcal{N}(0,1)) means that 𝑿{\bm{\mathsfit{X}}} is a random tensor with i.i.d. Gaussian 𝒩(0,1)\mathcal{N}(0,1) entries. Scalars are denoted by lowercase letters as a,b,ca,b,c. Vectors are denoted by bold lowercase letters as 𝒂,𝒃,𝒄{\bm{a}},{\bm{b}},{\bm{c}}. 𝒆id{\bm{e}}_{i}^{d} denotes the canonical vector in d{\mathbb{R}}^{d} with [𝒆id]j=δij[{\bm{e}}_{i}^{d}]_{j}=\delta_{ij}. Matrices are denoted by bold uppercase letters as 𝑨,𝑩,𝑪{\bm{A}},{\bm{B}},{\bm{C}}. Tensors are denoted as 𝑨,𝑩,𝑪{\bm{\mathsfit{A}}},{\bm{\mathsfit{B}}},{\bm{\mathsfit{C}}}. Ti1,,idT_{i_{1},\ldots,i_{d}} denotes the entry (i1,,id)(i_{1},\ldots,i_{d}) of the tensor 𝑻{\bm{\mathsfit{T}}}. 𝒖,𝒗=iuivi\langle{\bm{u}},{\bm{v}}\rangle=\sum_{i}u_{i}v_{i} denotes the scalar product between 𝒖{\bm{u}} and 𝒗{\bm{v}}, the 2\ell_{2}-norm of a vector 𝒖{\bm{u}} is denoted as 𝒖2=𝒖,𝒖\|{\bm{u}}\|^{2}=\langle{\bm{u}},{\bm{u}}\rangle. \|\cdot\| denotes the spectral norm for tensors. 𝑻(𝒖(1),,𝒖(d))i1,,idui1(1)uid(d)Ti1,,id{\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})\equiv\sum_{i_{1},\ldots,i_{d}}u_{i_{1}}^{(1)}\ldots u_{i_{d}}^{(d)}T_{i_{1},\ldots,i_{d}} denotes the contraction of tensor 𝑻{\bm{\mathsfit{T}}} on the vectors given as arguments. Given some vectors 𝒖(1),,𝒖(k){\bm{u}}^{(1)},\ldots,{\bm{u}}^{(k)} with k<dk<d, the contraction 𝑻(𝒖(1),,𝒖(k),:,,:){\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(k)},:,\ldots,:) denotes the resulting (dk)(d-k)-th order tensor. 𝕊N1{\mathbb{S}}^{N-1} denotes the NN-dimensional unit sphere.

1 Introduction

The extraction of latent and low-dimensional structures from raw data is a key step in various machine learning and signal processing applications. Our present interest is in those modern techniques which rely on the extraction of such structures from a low-rank random tensor model [1] and which extends the ideas from matrix-type data to tensor structured data. We refer the reader to [23, 21, 25] and the references therein which introduce an extensive set of applications of tensor decomposition methods to machine learning, including dimensionality reduction, supervised and unsupervised learning, learning subspaces for feature extraction, low-rank tensor recovery etc. Although random matrix models have been extensively studied and well understood in the literature, the understanding of random tensor models is still in its infancy and the ideas from random matrix analysis do not easily extend to higher-order tensors. Indeed, the resolvent notion (see Definition 1) which is at the heart of random matrices does not generalize to tensors. In our present investigation, we consider the spiked tensor model, which consists in an observed dd-order tensor 𝑻𝕋n1,,nd{\bm{\mathsfit{T}}}\in{\mathbb{T}}_{n_{1},\ldots,n_{d}} of the form

𝑻=β𝒙(1)𝒙(d)+1N𝑿,\displaystyle{\bm{\mathsfit{T}}}=\beta\,{\bm{x}}^{(1)}\otimes\cdots\otimes{\bm{x}}^{(d)}+\frac{1}{\sqrt{N}}{\bm{\mathsfit{X}}}, (1)

where (𝒙(1),,𝒙(d))𝕊n11××𝕊nd1({\bm{x}}^{(1)},\ldots,{\bm{x}}^{(d)})\in{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}, 𝑿𝕋n1,,nd(𝒩(0,1)){\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{d}}(\mathcal{N}(0,1)) and N=i=1dniN=\sum_{i=1}^{d}n_{i}. Note that the tensor noise is normalized by i=1dni\sqrt{\sum_{i=1}^{d}n_{i}} with nin_{i} the tensor dimensions, since the spectral norm of 𝑿{\bm{\mathsfit{X}}} is of order i=1dni\sqrt{\sum_{i=1}^{d}n_{i}} from Lemma 4. One aims at retrieving the rank-1 component (or spike) β𝒙(1)𝒙(d)\beta\,{\bm{x}}^{(1)}\otimes\cdots\otimes{\bm{x}}^{(d)} from the noisy tensor 𝑻{\bm{\mathsfit{T}}}, where β\beta can be seen as controlling the signal-to-noise ratio (SNR). The identification of the dominant rank-1 component is an important special case of the low-rank tensor approximation problem, a noisy version of the classical canonical polyadic decomposition (CPD) [10, 13]. Extensive efforts have been made to study the performance of low rank tensor approximation methods in the large dimensional regime – when the tensor dimensions nin_{i}\to\infty, however considering symmetric tensor models where n1==ndn_{1}=\ldots=n_{d}, 𝒙(1)==𝒙(d){\bm{x}}^{(1)}=\ldots={\bm{x}}^{(d)}, and assuming that the noise is symmetric [17, 20, 14, 9, 12, 8].

In particular, in the matrix case (i.e., d=2d=2), the above spiked tensor model becomes a so-called spiked matrix model. It is well-known that in the large dimensional regime, there exists an order one critical value βc\beta_{c} of the SNR below which it is information-theoretically impossible to detect/recover the spike, while above βc\beta_{c}, it is possible to detect the spike and recover the corresponding components in (at least) polynomial time using singular value decomposition (SVD). This phenomenon is sometimes known as the BBP (Baik, Ben Arous, and Péché) phase transition [2, 5, 7, 19].

In the (symmetric) spiked tensor model for d3d\geq 3, there also exists an order one critical value111Depending on the tensor order dd. We will sometimes omit the dependence on dd if there is no ambiguity. βc(d)\beta_{c}(d) (in the high-dimensional asymptotic) below which it is information-theoretically impossible to detect/recover the spike, while above βc(d)\beta_{c}(d) recovery is theoretically possible with the maximum likelihood (ML) estimator. Computing the maximum likelihood in the matrix case corresponds to the computation of the largest singular vectors of the considered matrix which has a polynomial time complexity, while for d3d\geq 3, computing ML is NP-hard [17, 6]. As such, a more practical phase transition for tensors is to characterize the algorithmic critical value βa(d,n)\beta_{a}(d,n) (which might depend on the tensor dimension nn) above which the recovery of the spike is possible in polynomial time. Richard and Montanari [17] first introduced the symmetric spiked tensor model (of the form 𝒀=μ𝒙d+𝑾𝕋nd{\bm{\mathsfit{Y}}}=\mu{\bm{x}}^{\otimes d}+{\bm{\mathsfit{W}}}\in{\mathbb{T}}_{n}^{d} with symmetric 𝑾{\bm{\mathsfit{W}}}) and also considered the related algorithmic aspects. In particular, they used heuristics to highlight that spike recovery is possible, with Approximate Message Passing (AMP) or the tensor power iteration method, in polynomial time222Using tensor power iteration or AMP with random initialization. provided μnd12\mu\gtrsim n^{\frac{d-1}{2}}. This phase transition was later proven rigorously for AMP by [14, 12] and recently for tensor power iteration by [11].

Richard and Montanari [17] further introduced a method for tensor decomposition based on tensor unfolding, which consists in unfolding 𝒀{\bm{\mathsfit{Y}}} to an nq×ndqn^{q}\times n^{d-q} matrix Mat(𝒀)=μ𝒙𝒚+𝒁\operatorname{Mat}({\bm{\mathsfit{Y}}})=\mu{\bm{x}}{\bm{y}}^{\top}+{\bm{Z}} for q[d1]q\in[d-1], to which a SVD is then performed. Setting q=1q=1, they predicted that their proposed method recovers successively the spike if μnd24\mu\gtrsim n^{\frac{d-2}{4}}. In a very recent paper by Ben Arous et al. [3], a study of spiked long rectangular random matrices333Number of rows mm are allowed to grow polynomially in the number of columns nn, i.e., mn=nα\frac{m}{n}=n^{\alpha}. has been proposed under fairly general (bounded fourth-order moment) noise distribution assumptions. They particularly proved the existence of a critical SNR for which the extreme singular value and singular vectors exhibit a BBP-type phase transition. They applied their result for the asymmetric rank-one spiked model in Eq. (1) (with equal dimensions) using the tensor unfolding method, and found the exact threshold obtained by [17], i.e., βnd24\beta\gtrsim n^{\frac{d-2}{4}} for tensor unfolding to succeed in signal recovery.

For the asymmetric spiked tensor model in Eq. (1), few results are available in the literature (to the best of our knowledge only [3] considered this setting by applying the tensor unfolding method proposed by [17]). This is precisely the model we consider in the present work and our more general result is derived as follows. Given the asymmetric model from Eq. (1), the maximum likelihood (ML) estimator of the best rank-one approximation of 𝑻{\bm{\mathsfit{T}}} is given by

(λ,𝒖(i))=argminλ+,(𝒖(1),,𝒖(d))𝕊n11××𝕊nd1𝑻λ𝒖(1)𝒖(d)F2.\displaystyle(\lambda_{*},{\bm{u}}_{*}^{(i)})=\operatorname*{arg\,min}_{\lambda\,\in\,{\mathbb{R}}^{+},\,({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})\,\in\,{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}}\|{\bm{\mathsfit{T}}}-\lambda\,{\bm{u}}^{(1)}\otimes\cdots\otimes{\bm{u}}^{(d)}\|_{\text{F}}^{2}. (2)

In the above equation, λ\lambda_{*} and the 𝒖(i){\bm{u}}_{*}^{(i)} can be respectively interpreted as the generalization to the tensor case of the concepts of dominant singular value and associated singular vectors [15]. Following variational arguments therein, Eq. (2) can be reformulated using contractions of 𝑻{\bm{\mathsfit{T}}} as

maxi=1d𝒖(i)=1|𝑻(𝒖(1),,𝒖(d))|,\displaystyle\max_{\prod_{i=1}^{d}\|{\bm{u}}^{(i)}\|=1}|{\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})|, (3)

the Lagrangian of which writes as 𝑻(𝒖(1),,𝒖(d))λ(i=1d𝒖(i)1)\mathcal{L}\equiv{\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})-\lambda\left(\prod_{i=1}^{d}\|{\bm{u}}^{(i)}\|-1\right) with λ>0\lambda>0. Hence, the stationary points (λ,𝒖(1),,𝒖(d))(\lambda,{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}), with the 𝒖(i){\bm{u}}^{(i)}’s being unitary vectors, must satisfy the Karush-Kuhn-Tucker conditions, for i[d]i\in[d]

{𝑻(𝒖(1),,𝒖(i1),:,𝒖(i+1),,𝒖(d))=λ𝒖(i),λ=𝑻(𝒖(1),,𝒖(d)).\displaystyle\begin{cases}{\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(i-1)},:,{\bm{u}}^{(i+1)},\ldots,{\bm{u}}^{(d)})=\lambda{\bm{u}}^{(i)},\\ \lambda={\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}).\end{cases} (4)

An interesting question concerns the computation of the expected number of stationary points (local optima or saddle points) satisfying the identities in Eq. (4). [4] studied the landscape of a symmetric spiked tensor model and found that for β<βc\beta<\beta_{c} the values of the objective function of all local maxima (including the global one) tend to concentrate on a small interval, while for β>βc\beta>\beta_{c} the value achieved by the global maximum exits that interval and increases with β\beta. In contrast, very recently Goulart et al. [8] have studied an order 33 symmetric spiked random tensor 𝒀{\bm{\mathsfit{Y}}} using a RMT approach, where it was stated that there exists a threshold 0<βs<βc0<\beta_{s}<\beta_{c} such that for β[βs,βc]\beta\in[\beta_{s},\beta_{c}] there exists a local optimum of the ML problem that correlates with the spike and such local optimum coincides with the global one for β>βc\beta>\beta_{c}. We conjecture that such observations extend to asymmetric spiked tensors, namely that there exists an order one critical value βc\beta_{c} above which the ML problem in Eq. (3) admits a global maximum. As for [8], our present findings do not allow to express such βc\beta_{c} and its exact characterization is left for future investigation. However, for asymmetric spiked random tensors, we also exhibit a threshold βs\beta_{s} such that for β>βs\beta>\beta_{s} there exists a local optimum of the ML objective that correlates with the true spike. Figure 1 provides an illustration of the different thresholds of β\beta and see the last part of Subsection 3.2 for a more detailed discussion.

Refer to caption
Figure 1: Illustration of the different thresholds for the SNR β\beta. Our approach exhibits βs\beta_{s} such that for β>βs\beta>\beta_{s} there exists a local optimum that correlates with the spike, the threshold βc\beta_{c} is unknown for asymmetric tensors and corresponds to the ML phase transition (above βc\beta_{c} the global maximum correlates with the spike), while βa\beta_{a} corresponds to the algorithmic phase transition (recovery of the spike in polynomial time).

Main Contributions

Starting form the conditions in Eq. (4), we provide an exact expression of the asymptotic singular value and alignments 𝒙(i),𝒖(i)\langle{\bm{x}}^{(i)},{\bm{u}}_{*}^{(i)}\rangle, when the tensor dimensions nin_{i}\to\infty with nij=1dnjci(0,1)\frac{n_{i}}{\sum_{j=1}^{d}n_{j}}\to c_{i}\in(0,1), where the tuple (λ,𝒖(1),,𝒖(d))(\lambda_{*},{\bm{u}}_{*}^{(1)},\ldots,{\bm{u}}_{*}^{(d)}) is associated to a local optimum of the ML problem verifying some technical conditions (detailed in Assumption 4). We conjecture that when the SNR β\beta is large enough, there is a unique local optimum verifying Assumption 4 and for which our results characterize the corresponding alignments. We further conjecture that (λ,𝒖(1),,𝒖(d))(\lambda_{*},{\bm{u}}_{*}^{(1)},\ldots,{\bm{u}}_{*}^{(d)}) coincides with the global maximum above some444Such a critical value has been characterized by [12] for symmetric tensors. See [8] for a detailed discussion about this aspect in the case of symmetric tensors. βc\beta_{c} – that needs to be characterized.

Technically, we first show that the considered random tensor 𝑻{\bm{\mathsfit{T}}} can be mapped to an equivalent symmetric random matrix 𝑻𝕄N{\bm{T}}\in{\mathbb{M}}_{N}, constructed through contractions of 𝑻{\bm{\mathsfit{T}}} with d2d-2 directions among 𝒖(1),,𝒖(d){\bm{u}}_{*}^{(1)},\ldots,{\bm{u}}_{*}^{(d)}. Then, leveraging on random matrix theory, we first characterize the limiting spectral measure of 𝑻{\bm{T}} and then provide estimates of the asymptotic alignments 𝒙(i),𝒖(i)\langle{\bm{x}}^{(i)},{\bm{u}}_{*}^{(i)}\rangle. We precisely show (see Theorem 8) that under Assumption 4, for d3d\geq 3, there exists βs>0\beta_{s}>0 such that for β>βs\beta>\beta_{s}

{λa.s.λ(β),|𝒙(i),𝒖(i)|a.s.qi(λ(β)),\displaystyle\begin{cases}\lambda_{*}\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}(\beta),\\ \left|\langle{\bm{x}}^{(i)},{\bm{u}}_{*}^{(i)}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}q_{i}(\lambda^{\infty}(\beta)),\end{cases} (5)

where λ(β)\lambda^{\infty}(\beta) satisfies f(λ(β),β)=0f(\lambda^{\infty}(\beta),\beta)=0 with f(z,β)=z+g(z)βi=1dqi(z)f(z,\beta)=z+g(z)-\beta\prod_{i=1}^{d}q_{i}(z) and

qi(z)=1gi2(z)ci,gi(z)=g(z)+z24ci+(g(z)+z)22,\displaystyle\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}q_{i}(z)=\sqrt{1-\frac{g_{i}^{2}(z)}{c_{i}}},\quad g_{i}(z)=\frac{g(z)+z}{2}-\frac{\sqrt{4c_{i}+(g(z)+z)^{2}}}{2},

g(z)g(z) being the solution to the fixed point equation g(z)=i=1dgi(z)g(z)=\sum_{i=1}^{d}g_{i}(z) (for zz large enough); see Section B.11 for the existence of g(z)g(z). Besides, for β[0,βs]\beta\in[0,\beta_{s}] with555For arbitrary values of cic_{i}’s, the upper bound of λ\lambda^{\infty} can be computed numerically as the minimum non-negative real number zz for which Algorithm 1 converges. ci=1dc_{i}=\frac{1}{d} for all i[d]i\in[d]

λa.s.λ2d1d,|𝒙(i),𝒖(i)|a.s.0,\displaystyle\lambda_{*}\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}\leq 2\sqrt{\frac{d-1}{d}},\quad\left|\langle{\bm{x}}^{(i)},{\bm{u}}_{*}^{(i)}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}0, (6)

that is 𝒖(i){\bm{u}}_{*}^{(i)} (asymptotically) ceases to correlate with 𝒙(i){\bm{x}}^{(i)}.

Remark 1.

Note that qi(z)q_{i}(z) can be equivalently expressed as qi(z)=(αi(z)d3jiαj(z))12d4q_{i}(z)=\left(\frac{\alpha_{i}(z)^{d-3}}{\prod_{j\neq i}\alpha_{j}(z)}\right)^{\frac{1}{2d-4}} with αi(z)=βz+g(z)gi(z)\alpha_{i}(z)=\frac{\beta}{z+g(z)-g_{i}(z)}. Note that such expression is defined for ci[0,1]c_{i}\in[0,1] with d3d\geq 3. See details in Section B.13.

We highlight that in our formulation the threshold βs\beta_{s} corresponds to the minimal value of the SNR β\beta above which the derived asymptotic formulas are algebraically defined, which may differ from the true phase transition βc\beta_{c} of the ML problem: In the case of symmetric tensors, the results form [8] seem to indicate that βs\beta_{s} is slightly below βc\beta_{c} obtained by [12] where the ML problem was studied.

Refer to caption
Figure 2: Asymptotic singular value and alignments as per Eq. (5) when all the dimensions nin_{i} are equal (ci=1dc_{i}=\frac{1}{d} for all i[d]i\in[d]) for different values of the tensor order dd. The matrix case corresponds to setting d=3d=3 and c3=0c_{3}=0 (see Section 4 for more details).

Figure 2 notably depicts the asymptotic alignments as in Eq. (5) when all the tensor dimensions nin_{i} are equal. Since our result characterizes the alignments for d3d\geq 3, it is not possible to recover the matrix case by simply setting d=2d=2 since the equations are not defined for d=2d=2 (see Remark 1). However, the matrix case is recovered by considering an order 33 tensor (d=3d=3) and then taking the limit c30c_{3}\to 0 (equivalent to the degenerate case n3=1n_{3}=1 which results in a spiked random matrix model), see Section 4 for more details. From Figure 2-(b), unlike the matrix case, i.e., d=2d=2, the predicted asymptotic alignments are not continuous for orders d3d\geq 3, this phenomenon has already been observed in the case of symmetric random tensors [12]. In particular, the predicted theoretical threshold βs\beta_{s} in the matrix case d=2d=2 coincides with the classical so-called BBP (Baik, Ben Arous, and Péché) phase transition βc(2)\beta_{c}(2) [2, 5, 7, 19]. Moreover, our result for the matrix case characterizes the asymptotic alignments for the long rectangular matrices studied by [3] and we also recover the threshold βnd24\beta\gtrsim n^{\frac{d-2}{4}} for the case of tensor unfolding method (See remark 9). From a methodological view point, our results are derived based solely on Stein’s lemma without the use of complex contour integrals as classically performed in RMT. In essence, we follow the approach in [8] and further introduce a new object (the mapping Φd\Phi_{d} defined subsequently in Eq. (31)) that simplifies drastically the use of RMT tools.

The remainder of the paper is organized as follows. Section 2 recalls some fundamental random matrix theory tools. In Section 3, we study asymmetric spiked tensors of order 33 where we provide the main steps of our approach. Section 4 characterizes the behavior of spiked random matrices given our result on spiked 33-order tensor models from Section 3. The generalization of our results to arbitrary dd-order tensors is then presented in Section 5. Section 6 discusses the application of our findings to arbitrary rank-kk tensors with mutually orthogonal components. Further discussions are presented in Section 7. In Appendix A, we provide some simulations to support our findings and discuss some algorithmic aspects. Finally, Appendix B provides the proofs of the main developed results.

2 Random matrix theory tools

Before digging into our main findings, we briefly recall some random matrix theory results that are at the heart of our analysis. Specifically, we recall the resolvent formalism which allows to assess the spectral behavior of large symmetric random matrices. In particular, given a symmetric matrix 𝑺𝕄n{\bm{S}}\in{\mathbb{M}}_{n} and denoting λ1(𝑺)λn(𝑺)\lambda_{1}({\bm{S}})\leq\ldots\leq\lambda_{n}({\bm{S}}) its nn eigenvalues with corresponding eigenvectors 𝒖i(𝑺){\bm{u}}_{i}({\bm{S}}) for i[n]i\in[n], its spectral decomposition writes as

𝑺=i=1nλi(𝑺)𝒖i(𝑺)𝒖i(𝑺).\displaystyle{\bm{S}}=\sum_{i=1}^{n}\lambda_{i}({\bm{S}}){\bm{u}}_{i}({\bm{S}}){\bm{u}}_{i}({\bm{S}})^{\top}.

In the sequel we omit the dependence on 𝑺{\bm{S}} by simply writing λi\lambda_{i} and 𝒖i{\bm{u}}_{i} if there is no ambiguity.

2.1 The resolvent matrix

We start by defining the resolvent of a symmetric matrix and present its main properties.

Definition 1.

Given a symmetric matrix 𝐒𝕄n{\bm{S}}\in{\mathbb{M}}_{n}, the resolvent of 𝐒{\bm{S}} is defined as

𝑹𝑺(z)(𝑺z𝑰n)1,z𝒮(𝑺),\displaystyle{\bm{R}}_{\bm{S}}(z)\equiv\left({\bm{S}}-z{\bm{I}}_{n}\right)^{-1},\quad z\in{\mathbb{C}}\setminus\operatorname{\mathcal{S}}({\bm{S}}),

where 𝒮(𝐒)={λ1,,λn}\operatorname{\mathcal{S}}({\bm{S}})=\{\lambda_{1},\ldots,\lambda_{n}\} is the spectrum of 𝐒{\bm{S}}.

The resolvent is a fundamental object since it retrieves the spectral characteristics (spectrum and eigenvectors) of 𝑺{\bm{S}}. It particularly verifies the following property which we will use extensively to derive our main results,

𝑹𝑺(z)=1z𝑰n+1z𝑺𝑹𝑺(z)=1z𝑰n+1z𝑹𝑺(z)𝑺.\displaystyle{\bm{R}}_{\bm{S}}(z)=-\frac{1}{z}{\bm{I}}_{n}+\frac{1}{z}{\bm{S}}{\bm{R}}_{\bm{S}}(z)=-\frac{1}{z}{\bm{I}}_{n}+\frac{1}{z}{\bm{R}}_{\bm{S}}(z){\bm{S}}. (7)

The above identity, coupled with Stein’s Lemma (Lemma 1 in Section 2.3 below), is a fundamental tool used to derive fixed point equations that allow the evaluation of functionals of interests involving 𝑹𝑺(z){\bm{R}}_{\bm{S}}(z). Another interesting property of the resolvent concerns its spectral norm, which we denote by 𝑹𝑺(z)\|{\bm{R}}_{\bm{S}}(z)\|. Indeed, if the spectrum of 𝑺{\bm{S}} has a bounded support, then the spectral norm of 𝑹𝑺(z){\bm{R}}_{\bm{S}}(z) is bounded. This is a consequence of the inequality

𝑹𝑺(z)1dist(z,𝒮(𝑺))\displaystyle\|{\bm{R}}_{\bm{S}}(z)\|\leq\frac{1}{\operatorname{dist}(z,\operatorname{\mathcal{S}}({\bm{S}}))} (8)

where dist(z,)\operatorname{dist}(z,\cdot) denotes the distance of zz to a set. The resolvent encodes rich information about the behavior of the eigenvalues of 𝑺{\bm{S}} through the so-called Stieltjes transform which we describe subsequently.

2.2 The Stieltjes transform

Random matrix theory, originally, aims at describing the limiting spectral measure of random matrices when their dimensions grow large. Typically, under certain technical conditions on 𝑺{\bm{S}}, the empirical spectral measure of 𝑺{\bm{S}} defined as

ν𝑺1ni=1nδλi(𝑺),\displaystyle\nu_{\bm{S}}\equiv\frac{1}{n}\sum_{i=1}^{n}\delta_{\lambda_{i}({\bm{S}})}, (9)

where δx\delta_{x} is a Dirac mass on xx, converges to a deterministic probability measure ν\nu. To characterize such asymptotic measure, the Stieltjes transform (defined below) approach is a widely used tool.

Definition 2 (Stieltjes transform).

Given a probability measure ν\nu, the Stieltjes transform of ν\nu is defined by

gν(z)dν(λ)λz,z𝒮(ν).\displaystyle g_{\nu}(z)\equiv\int\frac{d\nu(\lambda)}{\lambda-z},\quad z\in{\mathbb{C}}\setminus\operatorname{\mathcal{S}}(\nu).

Particularly, the Stieltjes transform of ν𝑺\nu_{\bm{S}} is closely related to its associated resolvent 𝑹𝑺(z){\bm{R}}_{\bm{S}}(z) through the algebraic identity

gν𝑺(z)=1ni=1nδλi(𝑺)(dλ)λz=1ni=1n1λi(𝑺)z=1ntr𝑹𝑺(z)\displaystyle g_{\nu_{\bm{S}}}(z)=\frac{1}{n}\sum_{i=1}^{n}\int\frac{\delta_{\lambda_{i}({\bm{S}})}(d\lambda)}{\lambda-z}=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{\lambda_{i}({\bm{S}})-z}=\frac{1}{n}\operatorname{tr}{\bm{R}}_{\bm{S}}(z)

The Stieltjes transform gνg_{\nu} has several interesting analytical properties, among which, we have

  1. 1.

    gνg_{\nu} is complex analytic on its definition domain 𝒮(ν){\mathbb{C}}\setminus\operatorname{\mathcal{S}}(\nu) and [gν(z)]>0\Im[g_{\nu}(z)]>0 if [z]>0\Im[z]>0;

  2. 2.

    gν(z)g_{\nu}(z) is bounded for z𝒮(ν)z\in{\mathbb{C}}\setminus\operatorname{\mathcal{S}}(\nu) if 𝒮(ν)\operatorname{\mathcal{S}}(\nu) is bounded, since |gν(z)|dist(z,𝒮(ν))1|g_{\nu}(z)|\leq\operatorname{dist}(z,\operatorname{\mathcal{S}}(\nu))^{-1};

  3. 3.

    Since gν(z)=(λz)2𝑑ν(λ)>0g_{\nu}^{\prime}(z)=\int(\lambda-z)^{-2}d\nu(\lambda)>0, gνg_{\nu} is monotonously increasing for zz\in\mathbb{R}.

The Stieltjes transform gνg_{\nu} admits an inverse formula which provides access to the evaluation of the underlying probability measure ν\nu, as per the following theorem.

Theorem 1 (Inverse formula of the Stieltjes transform).

Let a,ba,b be some continuity points of the probability measure ν\nu, then the segment [a,b][a,b] is measurable with ν\nu, precisely

ν([a,b])=1πlimϵ0ab[gν(x+iϵ)]𝑑x.\displaystyle\nu([a,b])=\frac{1}{\pi}\lim_{\epsilon\to 0}\int_{a}^{b}\Im[g_{\nu}(x+i\epsilon)]dx.

Moreover, if ν\nu admits a density function ff at some point xx, i.e., ν(x)\nu(x) is differentiable in a neighborhood of xx with limϵ0ϵ1ν([xϵ/2,x+ϵ/2])=f(x)\lim_{\epsilon\to 0}\epsilon^{-1}\nu([x-\epsilon/2,x+\epsilon/2])=f(x), then we have the inverse formula

f(x)=1πlimϵ0[gν(x+iϵ)].\displaystyle f(x)=\frac{1}{\pi}\lim_{\epsilon\to 0}\Im[g_{\nu}(x+i\epsilon)].

When it comes to large random matrices, the Stieltjes transform admits a continuity property, in the sense that if a sequence of random probability measures converges to a deterministic measure then the corresponding Stieltjes transforms converge almost surely to a deterministic one, and vice-versa. The following theorem from [24] states precisely this continuity property.

Theorem 2 (Stieltjes transform continuity).

A sequence of random probability measures νn\nu_{n}, supported on {\mathbb{R}} with corresponding Stieltjes transforms gνn(z)g_{\nu_{n}}(z), converges almost surely weakly to a deterministic measure ν\nu, with corresponding Stieltjes transform gνg_{\nu} if and only if gνn(z)gν(z)g_{\nu_{n}}(z)\to g_{\nu}(z) almost surely, for all z+i+z\in{\mathbb{R}}+i{\mathbb{R}}_{+}.

In particular, Theorems 1 and 2 combined together allow the description of the spectrum of large symmetric random matrices.

2.3 Gaussian calculations

The following lemma by Stein (also called Stein’s identity or Gaussian integration by parts) allows to replace the expectation of the product of a Gaussian variable with a differentiable function ff by the variance of that variable times the expectation of ff^{\prime}.

Lemma 1 (Stein’s Lemma [22]).

Let X𝒩(0,σ2)X\sim\mathcal{N}(0,\sigma^{2}) and ff a continuously differentiable function having at most polynomial growth, then

𝔼[Xf(X)]=σ2𝔼[f(X)],\displaystyle\operatorname{\mathbb{E}}\left[Xf(X)\right]=\sigma^{2}\operatorname{\mathbb{E}}\left[f^{\prime}(X)\right],

when the above expectations exist.

We will further need the Poincaré inequality which allows to control the variance of a functional of i.i.d.i.i.d. Gaussian random variables.

Lemma 2 (Poincaré inequality).

Let F:nF:{\mathbb{R}}^{n}\to{\mathbb{R}} a continuously differentiable function having at most polynomial growth and X1,,XnX_{1},\ldots,X_{n} a collection of i.i.d. standard Gaussian variables. Then,

VarF(X1,,Xn)i=1n𝔼|FXi|2.\displaystyle\mathrm{Var}F(X_{1},\ldots,X_{n})\leq\sum_{i=1}^{n}\operatorname{\mathbb{E}}\left|\frac{\partial F}{\partial X_{i}}\right|^{2}.

3 The asymmetric rank-one spiked tensor model of order 33

To best illustrate our approach, we start by considering the asymmetric rank-one tensor model of order 33 of the form

𝑻=β𝒙𝒚𝒛+1N𝑿,\displaystyle{\bm{\mathsfit{T}}}=\beta\,{\bm{x}}\otimes{\bm{y}}\otimes{\bm{z}}+\frac{1}{\sqrt{N}}{\bm{\mathsfit{X}}}, (10)

where 𝑻{\bm{\mathsfit{T}}} and 𝑿{\bm{\mathsfit{X}}} both have dimensions m×n×pm\times n\times p and N=m+n+pN=m+n+p. We assume that 𝒙,𝒚{\bm{x}},{\bm{y}} and 𝒛{\bm{z}} are on the unit spheres 𝕊m1{\mathbb{S}}^{m-1}, 𝕊n1{\mathbb{S}}^{n-1} and 𝕊p1{\mathbb{S}}^{p-1} respectively, 𝑿{\bm{\mathsfit{X}}} is a Gaussian noise tensor with i.i.d. entries Xijk𝒩(0,1)X_{ijk}\sim\mathcal{N}(0,1) independent from 𝒙,𝒚,𝒛{\bm{x}},{\bm{y}},{\bm{z}}.

3.1 Tensor singular value and vectors

According to Eq. (4), the 2\ell_{2}-singular value and vectors, corresponding to the best rank-one approximation λ𝒖𝒗𝒘\lambda_{*}{\bm{u}}_{*}\otimes{\bm{v}}_{*}\otimes{\bm{w}}_{*} of 𝑻{\bm{\mathsfit{T}}}, satisfy the identities

𝑻(𝒗)𝒘=λ𝒖𝑻(𝒖)𝒘=λ𝒗𝑻(𝒗)𝒖=λ𝒘with(𝒖,𝒗,𝒘)𝕊m1×𝕊n1×𝕊p1,\displaystyle{\bm{\mathsfit{T}}}({\bm{v}}){\bm{w}}=\lambda{\bm{u}}\quad{\bm{\mathsfit{T}}}({\bm{u}}){\bm{w}}=\lambda{\bm{v}}\quad{\bm{\mathsfit{T}}}({\bm{v}})^{\top}{\bm{u}}=\lambda{\bm{w}}\quad\text{with}\quad({\bm{u}},{\bm{v}},{\bm{w}})\,\in\,{\mathbb{S}}^{m-1}\times{\mathbb{S}}^{n-1}\times{\mathbb{S}}^{p-1}, (11)

where we denoted 𝑻(𝒖)=𝑻(𝒖,:,:)𝕄n,p,𝑻(𝒗)=𝑻(:,𝒗,:)𝕄m,p{\bm{\mathsfit{T}}}({\bm{u}})={\bm{\mathsfit{T}}}({\bm{u}},:,:)\in{\mathbb{M}}_{n,p},{\bm{\mathsfit{T}}}({\bm{v}})={\bm{\mathsfit{T}}}(:,{\bm{v}},:)\in{\mathbb{M}}_{m,p} and 𝑻(𝒘)=𝑻(:,:,𝒘)𝕄m,n{\bm{\mathsfit{T}}}({\bm{w}})={\bm{\mathsfit{T}}}(:,:,{\bm{w}})\in{\mathbb{M}}_{m,n}. Furthermore, the singular value λ\lambda can be characterized through the contraction of 𝑻{\bm{\mathsfit{T}}} along all its singular vectors 𝒖,𝒗,𝒘{\bm{u}},{\bm{v}},{\bm{w}}, i.e.

λ=𝑻(𝒖,𝒗,𝒘)=ijkuivjwkTijk.\displaystyle\lambda={\bm{\mathsfit{T}}}({\bm{u}},{\bm{v}},{\bm{w}})=\sum_{ijk}u_{i}v_{j}w_{k}T_{ijk}. (12)

3.2 Associated random matrix model

We follow the approach developed in [8] which consists in studying random matrices that are obtained through contractions of a random tensor model, and extend it to the asymmetric spiked tensor model of Eq. (10). Indeed, it has been shown in [8] that the study of a rank-one symmetric spiked tensor model 𝒀𝕋q3{\bm{\mathsfit{Y}}}\in{\mathbb{T}}_{q}^{3} boils down to the analysis of the symmetric random matrix666The contraction of the tensor 𝒀{\bm{\mathsfit{Y}}} with its eigenvector 𝒂{\bm{a}}. 𝒀(𝒂)𝕄q{\bm{\mathsfit{Y}}}({\bm{a}})\in{\mathbb{M}}_{q} where 𝒂𝕊q1{\bm{a}}\in{\mathbb{S}}^{q-1} stands for the eigenvector of 𝒀{\bm{\mathsfit{Y}}} corresponding to the best symmetric rank-one approximation of 𝒀{\bm{\mathsfit{Y}}}, i.e.,

(μ,𝒂)=argminμ,𝒂𝕊q1𝒀μ𝒂3F2𝒀(𝒂)𝒂=μ𝒂.\displaystyle(\mu,{\bm{a}})=\operatorname*{arg\,min}_{\mu\in{\mathbb{R}},\,{\bm{a}}\in{\mathbb{S}}^{q-1}}\|{\bm{\mathsfit{Y}}}-\mu{\bm{a}}^{\otimes 3}\|_{\text{F}}^{2}\quad\Leftrightarrow\quad{\bm{\mathsfit{Y}}}({\bm{a}}){\bm{a}}=\mu{\bm{a}}. (13)

In the asymmetric case of Eq. (10), the choice of a “relevant” random matrix to study is not trivial since the corresponding contractions 𝑻(𝒖),𝑻(𝒗){\bm{\mathsfit{T}}}({\bm{u}}),{\bm{\mathsfit{T}}}({\bm{v}}) and 𝑻(𝒘){\bm{\mathsfit{T}}}({\bm{w}}) yield asymmetric random matrices which present more technical difficulties from the random matrix theory perspective, therefore the extension of the approach in [8] to asymmetric tensors is not straightforward. We will see in the following that such a choice of the relevant matrix to study an asymmetric 𝑻{\bm{\mathsfit{T}}} is naturally obtained through the use of the Pastur’s Stein approach [16].

As described in [8] and since the singular vectors 𝒖,𝒗{\bm{u}},{\bm{v}} and 𝒘{\bm{w}} depend statistically on 𝑿{\bm{\mathsfit{X}}}, the first technical challenge consists in expressing the derivatives of the singular vectors 𝒖,𝒗{\bm{u}},{\bm{v}} and 𝒘{\bm{w}} w.r.t. the entries of the Gaussian noise tensor 𝑿{\bm{\mathsfit{X}}}. Indeed, one can show that there exists a differentiable mapping :𝕋m,n,pm+n+p+1\mathcal{F}:{\mathbb{T}}_{m,n,p}\to{\mathbb{R}}^{m+n+p+1} that maps 𝑿{\bm{\mathsfit{X}}} to (𝑿)=(λ(𝑿),𝒖(𝑿),𝒗(𝑿),𝒘(𝑿))\mathcal{F}({\bm{\mathsfit{X}}})=(\lambda({\bm{\mathsfit{X}}}),{\bm{u}}({\bm{\mathsfit{X}}}),{\bm{v}}({\bm{\mathsfit{X}}}),{\bm{w}}({\bm{\mathsfit{X}}})) singular-value and vectors of 𝑻{\bm{\mathsfit{T}}}, since the components of 𝒖(𝑿),𝒗(𝑿){\bm{u}}({\bm{\mathsfit{X}}}),{\bm{v}}({\bm{\mathsfit{X}}}) and 𝒘(𝑿){\bm{w}}({\bm{\mathsfit{X}}}) are bounded and λ(𝑿)\lambda({\bm{\mathsfit{X}}}) has polynomial growth. Indeed, we have the following Lemma which is analog to [8, Lemma 8] and which justifies the application of Stein’s Lemma subsequently.

Lemma 3.

There exists an almost everywhere continuously differentiable function :𝕋m,n,pm+n+p+1\mathcal{F}:{\mathbb{T}}_{m,n,p}\to{\mathbb{R}}^{m+n+p+1} such that (𝑿)=(λ(𝑿),𝐮(𝑿),𝐯(𝑿),𝐰(𝑿))\mathcal{F}({\bm{\mathsfit{X}}})=(\lambda({\bm{\mathsfit{X}}}),{\bm{u}}({\bm{\mathsfit{X}}}),{\bm{v}}({\bm{\mathsfit{X}}}),{\bm{w}}({\bm{\mathsfit{X}}})) is singular-value and vectors of 𝑻{\bm{\mathsfit{T}}} (for almost every 𝑿{\bm{\mathsfit{X}}}).

Proof.

The proof relies on the same arguments as [8, Lemma 8]. ∎

Calculus (see details in B.1) show that deriving the identities in Eq. (11) w.r.t. an entry XijkX_{ijk} of 𝑿{\bm{\mathsfit{X}}} with (i,j,k)[m]×[n]×[p](i,j,k)\in[m]\times[n]\times[p] results in

[𝒖Xijk𝒗Xijk𝒘Xijk]=1N([𝟎m×m𝑻(𝒘)𝑻(𝒗)𝑻(𝒘)𝟎n×n𝑻(𝒖)𝑻(𝒗)𝑻(𝒖)𝟎p×p]λ𝑰N)1[vjwk(𝒆imui𝒖)uiwk(𝒆jnvj𝒗)uivj(𝒆kpwk𝒘)]N,\displaystyle\begin{bmatrix}\frac{\partial{\bm{u}}}{\partial X_{ijk}}\\ \frac{\partial{\bm{v}}}{\partial X_{ijk}}\\ \frac{\partial{\bm{w}}}{\partial X_{ijk}}\end{bmatrix}=-\frac{1}{\sqrt{N}}\left(\begin{bmatrix}{\bm{0}}_{m\times m}&{\bm{\mathsfit{T}}}({\bm{w}})&{\bm{\mathsfit{T}}}({\bm{v}})\\ {\bm{\mathsfit{T}}}({\bm{w}})^{\intercal}&{\bm{0}}_{n\times n}&{\bm{\mathsfit{T}}}({\bm{u}})\\ {\bm{\mathsfit{T}}}({\bm{v}})^{\intercal}&{\bm{\mathsfit{T}}}({\bm{u}})^{\intercal}&{\bm{0}}_{p\times p}\end{bmatrix}-\lambda{\bm{I}}_{N}\right)^{-1}\begin{bmatrix}v_{j}w_{k}({\bm{e}}_{i}^{m}-u_{i}{\bm{u}})\\ u_{i}w_{k}({\bm{e}}_{j}^{n}-v_{j}{\bm{v}})\\ u_{i}v_{j}({\bm{e}}_{k}^{p}-w_{k}{\bm{w}})\end{bmatrix}\in{\mathbb{R}}^{N}, (14)

where we recall that 𝒆id{\bm{e}}_{i}^{d} denotes the canonical vector in d{\mathbb{R}}^{d} with [𝒆id]j=δij[{\bm{e}}_{i}^{d}]_{j}=\delta_{ij}. We further have the identity

λXijk=1Nuivjwk.\displaystyle\frac{\partial\lambda}{\partial X_{ijk}}=\frac{1}{\sqrt{N}}u_{i}v_{j}w_{k}. (15)

Denoting by 𝑴{\bm{M}} the symmetric block-wise random matrix which appears in the matrix inverse in Eq. (14), the derivatives of 𝒖,𝒗{\bm{u}},{\bm{v}} and 𝒘{\bm{w}} are therefore expressed in terms of the resolvent 𝑹𝑴(z)=(𝑴z𝑰N)1{\bm{R}}_{\bm{M}}(z)=\left({\bm{M}}-z{\bm{I}}_{N}\right)^{-1} evaluated on λ\lambda, and we will see subsequently that the assessment of the spectral properties of 𝑻{\bm{\mathsfit{T}}} boils down to the estimation of the normalized trace 1Ntr𝑹𝑴(z)\frac{1}{N}\operatorname{tr}{\bm{R}}_{\bm{M}}(z). As such, the matrix 𝑴{\bm{M}} provides the associated random matrix model that encodes the spectral properties of 𝑻{\bm{\mathsfit{T}}}. We will henceforth focus our analysis on this random matrix, in order to assess the spectral behavior of 𝑻{\bm{\mathsfit{T}}}. More generally, we will be interested in studying random matrices from the 33-order block-wise tensor contraction ensemble 3(𝑿)\mathcal{B}_{3}({\bm{\mathsfit{X}}}) for 𝑿𝕋m,n,p(𝒩(0,1)){\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{m,n,p}(\mathcal{N}(0,1)) defined as

3(𝑿){Φ3(𝑿,𝒂,𝒃,𝒄)|(𝒂,𝒃,𝒄)𝕊m1×𝕊n1×𝕊p1},\displaystyle\mathcal{B}_{3}({\bm{\mathsfit{X}}})\equiv\left\{\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}})\,\,\big{|}\,\,({\bm{a}},{\bm{b}},{\bm{c}})\in{\mathbb{S}}^{m-1}\times{\mathbb{S}}^{n-1}\times{\mathbb{S}}^{p-1}\right\}, (16)

where Φ3\Phi_{3} is the mapping

Φ3:𝕋m,n,p×𝕊m1×𝕊n1×𝕊p1𝕄m+n+p(𝑿,𝒂,𝒃,𝒄)[𝟎m×m𝑿(𝒄)𝑿(𝒃)𝑿(𝒄)𝟎n×n𝑿(𝒂)𝑿(𝒃)𝑿(𝒂)𝟎p×p].\begin{split}\Phi_{3}:{\mathbb{T}}_{m,n,p}\times{\mathbb{S}}^{m-1}\times{\mathbb{S}}^{n-1}\times{\mathbb{S}}^{p-1}&\longrightarrow{\mathbb{M}}_{m+n+p}\\ ({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}})&\longmapsto\begin{bmatrix}{\bm{0}}_{m\times m}&{\bm{\mathsfit{X}}}({\bm{c}})&{\bm{\mathsfit{X}}}({\bm{b}})\\ {\bm{\mathsfit{X}}}({\bm{c}})^{\intercal}&{\bm{0}}_{n\times n}&{\bm{\mathsfit{X}}}({\bm{a}})\\ {\bm{\mathsfit{X}}}({\bm{b}})^{\intercal}&{\bm{\mathsfit{X}}}({\bm{a}})^{\intercal}&{\bm{0}}_{p\times p}\end{bmatrix}.\end{split} (17)

We associate tensor 𝑻{\bm{\mathsfit{T}}} to the random matrix

Φ3(𝑻,𝒖,𝒗,𝒘)=β𝑽𝑩𝑽+1NΦ3(𝑿,𝒖,𝒗,𝒘)𝕄N,\displaystyle\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})=\beta\,{\bm{V}}{\bm{B}}{\bm{V}}^{\top}+\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{u}},{\bm{v}},{\bm{w}})\in{\mathbb{M}}_{N}, (18)

where

𝑩[0𝒛,𝒘𝒚,𝒗𝒛,𝒘0𝒙,𝒖𝒚,𝒗𝒙,𝒖0]𝕄3,𝑽[𝒙𝟎m𝟎m𝟎n𝒚𝟎n𝟎p𝟎p𝒛]𝕄N,3.\displaystyle{\bm{B}}\equiv\begin{bmatrix}0&\langle{\bm{z}},{\bm{w}}\rangle&\langle{\bm{y}},{\bm{v}}\rangle\\ \langle{\bm{z}},{\bm{w}}\rangle&0&\langle{\bm{x}},{\bm{u}}\rangle\\ \langle{\bm{y}},{\bm{v}}\rangle&\langle{\bm{x}},{\bm{u}}\rangle&0\end{bmatrix}\in{\mathbb{M}}_{3},\quad{\bm{V}}\equiv\begin{bmatrix}{\bm{x}}&{\bm{0}}_{m}&{\bm{0}}_{m}\\ {\bm{0}}_{n}&{\bm{y}}&{\bm{0}}_{n}\\ {\bm{0}}_{p}&{\bm{0}}_{p}&{\bm{z}}\end{bmatrix}\in{\mathbb{M}}_{N,3}.

and 𝒙,𝒚{\bm{x}},{\bm{y}} and 𝒛{\bm{z}} are on the unit spheres 𝕊m1{\mathbb{S}}^{m-1}, 𝕊n1{\mathbb{S}}^{n-1} and 𝕊p1{\mathbb{S}}^{p-1} respectively. Note that Φ3(𝑻,𝒖,𝒗,𝒘)\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}) is symmetric and behaves as a so-called spiked random matrix model where the signal part β𝑽𝑩𝑽\beta\,{\bm{V}}{\bm{B}}{\bm{V}}^{\top} correlates with the true signals 𝒙,𝒚{\bm{x}},{\bm{y}} and 𝒛{\bm{z}} for a sufficiently large β\beta. However, the noise part (i.e., the term 1NΦ3(𝑿,𝒖,𝒗,𝒘)\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{u}},{\bm{v}},{\bm{w}}) in Eq. (18)) has a non-trivial structure due to the statistical dependencies between the singular vectors 𝒖,𝒗,𝒘{\bm{u}},{\bm{v}},{\bm{w}} and the tensor noise 𝑿{\bm{\mathsfit{X}}}. Despite this statistical dependency, we will show in the next subsection that the spectral measure of Φ3(𝑻,𝒖,𝒗,𝒘)\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}) (and that of 1NΦ3(𝑿,𝒖,𝒗,𝒘)\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{u}},{\bm{v}},{\bm{w}})) converges to a deterministic measure (see Theorem 4) that coincides with the limiting spectral measure of Φ3(𝑻,𝒂,𝒃,𝒄)\Phi_{3}({\bm{\mathsfit{T}}},{\bm{a}},{\bm{b}},{\bm{c}}) where 𝒂,𝒃,𝒄{\bm{a}},{\bm{b}},{\bm{c}} are unit vectors and independent of 𝑿{\bm{\mathsfit{X}}}. Furthermore, the matrix Φ3(𝑻,𝒖,𝒗,𝒘)\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}) admits 2λ2\lambda as an eigenvalue (regardless of the value of β\beta), which is a simple consequence of the identities in Eq. (11), since

Φ3(𝑻,𝒖,𝒗,𝒘)[𝒖𝒗𝒘]=[𝑻(𝒘)𝒗+𝑻(𝒗)𝒘𝑻(𝒘)𝒖+𝑻(𝒖)𝒘𝑻(𝒗)𝒖+𝑻(𝒖)𝒗]=2λ[𝒖𝒗𝒘].\displaystyle\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})\begin{bmatrix}{\bm{u}}\\ {\bm{v}}\\ {\bm{w}}\end{bmatrix}=\begin{bmatrix}{\bm{\mathsfit{T}}}({\bm{w}}){\bm{v}}+{\bm{\mathsfit{T}}}({\bm{v}}){\bm{w}}\\ {\bm{\mathsfit{T}}}({\bm{w}})^{\top}{\bm{u}}+{\bm{\mathsfit{T}}}({\bm{u}}){\bm{w}}\\ {\bm{\mathsfit{T}}}({\bm{v}})^{\top}{\bm{u}}+{\bm{\mathsfit{T}}}({\bm{u}})^{\top}{\bm{v}}\end{bmatrix}=2\lambda\begin{bmatrix}{\bm{u}}\\ {\bm{v}}\\ {\bm{w}}\end{bmatrix}. (19)

Note however that the expression in Eq. (14) exists only if λ\lambda is not an eigenvalue of Φ3(𝑻,𝒖,𝒗,𝒘)\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}). As discussed in [8] in the symmetric case, such condition is related to the locality of the maximum ML estimator. Indeed, we first have the following remark that concerns the Hessian of the underlying objective function.

Remark 2.

Recall the Lagrangian of the ML problem as (𝐮,𝐯,𝐰)=𝑻(𝐮,𝐯,𝐰)λ(𝐮𝐯𝐰1)\mathcal{L}({\bm{u}},{\bm{v}},{\bm{w}})={\bm{\mathsfit{T}}}({\bm{u}},{\bm{v}},{\bm{w}})-\lambda\left(\|{\bm{u}}\|\|{\bm{v}}\|\|{\bm{w}}\|-1\right) and denote 𝐡[𝐮,𝐯,𝐰]3{\bm{h}}\equiv\frac{[{\bm{u}}^{\top},{\bm{v}}^{\top},{\bm{w}}^{\top}]^{\top}}{\sqrt{3}}. If λ\lambda is a singular value of 𝑻{\bm{\mathsfit{T}}} with associated singular vectors 𝐮,𝐯,𝐰{\bm{u}},{\bm{v}},{\bm{w}}, then (λ,𝐡)(\lambda,{\bm{h}}) is an eigenpair of the Hessian 22𝐡2\nabla^{2}\mathcal{L}\equiv\frac{\partial^{2}\mathcal{L}}{\partial{\bm{h}}^{2}} evaluated at 𝐡{\bm{h}}. Indeed, 2(𝐡)=Φ3(𝑻,𝐮,𝐯,𝐰)λ𝐈N\nabla^{2}\mathcal{L}({\bm{h}})=\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})-\lambda{\bm{I}}_{N}, thus 2(𝐡)𝐡=Φ3(𝑻,𝐮,𝐯,𝐰)𝐡λ𝐡=λ𝐡\nabla^{2}\mathcal{L}({\bm{h}}){\bm{h}}=\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}){\bm{h}}-\lambda{\bm{h}}=\lambda{\bm{h}} since Φ3(𝑻,𝐮,𝐯,𝐰)𝐡=2λ𝐡\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}){\bm{h}}=2\lambda{\bm{h}} (from Eq. (19)).

From [18, Theorem 12.5], a necessary condition for 𝒖,𝒗,𝒘{\bm{u}},{\bm{v}},{\bm{w}} to be a local maximum of the ML problem is that

2(𝒉)𝒌,𝒌0,𝒌𝒉{𝒌N𝒉,𝒌=0}.\displaystyle\langle\nabla^{2}\mathcal{L}({\bm{h}}){\bm{k}},{\bm{k}}\rangle\leq 0,\quad\forall{\bm{k}}\in{\bm{h}}^{\perp}\equiv\left\{{\bm{k}}\in{\mathbb{R}}^{N}\,\mid\,\langle{\bm{h}},{\bm{k}}\rangle=0\right\}.

which yields (by remark 2) the condition

max𝒌𝕊N1𝒉Φ3(𝑻,𝒖,𝒗,𝒘)𝒌,𝒌λ.\displaystyle\max_{{\bm{k}}\in{\mathbb{S}}^{N-1}\cap{\bm{h}}^{\perp}}\langle\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}){\bm{k}},{\bm{k}}\rangle\leq\lambda. (20)

As such, for λ>0\lambda>0, 2λ2\lambda must be the largest eigenvalue of Φ3(𝑻,𝒖,𝒗,𝒘)\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}) as per Eq. (19), while its second largest eigenvalue cannot exceed λ\lambda as shown by Eq. (20). Thus our analysis applies only to some local optimum of the ML problem for which the corresponding singular value λ\lambda lies outside the bulk of Φ3(𝑻,𝒖,𝒗,𝒘)\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}), namely we suppose777We will find that such assumption is satisfied provided β>βs\beta>\beta_{s} for some βs>0\beta_{s}>0 that we will determine that there exists a tuple (λ,𝒖,𝒗,𝒘)(\lambda,{\bm{u}},{\bm{v}},{\bm{w}}) verifying the identities in Eq. (11) such that λ\lambda is not an eigenvalue of Φ3(𝑻,𝒖,𝒗,𝒘)\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}). Moreover, this allows the existence of the matrix inverse in Eq. (14) and it is not restrictive in the sense that it is satisfied for a sufficiently large value of the SNR β\beta.
In the sequel, for any (λ,𝒖,𝒗,𝒘)(\lambda,{\bm{u}},{\bm{v}},{\bm{w}}) verifying the identities in Eq. (11), we find that the largest eigenvalue 2λ2\lambda is always isolated from the bulk of Φ3(𝑻,𝒖,𝒗,𝒘)\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}) independently of the SNR β\beta. In addition, there exists βs>0\beta_{s}>0 such that for ββs\beta\leq\beta_{s}, the observed spike is not informative (in the sense that the corresponding alignments will be null) and is visible (an isolated eigenvalue appears in the spectrum of Φ3(𝑻,𝒖,𝒗,𝒘)\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}), see Figure 4) because of the statistical dependencies between 𝒖,𝒗,𝒘{\bm{u}},{\bm{v}},{\bm{w}} and 𝑿{\bm{\mathsfit{X}}}. The same phenomenon has been observed in the case of symmetric random tensors by [8].

3.3 Limiting spectral measure of block-wise 33-order tensor contractions

We start by presenting our first result which characterizes the limiting spectral measure of the ensemble 3(𝑿)\mathcal{B}_{3}({\bm{\mathsfit{X}}}) with 𝑿𝕋m,n,p(𝒩(0,1)){\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{m,n,p}(\mathcal{N}(0,1)). We characterize this measure in the limit when the tensor dimensions grow large as in the following assumption.

Assumption 1 (Growth rate).

As m,n,pm,n,p\to\infty, the dimension ratios mm+n+pc1(0,1)\frac{m}{m+n+p}\to c_{1}\in(0,1), nm+n+pc2(0,1)\frac{n}{m+n+p}\to c_{2}\in(0,1) and pm+n+pc3=1(c1+c2)(0,1)\frac{p}{m+n+p}\to c_{3}=1-(c_{1}+c_{2})\in(0,1).

We have the following theorem which characterizes the spectrum of 1NΦ3(𝑿,𝒂,𝒃,𝒄)\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}}) with arbitrary deterministic unit vectors 𝒂,𝒃,𝒄{\bm{a}},{\bm{b}},{\bm{c}}.

Theorem 3.

Let 𝑿𝕋m,n,p(𝒩(0,1)){\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{m,n,p}(\mathcal{N}(0,1)) be a sequence of random asymmetric Gaussian tensors and (𝐚,𝐛,𝐜)𝕊m1×𝕊n1×𝕊p1({\bm{a}},{\bm{b}},{\bm{c}})\in{\mathbb{S}}^{m-1}\times{\mathbb{S}}^{n-1}\times{\mathbb{S}}^{p-1} a sequence of deterministic vectors of increasing dimensions, following Assumption 1. Then the empirical spectral measure of 1NΦ3(𝑿,𝐚,𝐛,𝐜)\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}}) converges weakly almost surely to a deterministic measure ν\nu whose Stieltjes transform g(z)g(z) is defined as the solution to the equation g(z)=i=13gi(z)g(z)=\sum_{i=1}^{3}g_{i}(z) such that [g(z)]>0\Im[g(z)]>0 for [z]>0\Im[z]>0 where, for i[3]i\in[3] gi(z)g_{i}(z) satisfies gi2(z)(g(z)+z)gi(z)ci=0g_{i}^{2}(z)-(g(z)+z)g_{i}(z)-c_{i}=0 for z𝒮(ν)z\in{\mathbb{C}}\setminus\mathcal{S}(\nu).

Proof.

See Appendix B.2. ∎

Remark 3.

The equation g(z)=i=13gi(z)g(z)=\sum_{i=1}^{3}g_{i}(z) yields a polynomial equation of degree 55 in gg which can be solved numerically or via an iteration procedure which converges to a fixed point for zz large enough.

Refer to caption
Figure 3: Spectrum of 1NΦ3(𝑿,𝒂,𝒃,𝒄)\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}}) with 𝑿𝕋m,n,p(𝒩(0,1)){\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{m,n,p}(\mathcal{N}(0,1)) and 𝒂,𝒃,𝒄{\bm{a}},{\bm{b}},{\bm{c}} independently sampled from the unit spheres 𝕊m1,𝕊n1{\mathbb{S}}^{m-1},{\mathbb{S}}^{n-1}, 𝕊p1{\mathbb{S}}^{p-1} respectively. In black the semi-circle law as per Theorem 3.

In particular, in the cubic case when c1=c2=c3=13c_{1}=c_{2}=c_{3}=\frac{1}{3}, the empirical spectral measure of 1NΦ3(𝑿,𝒂,𝒃,𝒄)\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}}) converges to a semi-circle law as precisely stated by the following Corollary of Theorem 3.

Corollary 1.

Given the setting of Theorem 3 with c1=c2=c3=13c_{1}=c_{2}=c_{3}=\frac{1}{3}, the empirical spectral measure of 1NΦ3(𝑿,𝐚,𝐛,𝐜)\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}}) converges weakly almost surely to the semi-circle distribution supported on 𝒮(ν)[223,223]\mathcal{S}(\nu)\equiv\left[-2\sqrt{\frac{2}{3}},2\sqrt{\frac{2}{3}}\right], whose density and Stieltjes transform write respectively as

ν(dx)=34π(83x2)+dx,g(z)3z+3z2834,wherez𝒮(ν).\displaystyle\nu(dx)=\frac{3}{4\pi}\sqrt{\left(\frac{8}{3}-x^{2}\right)^{+}}dx,\quad g(z)\equiv\frac{-3z+3\sqrt{z^{2}-\frac{8}{3}}}{4},\quad\text{where}\quad z\in{\mathbb{C}}\setminus\mathcal{S}(\nu).
Proof.

See Appendix B.3. ∎

Figure 3 depicts the spectrum of 1NΦ3(𝑿,𝒂,𝒃,𝒄)\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}}) with 𝑿𝕋m,n,p(𝒩(0,1)){\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{m,n,p}(\mathcal{N}(0,1)) and independent unit vectors 𝒂,𝒃,𝒄{\bm{a}},{\bm{b}},{\bm{c}}, it particularly illustrates the convergence in law of this spectrum when the dimensions m,n,pm,n,p grow large.

Remark 4.

More generally, the spectral measure of 1NΦ3(𝑿,𝐚,𝐛,𝐜)\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}}) converges to a deterministic measure with connected support if 1NΦ3(𝑿,𝐚,𝐛,𝐜)\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}}) is almost surely full rank, i.e., if max(m,n)min(m,n)pm+n\max(m,n)-\min(m,n)\leq p\leq m+n since the rank of 1NΦ3(𝑿,𝐚,𝐛,𝐜)\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}}) is min(m,n+p)+min(n,m+p)+min(p,m+n)\min(m,n+p)+\min(n,m+p)+\min(p,m+n). In contrast if it is not full rank, its spectral measure converges to a deterministic measure with unconnected support (see the case of matrices in Corollary 4 subsequently).

The analysis of the tensor 𝑻{\bm{\mathsfit{T}}} relies on describing the spectrum of Φ3(𝑻,𝒖,𝒗,𝒘)\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}) where the singular vectors 𝒖,𝒗,𝒘{\bm{u}},{\bm{v}},{\bm{w}} depend statistically on 𝑿{\bm{\mathsfit{X}}} (the noise part of 𝑻{\bm{\mathsfit{T}}}). Despite these dependencies, it turns out that the spectrum of Φ3(𝑻,𝒖,𝒗,𝒘)\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}) converges in law to the same deterministic measure described by Theorem 3. Besides, we need a further technical assumption on the singular value and vectors of 𝑻{\bm{\mathsfit{T}}}.

Assumption 2.

We assume that there exists a sequence of critical points (λ,𝐮,𝐯,𝐰)(\lambda_{*},{\bm{u}}_{*},{\bm{v}}_{*},{\bm{w}}_{*}) satisfying Eq. (11) such that λa.s.λ(β)\lambda_{*}\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}(\beta), |𝐮,𝐱|a.s.ax(β)|\langle{\bm{u}}_{*},{\bm{x}}\rangle|\operatorname{\,\xrightarrow{\text{a.s.}}\,}a_{x}^{\infty}(\beta), |𝐯,𝐲|a.s.ay(β)|\langle{\bm{v}}_{*},{\bm{y}}\rangle|\operatorname{\,\xrightarrow{\text{a.s.}}\,}a_{y}^{\infty}(\beta), |𝐰,𝐳|a.s.az(β)|\langle{\bm{w}}_{*},{\bm{z}}\rangle|\operatorname{\,\xrightarrow{\text{a.s.}}\,}a_{z}^{\infty}(\beta) with λ(β)𝒮(ν)\lambda^{\infty}(\beta)\notin\mathcal{S}(\nu) and ax(β),ay(β),az(β)>0a_{x}^{\infty}(\beta),a_{y}^{\infty}(\beta),a_{z}^{\infty}(\beta)>0.

We precisely have the following result.

Theorem 4.

Let 𝑻{\bm{\mathsfit{T}}} be a sequence of random tensors defined as in Eq. (10). Under Assumptions 1 and 2, the empirical spectral measure of Φ3(𝑻,𝐮,𝐯,𝐰)\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}}_{*},{\bm{v}}_{*},{\bm{w}}_{*}) converges weakly almost surely to a deterministic measure ν\nu whose Stieltjes transform g(z)g(z) is defined as the solution to the equation g(z)=i=13gi(z)g(z)=\sum_{i=1}^{3}g_{i}(z) such that [g(z)]>0\Im[g(z)]>0 for [z]>0\Im[z]>0 where, for i[3]i\in[3] gi(z)g_{i}(z) satisfies gi2(z)(g(z)+z)gi(z)ci=0g_{i}^{2}(z)-(g(z)+z)g_{i}(z)-c_{i}=0 for z𝒮(ν)z\in{\mathbb{C}}\setminus\mathcal{S}(\nu).

Proof.

See Appendix B.4. ∎

However, the statistical dependency between 𝒖,𝒗,𝒘{\bm{u}},{\bm{v}},{\bm{w}} and 𝑿{\bm{\mathsfit{X}}} exhibits an isolated eigenvalue in the spectrum of Φ3(𝑻,𝒖,𝒗,𝒘)\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}) at the value 2λ2\lambda independently of the value of the SNR β\beta, which is a consequence of Eq. (19). Figure 4 depicts iterations of the power iteration method where the leftmost histogram corresponds to the fixed point solution, where one sees that the spike converges to the value 2λ2\lambda.

Refer to caption
Figure 4: Spectrum of Φ3(𝑻,𝒖,𝒗,𝒘)\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}) at iterations 0,50,5 and \infty of the power method (see Algorithm 3) applied on 𝑻{\bm{\mathsfit{T}}}, for m=n=p=100m=n=p=100 and β=0\beta=0.
Remark 5.

Note that, from Assumption 2, the almost sure limit λ\lambda^{\infty} of λ\lambda must lie outside the support 𝒮(ν)\mathcal{S}(\nu) of the deterministic measure ν\nu described by Theorem 4. The same phenomenon was noticed in [8] where it is assumed that the almost sure limit μ\mu^{\infty} of μ\mu must satisfy μ>223\mu^{\infty}>2\sqrt{\frac{2}{3}} in the case of symmetric tensors.

Let us denote the blocks of the resolvent of Φ3(𝑻,𝒖,𝒗,𝒘)\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}) as

𝑹Φ3(𝑻,𝒖,𝒗,𝒘)(z)=[𝑹11(z)𝑹12(z)𝑹13(z)𝑹12(z)𝑹22(z)𝑹23(z)𝑹13(z)𝑹23(z)𝑹33(z)],\displaystyle{\bm{R}}_{\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})}(z)=\begin{bmatrix}{\bm{R}}^{11}(z)&{\bm{R}}^{12}(z)&{\bm{R}}^{13}(z)\\ {\bm{R}}^{12}(z)^{\top}&{\bm{R}}^{22}(z)&{\bm{R}}^{23}(z)\\ {\bm{R}}^{13}(z)^{\top}&{\bm{R}}^{23}(z)^{\top}&{\bm{R}}^{33}(z)\end{bmatrix}, (21)

where 𝑹11(z)𝕄m,𝑹22(z)𝕄n{\bm{R}}^{11}(z)\in{\mathbb{M}}_{m},{\bm{R}}^{22}(z)\in{\mathbb{M}}_{n} and 𝑹33(z)𝕄p{\bm{R}}^{33}(z)\in{\mathbb{M}}_{p}. The following corollary of Theorem 4 will be useful subsequently.

Corollary 2.

Recall the setting and notations of Theorem 4. For all z𝒮(ν)z\in{\mathbb{C}}\setminus\mathcal{S}(\nu), we have

1Ntr𝑹11(z)a.s.g1(z),1Ntr𝑹22(z)a.s.g2(z),1Ntr𝑹33(z)a.s.g3(z).\displaystyle\frac{1}{N}\operatorname{tr}{\bm{R}}^{11}(z)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g_{1}(z),\quad\frac{1}{N}\operatorname{tr}{\bm{R}}^{22}(z)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g_{2}(z),\quad\frac{1}{N}\operatorname{tr}{\bm{R}}^{33}(z)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g_{3}(z).

3.4 Concentration of the singular value and the alignments

When the dimensions of 𝑻{\bm{\mathsfit{T}}} grow large under Assumption 1, the singular value λ\lambda and the alignments 𝒖,𝒙,𝒗,𝒚\langle{\bm{u}},{\bm{x}}\rangle,\langle{\bm{v}},{\bm{y}}\rangle and 𝒘,𝒛\langle{\bm{w}},{\bm{z}}\rangle converge almost surely to some deterministic limits. This can be shown by controlling the variances of these quantities using the Poincaré’s inequality (Lemma 2). Precisely, for λ\lambda, invoking Eq. (15) we have

Varλijk𝔼|λXijk|2=1Nijkui2vj2wk2=1N.\displaystyle\mathrm{Var}\lambda\leq\sum_{ijk}\operatorname{\mathbb{E}}\left|\frac{\partial\lambda}{\partial X_{ijk}}\right|^{2}=\frac{1}{N}\sum_{ijk}u_{i}^{2}v_{j}^{2}w_{k}^{2}=\frac{1}{N}.

Bounding higher order moments of λ\lambda similarly allows to obtain the concentration of λ\lambda, e.g. by Chebyshev’s inequality, we have for all t>0t>0

(|λ𝔼λ|t)1Nt2.\displaystyle\mathbb{P}\left(|\lambda-\operatorname{\mathbb{E}}\lambda|\geq t\right)\leq\frac{1}{N\,t^{2}}.

Similarly, with Eq. (14), there exists C>0C>0 such that for all t>0t>0

(|𝒖,𝒙𝔼𝒖,𝒙|t)CNt2,(|𝒗,𝒚𝔼𝒗,𝒚|t)CNt2,(|𝒘,𝒛𝔼𝒘,𝒛|t)CNt2.\displaystyle\mathbb{P}\left(|\langle{\bm{u}},{\bm{x}}\rangle-\operatorname{\mathbb{E}}\langle{\bm{u}},{\bm{x}}\rangle|\geq t\right)\leq\frac{C}{N\,t^{2}},\,\,\mathbb{P}\left(|\langle{\bm{v}},{\bm{y}}\rangle-\operatorname{\mathbb{E}}\langle{\bm{v}},{\bm{y}}\rangle|\geq t\right)\leq\frac{C}{N\,t^{2}},\,\,\mathbb{P}\left(|\langle{\bm{w}},{\bm{z}}\rangle-\operatorname{\mathbb{E}}\langle{\bm{w}},{\bm{z}}\rangle|\geq t\right)\leq\frac{C}{N\,t^{2}}.

For the remainder of the manuscript, we denote the almost sure limits of λ\lambda and the of alignments 𝒖,𝒙,𝒗,𝒚\langle{\bm{u}},{\bm{x}}\rangle,\langle{\bm{v}},{\bm{y}}\rangle and 𝒘,𝒛\langle{\bm{w}},{\bm{z}}\rangle respectively as

λ(β)limNλ,a𝒙(β)limN𝒖,𝒙,a𝒚(β)limN𝒗,𝒚,a𝒛(β)limN𝒘,𝒛.\displaystyle\lambda^{\infty}(\beta)\equiv\lim_{N\to\infty}\lambda,\quad a_{\bm{x}}^{\infty}(\beta)\equiv\lim_{N\to\infty}\langle{\bm{u}},{\bm{x}}\rangle,\quad a_{\bm{y}}^{\infty}(\beta)\equiv\lim_{N\to\infty}\langle{\bm{v}},{\bm{y}}\rangle,\quad a_{\bm{z}}^{\infty}(\beta)\equiv\lim_{N\to\infty}\langle{\bm{w}},{\bm{z}}\rangle. (22)

3.5 Asymptotic singular value and alignments

Having the concentration result from the previous subsection, it remains to estimate the expectations 𝔼λ,𝔼𝒖,𝒙,𝔼𝒗,𝒚\operatorname{\mathbb{E}}\lambda,\operatorname{\mathbb{E}}\langle{\bm{u}},{\bm{x}}\rangle,\operatorname{\mathbb{E}}\langle{\bm{v}},{\bm{y}}\rangle and 𝔼𝒘,𝒛\operatorname{\mathbb{E}}\langle{\bm{w}},{\bm{z}}\rangle. Usually, the evaluation of these quantities using tools from random matrix theory relies on computing Cauchy integrals involving 𝑹Φ3(𝑻,𝒖,𝒗,𝒘)(z){\bm{R}}_{\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})}(z) (the resolvent of Φ3(𝑻,𝒖,𝒗,𝒘)\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})). Here, we take a different (yet analytically simpler) approach by taking directly the expectation of the identities in Eq. (11) and Eq. (12), then applying Stein’s Lemma (Lemma 1) and Lemma 3. For instance, for λ\lambda, we have

𝔼λ=1Nijk𝔼[vjwkuiXijk]+𝔼[uiwkvjXijk]+𝔼[uivjwkXijk]+β𝔼[𝒖,𝒙𝒗,𝒚𝒘,𝒛].\displaystyle\operatorname{\mathbb{E}}\lambda=\frac{1}{\sqrt{N}}\sum_{ijk}\operatorname{\mathbb{E}}\left[v_{j}w_{k}\frac{\partial u_{i}}{\partial X_{ijk}}\right]+\operatorname{\mathbb{E}}\left[u_{i}w_{k}\frac{\partial v_{j}}{\partial X_{ijk}}\right]+\operatorname{\mathbb{E}}\left[u_{i}v_{j}\frac{\partial w_{k}}{\partial X_{ijk}}\right]+\beta\operatorname{\mathbb{E}}\left[\langle{\bm{u}},{\bm{x}}\rangle\langle{\bm{v}},{\bm{y}}\rangle\langle{\bm{w}},{\bm{z}}\rangle\right].

From Eq. (14), when NN\to\infty, it turns out that the only contributing terms888Yielding non-vanishing terms in the expression of λ(β)\lambda^{\infty}(\beta). of the derivatives uiXijk,vjXijk\frac{\partial u_{i}}{\partial X_{ijk}},\frac{\partial v_{j}}{\partial X_{ijk}} and wkXijk\frac{\partial w_{k}}{\partial X_{ijk}} in the above sum are respectively

uiXijk1NvjwkRii11(λ),vjXijk1NuiwkRjj22(λ),wkXijk1NuivjRkk33(λ).\displaystyle\frac{\partial u_{i}}{\partial X_{ijk}}\simeq-\frac{1}{\sqrt{N}}v_{j}w_{k}R^{11}_{ii}(\lambda),\quad\frac{\partial v_{j}}{\partial X_{ijk}}\simeq-\frac{1}{\sqrt{N}}u_{i}w_{k}R^{22}_{jj}(\lambda),\quad\frac{\partial w_{k}}{\partial X_{ijk}}\simeq-\frac{1}{\sqrt{N}}u_{i}v_{j}R^{33}_{kk}(\lambda).

This yields

𝔼λ=1N(tr𝑹11(λ)+tr𝑹22(λ)+tr𝑹33(λ))+β𝔼[𝒖,𝒙𝒗,𝒚𝒘,𝒛]+𝒪(N1).\displaystyle\operatorname{\mathbb{E}}\lambda=-\frac{1}{N}\left(\operatorname{tr}{\bm{R}}^{11}(\lambda)+\operatorname{tr}{\bm{R}}^{22}(\lambda)+\operatorname{tr}{\bm{R}}^{33}(\lambda)\right)+\beta\operatorname{\mathbb{E}}\left[\langle{\bm{u}},{\bm{x}}\rangle\langle{\bm{v}},{\bm{y}}\rangle\langle{\bm{w}},{\bm{z}}\rangle\right]+\mathcal{O}\left(N^{-1}\right).

Therefore, the almost sure limit λ(β)\lambda^{\infty}(\beta) as NN\to\infty of λ\lambda satisfies

λ(β)+g(λ(β))=βa𝒙(β)a𝒚(β)a𝒛(β),\displaystyle\lambda^{\infty}(\beta)+g(\lambda^{\infty}(\beta))=\beta a_{\bm{x}}^{\infty}(\beta)a_{\bm{y}}^{\infty}(\beta)a_{\bm{z}}^{\infty}(\beta),

where a𝒙(β),a𝒚(β)a_{\bm{x}}^{\infty}(\beta),a_{\bm{y}}^{\infty}(\beta) and a𝒛(β)a_{\bm{z}}^{\infty}(\beta) are defined in Eq. (22). From Eq. (11), proceeding similarly as above with the identities

𝒙𝑻(𝒗)𝒘=λ𝒖,𝒙,𝒚𝑻(𝒖)𝒘=λ𝒗,𝒚,𝒛𝑻(𝒗)𝒖=λ𝒘,𝒛,\displaystyle{\bm{x}}^{\top}{\bm{\mathsfit{T}}}({\bm{v}}){\bm{w}}=\lambda\langle{\bm{u}},{\bm{x}}\rangle,\quad{\bm{y}}^{\top}{\bm{\mathsfit{T}}}({\bm{u}}){\bm{w}}=\lambda\langle{\bm{v}},{\bm{y}}\rangle,\quad{\bm{z}}^{\top}{\bm{\mathsfit{T}}}({\bm{v}})^{\top}{\bm{u}}=\lambda\langle{\bm{w}},{\bm{z}}\rangle,

we obtain the following result.

Theorem 5.

Recall the notations in Theorem 4. Under Assumptions 1 and 2, there exists βs>0\beta_{s}>0 such that for β>βs\beta>\beta_{s},

{λa.s.λ(β),|𝒖,𝒙|a.s.1α2(λ(β))α3(λ(β)),|𝒗,𝒚|a.s.1α1(λ(β))α3(λ(β)),|𝒘,𝒛|a.s.1α1(λ(β))α2(λ(β)),\displaystyle\begin{cases}\lambda\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}(\beta),\\ \left|\langle{\bm{u}},{\bm{x}}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}\frac{1}{\sqrt{\alpha_{2}(\lambda^{\infty}(\beta))\alpha_{3}(\lambda^{\infty}(\beta))}},\\ \left|\langle{\bm{v}},{\bm{y}}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}\frac{1}{\sqrt{\alpha_{1}(\lambda^{\infty}(\beta))\alpha_{3}(\lambda^{\infty}(\beta))}},\\ \left|\langle{\bm{w}},{\bm{z}}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}\frac{1}{\sqrt{\alpha_{1}(\lambda^{\infty}(\beta))\alpha_{2}(\lambda^{\infty}(\beta))}},\end{cases}

where αi(z)βz+g(z)gi(z)\alpha_{i}(z)\equiv\frac{\beta}{z+g(z)-g_{i}(z)} and λ(β)\lambda^{\infty}(\beta) satisfies f(λ(β),β)=0f(\lambda^{\infty}(\beta),\beta)=0 with f(z,β)=z+g(z)βα1(z)α2(z)α3(z)f(z,\beta)=z+g(z)-\frac{\beta}{\alpha_{1}(z)\alpha_{2}(z)\alpha_{3}(z)}. Besides, for β[0,βs]\beta\in[0,\beta_{s}], λ\lambda^{\infty} is bounded999Such bound might be computed numerically and corresponds to the right edge of the limiting spectral measure in Theorem 4. by an order one constant and |𝐮,𝐱|,|𝐯,𝐲|,|𝐰,𝐳|a.s.0\left|\langle{\bm{u}},{\bm{x}}\rangle\right|,\left|\langle{\bm{v}},{\bm{y}}\rangle\right|,\left|\langle{\bm{w}},{\bm{z}}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}0.

Proof.

See Appendix B.5. ∎

Remark 6.

A more compact expression for the alignments is provided in Theorem 8.

Refer to caption
Figure 5: Asymptotic singular value and alignments of 𝑻{\bm{\mathsfit{T}}} as predicted by Theorem 5 for c1=16,c2=13c_{1}=\frac{1}{6},c_{2}=\frac{1}{3} and c3=12c_{3}=\frac{1}{2}.

Figure 5 depicts the predicted asymptotic dominant singular value λ(β)\lambda^{\infty}(\beta) of 𝑻{\bm{\mathsfit{T}}} and the corresponding alignments from Theorem 5. As we can see, the result of Theorem 5 predicts that a non-zero correlation between the population signals 𝒙,𝒚,𝒛{\bm{x}},{\bm{y}},{\bm{z}} and their estimated counterparts is possible only when β>βs1.1134\beta>\beta_{s}\approx 1.1134, which corresponds to the value after which λ(β)\lambda^{\infty}(\beta) starts to increase with β\beta.

Remark 7.

Note that, given g(z)g(z), the inverse formula expressing β\beta in terms of λ\lambda^{\infty} is explicit. Specifically, we have β(λ)=i=13(λ+g(λ)gi(λ))λ+g(λ)\beta(\lambda^{\infty})=\sqrt{\frac{\prod_{i=1}^{3}(\lambda^{\infty}+g(\lambda^{\infty})-g_{i}(\lambda^{\infty}))}{\lambda^{\infty}+g(\lambda^{\infty})}}. In particular, this inverse formula provides an estimator for the SNR β\beta given the largest singular value λ\lambda of 𝑻{\bm{\mathsfit{T}}}.

3.6 Cubic 33-order tensors: case c1=c2=c3=13c_{1}=c_{2}=c_{3}=\frac{1}{3}

In this section, we study the particular case where all the tensor dimensions are equal. As such, the three alignments 𝒖,𝒙,𝒗,𝒚,𝒘,𝒛\langle{\bm{u}},{\bm{x}}\rangle,\langle{\bm{v}},{\bm{y}}\rangle,\langle{\bm{w}},{\bm{z}}\rangle converge almost surely to the same quantity. In this case, the almost sure limits of λ\lambda and 𝒖,𝒙,𝒗,𝒚,𝒘,𝒛\langle{\bm{u}},{\bm{x}}\rangle,\langle{\bm{v}},{\bm{y}}\rangle,\langle{\bm{w}},{\bm{z}}\rangle can be obtained explicitly in terms of the signal strength β\beta as per the following corollary of Theorem 5.

Corollary 3.

Under Assumptions 2 and 1 with c1=c2=c3=13c_{1}=c_{2}=c_{3}=\frac{1}{3}, for β>βs=233\beta>\beta_{s}=\frac{2\sqrt{3}}{3}

{λa.s.λ(β)β22+2+3(3β24)318β,|𝒖,𝒙|,|𝒗,𝒚|,|𝒘,𝒛|a.s.9β212+3(3β24)3β+9β2+36+3(3β24)3β62β.\displaystyle\begin{cases}\lambda\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}(\beta)\equiv\sqrt{\frac{\beta^{2}}{2}+2+\frac{\sqrt{3}\sqrt{\left(3\beta^{2}-4\right)^{3}}}{18\beta}},\\ \left|\langle{\bm{u}},{\bm{x}}\rangle\right|,\left|\langle{\bm{v}},{\bm{y}}\rangle\right|,\left|\langle{\bm{w}},{\bm{z}}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}\frac{\sqrt{9\beta^{2}-12+\frac{\sqrt{3}\sqrt{\left(3\beta^{2}-4\right)^{3}}}{\beta}}+\sqrt{9\beta^{2}+36+\frac{\sqrt{3}\sqrt{\left(3\beta^{2}-4\right)^{3}}}{\beta}}}{6\sqrt{2}\beta}.\end{cases}

Besides, for β[0,233]\beta\in\left[0,\frac{2\sqrt{3}}{3}\right], λa.s.λ223\lambda\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}\leq 2\sqrt{\frac{2}{3}} and |𝐮,𝐱|,|𝐯,𝐲|,|𝐰,𝐳|a.s.0\left|\langle{\bm{u}},{\bm{x}}\rangle\right|,\left|\langle{\bm{v}},{\bm{y}}\rangle\right|,\left|\langle{\bm{w}},{\bm{z}}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}0.

Proof.

See Appendix B.6. ∎

Figure 6 provides plots of the almost sure limits of the singular value and alignments when 𝑻{\bm{\mathsfit{T}}} is cubic as per Corollary 3 (see Subsection A.2 for simulations supporting the above result). In particular, this result predicts a possible correlation between the singular vectors and the underlying signal components above the value βs=2331.154\beta_{s}=\frac{2\sqrt{3}}{3}\approx 1.154 with corresponding singular value λs=2231.633\lambda_{s}=2\sqrt{\frac{2}{3}}\approx 1.633 with an alignment as=220.707a_{s}=\frac{\sqrt{2}}{2}\approx 0.707. In addition, we can easily check from the formulas above that λ(β)β1\frac{\lambda^{\infty}(\beta)}{\beta}\to 1 and |𝒖,𝒙|1\left|\langle{\bm{u}},{\bm{x}}\rangle\right|\to 1 for large values of β\beta. Besides, for values of β\beta around the value βs\beta_{s}, the expression of λ(β)\lambda^{\infty}(\beta) admits the following expansion

λ(β)=223+24(ββs)+2 3144(ββs)32+3664(ββs)2+o((ββs)2),\displaystyle\lambda^{\infty}(\beta)=2\sqrt{\frac{2}{3}}+\frac{\sqrt{2}}{4}\left(\beta-\beta_{s}\right)+\frac{\sqrt{2}\,3^{\frac{1}{4}}}{4}\left(\beta-\beta_{s}\right)^{\frac{3}{2}}+\frac{3\sqrt{6}}{64}\left(\beta-\beta_{s}\right)^{2}+o(\left(\beta-\beta_{s}\right)^{2}),

whereas the corresponding alignment expends as

22+2 3144ββs616(ββs)2 33416(ββs)32+212256(ββs)2+o((ββs)2).\displaystyle\frac{\sqrt{2}}{2}+\frac{\sqrt{2}\,3^{\frac{1}{4}}}{4}\sqrt{\beta-\beta_{s}}-\frac{\sqrt{6}}{16}\left(\beta-\beta_{s}\right)-\frac{\sqrt{2}\,3^{\frac{3}{4}}}{16}\left(\beta-\beta_{s}\right)^{\frac{3}{2}}+\frac{21\sqrt{2}}{256}\left(\beta-\beta_{s}\right)^{2}+o(\left(\beta-\beta_{s}\right)^{2}).
Refer to caption
Figure 6: Asymptotic singular value and alignments of 𝑻{\bm{\mathsfit{T}}} for c1=c2=c3=13c_{1}=c_{2}=c_{3}=\frac{1}{3} as per Corollary 3.

4 Random tensors meet random matrices

In this section, we investigate the application of Theorem 5 to the particular case of spiked random matrices. Indeed, for instance when p=1p=1 the spiked tensor model in Eq. (10) becomes a spiked matrix model which we will now denote as

𝑴=β𝒙𝒚+1N𝑿,\displaystyle{\bm{M}}=\beta{\bm{x}}{\bm{y}}^{\top}+\frac{1}{\sqrt{N}}{\bm{X}}, (23)

where again 𝒙{\bm{x}} and 𝒚{\bm{y}} are on the unit spheres 𝕊m1{\mathbb{S}}^{m-1} and 𝕊n1{\mathbb{S}}^{n-1} respectively, N=m+nN=m+n and 𝑿{\bm{X}} is a Gaussian noise matrix with i.i.d. entries Xij𝒩(0,1)X_{ij}\sim\mathcal{N}(0,1).

Our approach does not apply directly to the matrix 𝑴{\bm{M}} above. Indeed, the singular vectors 𝒖,𝒗{\bm{u}},{\bm{v}} of 𝑴{\bm{M}} corresponding to its largest singular value λ\lambda satisfy the identities

𝑴𝒗=λ𝒖,𝑴𝒖=λ𝒗,λ=𝒖𝑴𝒗,\displaystyle{\bm{M}}{\bm{v}}=\lambda{\bm{u}},\quad{\bm{M}}^{\top}{\bm{u}}=\lambda{\bm{v}},\quad\lambda={\bm{u}}^{\top}{\bm{M}}{\bm{v}}, (24)

and deriving 𝒖,𝒗{\bm{u}},{\bm{v}} w.r.t. an entry XijX_{ij} of 𝑿{\bm{X}} will result in

([𝟎m×m𝑴𝑴𝟎n×n]λ𝑰N)[𝒖Xij𝒗Xij]=1N[vj(𝒆imui𝒖)ui(𝒆jnvj𝒗)].\displaystyle\left(\begin{bmatrix}{\bm{0}}_{m\times m}&{\bm{M}}\\ {\bm{M}}^{\top}&{\bm{0}}_{n\times n}\end{bmatrix}-\lambda{\bm{I}}_{N}\right)\begin{bmatrix}\frac{\partial{\bm{u}}}{\partial X_{ij}}\\ \frac{\partial{\bm{v}}}{\partial X_{ij}}\end{bmatrix}=-\frac{1}{\sqrt{N}}\begin{bmatrix}v_{j}\left({\bm{e}}_{i}^{m}-u_{i}{\bm{u}}\right)\\ u_{i}\left({\bm{e}}_{j}^{n}-v_{j}{\bm{v}}\right)\end{bmatrix}. (25)

Now since λ\lambda is an eigenvalue of [𝟎m×m𝑴𝑴𝟎n×n]\begin{bmatrix}{\bm{0}}_{m\times m}&{\bm{M}}\\ {\bm{M}}^{\top}&{\bm{0}}_{n\times n}\end{bmatrix}, the matrix ([𝟎m×m𝑴𝑴𝟎n×n]λ𝑰N)\left(\begin{bmatrix}{\bm{0}}_{m\times m}&{\bm{M}}\\ {\bm{M}}^{\top}&{\bm{0}}_{n\times n}\end{bmatrix}-\lambda{\bm{I}}_{N}\right) is not invertible, which makes our approach inapplicable for d=2d=2. However, we can retrieve the behavior of the spiked matrix model in Eq. (23) by considering an order 33 tensor and setting for instance c30c_{3}\to 0 as we will discuss subsequently.

4.1 Limiting spectral measure

Given the result of Theorem 5, the asymptotic singular value and alignments for the spiked matrix model in Eq. (23) correspond to the particular case c30c_{3}\to 0. We start by characterizing the corresponding limiting spectral measure, we have the following corollary of Theorem 4 when c30c_{3}\to 0.

Corollary 4.

Given the setting and notations of Theorem 3, under Assumptions 2 and 1 with c1=c,c2=1cc_{1}=c,c_{2}=1-c for c(0,1)c\in(0,1), the empirical spectral measure of 1NΦ3(𝑿,𝐚,𝐛,𝐜)\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}}) converges weakly almost surely to a deterministic measure ν\nu defined on the support 𝒮(ν)[1+2η,12η][12η,1+2η]\mathcal{S}(\nu)\equiv\left[-\sqrt{1+2\sqrt{\eta}},-\sqrt{1-2\sqrt{\eta}}\right]\cup\left[\sqrt{1-2\sqrt{\eta}},\sqrt{1+2\sqrt{\eta}}\right] with η=c(1c)\eta=c(1-c), whose density function writes as

ν(dx)=1πxsin(arctan2(0,qc(x))2)|qc(x)|sign(sin(arctan2(0,qc(x))2)x)dx+(12min(c,1c))δ(x),\displaystyle\nu(dx)=\frac{1}{\pi x}\sin\left(\frac{\arctan_{2}(0,q_{c}(x))}{2}\right)\sqrt{|q_{c}(x)|}\operatorname{sign}\left(\frac{\sin\left(\frac{\arctan_{2}(0,q_{c}(x))}{2}\right)}{x}\right)dx+\left(1-2\min(c,1-c)\right)\delta(x),

where qc(x)=(x21)2+4c(c1)q_{c}(x)=(x^{2}-1)^{2}+4c(c-1). And the corresponding Stieltjes transform writes as

g(z)=z+qc(z)z,forz𝒮(ν).\displaystyle g(z)=-z+\frac{\sqrt{q_{c}(z)}}{z},\quad\text{for}\quad z\in{\mathbb{C}}\setminus\mathcal{S}(\nu).
Proof.

See Appendix B.7. ∎

Refer to caption
Figure 7: Limiting spectral measure of 1NΦ3(𝑿,𝒂,𝒃,𝒄)\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}}) for c1=c,c2=1cc_{1}=c,c_{2}=1-c as per Corollary 4.
Remark 8.

Corollary 4 describes the limiting spectral measure of 1m+nΦ3(𝑿,𝐚,𝐛,𝐜)\frac{1}{\sqrt{m+n}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}}) for 𝑿𝕋m,n,1(𝒩(0,1)){\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{m,n,1}(\mathcal{N}(0,1)), (𝐚,𝐛)𝕊m1×𝕊n1({\bm{a}},{\bm{b}})\in{\mathbb{S}}^{m-1}\times{\mathbb{S}}^{n-1} and 𝐜=1{\bm{c}}=1 (a scalar), which is equivalent to random matrices of the form 1m+n[𝟎m×m𝐗𝐗𝟎n×n]\frac{1}{\sqrt{m+n}}\begin{bmatrix}{\bm{0}}_{m\times m}&{\bm{X}}\\ {\bm{X}}^{\top}&{\bm{0}}_{n\times n}\end{bmatrix} with 𝐗𝕄m,n(𝒩(0,1)){\bm{X}}\sim{\mathbb{M}}_{m,n}(\mathcal{N}(0,1)).

Figure 7 depicts the limiting spectral measures as per Corollary 4 for various values of cc (see also simulations in Appendix A.1). In particular, the limiting measure is a semi-circle law for square matrices (i.e., c=12c=\frac{1}{2}), while it decomposes into a two-mode distribution101010With a Dirac δ(x)\delta(x) at 0 weighted by 12min(c,1c)1-2\min(c,1-c), not shown in Figure 7. for c(0,1){12}c\in(0,1)\setminus\{\frac{1}{2}\} due to the fact that the underlying random matrix model is not full rank (Φ3(𝑿,𝒂,𝒃,1)\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},1) if of rank 2min(m,n)2\min(m,n)).

4.2 Limiting singular value and alignments

Given the limiting Stieltjes transform from the previous subsection (Corollary 4), the limiting largest singular value of 𝑴{\bm{M}} (in Eq. (23)) and the alignments of the corresponding singular vectors are obtained thanks to Theorem 5, yielding the following corollary.

Corollary 5.

Under Assumption 1 with c1=c,c2=1cc_{1}=c,c_{2}=1-c, we have for β>βs=c(1c)4\beta>\beta_{s}=\sqrt[4]{c(1-c)}

{λa.s.λ(β)=β2+1+c(1c)β2,|𝒖,𝒙|a.s.1κ(β,c),|𝒗,𝒚|a.s.1κ(β,1c),\displaystyle\begin{cases}\lambda\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}(\beta)=\sqrt{\beta^{2}+1+\frac{c(1-c)}{\beta^{2}}},\\ \left|\langle{\bm{u}},{\bm{x}}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}\frac{1}{\kappa(\beta,c)},\quad\left|\langle{\bm{v}},{\bm{y}}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}\frac{1}{\kappa(\beta,1-c)},\end{cases}

where 𝐮𝕊m1,𝐯𝕊n1{\bm{u}}\in{\mathbb{S}}^{m-1},{\bm{v}}\in{\mathbb{S}}^{n-1} are the singular vectors of 𝐌{\bm{M}} corresponding to its largest singular value λ\lambda and κ(β,c)\kappa(\beta,c) is given by

κ(β,c)=ββ2(β2+1)c(c1)(β4+c(c1))(β2+1c).\displaystyle\kappa(\beta,c)=\beta\sqrt{\frac{\beta^{2}\left(\beta^{2}+1\right)-c\left(c-1\right)}{(\beta^{4}+c(c-1))\left(\beta^{2}+1-c\right)}}.

Besides, for β[0,βs]\beta\in[0,\beta_{s}], λa.s.1+2c(1c)\lambda\operatorname{\,\xrightarrow{\text{a.s.}}\,}\sqrt{1+2\sqrt{c(1-c)}} and |𝐮,𝐱|,|𝐯,𝐲|a.s.0\left|\langle{\bm{u}},{\bm{x}}\rangle\right|,\left|\langle{\bm{v}},{\bm{y}}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}0.

Proof.

See Appendix B.8. ∎

Refer to caption
Figure 8: Asymptotic singular value and alignments of 𝑻{\bm{\mathsfit{T}}} (and of 𝑴{\bm{M}} in Eq. (23)) for c1=cc_{1}=c and c2=1cc_{2}=1-c as per Corollary 5, for different values of c{12,110,150}c\in\left\{\frac{1}{2},\frac{1}{10},\frac{1}{50}\right\}.

Figure 8 provides the curves of the asymptotic singular value and alignments of Corollary 5 (see Subsection A.1 for simulations supporting the above result). Unlike the tensor case from the previous section, we see that the asymptotic alignments a𝒙(β),a𝒚(β)a_{\bm{x}}^{\infty}(\beta),a_{\bm{y}}^{\infty}(\beta) are continuous and a positive alignment is observed for β>βs\beta>\beta_{s} which corresponds to the classical BBP phase transition of spiked random matrix models [2, 5, 19].

Remark 9 (Application to tensor unfolding).

The tensor unfolding method consists in estimating the spike components of a given tensor 𝑻=γ𝐱(1)𝐱(d)+1n𝑿𝕋nd{\bm{\mathsfit{T}}}=\gamma{\bm{x}}^{(1)}\otimes\cdots\otimes{\bm{x}}^{(d)}+\frac{1}{\sqrt{n}}{\bm{\mathsfit{X}}}\in{\mathbb{T}}_{n}^{d} with 𝑿𝕋nd(𝒩(0,1)){\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n}^{d}(\mathcal{N}(0,1)) by applying an SVD to its unfolded matrices (for i[d]i\in[d]) Mati(𝑻)=γ𝐱(i)(𝐲(i))+1n𝐗i\operatorname{Mat}_{i}({\bm{\mathsfit{T}}})=\gamma{\bm{x}}^{(i)}({\bm{y}}^{(i)})^{\top}+\frac{1}{\sqrt{n}}{\bm{X}}_{i} of size n×nd1n\times n^{d-1}. Applying Corollary 5 to this case predicts a phase transition at βs=c(1c)4\beta_{s}=\sqrt[4]{c(1-c)} with111111 In our assumptions, we assume that cc is some constant that does not depend on the tensor dimensions. Still, we believe that it can be relaxed in the same vein as in [4]. c=nn+nd1=11+nd20c=\frac{n}{n+n^{d-1}}=\frac{1}{1+n^{d-2}}\to 0, which yields βs=nd2(1+nd2)24\beta_{s}=\sqrt[4]{\frac{n^{d-2}}{(1+n^{d-2})^{2}}}. After re-scaling the noise component (by multiplying Eq. (23) by 1+mn\sqrt{1+\frac{m}{n}} with m=nd1m=n^{d-1}) this yields γ>nd24\gamma>n^{\frac{d-2}{4}}, i.e., the phase transition for tensor unfolding obtained by [3] (see Theorem 3.3 therein). More generally, for any order-dd tensor 𝑻{\bm{\mathsfit{T}}} of arbitrary dimensions n1××ndn_{1}\times\cdots\times n_{d}, the tensor unfolding method succeeds provided that β(i=1dni)1/4/i=1dni\beta\geq\left(\prod_{i=1}^{d}n_{i}\right)^{1/4}/\sqrt{\sum_{i=1}^{d}n_{i}}.

5 Generalization to arbitrary dd-order tensors

We now show that our approach can be generalized straightforwardly to the dd-order spiked tensor model of Eq. (1). Indeed, from Eq. (4), the 2\ell_{2}-singular value λ\lambda and vectors 𝒖(1),,𝒖(d)𝕊n11××𝕊nd1{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}\in{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}, corresponding to the best rank-one approximation λ𝒖(1)𝒖(d)\lambda{\bm{u}}^{(1)}\otimes\ldots\otimes{\bm{u}}^{(d)} of the dd-order tensor 𝑻{\bm{\mathsfit{T}}} in Eq. (1), satisfy the identities

{𝑻(𝒖(1),,𝒖(i1),:,𝒖(i+1),,𝒖(d))=λ𝒖(i),λ=𝑻(𝒖(1),,𝒖(d))=i1,,idui1(1)uid(d)Ti1,,id.\displaystyle\begin{cases}{\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(i-1)},:,{\bm{u}}^{(i+1)},\ldots,{\bm{u}}^{(d)})=\lambda{\bm{u}}^{(i)},\\ \lambda={\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})=\sum_{i_{1},\ldots,i_{d}}u_{i_{1}}^{(1)}\ldots u_{i_{d}}^{(d)}T_{i_{1},\ldots,i_{d}}.\end{cases} (26)

5.1 Associated random matrix ensemble

Let 𝑻ij𝕄ni,nj{\bm{\mathsfit{T}}}^{ij}\in{\mathbb{M}}_{n_{i},n_{j}} denote the matrix obtained by contracting the tensor 𝑻{\bm{\mathsfit{T}}} with the singular vectors in {𝒖(1),,𝒖(d)}{𝒖(i),𝒖(j)}\{{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}\}\setminus\{{\bm{u}}^{(i)},{\bm{u}}^{(j)}\}, i.e.,

𝑻ij𝑻(𝒖(1),,𝒖(i1),:,𝒖(i+1),,𝒖(j1),:,𝒖(j+1),,𝒖(d)).\displaystyle{\bm{\mathsfit{T}}}^{ij}\equiv{\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(i-1)},:,{\bm{u}}^{(i+1)},\ldots,{\bm{u}}^{(j-1)},:,{\bm{u}}^{(j+1)},\ldots,{\bm{u}}^{(d)}). (27)

As in the order 33 case, from Eq. (26), the derivatives of the singular vectors 𝒖(1),,𝒖(d){\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)} with respect to the entry Xi1,,idX_{i_{1},\ldots,i_{d}} of the noise tensor 𝑿{\bm{\mathsfit{X}}} express as

[𝒖(1)Xi1,,id𝒖(d)Xi1,,id]=1N([𝟎n1×n1𝑻12𝑻13𝑻1d(𝑻12)𝟎n2×n2𝑻23𝑻2d(𝑻13)(𝑻23)𝟎n3×n3𝑻3d(𝑻1d)(𝑻2d)(𝑻3d)𝟎nd×nd]λ𝑰N)1[{2,,d}ui()(𝒆i1n1ui1(i1)𝒖(i1)){1,,d1}ui()(𝒆idnduid(id)𝒖(id))]\displaystyle\begin{bmatrix}\frac{\partial{\bm{u}}^{(1)}}{\partial X_{i_{1},\ldots,i_{d}}}\\ \vdots\\ \frac{\partial{\bm{u}}^{(d)}}{\partial X_{i_{1},\ldots,i_{d}}}\end{bmatrix}=-\frac{1}{\sqrt{N}}\left(\begin{bmatrix}{\bm{0}}_{n_{1}\times n_{1}}&{\bm{\mathsfit{T}}}^{12}&{\bm{\mathsfit{T}}}^{13}&\cdots&{\bm{\mathsfit{T}}}^{1d}\\ ({\bm{\mathsfit{T}}}^{12})^{\top}&{\bm{0}}_{n_{2}\times n_{2}}&{\bm{\mathsfit{T}}}^{23}&\cdots&{\bm{\mathsfit{T}}}^{2d}\\ ({\bm{\mathsfit{T}}}^{13})^{\top}&({\bm{\mathsfit{T}}}^{23})^{\top}&{\bm{0}}_{n_{3}\times n_{3}}&\ldots&{\bm{\mathsfit{T}}}^{3d}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ ({\bm{\mathsfit{T}}}^{1d})^{\top}&({\bm{\mathsfit{T}}}^{2d})^{\top}&({\bm{\mathsfit{T}}}^{3d})^{\top}&\cdots&{\bm{0}}_{n_{d}\times n_{d}}\end{bmatrix}-\lambda{\bm{I}}_{N}\right)^{-1}\begin{bmatrix}\prod_{\ell\in\{2,\ldots,d\}}u_{i_{\ell}}^{(\ell)}({\bm{e}}_{i_{1}}^{n_{1}}-u_{i_{1}}^{(i_{1})}{\bm{u}}^{(i_{1})})\\ \vdots\\ \prod_{\ell\in\{1,\ldots,d-1\}}u_{i_{\ell}}^{(\ell)}({\bm{e}}_{i_{d}}^{n_{d}}-u_{i_{d}}^{(i_{d})}{\bm{u}}^{(i_{d})})\end{bmatrix} (28)

where N=i[d]niN=\sum_{i\in[d]}n_{i}, and the derivative of λ\lambda w.r.t. Xi1,,idX_{i_{1},\ldots,i_{d}} writes as

λXi1,,id=1N[d]ui().\displaystyle\frac{\partial\lambda}{\partial X_{i_{1},\ldots,i_{d}}}=\frac{1}{\sqrt{N}}\prod_{\ell\in[d]}u_{i_{\ell}}^{(\ell)}. (29)

As such, the associated random matrix model of 𝑻{\bm{\mathsfit{T}}} is the matrix appearing in the resolvent in Eq. (28). More generally, the dd-order block-wise tensor contraction ensemble d(𝑿)\mathcal{B}_{d}({\bm{\mathsfit{X}}}) for 𝑿𝕋n1,,nd(𝒩(0,1)){\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{d}}(\mathcal{N}(0,1)) is defined as

d(𝑿){Φd(𝑿,𝒂(1),,𝒂(d))|(𝒂(1),,𝒂(d))𝕊n11××𝕊nd1}\displaystyle\mathcal{B}_{d}({\bm{\mathsfit{X}}})\equiv\left\{\Phi_{d}({\bm{\mathsfit{X}}},{\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)})\,\,\big{|}\,\,({\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)})\in{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}\right\} (30)

where Φd\Phi_{d} is the mapping

Φd:𝕋n1,,nd×𝕊n11××𝕊nd1𝕄ini(𝑿,𝒂(1),,𝒂(d))[𝟎n1×n1𝑿12𝑿13𝑿1d(𝑿12)𝟎n2×n2𝑿23𝑿2d(𝑿13)(𝑿23)𝟎n3×n3𝑿3d(𝑿1d)(𝑿2d)(𝑿3d)𝟎nd×nd],\begin{split}\Phi_{d}:{\mathbb{T}}_{n_{1},\ldots,n_{d}}\times{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}&\longrightarrow{\mathbb{M}}_{\sum_{i}n_{i}}\\ ({\bm{\mathsfit{X}}},{\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)})&\longmapsto\begin{bmatrix}{\bm{0}}_{n_{1}\times n_{1}}&{\bm{\mathsfit{X}}}^{12}&{\bm{\mathsfit{X}}}^{13}&\cdots&{\bm{\mathsfit{X}}}^{1d}\\ ({\bm{\mathsfit{X}}}^{12})^{\top}&{\bm{0}}_{n_{2}\times n_{2}}&{\bm{\mathsfit{X}}}^{23}&\cdots&{\bm{\mathsfit{X}}}^{2d}\\ ({\bm{\mathsfit{X}}}^{13})^{\top}&({\bm{\mathsfit{X}}}^{23})^{\top}&{\bm{0}}_{n_{3}\times n_{3}}&\ldots&{\bm{\mathsfit{X}}}^{3d}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ ({\bm{\mathsfit{X}}}^{1d})^{\top}&({\bm{\mathsfit{X}}}^{2d})^{\top}&({\bm{\mathsfit{X}}}^{3d})^{\top}&\cdots&{\bm{0}}_{n_{d}\times n_{d}}\end{bmatrix},\end{split} (31)

with 𝑿ij𝑿(𝒂(1),,𝒂(i1),:,𝒂(i+1),,𝒂(j1),:,𝒂(j+1),,𝒂(d))𝕄ni,nj{\bm{\mathsfit{X}}}^{ij}\equiv{\bm{\mathsfit{X}}}({\bm{a}}^{(1)},\ldots,{\bm{a}}^{(i-1)},:,{\bm{a}}^{(i+1)},\ldots,{\bm{a}}^{(j-1)},:,{\bm{a}}^{(j+1)},\ldots,{\bm{a}}^{(d)})\in{\mathbb{M}}_{n_{i},n_{j}}.

Remark 10.

As in the order 33 case, to ensure the existence of the matrix inverse in Eq. (28), we need to suppose that there exists a tuple (λ,𝐮(1),,𝐮(d))(\lambda_{*},{\bm{u}}_{*}^{(1)},\ldots,{\bm{u}}_{*}^{(d)}) verifying the identities in Eq. (26) such that λ\lambda_{*} is not an eigenvalue of Φd(𝑻,𝐮(1),,𝐮(d))\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}_{*}^{(1)},\ldots,{\bm{u}}_{*}^{(d)}).

5.2 Limiting spectral measure of block-wise dd-order tensor contractions

In this section, we characterize the limiting spectral measure of the ensemble d(𝑿)\mathcal{B}_{d}({\bm{\mathsfit{X}}}) for 𝑿𝕋n1,,nd(𝒩(0,1)){\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{d}}(\mathcal{N}(0,1)) in the limit when all tensor dimensions grow as per the following assumption.

Assumption 3.

For all i[d]i\in[d], assume that nin_{i}\to\infty with nijnjci(0,1)\frac{n_{i}}{\sum_{j}n_{j}}\to c_{i}\in(0,1).

We thus have the following result which characterizes the spectrum of 1NΦd(𝑿,𝒂(1),,𝒂(d))\frac{1}{\sqrt{N}}\Phi_{d}({\bm{\mathsfit{X}}},{\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)}) for any deterministic unit norm vectors 𝒂(1),,𝒂(d){\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)}.

Theorem 6.

Let 𝑿𝕋n1,,nd(𝒩(0,1)){\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{d}}(\mathcal{N}(0,1)) be a sequence of random asymmetric Gaussian tensors and (𝐚(1),,𝐚(d))𝕊n11××𝕊nd1({\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)})\in{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1} a sequence of deterministic vectors of increasing dimensions, following Assumption 3. Then the empirical spectral measure of 1NΦd(𝑿,𝐚(1),,𝐚(d))\frac{1}{\sqrt{N}}\Phi_{d}({\bm{\mathsfit{X}}},{\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)}) converges weakly almost surely to a deterministic measure ν\nu whose Stieltjes transform g(z)g(z) is defined as the solution to the equation g(z)=i=1dgi(z)g(z)=\sum_{i=1}^{d}g_{i}(z) such that [g(z)]>0\Im[g(z)]>0 for [z]>0\Im[z]>0 where, for i[d]i\in[d] gi(z)g_{i}(z) satisfies gi2(z)(g(z)+z)gi(z)ci=0g_{i}^{2}(z)-(g(z)+z)g_{i}(z)-c_{i}=0 for z𝒮(ν)z\in{\mathbb{C}}\setminus\mathcal{S}(\nu).

Proof.

See Appendix B.9. ∎

Algorithm 1 Fixed point equation to compute the Stieltjes transform in Theorem 6
z𝒮(ν)z\in{\mathbb{R}}\setminus\mathcal{S}(\nu) and tensor dimension ratios 𝒄=[c1,,cd](0,1)d{\bm{c}}=[c_{1},\ldots,c_{d}]^{\top}\in(0,1)^{d}
g,𝒈=[g1,,gd]dg\in{\mathbb{R}},\quad{\bm{g}}=[g_{1},\ldots,g_{d}]^{\top}\in{\mathbb{R}}^{d}
Repeat
𝒈g+z24𝒄+(g+z)22\quad{\bm{g}}\leftarrow\frac{g+z}{2}-\frac{\sqrt{4{\bm{c}}+(g+z)^{2}}}{2} \triangleright Element-wise vector operation.
gi=1dgi\quad g\leftarrow\sum_{i=1}^{d}g_{i}
until convergence of gg

Algorithm 1 provides a pseudo-code to compute the Stieltjes transform g(z)g(z) in Theorem 6 through an iterative solution to the fixed point equation g(z)=i=1dgi(z)g(z)=\sum_{i=1}^{d}g_{i}(z). In particular, as for the order 33 case, when all the tensor dimensions are equal (i.e., ci=1dc_{i}=\frac{1}{d} for all i[d]i\in[d]), the spectral measure of 1NΦd(𝑿,𝒂(1),,𝒂(d))\frac{1}{\sqrt{N}}\Phi_{d}({\bm{\mathsfit{X}}},{\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)}) converges to a semi-circle law. We have the following corollary of Theorem 6.

Corollary 6.

With the setting of Theorem 6. Under Assumption 3 with ci=1dc_{i}=\frac{1}{d} for all i[d]i\in[d], the empirical spectral measure of 1NΦd(𝑿,𝐚(1),,𝐚(d))\frac{1}{\sqrt{N}}\Phi_{d}({\bm{\mathsfit{X}}},{\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)}) converges weakly almost surely to the semi-circle distribution support 𝒮(ν)[2d1d,2d1d]\mathcal{S}(\nu)\equiv\left[-2\sqrt{\frac{d-1}{d}},2\sqrt{\frac{d-1}{d}}\right], whose density and Stieltjes transform write respectively as

ν(dx)=d2(d1)π(4(d1)dx2)+,g(z)zd+dz24(d1)d2(d1),wherez𝒮(ν).\displaystyle\nu(dx)=\frac{d}{2(d-1)\pi}\sqrt{\left(\frac{4(d-1)}{d}-x^{2}\right)^{+}},\quad g(z)\equiv\frac{-zd+d\sqrt{z^{2}-\frac{4(d-1)}{d}}}{2(d-1)},\quad\text{where}\quad z\in{\mathbb{C}}\setminus\mathcal{S}(\nu).
Proof.

See Appendix B.10. ∎

Refer to caption
Figure 9: Spectrum of 1NΦ4(𝑿,𝒂,𝒃,𝒄,𝒅)\frac{1}{\sqrt{N}}\Phi_{4}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}},{\bm{d}}) with 𝑿𝕋n1,,n4(𝒩(0,1)){\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{4}}(\mathcal{N}(0,1)) and 𝒂,𝒃,𝒄,𝒅{\bm{a}},{\bm{b}},{\bm{c}},{\bm{d}} independently sampled from the unit spheres 𝕊n11,,𝕊n41{\mathbb{S}}^{n_{1}-1},\ldots,{\mathbb{S}}^{n_{4}-1} respectively. In black the semi-circle law as per Theorem 6.

Figure 9 provides an illustration of the convergence in law (when the dimension nin_{i} grow large) of the spectrum of 1NΦ4(𝑿,𝒂,𝒃,𝒄,𝒅)\frac{1}{\sqrt{N}}\Phi_{4}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}},{\bm{d}}) with a 44th-order tensor 𝑿𝕋n1,,n4(𝒩(0,1)){\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{4}}(\mathcal{N}(0,1)) and 𝒂,𝒃,𝒄,𝒅{\bm{a}},{\bm{b}},{\bm{c}},{\bm{d}} independently sampled from the unit spheres 𝕊n11,,𝕊n41{\mathbb{S}}^{n_{1}-1},\ldots,{\mathbb{S}}^{n_{4}-1} respectively.

Remark 11.

1NΦd(𝑿,𝒂(1),,𝒂(d))\frac{1}{\sqrt{N}}\Phi_{d}({\bm{\mathsfit{X}}},{\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)}) is almost surely of rank imin(ni,jinj)\sum_{i}\min(n_{i},\sum_{j\neq i}n_{j}). As in the order 33 case, when 1NΦd(𝑿,𝐚(1),,𝐚(d))\frac{1}{\sqrt{N}}\Phi_{d}({\bm{\mathsfit{X}}},{\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)}) is almost surely full rank, its spectral measure converges to a semi-circle law with connected support, while in general the limiting spectral measure has unconnected support.

The result of Theorem 6 still holds for Φd(𝑻,𝒖(1),,𝒖(d))\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}) where 𝒖(1),,𝒖(d){\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)} stand for the singular vectors of 𝑻{\bm{\mathsfit{T}}} defined through Eq. (26). Indeed, the statistical dependencies between these singular vectors and the noise tensor 𝑿{\bm{\mathsfit{X}}} do not affect the convergence of the spectrum of Φd(𝑻,𝒖(1),,𝒖(d))\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}) to the limiting measure described by Theorem 6. We further need the following assumption.

Assumption 4.

We assume that there exists a sequence of critical points (λ,𝐮(1),,𝐮(d))(\lambda_{*},{\bm{u}}_{*}^{(1)},\ldots,{\bm{u}}_{*}^{(d)}) satisfying Eq. (26) such that λa.s.λ(β)\lambda_{*}\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}(\beta), |𝐮(i),𝐱(i)|a.s.a𝐱(i)(β)|\langle{\bm{u}}_{*}^{(i)},{\bm{x}}^{(i)}\rangle|\operatorname{\,\xrightarrow{\text{a.s.}}\,}a_{{\bm{x}}^{(i)}}^{\infty}(\beta) with λ(β)𝒮(ν)\lambda^{\infty}(\beta)\notin\mathcal{S}(\nu) and a𝐱(i)(β)>0a_{{\bm{x}}^{(i)}}^{\infty}(\beta)>0.

Theorem 7.

Let 𝑻{\bm{\mathsfit{T}}} be a sequence of spiked random tensors defined as in Eq. (1). Under Assumptions 3 and 4, the empirical spectral measure of Φd(𝑻,𝐮(1),,𝐮(d))\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}_{*}^{(1)},\ldots,{\bm{u}}_{*}^{(d)}) converges weakly almost surely to a deterministic measure ν\nu whose Stieltjes transform g(z)g(z) is defined as the solution to the equation g(z)=i=1dgi(z)g(z)=\sum_{i=1}^{d}g_{i}(z) such that [g(z)]>0\Im[g(z)]>0 for [z]>0\Im[z]>0 where, for i[d]i\in[d] gi(z)g_{i}(z) satisfies gi2(z)(g(z)+z)gi(z)ci=0g_{i}^{2}(z)-(g(z)+z)g_{i}(z)-c_{i}=0 for z𝒮(ν)z\in{\mathbb{C}}\setminus\mathcal{S}(\nu).

Proof.

See Appendix B.12. ∎

Refer to caption
Figure 10: Spectrum of Φ4(𝑻,𝒖(1),,𝒖(4))\Phi_{4}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(4)}) at iterations 0,50,5 and at convergence of the power method (see Algorithm 3) applied to 𝑻{\bm{\mathsfit{T}}}, for n1=n2=n3=n4=50n_{1}=n_{2}=n_{3}=n_{4}=50 and β=0\beta=0.

Figure 10 depicts the spectrum of Φ4(𝑻,𝒖(1),,𝒖(4))\Phi_{4}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(4)}) for an order 44 tensor 𝑻{\bm{\mathsfit{T}}} with β=0\beta=0. As we saw previously, an isolated eigenvalue pops out from the continuous bulk of Φ4(𝑻,𝒖(1),,𝒖(4))\Phi_{4}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(4)}) because of the statistical dependencies between the tensor noise 𝑿{\bm{\mathsfit{X}}} and the singular vectors 𝒖(1),,𝒖(4){\bm{u}}^{(1)},\ldots,{\bm{u}}^{(4)}. More generally, for an order-dd tensor 𝑻{\bm{\mathsfit{T}}}, the spectrum of Φd(𝑻,𝒖(1),,𝒖(d))\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}) admits an isolated spike at the value (d1)λ(d-1)\lambda independently of the signal strength β\beta. Indeed, (d1)λ(d-1)\lambda is an eigenvalue of Φd(𝑻,𝒖(1),,𝒖(d))\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}) with corresponding eigenvector the concatenation of the singular vectors 𝒖(1),,𝒖(d){\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}, i.e.,

Φd(𝑻,𝒖(1),,𝒖(d))[𝒖(1)𝒖(d)]=[i1𝑻(:,𝒖(2),,𝒖(i1),:,𝒖(i+1),,𝒖(d))𝒖(i)id𝑻(𝒖(1),,𝒖(i1),:,𝒖(i+1),,𝒖(d1),:)𝒖(i)]=(d1)λ[𝒖(1)𝒖(d)].\displaystyle\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})\begin{bmatrix}{\bm{u}}^{(1)}\\ \vdots\\ {\bm{u}}^{(d)}\end{bmatrix}=\begin{bmatrix}\sum_{i\neq 1}{\bm{\mathsfit{T}}}(:,{\bm{u}}^{(2)},\ldots,{\bm{u}}^{(i-1)},:,{\bm{u}}^{(i+1)},\ldots,{\bm{u}}^{(d)}){\bm{u}}^{(i)}\\ \vdots\\ \sum_{i\neq d}{\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(i-1)},:,{\bm{u}}^{(i+1)},\ldots,{\bm{u}}^{(d-1)},:)^{\top}{\bm{u}}^{(i)}\end{bmatrix}=(d-1)\lambda\begin{bmatrix}{\bm{u}}^{(1)}\\ \vdots\\ {\bm{u}}^{(d)}\end{bmatrix}. (32)

5.3 Asymptotic singular value and alignments of hyper-rectangular tensors

Similarly to the 33-order case studied previously, when the dimensions of 𝑻{\bm{\mathsfit{T}}} grow large at a same rate, its singular value λ\lambda and the corresponding alignments 𝒖(i),𝒙(i)\langle{\bm{u}}^{(i)},{\bm{x}}^{(i)}\rangle for i[d]i\in[d] concentrate almost surely around some deterministic quantities which we denote λ(β)limNλ\lambda^{\infty}(\beta)\equiv\lim_{N\to\infty}\lambda and a𝒙(i)(β)limN|𝒖(i),𝒙(i)|a_{{\bm{x}}^{(i)}}^{\infty}(\beta)\equiv\lim_{N\to\infty}|\langle{\bm{u}}^{(i)},{\bm{x}}^{(i)}\rangle| respectively. Applying again Stein’s Lemma (Lemma 1) to the identities in Eq. (26), we obtain the following theorem which characterizes the aforementioned deterministic limits.

Theorem 8.

Recall the notations in Theorem 7. Under Assumptions 3 and 4, for d3d\geq 3, there exists βs>0\beta_{s}>0 such that for β>βs\beta>\beta_{s},

{λa.s.λ(β),|𝒙(i),𝒖(i)|a.s.qi(λ(β)),\displaystyle\begin{cases}\lambda\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}(\beta),\\ \left|\langle{\bm{x}}^{(i)},{\bm{u}}^{(i)}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}q_{i}\left(\lambda^{\infty}(\beta)\right),\end{cases}

where qi(z)q_{i}(z) is given by qi(z)=1gi2(z)ciq_{i}(z)=\sqrt{1-\frac{g_{i}^{2}(z)}{c_{i}}} and λ(β)\lambda^{\infty}(\beta) satisfies f(λ(β),β)=0f(\lambda^{\infty}(\beta),\beta)=0 with f(z,β)=z+g(z)βi=1dqi(z)f(z,\beta)=z+g(z)-\beta\prod_{i=1}^{d}q_{i}(z). Besides, for β[0,βs]\beta\in[0,\beta_{s}], λ\lambda^{\infty} is bounded (in particular when ci=1dc_{i}=\frac{1}{d} for all i[d]i\in[d], λa.s.λ2d1d\lambda\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}\leq 2\sqrt{\frac{d-1}{d}}) and |𝐱(i),𝐮(i)|a.s.0\left|\langle{\bm{x}}^{(i)},{\bm{u}}^{(i)}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}0.

Proof.

See Appendix B.13. ∎

Remark 12.

As in the order 33 case, note that the inverse formula expressing β\beta in terms of λ\lambda^{\infty} is explicit. Specifically, we have β(λ)=λ+g(λ)i=1dqi(λ)\beta(\lambda^{\infty})=\frac{\lambda^{\infty}+g(\lambda^{\infty})}{\prod_{i=1}^{d}q_{i}(\lambda^{\infty})}. In particular, this inverse formula provides an estimator for the SNR β\beta given the largest singular value λ\lambda of 𝑻{\bm{\mathsfit{T}}}. Algorithm 2 provides a pseudo-code to compute the asymptotic alignments.

Algorithm 2 Compute alignments as per Theorem 8
SNR β+\beta\in{\mathbb{R}}_{+} and tensor dimension ratios 𝒄=[c1,,cd](0,1)d{\bm{c}}=[c_{1},\ldots,c_{d}]^{\top}\in(0,1)^{d}
Asymptotic singular value λ\lambda^{\infty} and corresponding alignments 𝒂=[a𝒙(1),,a𝒙(d)][0,1]d{\bm{a}}=\left[a_{{\bm{x}}^{(1)}}^{\infty},\ldots,a_{{\bm{x}}^{(d)}}^{\infty}\right]^{\top}\in[0,1]^{d}
Set λ\lambda^{\infty} as the solution of f(z,β)=0f(z,\beta)=0 in zz.
Set the alignments as a𝒙(i)1gi2(λ)cia_{{\bm{x}}^{(i)}}^{\infty}\leftarrow\sqrt{1-\frac{g_{i}^{2}(\lambda^{\infty})}{c_{i}}} with gi(z)g_{i}(z) computed by Algorithm 1 for z=λz=\lambda^{\infty}.
Refer to caption
Figure 11: Asymptotic singular value and alignments of 𝑻{\bm{\mathsfit{T}}} as predicted by Theorem 8 for c1=110,c2=16,c3=14c_{1}=\frac{1}{10},c_{2}=\frac{1}{6},c_{3}=\frac{1}{4} and c4=1(c1+c2+c3)c_{4}=1-(c_{1}+c_{2}+c_{3}).

Figure 11 depicts the asymptotic singular value and alignments for an order 44 tensor as per Theorem 8. As discussed previously, the predicted alignments are discontinuous and a strictly positive correlation between the singular vectors and the underlying signals is possible for the considered local optimum of the ML estimator only above the minimal signal strength βs1.234\beta_{s}\approx 1.234 in the shown example. In the case of hyper-cubic tensors, i.e., all the dimensions nin_{i} are equal, Figure 12 depicts the minimal SNR values in terms of the tensor order dd and the corresponding asymptotic singular value and alignments. As such, when the tensor order increases the minimal SNR and singular value converge respectively to 1.6\approx 1.6 and 22, while the corresponding alignment121212corresponding to the minimal theoretical SNR βs\beta_{s}. gets closer to 11. The expressions of βs\beta_{s} and a(βs)a^{\infty}(\beta_{s}) for hyper-cubic tensors of order dd are explicitly given as

βs=d1d(d2d1)1d2,limββsa(β)=d2d1.\displaystyle\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\beta_{s}=\sqrt{\frac{d-1}{d}}\left(\frac{d-2}{d-1}\right)^{1-\frac{d}{2}},\quad\lim_{\beta\to\beta_{s}}a^{\infty}(\beta)=\sqrt{\frac{d-2}{d-1}}. (33)
Refer to caption
Figure 12: Minimal signal strength βs\beta_{s} and the corresponding singular value and alignments in terms of the tensor order dd, when all the tensor dimensions are equal.

6 Generalization to rank rr tensor with orthogonal components

In this section we discuss the generalization of our previous findings to rank rr spiked tensor model with r>1r>1 of the form

𝑻==1rβ𝒙(1)𝒙(d)+1N𝑿,\displaystyle{\bm{\mathsfit{T}}}=\sum_{\ell=1}^{r}\beta_{\ell}\,{\bm{x}}^{(1)}_{\ell}\otimes\cdots\otimes{\bm{x}}^{(d)}_{\ell}+\frac{1}{\sqrt{N}}{\bm{\mathsfit{X}}}, (34)

where β1>>βr\beta_{1}>\cdots>\beta_{r} are the signal strengths, (𝒙(1),,𝒙(d))𝕊n11××𝕊nd1({\bm{x}}^{(1)}_{\ell},\ldots,{\bm{x}}^{(d)}_{\ell})\in{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1} for [r]\ell\in[r], 𝑿𝕋n1,,nd(𝒩(0,1)){\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{d}}(\mathcal{N}(0,1)) and N=i=1dniN=\sum_{i=1}^{d}n_{i}. Supposing that the components 𝒙1(i),,𝒙r(i){\bm{x}}^{(i)}_{1},\ldots,{\bm{x}}^{(i)}_{r} are mutually orthogonal, i.e., 𝒙(i),𝒙(i)=0\langle{\bm{x}}^{(i)}_{\ell},{\bm{x}}^{(i)}_{\ell^{\prime}}\rangle=0 for \ell\neq\ell^{\prime}, the above rr-rank spiked tensor model can be treated through equivalent rank-one tensors defined for each component β𝒙(1)𝒙(d)\beta_{\ell}\,{\bm{x}}^{(1)}_{\ell}\otimes\cdots\otimes{\bm{x}}^{(d)}_{\ell} independently, by applying the result of the rank-one case established in the previous section. Precisely, the best rank-rr approximation of 𝑻{\bm{\mathsfit{T}}} corresponds to

argminλ,(𝒖(1),,𝒖(d))𝕊n11××𝕊nd1𝑻=1rλ𝒖(1)𝒖(d)F2,\displaystyle\operatorname*{arg\,min}_{\lambda_{\ell}\,\in\,{\mathbb{R}},\,({\bm{u}}^{(1)}_{\ell},\ldots,{\bm{u}}^{(d)}_{\ell})\,\in\,{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}}\|{\bm{\mathsfit{T}}}-\sum_{\ell=1}^{r}\lambda_{\ell}\,{\bm{u}}^{(1)}_{\ell}\otimes\cdots\otimes{\bm{u}}^{(d)}_{\ell}\|_{\text{F}}^{2}, (35)

and the 𝒖(i){\bm{u}}^{(i)}_{\ell}’s correlate with 𝒙(i){\bm{x}}^{(i)}_{\ell}’s as a result of the uniqueness of orthogonal tensor decomposition (see Theorem 4.1 in [1]). Therefore, the study of 𝑻{\bm{\mathsfit{T}}} boils down to the study of the random matrices Φd(𝑻,𝒖(1),,𝒖(d))\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)}_{\ell},\ldots,{\bm{u}}^{(d)}_{\ell}) for [r]\ell\in[r] which behave as the rank-one case treated previously with signal strength β\beta_{\ell} respectively. Indeed, since 𝒖(i)𝕊ni1{\bm{u}}^{(i)}_{\ell}\in{\mathbb{S}}^{n_{i}-1} by definition and given the orthogonality condition on the 𝒙(i){\bm{x}}^{(i)}_{\ell}’s, for \ell^{\prime}\neq\ell the inner product 𝒖(i),𝒙(i)a.s.0\langle{\bm{u}}^{(i)}_{\ell},{\bm{x}}^{(i)}_{\ell^{\prime}}\rangle\operatorname{\,\xrightarrow{\text{a.s.}}\,}0 in high dimension, as such

Φd(𝑻,𝒖(1),,𝒖(d))Φd(𝑻,𝒖(1),,𝒖(d))a.s.0,\displaystyle\|\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)}_{\ell},\ldots,{\bm{u}}^{(d)}_{\ell})-\Phi_{d}({\bm{\mathsfit{T}}}_{\ell},{\bm{u}}^{(1)}_{\ell},\ldots,{\bm{u}}^{(d)}_{\ell})\|\operatorname{\,\xrightarrow{\text{a.s.}}\,}0, (36)

where 𝑻β𝒙(1)𝒙(d)+1N𝑿{\bm{\mathsfit{T}}}_{\ell}\equiv\beta_{\ell}\,{\bm{x}}^{(1)}_{\ell}\otimes\cdots\otimes{\bm{x}}^{(d)}_{\ell}+\frac{1}{\sqrt{N}}{\bm{\mathsfit{X}}}. Meaning, that the study of Φd(𝑻,𝒖(1),,𝒖(d))\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)}_{\ell},\ldots,{\bm{u}}^{(d)}_{\ell}) is equivalent to consider the rank-one spiked tensor model 𝑻{\bm{\mathsfit{T}}}_{\ell}.

7 Discussion

In this work, we characterized the asymptotic behavior of spiked asymmetric tensors by mapping them to equivalent (in spectral sense) random matrices. Our starting point is mainly the identities in Eq. (26) which are verified by all critical points of the ML problem. Quite surprisingly and as also discussed in [8] for symmetric tensors, we found that our asymptotic equations describe precisely the maximum of the ML problem which correlates with the true spike. Extrapolating the findings from [12, 8] in the symmetric case, we conjuncture the existence of an order one threshold βc\beta_{c} above which our equations describe the behavior of the global maximum of the ML problem. Unfortunately, it is still unclear how we can characterize such βc\beta_{c} with our present approach, which remains an open question. The same question concerns the characterization of the algorithmic threshold βa\beta_{a} which is more interesting from a practical standpoint as computing the ML solution is NP-hard for β<βa\beta<\beta_{a}.

In the present work, our results were derived under a Gaussian assumption on the tensor noise components. We believe that the derived formulas are universal in the sense that they extend to other distributions provided that the fourth order moment is finite (as assumed by [3] for long random matrices). Other extensions concern the generalization to higher-ranks with arbitrary components and possibly correlated noise components since the present RMT-tools are more flexible than the use of tools form statistical physics.

{acks}

[Acknowledgments] This work was supported by the MIAILargeDATA Chair at University Grenoble Alpes led by R. Couillet and the UGA-HUAWEI LarDist project led by M. Guillaud. We would like to thank Henrique Goulart, Pierre Common and Gérard Ben-Aroud for valuable discussions on the topic of random tensors.

Appendix A Simulations

In this section we provide simulations to support our findings.

A.1 Matrix case

We start by considering the spiked random matrix model of the form

𝑴=β𝒙𝒚+1m+n𝑿,with𝑿𝕄m,n(𝒩(0,1)),\displaystyle{\bm{M}}=\beta{\bm{x}}{\bm{y}}^{\top}+\frac{1}{\sqrt{m+n}}{\bm{X}},\quad\text{with}\quad{\bm{X}}\sim{\mathbb{M}}_{m,n}(\mathcal{N}(0,1)), (37)

and 𝒙,𝒚{\bm{x}},{\bm{y}} are unitary vectors of dimensions mm and nn respectively. Figure 13 depicts the spectrum of [𝟎m×m𝑴𝑴𝟎n×n]\begin{bmatrix}{\bm{0}}_{m\times m}&{\bm{M}}\\ {\bm{M}}^{\top}&{\bm{0}}_{n\times n}\end{bmatrix} for β=0\beta=0 for different values of the dimensions m,nm,n, and the predicted limiting spectral measure as per Corollary 4. Figure 14 shows a comparison between the asymptotic singular value and alignments obtained in Corollary 5 and their simulated counterparts through SVD applied on 𝑴{\bm{M}}. Figure 15 further provides comparison between theory and simulations in the case of long random matrices (m=200,n=m32m=200,n=m^{\frac{3}{2}}), which corresponds to tensor unfolding as per Remark 9, where a perfect matching is also observed between theory and simulations.

Refer to caption
Figure 13: Spectrum of [𝟎m×m𝑴𝑴𝟎n×n]\begin{bmatrix}{\bm{0}}_{m\times m}&{\bm{M}}\\ {\bm{M}}^{\top}&{\bm{0}}_{n\times n}\end{bmatrix} for β=0\beta=0 with different values of the dimensions m,nm,n and the limiting spectral measure in Corollary 4.
Refer to caption
Figure 14: Asymptotic singular value and alignments of the spiked random matrix in Eq. (37) as per Corollary 5, and their simulated counterparts for different dimensions m,nm,n. Simulations are performed through SVD applied to 𝑴{\bm{M}}.
Refer to caption
Figure 15: Asymptotic singular value and alignments of the spiked random matrix in Eq. (37) as per Corollary 5, and their simulated counterparts in the long matrix regime (m=200,n=m32m=200,n=m^{\frac{3}{2}}) which simulated tensor unfolding as in Remark 9. Simulations are performed through SVD applied to 𝑴{\bm{M}}.

A.2 Order 33 tensors

Now we consider a 33-order random tensor model of the form

𝑻=β𝒙𝒚𝒛+1m+n+p𝑿,with𝑿𝕋m,n,p(𝒩(0,1)),\displaystyle{\bm{\mathsfit{T}}}=\beta{\bm{x}}\otimes{\bm{y}}\otimes{\bm{z}}+\frac{1}{\sqrt{m+n+p}}{\bm{\mathsfit{X}}},\quad\text{with}\quad{\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{m,n,p}(\mathcal{N}(0,1)), (38)

and 𝒙,𝒚,𝒛{\bm{x}},{\bm{y}},{\bm{z}} are unitary vectors of dimensions m,nm,n and pp respectively. In our simulations the estimation of the singular vectors 𝒖,𝒗,𝒘{\bm{u}},{\bm{v}},{\bm{w}} of 𝑻{\bm{\mathsfit{T}}} is performed using the power iteration method described by Algorithm 3. We consider three initialization strategies for Algorithm 3:

  • (i)

    Random initialization by randomly sampling 𝒖0,𝒗0,𝒘0{\bm{u}}_{0},{\bm{v}}_{0},{\bm{w}}_{0} from the unitary spheres 𝕊m1,𝕊n1{\mathbb{S}}^{m-1},{\mathbb{S}}^{n-1} and 𝕊p1{\mathbb{S}}^{p-1} respectively. We refer to this strategy in the figures legends by “Random init.”.

  • (ii)

    Initialization with the true components 𝒙,𝒚,𝒛{\bm{x}},{\bm{y}},{\bm{z}}. We refer to this strategy in the figures legends by “Init. with 𝒙,𝒚,𝒛{\bm{x}},{\bm{y}},{\bm{z}}”.

  • (iii)

    We start by running Algorithm 3 with strategy (i) for a high value of SNR β1\beta\gg 1 then progressively diminishing β\beta and initializing Algorithm 3 with the components obtained for the precedent value of β\beta. We refer to this strategy in the figures legends by “Init. with 𝒖,𝒗,𝒘{\bm{u}},{\bm{v}},{\bm{w}}”.

Algorithm 3 Tensor power method [1]
An order dd tensor 𝑻𝕋n1,,nd{\bm{\mathsfit{T}}}\in{\mathbb{T}}_{n_{1},\ldots,n_{d}} and initial components (𝒖0(1),,𝒖0(d))𝕊n11××𝕊nd1({\bm{u}}^{(1)}_{0},\ldots,{\bm{u}}^{(d)}_{0})\in{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}
Estimate of argminλ,(𝒖(1),,𝒖(d))𝕊n11××𝕊nd1𝑻λ𝒖(1)𝒖(d)F2\operatorname*{arg\,min}_{\lambda\,\in\,{\mathbb{R}},\,({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})\,\in\,{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}}\|{\bm{\mathsfit{T}}}-\lambda\,{\bm{u}}^{(1)}\otimes\cdots\otimes{\bm{u}}^{(d)}\|_{\text{F}}^{2}
Repeat
for i[d]i\in[d] do
     𝒖(i)𝑻(𝒖(1),,𝒖(i1),:,𝒖(i+1),,𝒖(d))𝑻(𝒖(1),,𝒖(i1),:,𝒖(i+1),,𝒖(d)){\bm{u}}^{(i)}\leftarrow\frac{{\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(i-1)},:,{\bm{u}}^{(i+1)},\ldots,{\bm{u}}^{(d)})}{\|{\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(i-1)},:,{\bm{u}}^{(i+1)},\ldots,{\bm{u}}^{(d)})\|} \triangleright Contract 𝑻{\bm{\mathsfit{T}}} on all the 𝒖(j){\bm{u}}^{(j)}’s for jij\neq i
end for
until convergence of the u(i){\bm{u}}^{(i)}’s

Figure 16 provides a comparison between the asymptotic singular value and alignments of 𝑻{\bm{\mathsfit{T}}} (obtained by Corollary 3) with their simulation counterparts where the singular vectors are estimated by Algorithm 3 with the initialization strategies (i) in yellow dots and (ii) in green dots. As we can see, as the tensor dimensions grow, the numerical estimates approach their asymptotic counterparts for (ii). Besides, the random initialization (i) yields poor convergence for β\beta around its minimal value βs\beta_{s} when the tensor dimension is large enough (see m=n=p=150m=n=p=150). This phenomenon is related to the algorithmic time complexity of the power method which is known to succeed in recovering the underlying signal in polynomial time provided that βnd24\beta\gtrsim n^{\frac{d-2}{4}} [17, 6], which is also noticeable in our simulations (the algorithmic phase transition seems to grow with the tensor dimensions). Figure 17 further depicts comparison between theory and simulations when Algorithm 3 is initialized following strategy (iii) which allows to follow the trajectory of the global maximum of the ML problem, where we can notice a good matching between the asymptotic curves and their simulated counterparts.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 16: Asymptotic singular value and alignments of the order 33 tensor in Eq. (38) having equal dimensions as per Corollary 3, and their simulated counterparts for different dimensions {10,30,50,100,150}\{10,30,50,100,150\}. The yellow dots correspond to a random initialization of the power iteration method strategy (i), while the green dots correspond to an initialization with the true spike components 𝒙,𝒚{\bm{x}},{\bm{y}} and 𝒛{\bm{z}} strategy (ii).
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 17: Asymptotic singular value and alignments of the order 33 tensor in Eq. (38) having equal dimensions as per Corollary 3, and their simulated counterparts for different dimensions {10,30,50,100,150}\{10,30,50,100,150\}. The green dots correspond the initialization with strategy (iii).

Appendix B Proofs

B.1 Derivative of tensor singular value and vectors

Deriving the identities in Eq. (11) and Eq. (12) w.r.t. the entry XijkX_{ijk} of the tensor noise 𝑿{\bm{\mathsfit{X}}}, we obtain the following set of equations

{𝑻(𝒘)𝒗Xijk+𝑻(𝒗)𝒘Xijk+1Nvjwk𝒆im=λXijk𝒖+λ𝒖Xijk,𝑻(𝒘)𝒖Xijk+𝑻(𝒖)𝒘Xijk+1Nuiwk𝒆jn=λXijk𝒗+λ𝒗Xijk,𝑻(𝒗)𝒖Xijk+𝑻(𝒖)𝒗Xijk+1Nuivj𝒆kp=λXijk𝒘+λ𝒘Xijk,λXijk=𝑻(𝒖Xijk,𝒗,𝒘)+𝑻(𝒖,𝒗Xijk,𝒘)+𝑻(𝒖,𝒗,𝒘Xijk)+1Nuivjwk.\displaystyle\begin{cases}{\bm{\mathsfit{T}}}({\bm{w}})\frac{\partial{\bm{v}}}{\partial X_{ijk}}+{\bm{\mathsfit{T}}}({\bm{v}})\frac{\partial{\bm{w}}}{\partial X_{ijk}}+\frac{1}{\sqrt{N}}v_{j}w_{k}{\bm{e}}_{i}^{m}=\frac{\partial\lambda}{\partial X_{ijk}}{\bm{u}}+\lambda\frac{\partial{\bm{u}}}{\partial X_{ijk}},\\ {\bm{\mathsfit{T}}}({\bm{w}})^{\top}\frac{\partial{\bm{u}}}{\partial X_{ijk}}+{\bm{\mathsfit{T}}}({\bm{u}})\frac{\partial{\bm{w}}}{\partial X_{ijk}}+\frac{1}{\sqrt{N}}u_{i}w_{k}{\bm{e}}_{j}^{n}=\frac{\partial\lambda}{\partial X_{ijk}}{\bm{v}}+\lambda\frac{\partial{\bm{v}}}{\partial X_{ijk}},\\ {\bm{\mathsfit{T}}}({\bm{v}})^{\top}\frac{\partial{\bm{u}}}{\partial X_{ijk}}+{\bm{\mathsfit{T}}}({\bm{u}})^{\top}\frac{\partial{\bm{v}}}{\partial X_{ijk}}+\frac{1}{\sqrt{N}}u_{i}v_{j}{\bm{e}}_{k}^{p}=\frac{\partial\lambda}{\partial X_{ijk}}{\bm{w}}+\lambda\frac{\partial{\bm{w}}}{\partial X_{ijk}},\\ \frac{\partial\lambda}{\partial X_{ijk}}={\bm{\mathsfit{T}}}\left(\frac{\partial{\bm{u}}}{\partial X_{ijk}},{\bm{v}},{\bm{w}}\right)+{\bm{\mathsfit{T}}}\left({\bm{u}},\frac{\partial{\bm{v}}}{\partial X_{ijk}},{\bm{w}}\right)+{\bm{\mathsfit{T}}}\left({\bm{u}},{\bm{v}},\frac{\partial{\bm{w}}}{\partial X_{ijk}}\right)+\frac{1}{\sqrt{N}}u_{i}v_{j}w_{k}.\end{cases}

Writing 𝑻(𝒖Xijk,𝒗,𝒘){\bm{\mathsfit{T}}}\left(\frac{\partial{\bm{u}}}{\partial X_{ijk}},{\bm{v}},{\bm{w}}\right) as 𝑻(𝒖Xijk,𝒗,𝒘)=(𝒖Xijk)𝑻(𝒗)𝒘{\bm{\mathsfit{T}}}\left(\frac{\partial{\bm{u}}}{\partial X_{ijk}},{\bm{v}},{\bm{w}}\right)=\left(\frac{\partial{\bm{u}}}{\partial X_{ijk}}\right)^{\top}{\bm{\mathsfit{T}}}({\bm{v}}){\bm{w}}, we can apply again the identities in Eq. (11) which results in 𝑻(𝒖Xijk,𝒗,𝒘)=λ(𝒖Xijk)𝒖{\bm{\mathsfit{T}}}\left(\frac{\partial{\bm{u}}}{\partial X_{ijk}},{\bm{v}},{\bm{w}}\right)=\lambda\left(\frac{\partial{\bm{u}}}{\partial X_{ijk}}\right)^{\top}{\bm{u}}. Doing similarly with 𝑻(𝒖,𝒗Xijk,𝒘){\bm{\mathsfit{T}}}\left({\bm{u}},\frac{\partial{\bm{v}}}{\partial X_{ijk}},{\bm{w}}\right) and 𝑻(𝒖,𝒗,𝒘Xijk){\bm{\mathsfit{T}}}\left({\bm{u}},{\bm{v}},\frac{\partial{\bm{w}}}{\partial X_{ijk}}\right), we have

λXijk=λ((𝒖Xijk)𝒖+(𝒗Xijk)𝒗+(𝒘Xijk)𝒘)+1Nuivjwk.\displaystyle\frac{\partial\lambda}{\partial X_{ijk}}=\lambda\left(\left(\frac{\partial{\bm{u}}}{\partial X_{ijk}}\right)^{\top}{\bm{u}}+\left(\frac{\partial{\bm{v}}}{\partial X_{ijk}}\right)^{\top}{\bm{v}}+\left(\frac{\partial{\bm{w}}}{\partial X_{ijk}}\right)^{\top}{\bm{w}}\right)+\frac{1}{\sqrt{N}}u_{i}v_{j}w_{k}.

Furthermore, since 𝒖𝒖=𝒗𝒗=𝒘𝒘=1{\bm{u}}^{\top}{\bm{u}}={\bm{v}}^{\top}{\bm{v}}={\bm{w}}^{\top}{\bm{w}}=1, we have

(𝒖Xijk)𝒖=(𝒗Xijk)𝒗=(𝒘Xijk)𝒘=0.\displaystyle\left(\frac{\partial{\bm{u}}}{\partial X_{ijk}}\right)^{\top}{\bm{u}}=\left(\frac{\partial{\bm{v}}}{\partial X_{ijk}}\right)^{\top}{\bm{v}}=\left(\frac{\partial{\bm{w}}}{\partial X_{ijk}}\right)^{\top}{\bm{w}}=0.

Thus the derivative of λ\lambda writes simply as

λXijk=1Nuivjwk.\displaystyle\frac{\partial\lambda}{\partial X_{ijk}}=\frac{1}{\sqrt{N}}u_{i}v_{j}w_{k}.

Hence, we find that

λ[𝒖Xijk𝒗Xijk𝒘Xijk]=1N[vjwk(𝒆imui𝒖)uiwk(𝒆jnvj𝒗)uivj(𝒆kpwk𝒘)]+Φ3(𝑻,𝒖,𝒗,𝒘)[𝒖Xijk𝒗Xijk𝒘Xijk].\displaystyle\lambda\begin{bmatrix}\frac{\partial{\bm{u}}}{\partial X_{ijk}}\\ \frac{\partial{\bm{v}}}{\partial X_{ijk}}\\ \frac{\partial{\bm{w}}}{\partial X_{ijk}}\end{bmatrix}=\frac{1}{\sqrt{N}}\begin{bmatrix}v_{j}w_{k}({\bm{e}}_{i}^{m}-u_{i}{\bm{u}})\\ u_{i}w_{k}({\bm{e}}_{j}^{n}-v_{j}{\bm{v}})\\ u_{i}v_{j}({\bm{e}}_{k}^{p}-w_{k}{\bm{w}})\end{bmatrix}+\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})\begin{bmatrix}\frac{\partial{\bm{u}}}{\partial X_{ijk}}\\ \frac{\partial{\bm{v}}}{\partial X_{ijk}}\\ \frac{\partial{\bm{w}}}{\partial X_{ijk}}\end{bmatrix}.

Yielding the expression in Eq. (14). The same calculations apply to the more general dd-order tensor case yielding the identity in Eq. (28).

B.2 Proof of Theorem 3

Denote the matrix model as

𝑵1NΦ3(𝑿,𝒂,𝒃,𝒄),\displaystyle{\bm{N}}\equiv\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}}),

where we recall 𝑿𝕋m,n,p(𝒩(0,1)){\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{m,n,p}(\mathcal{N}(0,1)) and (𝒂,𝒃,𝒄)𝕊m1×𝕊m1×𝕊p1({\bm{a}},{\bm{b}},{\bm{c}})\in{\mathbb{S}}^{m-1}\times{\mathbb{S}}^{m-1}\times{\mathbb{S}}^{p-1} are independent of 𝑿{\bm{\mathsfit{X}}}. We further denote the resolvent matrix of 𝑵{\bm{N}} as

𝑸(z)(𝑵z𝑰N)1=[𝑸11(z)𝑸12(z)𝑸13(z)𝑸12(z)𝑸22(z)𝑸23(z)𝑸13(z)𝑸23(z)𝑸33(z)].\displaystyle{\bm{Q}}(z)\equiv\left({\bm{N}}-z{\bm{I}}_{N}\right)^{-1}=\begin{bmatrix}{\bm{Q}}^{11}(z)&{\bm{Q}}^{12}(z)&{\bm{Q}}^{13}(z)\\ {\bm{Q}}^{12}(z)^{\top}&{\bm{Q}}^{22}(z)&{\bm{Q}}^{23}(z)\\ {\bm{Q}}^{13}(z)^{\top}&{\bm{Q}}^{23}(z)^{\top}&{\bm{Q}}^{33}(z)\end{bmatrix}.

In order to characterize the limiting Stieltjes transform g(z)g(z) of 𝑵{\bm{N}}, we need to estimate the quantity 1Ntr𝑸(z)a.s.g(z)\frac{1}{N}\operatorname{tr}{\bm{Q}}(z)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g(z) (as a consequence of Theorem 2). We further introduce the following limits

1Ntr𝑸11(z)a.s.g1(z),1Ntr𝑸22(z)a.s.g2(z),1Ntr𝑸33(z)a.s.g3(z).\displaystyle\frac{1}{N}\operatorname{tr}{\bm{Q}}^{11}(z)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g_{1}(z),\quad\frac{1}{N}\operatorname{tr}{\bm{Q}}^{22}(z)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g_{2}(z),\quad\frac{1}{N}\operatorname{tr}{\bm{Q}}^{33}(z)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g_{3}(z).

From the identity in Eq. (7), we have

𝑵𝑸(z)z𝑸(z)=𝑰N,\displaystyle{\bm{N}}{\bm{Q}}(z)-z{\bm{Q}}(z)={\bm{I}}_{N},

from which we particularly have

1N[𝑿(𝒄)𝑸12(z)]ii+1N[𝑿(𝒃)𝑸13(z)]iizQii11(z)=1,\displaystyle\frac{1}{\sqrt{N}}\left[{\bm{\mathsfit{X}}}({\bm{c}}){\bm{Q}}^{12}(z)^{\top}\right]_{ii}+\frac{1}{\sqrt{N}}\left[{\bm{\mathsfit{X}}}({\bm{b}}){\bm{Q}}^{13}(z)^{\top}\right]_{ii}-zQ^{11}_{ii}(z)=1,

or

1NNi=1m[𝑿(𝒄)𝑸12(z)]ii+1NNi=1m[𝑿(𝒃)𝑸13(z)]iizNtr𝑸11(z)=mN.\displaystyle\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{c}}){\bm{Q}}^{12}(z)^{\top}\right]_{ii}+\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{b}}){\bm{Q}}^{13}(z)^{\top}\right]_{ii}-\frac{z}{N}\operatorname{tr}{\bm{Q}}^{11}(z)=\frac{m}{N}. (39)

We thus need to compute the expectations of 1NNi=1m[𝑿(𝒄)𝑸12(z)]ii\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{c}}){\bm{Q}}^{12}(z)^{\top}\right]_{ii} and 1NNi=1m[𝑿(𝒃)𝑸13(z)]ii\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{b}}){\bm{Q}}^{13}(z)^{\top}\right]_{ii}, which develop as

1NNi=1m𝔼[𝑿(𝒄)𝑸12(z)]ii=1NNi=1mj=1nk=1pck𝔼[XijkQij12]=1NNi=1mj=1nk=1pck𝔼[Qij12Xijk],\displaystyle\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\operatorname{\mathbb{E}}\left[{\bm{\mathsfit{X}}}({\bm{c}}){\bm{Q}}^{12}(z)^{\top}\right]_{ii}=\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\sum_{j=1}^{n}\sum_{k=1}^{p}c_{k}\operatorname{\mathbb{E}}\left[X_{ijk}Q^{12}_{ij}\right]=\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\sum_{j=1}^{n}\sum_{k=1}^{p}c_{k}\operatorname{\mathbb{E}}\left[\frac{\partial Q^{12}_{ij}}{\partial X_{ijk}}\right],

where the last equality is obtained by applying Stein’s lemma (Lemma 1). For continuing the derivations, we need to express the derivative of the resolvent 𝑸(z){\bm{Q}}(z) with respect to an entry XijkX_{ijk} of the tensor noise 𝑿{\bm{\mathsfit{X}}}. Indeed, since 𝑵𝑸(z)z𝑸(z)=𝑰N{\bm{N}}{\bm{Q}}(z)-z{\bm{Q}}(z)={\bm{I}}_{N}, we have

𝑵Xijk𝑸(z)+𝑵𝑸(z)Xijkz𝑸(z)Xijk=𝟎N×N(𝑵z𝑰N)𝑸(z)Xijk=𝑵Xijk𝑸(z),\displaystyle\frac{\partial{\bm{N}}}{\partial X_{ijk}}{\bm{Q}}(z)+{\bm{N}}\frac{\partial{\bm{Q}}(z)}{\partial X_{ijk}}-z\frac{\partial{\bm{Q}}(z)}{\partial X_{ijk}}={\bm{0}}_{N\times N}\quad\Rightarrow\quad({\bm{N}}-z{\bm{I}}_{N})\frac{\partial{\bm{Q}}(z)}{\partial X_{ijk}}=-\frac{\partial{\bm{N}}}{\partial X_{ijk}}{\bm{Q}}(z),

from which we get

𝑸(z)Xijk=𝑸(z)𝑵Xijk𝑸(z),\displaystyle\frac{\partial{\bm{Q}}(z)}{\partial X_{ijk}}=-{\bm{Q}}(z)\frac{\partial{\bm{N}}}{\partial X_{ijk}}{\bm{Q}}(z),

where

𝑵Xijk=1N[𝟎m×mck𝒆im(𝒆jn)bj𝒆im(𝒆kp)ck𝒆jn(𝒆im)𝟎n×nai𝒆jn(𝒆kp)bj𝒆kp(𝒆im)ai𝒆kp(𝒆jn)𝟎p×p],\displaystyle\frac{\partial{\bm{N}}}{\partial X_{ijk}}=\frac{1}{\sqrt{N}}\begin{bmatrix}{\bm{0}}_{m\times m}&c_{k}{\bm{e}}_{i}^{m}({\bm{e}}_{j}^{n})^{\top}&b_{j}{\bm{e}}_{i}^{m}({\bm{e}}_{k}^{p})^{\top}\\ c_{k}{\bm{e}}_{j}^{n}({\bm{e}}_{i}^{m})^{\top}&{\bm{0}}_{n\times n}&a_{i}{\bm{e}}_{j}^{n}({\bm{e}}_{k}^{p})^{\top}\\ b_{j}{\bm{e}}_{k}^{p}({\bm{e}}_{i}^{m})^{\top}&a_{i}{\bm{e}}_{k}^{p}({\bm{e}}_{j}^{n})^{\top}&{\bm{0}}_{p\times p}\end{bmatrix},

and we finally obtain the following derivatives

Qab11Xijk\displaystyle\frac{\partial Q_{ab}^{11}}{\partial X_{ijk}} =1N[ai(Qaj12Qbk13+Qak13Qbj12)+bj(Qai11Qbk13+Qak13Qib11)+ck(Qai11Qbj12+Qaj12Qib11)],\displaystyle=-\frac{1}{\sqrt{N}}\left[a_{i}(Q_{aj}^{12}Q_{bk}^{13}+Q_{ak}^{13}Q_{bj}^{12})+b_{j}(Q_{ai}^{11}Q_{bk}^{13}+Q_{ak}^{13}Q_{ib}^{11})+c_{k}(Q_{ai}^{11}Q_{bj}^{12}+Q_{aj}^{12}Q_{ib}^{11})\right],
Qab12Xijk\displaystyle\frac{\partial Q_{ab}^{12}}{\partial X_{ijk}} =1N[ai(Qaj12Qbk23+Qak13Qjb22)+bj(Qai11Qbk23+Qak13Qib12)+ck(Qai11Qjb22+Qaj12Qib12)],\displaystyle=-\frac{1}{\sqrt{N}}\left[a_{i}(Q_{aj}^{12}Q_{bk}^{23}+Q_{ak}^{13}Q_{jb}^{22})+b_{j}(Q_{ai}^{11}Q_{bk}^{23}+Q_{ak}^{13}Q_{ib}^{12})+c_{k}(Q_{ai}^{11}Q_{jb}^{22}+Q_{aj}^{12}Q_{ib}^{12})\right],
Qab13Xijk\displaystyle\frac{\partial Q_{ab}^{13}}{\partial X_{ijk}} =1N[ai(Qaj12Qkb33+Qak13Qjb23)+bj(Qai11Qkb33+Qak13Qib13)+ck(Qai11Qjb23+Qaj12Qib13)],\displaystyle=-\frac{1}{\sqrt{N}}\left[a_{i}(Q_{aj}^{12}Q_{kb}^{33}+Q_{ak}^{13}Q_{jb}^{23})+b_{j}(Q_{ai}^{11}Q_{kb}^{33}+Q_{ak}^{13}Q_{ib}^{13})+c_{k}(Q_{ai}^{11}Q_{jb}^{23}+Q_{aj}^{12}Q_{ib}^{13})\right],
Qab22Xijk\displaystyle\frac{\partial Q_{ab}^{22}}{\partial X_{ijk}} =1N[ai(Qaj22Qbk23+Qak23Qjb22)+bj(Qia12Qbk23+Qak23Qib12)+ck(Qia12Qjb22+Qaj22Qib12)],\displaystyle=-\frac{1}{\sqrt{N}}\left[a_{i}(Q_{aj}^{22}Q_{bk}^{23}+Q_{ak}^{23}Q_{jb}^{22})+b_{j}(Q_{ia}^{12}Q_{bk}^{23}+Q_{ak}^{23}Q_{ib}^{12})+c_{k}(Q_{ia}^{12}Q_{jb}^{22}+Q_{aj}^{22}Q_{ib}^{12})\right],
Qab23Xijk\displaystyle\frac{\partial Q_{ab}^{23}}{\partial X_{ijk}} =1N[ai(Qaj22Qkb33+Qak23Qjb23)+bj(Qia12Qkb33+Qak23Qib13)+ck(Qia12Qjb23+Qaj22Qib13)],\displaystyle=-\frac{1}{\sqrt{N}}\left[a_{i}(Q_{aj}^{22}Q_{kb}^{33}+Q_{ak}^{23}Q_{jb}^{23})+b_{j}(Q_{ia}^{12}Q_{kb}^{33}+Q_{ak}^{23}Q_{ib}^{13})+c_{k}(Q_{ia}^{12}Q_{jb}^{23}+Q_{aj}^{22}Q_{ib}^{13})\right],
Qab33Xijk\displaystyle\frac{\partial Q_{ab}^{33}}{\partial X_{ijk}} =1N[ai(Qja23Qkb33+Qak33Qjb23)+bj(Qia13Qkb33+Qak33Qib13)+ck(Qia13Qjb23+Qja23Qib13)].\displaystyle=-\frac{1}{\sqrt{N}}\left[a_{i}(Q_{ja}^{23}Q_{kb}^{33}+Q_{ak}^{33}Q_{jb}^{23})+b_{j}(Q_{ia}^{13}Q_{kb}^{33}+Q_{ak}^{33}Q_{ib}^{13})+c_{k}(Q_{ia}^{13}Q_{jb}^{23}+Q_{ja}^{23}Q_{ib}^{13})\right].

In particular,

Qij12Xijk\displaystyle\frac{\partial Q_{ij}^{12}}{\partial X_{ijk}} =1N[ai(Qij12Qjk23+Qik13Qjj22)+bj(Qii11Qjk23+Qik13Qij12)+ck(Qii11Qjj22+Qij12Qij12)].\displaystyle=-\frac{1}{\sqrt{N}}\left[a_{i}(Q_{ij}^{12}Q_{jk}^{23}+Q_{ik}^{13}Q_{jj}^{22})+b_{j}(Q_{ii}^{11}Q_{jk}^{23}+Q_{ik}^{13}Q_{ij}^{12})+c_{k}(Q_{ii}^{11}Q_{jj}^{22}+Q_{ij}^{12}Q_{ij}^{12})\right].

Going back to the computation of A1NNi=1mj=1nk=1pck𝔼[Qij12Xijk]A\equiv\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\sum_{j=1}^{n}\sum_{k=1}^{p}c_{k}\operatorname{\mathbb{E}}\left[\frac{\partial Q^{12}_{ij}}{\partial X_{ijk}}\right], we will see that the only contributing term in the derivative Qij12Xijk\frac{\partial Q_{ij}^{12}}{\partial X_{ijk}} is 1NckQii11Qjj22-\frac{1}{\sqrt{N}}c_{k}Q_{ii}^{11}Q_{jj}^{22}. Indeed,

A=1N2ijkck𝔼[ai(Qij12Qjk23+Qik13Qjj22)+bj(Qii11Qjk23+Qik13Qij12)+ck(Qii11Qjj22+Qij12Qij12)],\displaystyle A=-\frac{1}{N^{2}}\sum_{ijk}c_{k}\operatorname{\mathbb{E}}\left[a_{i}(Q_{ij}^{12}Q_{jk}^{23}+Q_{ik}^{13}Q_{jj}^{22})+b_{j}(Q_{ii}^{11}Q_{jk}^{23}+Q_{ik}^{13}Q_{ij}^{12})+c_{k}(Q_{ii}^{11}Q_{jj}^{22}+Q_{ij}^{12}Q_{ij}^{12})\right],
=1N2𝔼[𝒂𝑸12𝑸23𝒄+𝒂𝑸13𝒄tr𝑸22+tr𝑸11𝒃𝑸23𝒄+𝒃(𝑸12)𝑸13𝒄+tr𝑸11tr𝑸22+tr(𝑸12(𝑸12))].\displaystyle=-\frac{1}{N^{2}}\operatorname{\mathbb{E}}\left[{\bm{a}}^{\top}{\bm{Q}}^{12}{\bm{Q}}^{23}{\bm{c}}+{\bm{a}}^{\top}{\bm{Q}}^{13}{\bm{c}}\operatorname{tr}{\bm{Q}}^{22}+\operatorname{tr}{\bm{Q}}^{11}{\bm{b}}^{\top}{\bm{Q}}^{23}{\bm{c}}+{\bm{b}}^{\top}({\bm{Q}}^{12})^{\top}{\bm{Q}}^{13}{\bm{c}}+\operatorname{tr}{\bm{Q}}^{11}\operatorname{tr}{\bm{Q}}^{22}+\operatorname{tr}({\bm{Q}}^{12}({\bm{Q}}^{12})^{\top})\right].

Now, since the vectors 𝒂,𝒃,𝒄{\bm{a}},{\bm{b}},{\bm{c}} are of bounded norms and assuming 𝑸(z){\bm{Q}}(z) is of bounded spectral norm (see condition in Eq. (8)), under Assumption 1 (as NN\to\infty), the terms 1N2𝒂𝑸12𝑸23𝒄\frac{1}{N^{2}}{\bm{a}}^{\top}{\bm{Q}}^{12}{\bm{Q}}^{23}{\bm{c}}, 1N2𝒂𝑸13𝒄tr𝑸22\frac{1}{N^{2}}{\bm{a}}^{\top}{\bm{Q}}^{13}{\bm{c}}\operatorname{tr}{\bm{Q}}^{22}, 1N2tr𝑸11𝒃𝑸23𝒄\frac{1}{N^{2}}\operatorname{tr}{\bm{Q}}^{11}{\bm{b}}^{\top}{\bm{Q}}^{23}{\bm{c}}, 1N2𝒃(𝑸12)𝑸13𝒄\frac{1}{N^{2}}{\bm{b}}^{\top}({\bm{Q}}^{12})^{\top}{\bm{Q}}^{13}{\bm{c}} and 1N2tr(𝑸12(𝑸12))\frac{1}{N^{2}}\operatorname{tr}({\bm{Q}}^{12}({\bm{Q}}^{12})^{\top}) are vanishing almost surely. As such, we find that

1NNi=1m[𝑿(𝒄)𝑸12(z)]ii=1Ntr𝑸11(z)1Ntr𝑸22(z)+𝒪(N1)a.s.g1(z)g2(z)+𝒪(N1).\displaystyle\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{c}}){\bm{Q}}^{12}(z)^{\top}\right]_{ii}=-\frac{1}{N}\operatorname{tr}{\bm{Q}}^{11}(z)\frac{1}{N}\operatorname{tr}{\bm{Q}}^{22}(z)+\mathcal{O}(N^{-1})\operatorname{\,\xrightarrow{\text{a.s.}}\,}-g_{1}(z)g_{2}(z)+\mathcal{O}(N^{-1}).

Similarly, we find that

1NNi=1m[𝑿(𝒃)𝑸13(z)]ii=1Ntr𝑸11(z)1Ntr𝑸33(z)+𝒪(N1)a.s.g1(z)g3(z)+𝒪(N1).\displaystyle\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{b}}){\bm{Q}}^{13}(z)^{\top}\right]_{ii}=-\frac{1}{N}\operatorname{tr}{\bm{Q}}^{11}(z)\frac{1}{N}\operatorname{tr}{\bm{Q}}^{33}(z)+\mathcal{O}(N^{-1})\operatorname{\,\xrightarrow{\text{a.s.}}\,}-g_{1}(z)g_{3}(z)+\mathcal{O}(N^{-1}).

From Eq. (39), g1(z)g_{1}(z) satisfies

g1(z)(g2(z)+g3(z))zg1(z)=c1,\displaystyle-g_{1}(z)\left(g_{2}(z)+g_{3}(z)\right)-zg_{1}(z)=c_{1},

where we recall c1=limmNc_{1}=\lim\frac{m}{N}. Similarly, g2(z)g_{2}(z) and g3(z)g_{3}(z) satisfy

{g2(z)(g1(z)+g3(z))zg2(z)=c2,g3(z)(g1(z)+g2(z))zg3(z)=c3,\displaystyle\begin{cases}-g_{2}(z)(g_{1}(z)+g_{3}(z))-zg_{2}(z)=c_{2},\\ -g_{3}(z)(g_{1}(z)+g_{2}(z))-zg_{3}(z)=c_{3},\end{cases}

where we recall again c2=limnNc_{2}=\lim\frac{n}{N} and c3=limpNc_{3}=\lim\frac{p}{N}. Moreover, by definition, g(z)=i=13gi(z)g(z)=\sum_{i=1}^{3}g_{i}(z), thus we have for each i[3]i\in[3]

gi(z)(g(z)gi(z))+zgi(z)+ci=0,\displaystyle g_{i}(z)(g(z)-g_{i}(z))+zg_{i}(z)+c_{i}=0,

yielding

gi(z)=g(z)+z24ci+(g(z)+z)22,\displaystyle g_{i}(z)=\frac{g(z)+z}{2}-\frac{\sqrt{4c_{i}+(g(z)+z)^{2}}}{2},

with g(z)g(z) solution of the equation g(z)=i=13gi(z)g(z)=\sum_{i=1}^{3}g_{i}(z) satisfying [g(z)]>0\Im[g(z)]>0 for zz\in{\mathbb{C}} with [z]>0\Im[z]>0 (see Property 2 of the Stieltjes transform in Subsection 2.2).

B.3 Proof of Corollary 1

Given the result of Theorem 3, setting c1=c2=c3=13c_{1}=c_{2}=c_{3}=\frac{1}{3}, we have for all i[3]i\in[3]

gi(z)=g(z)+z243+(g(z)+z)22,\displaystyle g_{i}(z)=\frac{g(z)+z}{2}-\frac{\sqrt{\frac{4}{3}+(g(z)+z)^{2}}}{2},

where g(z)g(z) satisfies g(z)=i=13gi(z)g(z)=\sum_{i=1}^{3}g_{i}(z), thus g(z)g(z) is the solution to

z+g(z)+z2343+(g(z)+z)22=0,\displaystyle z+\frac{g(z)+z}{2}-\frac{3\sqrt{\frac{4}{3}+(g(z)+z)^{2}}}{2}=0,

solving in g(z)g(z) yields

g(z){3z433z284,3z4+33z284},\displaystyle g(z)\in\left\{-\frac{3z}{4}-\frac{\sqrt{3}\sqrt{3z^{2}-8}}{4},-\frac{3z}{4}+\frac{\sqrt{3}\sqrt{3z^{2}-8}}{4}\right\},

and the limiting Stieltjes transform (with [g(z)]>0\Im[g(z)]>0 for zz with (z)>0\Im(z)>0 by property 2 of the Stieltjes transform, see Subsection 2.2) is therefore

g(z)=3z4+33z284.\displaystyle g(z)=-\frac{3z}{4}+\frac{\sqrt{3}\sqrt{3z^{2}-8}}{4}.

B.4 Proof of Theorem 4

Given the random tensor model in Eq. (10) and its singular vectors characterized by Eq. (11), we denote the associated random matrix model as

𝑻Φ3(𝑻,𝒖,𝒗,𝒘)=β𝑽𝑩𝑽+𝑵,\displaystyle{\bm{T}}\equiv\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})=\beta{\bm{V}}{\bm{B}}{\bm{V}}^{\top}+{\bm{N}},

where

𝑵=1NΦ3(𝑿,𝒖,𝒗,𝒘),𝑩[0𝒛,𝒘𝒚,𝒗𝒛,𝒘0𝒙,𝒖𝒚,𝒗𝒙,𝒖0]𝕄3,𝑽[𝒙𝟎m𝟎m𝟎n𝒚𝟎n𝟎p𝟎p𝒛]𝕄N,3.\displaystyle{\bm{N}}=\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{u}},{\bm{v}},{\bm{w}}),\quad{\bm{B}}\equiv\begin{bmatrix}0&\langle{\bm{z}},{\bm{w}}\rangle&\langle{\bm{y}},{\bm{v}}\rangle\\ \langle{\bm{z}},{\bm{w}}\rangle&0&\langle{\bm{x}},{\bm{u}}\rangle\\ \langle{\bm{y}},{\bm{v}}\rangle&\langle{\bm{x}},{\bm{u}}\rangle&0\end{bmatrix}\in{\mathbb{M}}_{3},\quad{\bm{V}}\equiv\begin{bmatrix}{\bm{x}}&{\bm{0}}_{m}&{\bm{0}}_{m}\\ {\bm{0}}_{n}&{\bm{y}}&{\bm{0}}_{n}\\ {\bm{0}}_{p}&{\bm{0}}_{p}&{\bm{z}}\end{bmatrix}\in{\mathbb{M}}_{N,3}.

We further denote the resolvents of 𝑻{\bm{T}} and 𝑵{\bm{N}} respectively as

𝑹(z)\displaystyle{\bm{R}}(z) =(𝑻z𝑰N)1=[𝑹11(z)𝑹12(z)𝑹13(z)𝑹12(z)𝑹22(z)𝑹23(z)𝑹13(z)𝑹23(z)𝑹33(z)],\displaystyle=\left({\bm{T}}-z{\bm{I}}_{N}\right)^{-1}=\begin{bmatrix}{\bm{R}}^{11}(z)&{\bm{R}}^{12}(z)&{\bm{R}}^{13}(z)\\ {\bm{R}}^{12}(z)^{\top}&{\bm{R}}^{22}(z)&{\bm{R}}^{23}(z)\\ {\bm{R}}^{13}(z)^{\top}&{\bm{R}}^{23}(z)^{\top}&{\bm{R}}^{33}(z)\end{bmatrix},
𝑸(z)\displaystyle{\bm{Q}}(z) =(𝑵z𝑰N)1=[𝑸11(z)𝑸12(z)𝑸13(z)𝑸12(z)𝑸22(z)𝑸23(z)𝑸13(z)𝑸23(z)𝑸33(z)].\displaystyle=\left({\bm{N}}-z{\bm{I}}_{N}\right)^{-1}=\begin{bmatrix}{\bm{Q}}^{11}(z)&{\bm{Q}}^{12}(z)&{\bm{Q}}^{13}(z)\\ {\bm{Q}}^{12}(z)^{\top}&{\bm{Q}}^{22}(z)&{\bm{Q}}^{23}(z)\\ {\bm{Q}}^{13}(z)^{\top}&{\bm{Q}}^{23}(z)^{\top}&{\bm{Q}}^{33}(z)\end{bmatrix}.

By Woodbury matrix identity (Lemma 5), we have

𝑹(z)=𝑸(z)𝑸(z)𝑽(1β𝑩1+𝑽𝑸(z)𝑽)1𝑽𝑸(z),\displaystyle{\bm{R}}(z)={\bm{Q}}(z)-{\bm{Q}}(z){\bm{V}}\left(\frac{1}{\beta}{\bm{B}}^{-1}+{\bm{V}}^{\top}{\bm{Q}}(z){\bm{V}}\right)^{-1}{\bm{V}}^{\top}{\bm{Q}}(z), (40)

In particular, taking the normalized trace operator, we get

1Ntr𝑹(z)\displaystyle\frac{1}{N}\operatorname{tr}{\bm{R}}(z) =1Ntr𝑸(z)1Ntr[(1β𝑩1+𝑽𝑸(z)𝑽)1𝑽𝑸2(z)𝑽],\displaystyle=\frac{1}{N}\operatorname{tr}{\bm{Q}}(z)-\frac{1}{N}\operatorname{tr}\left[\left(\frac{1}{\beta}{\bm{B}}^{-1}+{\bm{V}}^{\top}{\bm{Q}}(z){\bm{V}}\right)^{-1}{\bm{V}}^{\top}{\bm{Q}}^{2}(z){\bm{V}}\right],
=1Ntr𝑸(z)+𝒪(N1),\displaystyle=\frac{1}{N}\operatorname{tr}{\bm{Q}}(z)+\mathcal{O}(N^{-1}),

since the matrix (1β𝑩1+𝑽𝑸(z)𝑽)1𝑽𝑸2(z)𝑽\left(\frac{1}{\beta}{\bm{B}}^{-1}+{\bm{V}}^{\top}{\bm{Q}}(z){\bm{V}}\right)^{-1}{\bm{V}}^{\top}{\bm{Q}}^{2}(z){\bm{V}} is of bounded spectral norm (assuming 𝑸(z)\|{\bm{Q}}(z)\| is bounded, see condition in Eq. (8)) and being of finite size (3×33\times 3 matrix). As such, the asymptotic spectral measure of 𝑻{\bm{T}} is the same as the one of 𝑵{\bm{N}} which can be estimated though 1Ntr𝑸(z)\frac{1}{N}\operatorname{tr}{\bm{Q}}(z). Comparing to the result from Appendix B.2, now the singular vectors 𝒖,𝒗,𝒘{\bm{u}},{\bm{v}},{\bm{w}} depend statistically on the tensor noise 𝑿{\bm{\mathsfit{X}}} which needs to be handled.

From the identity Eq. (7), we have 𝑵𝑸(z)z𝑸(z)=𝑰N{\bm{N}}{\bm{Q}}(z)-z{\bm{Q}}(z)={\bm{I}}_{N}, from which

1NNi=1m[𝑿(𝒘)𝑸12(z)]ii+1NNi=1m[𝑿(𝒗)𝑸13(z)]iizNtrQ11(z)=mN.\displaystyle\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{w}}){\bm{Q}}^{12}(z)^{\top}\right]_{ii}+\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{v}}){\bm{Q}}^{13}(z)^{\top}\right]_{ii}-\frac{z}{N}\operatorname{tr}Q^{11}(z)=\frac{m}{N}. (41)

We thus need to compute the expectations of 1NNi=1m[𝑿(𝒘)𝑸12(z)]ii\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{w}}){\bm{Q}}^{12}(z)^{\top}\right]_{ii} and 1NNi=1m[𝑿(𝒗)𝑸13(z)]ii\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{v}}){\bm{Q}}^{13}(z)^{\top}\right]_{ii}. In particular,

A1NNi=1m𝔼[𝑿(𝒘)𝑸12(z)]ii=1NNijk𝔼[XijkwkQij12]=1NNijk𝔼[(wkQij12)Xijk],\displaystyle A\equiv\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\operatorname{\mathbb{E}}\left[{\bm{\mathsfit{X}}}({\bm{w}}){\bm{Q}}^{12}(z)^{\top}\right]_{ii}=\frac{1}{N\sqrt{N}}\sum_{ijk}\operatorname{\mathbb{E}}\left[X_{ijk}w_{k}Q^{12}_{ij}\right]=\frac{1}{N\sqrt{N}}\sum_{ijk}\operatorname{\mathbb{E}}\left[\frac{\partial(w_{k}Q^{12}_{ij})}{\partial X_{ijk}}\right],

where the last equality is obtained by Stein’s lemma (Lemma 1). Due to the statistical dependency between 𝒘{\bm{w}} and 𝑿{\bm{\mathsfit{X}}}, the above sum decomposes in two terms which are

A=1NNijk𝔼[wkQij12Xijk]+1NNijk𝔼[wkXijkQij12]=A1+A2,\displaystyle A=\frac{1}{N\sqrt{N}}\sum_{ijk}\operatorname{\mathbb{E}}\left[w_{k}\frac{\partial Q^{12}_{ij}}{\partial X_{ijk}}\right]+\frac{1}{N\sqrt{N}}\sum_{ijk}\operatorname{\mathbb{E}}\left[\frac{\partial w_{k}}{\partial X_{ijk}}Q^{12}_{ij}\right]=A_{1}+A_{2},

where the first term A1A_{1} has already been handled in the previous subsection (if replacing 𝒄{\bm{c}} with 𝒘{\bm{w}}). We now show that the second term A2A_{2} is asymptotically vanishing under Assumption 1. Indeed, by Eq. (14), we have

wkXijk\displaystyle\frac{\partial w_{k}}{\partial X_{ijk}} =1N(vjwk(Rik13(λ)ui𝒖𝑹:,k13(λ)))\displaystyle=-\frac{1}{\sqrt{N}}\left(v_{j}w_{k}(R_{ik}^{13}(\lambda)-u_{i}{\bm{u}}^{\top}{\bm{R}}_{:,k}^{13}(\lambda))\right)
1N(uiwk(Rjk23(λ)vj𝒗𝑹:,k23(λ)))1N(uivj(Rkk33(λ)wk𝒘𝑹:,k33(λ))).\displaystyle-\frac{1}{\sqrt{N}}\left(u_{i}w_{k}(R_{jk}^{23}(\lambda)-v_{j}{\bm{v}}^{\top}{\bm{R}}_{:,k}^{23}(\lambda))\right)-\frac{1}{\sqrt{N}}\left(u_{i}v_{j}(R_{kk}^{33}(\lambda)-w_{k}{\bm{w}}^{\top}{\bm{R}}_{:,k}^{33}(\lambda))\right).

As such A2A_{2} decomposes in three terms A2=A21+A22+A23A_{2}=A_{21}+A_{22}+A_{23}, where

A21\displaystyle A_{21} =1N2ijk𝔼[vjwkRik13(λ)Qij12]+1N2ijkl𝔼[uivjwkulRlk13(λ)Qij12],\displaystyle=-\frac{1}{N^{2}}\sum_{ijk}\operatorname{\mathbb{E}}\left[v_{j}w_{k}R_{ik}^{13}(\lambda)Q_{ij}^{12}\right]+\frac{1}{N^{2}}\sum_{ijkl}\operatorname{\mathbb{E}}\left[u_{i}v_{j}w_{k}u_{l}R_{lk}^{13}(\lambda)Q_{ij}^{12}\right],
=1N2𝔼[𝒗𝑸12(z)𝑹13(λ)𝒘]+1N2𝔼[𝒖𝑸12(z)𝒗𝒖𝑹13(λ)𝒘]0,\displaystyle=-\frac{1}{N^{2}}\operatorname{\mathbb{E}}\left[{\bm{v}}^{\top}{\bm{Q}}^{12}(z)^{\top}{\bm{R}}^{13}(\lambda){\bm{w}}\right]+\frac{1}{N^{2}}\operatorname{\mathbb{E}}\left[{\bm{u}}^{\top}{\bm{Q}}^{12}(z){\bm{v}}{\bm{u}}^{\top}{\bm{R}}^{13}(\lambda){\bm{w}}\right]\to 0,

as NN\to\infty, since the singular vectors 𝒖,𝒗,𝒘{\bm{u}},{\bm{v}},{\bm{w}} are of bounded norms and assuming the resolvent 𝑸(z){\bm{Q}}(z) and 𝑹(λ){\bm{R}}(\lambda) are of bounded spectral norms (𝑹(λ){\bm{R}}(\lambda) has bounded spectral norm by Assumption 2 and Eq. (8)). Similarly, we further have

A22\displaystyle A_{22} =1N2ijk𝔼[uiwkRjk23(λ)Qij12]+1N2ijkl𝔼[uivjwkvlRlk23(λ)Qij12],\displaystyle=-\frac{1}{N^{2}}\sum_{ijk}\operatorname{\mathbb{E}}\left[u_{i}w_{k}R^{23}_{jk}(\lambda)Q_{ij}^{12}\right]+\frac{1}{N^{2}}\sum_{ijkl}\operatorname{\mathbb{E}}\left[u_{i}v_{j}w_{k}v_{l}R^{23}_{lk}(\lambda)Q_{ij}^{12}\right],
=1N2𝔼[𝒖𝑸12(z)𝑹23(λ)𝒘]+1N2𝔼[𝒖𝑸12(z)𝒗𝒗𝑹23(λ)𝒘]0.\displaystyle=-\frac{1}{N^{2}}\operatorname{\mathbb{E}}\left[{\bm{u}}^{\top}{\bm{Q}}^{12}(z){\bm{R}}^{23}(\lambda){\bm{w}}\right]+\frac{1}{N^{2}}\operatorname{\mathbb{E}}\left[{\bm{u}}^{\top}{\bm{Q}}^{12}(z){\bm{v}}{\bm{v}}^{\top}{\bm{R}}^{23}(\lambda){\bm{w}}\right]\to 0.

And finally,

A23\displaystyle A_{23} =1N2ijk𝔼[uivjRkk33(λ)Qij12]+1N2ijkl𝔼[uivjwkwlRlk33(λ)Qij12],\displaystyle=-\frac{1}{N^{2}}\sum_{ijk}\operatorname{\mathbb{E}}\left[u_{i}v_{j}R^{33}_{kk}(\lambda)Q_{ij}^{12}\right]+\frac{1}{N^{2}}\sum_{ijkl}\operatorname{\mathbb{E}}\left[u_{i}v_{j}w_{k}w_{l}R^{33}_{lk}(\lambda)Q_{ij}^{12}\right],
=1N𝔼[𝒖𝑸12(z)𝒗1Ntr𝑹33(λ)]+1N2𝔼[𝒖𝑸12(z)𝒗𝒘𝑹33(λ)𝒘]0.\displaystyle=-\frac{1}{N}\operatorname{\mathbb{E}}\left[{\bm{u}}^{\top}{\bm{Q}}^{12}(z){\bm{v}}\frac{1}{N}\operatorname{tr}{\bm{R}}^{33}(\lambda)\right]+\frac{1}{N^{2}}\operatorname{\mathbb{E}}\left[{\bm{u}}^{\top}{\bm{Q}}^{12}(z){\bm{v}}{\bm{w}}^{\top}{\bm{R}}^{33}(\lambda){\bm{w}}\right]\to 0.

Therefore,

A=1NNijk𝔼[wkQij12Xijk]+𝒪(N1).\displaystyle A=\frac{1}{N\sqrt{N}}\sum_{ijk}\operatorname{\mathbb{E}}\left[w_{k}\frac{\partial Q^{12}_{ij}}{\partial X_{ijk}}\right]+\mathcal{O}(N^{-1}).

As in the previous subsection, the derivative of 𝑸(z){\bm{Q}}(z) w.r.t. the entry XijkX_{ijk} expresses as 𝑸(z)Xijk=𝑸(z)𝑵Xijk𝑸(z)\frac{\partial{\bm{Q}}(z)}{\partial X_{ijk}}=-{\bm{Q}}(z)\frac{\partial{\bm{N}}}{\partial X_{ijk}}{\bm{Q}}(z) but now with

𝑵Xijk=1N[𝟎m×mwk𝒆im(𝒆jn)vj𝒆im(𝒆kp)wk𝒆jn(𝒆im)𝟎n×nui𝒆jn(𝒆kp)vj𝒆kp(𝒆im)ui𝒆kp(𝒆jn)𝟎p×p]+1NΦ3(𝑿,𝒖Xijk,𝒗Xijk,𝒘Xijk),\displaystyle\frac{\partial{\bm{N}}}{\partial X_{ijk}}=\frac{1}{\sqrt{N}}\begin{bmatrix}{\bm{0}}_{m\times m}&w_{k}{\bm{e}}_{i}^{m}({\bm{e}}_{j}^{n})^{\top}&v_{j}{\bm{e}}_{i}^{m}({\bm{e}}_{k}^{p})^{\top}\\ w_{k}{\bm{e}}_{j}^{n}({\bm{e}}_{i}^{m})^{\top}&{\bm{0}}_{n\times n}&u_{i}{\bm{e}}_{j}^{n}({\bm{e}}_{k}^{p})^{\top}\\ v_{j}{\bm{e}}_{k}^{p}({\bm{e}}_{i}^{m})^{\top}&u_{i}{\bm{e}}_{k}^{p}({\bm{e}}_{j}^{n})^{\top}&{\bm{0}}_{p\times p}\end{bmatrix}+\frac{1}{\sqrt{N}}\Phi_{3}\left({\bm{\mathsfit{X}}},\frac{\partial{\bm{u}}}{\partial X_{ijk}},\frac{\partial{\bm{v}}}{\partial X_{ijk}},\frac{\partial{\bm{w}}}{\partial X_{ijk}}\right),

where 𝑶1NΦ3(𝑿,𝒖Xijk,𝒗Xijk,𝒘Xijk){\bm{O}}\equiv\frac{1}{\sqrt{N}}\Phi_{3}\left({\bm{\mathsfit{X}}},\frac{\partial{\bm{u}}}{\partial X_{ijk}},\frac{\partial{\bm{v}}}{\partial X_{ijk}},\frac{\partial{\bm{w}}}{\partial X_{ijk}}\right) is of vanishing spectral norm. Indeed, from Eq. (14), there exists C>0C>0 independent of NN such that 𝒖Xijk,𝒗Xijk,𝒘XijkCN\|\frac{\partial{\bm{u}}}{\partial X_{ijk}}\|,\|\frac{\partial{\bm{v}}}{\partial X_{ijk}}\|,\|\frac{\partial{\bm{w}}}{\partial X_{ijk}}\|\leq\frac{C}{\sqrt{N}} yielding that the spectral norm of 𝑶{\bm{O}} is bounded by CN32\frac{C^{\prime}}{N^{\frac{3}{2}}} for some constant C>0C^{\prime}>0 independent of NN. Therefore, we find that Ag1(z)g2(z)+𝒪(N1)A\to-g_{1}(z)g_{2}(z)+\mathcal{O}(N^{-1}) (with g1(z),g2(z)g_{1}(z),g_{2}(z) the almost sure limits of 1Ntr𝑸11(z)\frac{1}{N}\operatorname{tr}{\bm{Q}}^{11}(z) and 1Ntr𝑸22(z)\frac{1}{N}\operatorname{tr}{\bm{Q}}^{22}(z) respectively), thus yielding the same limiting Stieltjes transform as the one obtained in Appendix B.2.

B.5 Proof of Theorem 5

Given the identities in Eq. (11), we have

λ𝒖,𝒙=𝒙𝑻(𝒗)𝒘=β𝒗,𝒚𝒘,𝒛+1N𝒙𝑿(𝒗)𝒘,\displaystyle\lambda\langle{\bm{u}},{\bm{x}}\rangle={\bm{x}}^{\top}{\bm{\mathsfit{T}}}({\bm{v}}){\bm{w}}=\beta\langle{\bm{v}},{\bm{y}}\rangle\langle{\bm{w}},{\bm{z}}\rangle+\frac{1}{\sqrt{N}}{\bm{x}}^{\top}{\bm{\mathsfit{X}}}({\bm{v}}){\bm{w}}, (42)

with λ,𝒖,𝒙,𝒗,𝒚\lambda,\langle{\bm{u}},{\bm{x}}\rangle,\langle{\bm{v}},{\bm{y}}\rangle and 𝒘,𝒛\langle{\bm{w}},{\bm{z}}\rangle converging almost surely to their asymptotic limits λ(β),a𝒙(β),a𝒚(β)\lambda^{\infty}(\beta),a_{\bm{x}}^{\infty}(\beta),a_{\bm{y}}^{\infty}(\beta) and a𝒛(β)a_{\bm{z}}^{\infty}(\beta) respectively given the concentration properties in Subsection 3.4. To characterize such limits we need to evaluate the expectation of 1N𝒙𝑿(𝒗)𝒘\frac{1}{\sqrt{N}}{\bm{x}}^{\top}{\bm{\mathsfit{X}}}({\bm{v}}){\bm{w}}. Indeed,

1N𝔼[𝒙𝑿(𝒗)𝒘]=1Nijkxi𝔼[vjwkXijk]=1Nijkxi𝔼[vjXijkwk]+1Nijkxi𝔼[vjwkXijk]\displaystyle\frac{1}{\sqrt{N}}\operatorname{\mathbb{E}}\left[{\bm{x}}^{\top}{\bm{\mathsfit{X}}}({\bm{v}}){\bm{w}}\right]=\frac{1}{\sqrt{N}}\sum_{ijk}x_{i}\operatorname{\mathbb{E}}\left[v_{j}w_{k}X_{ijk}\right]=\frac{1}{\sqrt{N}}\sum_{ijk}x_{i}\operatorname{\mathbb{E}}\left[\frac{\partial v_{j}}{\partial X_{ijk}}w_{k}\right]+\frac{1}{\sqrt{N}}\sum_{ijk}x_{i}\operatorname{\mathbb{E}}\left[v_{j}\frac{\partial w_{k}}{\partial X_{ijk}}\right]

where the last equality is obtained by Stein’s lemma (Lemma 1). From Eq. (14), we have

vjXijk\displaystyle\frac{\partial v_{j}}{\partial X_{ijk}} =1N(vjwk(Rij12(λ)ui𝒖𝑹:,j12(λ)))\displaystyle=-\frac{1}{\sqrt{N}}\left(v_{j}w_{k}(R_{ij}^{12}(\lambda)-u_{i}{\bm{u}}^{\top}{\bm{R}}_{:,j}^{12}(\lambda))\right)
1N(uiwk(Rjj22(λ)vj𝒗𝑹:,j22(λ)))1N(uivj(Rjk23(λ)wk𝒘𝑹:,j32(λ))).\displaystyle-\frac{1}{\sqrt{N}}\left(u_{i}w_{k}(R_{jj}^{22}(\lambda)-v_{j}{\bm{v}}^{\top}{\bm{R}}_{:,j}^{22}(\lambda))\right)-\frac{1}{\sqrt{N}}\left(u_{i}v_{j}(R_{jk}^{23}(\lambda)-w_{k}{\bm{w}}^{\top}{\bm{R}}_{:,j}^{32}(\lambda))\right).

Hence A1Nijkxi𝔼[vjXijkwk]=A1+A2+A3A\equiv\frac{1}{\sqrt{N}}\sum_{ijk}x_{i}\operatorname{\mathbb{E}}\left[\frac{\partial v_{j}}{\partial X_{ijk}}w_{k}\right]=A_{1}+A_{2}+A_{3} decomposes in three terms A1,A2A_{1},A_{2} and A3A_{3}. The terms A1A_{1} and A3A_{3} will be vanishing asymptotically and only A2A_{2} contains non-vanishing terms. Indeed,

A1\displaystyle A_{1} =1Nijk𝔼[xiwkvjwkRij12(λ)]+1Nijkl𝔼[xiwkvjwkuiulRlj12(λ)],\displaystyle=-\frac{1}{N}\sum_{ijk}\operatorname{\mathbb{E}}\left[x_{i}w_{k}v_{j}w_{k}R^{12}_{ij}(\lambda)\right]+\frac{1}{N}\sum_{ijkl}\operatorname{\mathbb{E}}\left[x_{i}w_{k}v_{j}w_{k}u_{i}u_{l}R^{12}_{lj}(\lambda)\right],
=1N𝔼[𝒙𝑹12(λ)𝒗]+1N𝔼[𝒙,𝒖𝒖𝑹12(λ)𝒗]0,\displaystyle=-\frac{1}{N}\operatorname{\mathbb{E}}\left[{\bm{x}}^{\top}{\bm{R}}^{12}(\lambda){\bm{v}}\right]+\frac{1}{N}\operatorname{\mathbb{E}}\left[\langle{\bm{x}},{\bm{u}}\rangle{\bm{u}}^{\top}{\bm{R}}^{12}(\lambda){\bm{v}}\right]\to 0,

as NN\to\infty since 𝒙,𝒖{\bm{x}},{\bm{u}} and 𝒗{\bm{v}} are of bounded norms and 𝑹(λ){\bm{R}}(\lambda) being of bounded spectral norm for λ\lambda outside the support of 1NΦ3(𝑿,𝒖,𝒗,𝒘)\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{u}},{\bm{v}},{\bm{w}}) (through the identity in Eq. (40)). Similarly with A3A_{3}, we have

A3\displaystyle A_{3} =1Nijk𝔼[xiwkuivjRjk23(λ)]+1Nijkl𝔼[xiwkuivjwkwlRjl23(λ)],\displaystyle=-\frac{1}{N}\sum_{ijk}\operatorname{\mathbb{E}}\left[x_{i}w_{k}u_{i}v_{j}R^{23}_{jk}(\lambda)\right]+\frac{1}{N}\sum_{ijkl}\operatorname{\mathbb{E}}\left[x_{i}w_{k}u_{i}v_{j}w_{k}w_{l}R_{jl}^{23}(\lambda)\right],
=1N𝔼[𝒙,𝒖𝒗𝑹23(λ)𝒘]+1N𝔼[𝒙,𝒖𝒗𝑹23(λ)𝒘]0.\displaystyle=-\frac{1}{N}\operatorname{\mathbb{E}}\left[\langle{\bm{x}},{\bm{u}}\rangle{\bm{v}}^{\top}{\bm{R}}^{23}(\lambda){\bm{w}}\right]+\frac{1}{N}\operatorname{\mathbb{E}}\left[\langle{\bm{x}},{\bm{u}}\rangle{\bm{v}}^{\top}{\bm{R}}^{23}(\lambda){\bm{w}}\right]\to 0.

Now A2A_{2} is not vanishing, precisely,

A2\displaystyle A_{2} =1Nijk𝔼[xiwkuiwkRjj22(λ)]+1Nijkl𝔼[xiwkuiwkvjvlRjl22(λ)],\displaystyle=-\frac{1}{N}\sum_{ijk}\operatorname{\mathbb{E}}\left[x_{i}w_{k}u_{i}w_{k}R^{22}_{jj}(\lambda)\right]+\frac{1}{N}\sum_{ijkl}\operatorname{\mathbb{E}}\left[x_{i}w_{k}u_{i}w_{k}v_{j}v_{l}R^{22}_{jl}(\lambda)\right],
=1N𝔼[𝒙,𝒖tr𝑹22(λ)]+1N𝔼[𝒙,𝒖𝒗𝑹22(λ)𝒗],\displaystyle=-\frac{1}{N}\operatorname{\mathbb{E}}\left[\langle{\bm{x}},{\bm{u}}\rangle\operatorname{tr}{\bm{R}}^{22}(\lambda)\right]+\frac{1}{N}\operatorname{\mathbb{E}}\left[\langle{\bm{x}},{\bm{u}}\rangle{\bm{v}}^{\top}{\bm{R}}^{22}(\lambda){\bm{v}}\right],
g2(λ)𝔼[𝒙,𝒖]+𝒪(N1),\displaystyle\to-g_{2}(\lambda)\operatorname{\mathbb{E}}\left[\langle{\bm{x}},{\bm{u}}\rangle\right]+\mathcal{O}(N^{-1}),

where the last line results from the fact that 1Ntr𝑹22(λ)a.s.g2(λ)\frac{1}{N}\operatorname{tr}{\bm{R}}^{22}(\lambda)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g_{2}(\lambda) as we saw in the previous subsection. Similarly, we find that

B1Nijkxi𝔼[vjwkXijk]g3(λ)𝔼[𝒙,𝒖]+𝒪(N1).\displaystyle B\equiv\frac{1}{\sqrt{N}}\sum_{ijk}x_{i}\operatorname{\mathbb{E}}\left[v_{j}\frac{\partial w_{k}}{\partial X_{ijk}}\right]\to-g_{3}(\lambda)\operatorname{\mathbb{E}}\left[\langle{\bm{x}},{\bm{u}}\rangle\right]+\mathcal{O}(N^{-1}).

Therefore, by Eq. (42), the almost sure limits λ(β),a𝒙(β),a𝒚(β)\lambda^{\infty}(\beta),a_{\bm{x}}^{\infty}(\beta),a_{\bm{y}}^{\infty}(\beta) and a𝒛(β)a_{\bm{z}}^{\infty}(\beta) satisfy the equation

λ(β)a𝒙(β)=βa𝒚(β)a𝒛(β)[g2(λ(β))+g3(λ(β))]a𝒙(β).\displaystyle\lambda^{\infty}(\beta)a_{\bm{x}}^{\infty}(\beta)=\beta a_{\bm{y}}^{\infty}(\beta)a_{\bm{z}}^{\infty}(\beta)-[g_{2}(\lambda^{\infty}(\beta))+g_{3}(\lambda^{\infty}(\beta))]a_{\bm{x}}^{\infty}(\beta).

Hence,

a𝒙(β)=α1(λ(β))a𝒚(β)a𝒛(β),\displaystyle a_{\bm{x}}^{\infty}(\beta)=\alpha_{1}(\lambda^{\infty}(\beta))a_{\bm{y}}^{\infty}(\beta)a_{\bm{z}}^{\infty}(\beta),

with α1(z)βz+g(z)g1(z)\alpha_{1}(z)\equiv\frac{\beta}{z+g(z)-g_{1}(z)}. Similarly, we find that

{a𝒚(β)=α2(λ(β))a𝒙(β)a𝒛(β),a𝒛(β)=α3(λ(β))a𝒙(β)a𝒚(β),\displaystyle\begin{cases}a_{\bm{y}}^{\infty}(\beta)=\alpha_{2}(\lambda^{\infty}(\beta))a_{\bm{x}}^{\infty}(\beta)a_{\bm{z}}^{\infty}(\beta),\\ a_{\bm{z}}^{\infty}(\beta)=\alpha_{3}(\lambda^{\infty}(\beta))a_{\bm{x}}^{\infty}(\beta)a_{\bm{y}}^{\infty}(\beta),\end{cases}

with αi(z)βz+g(z)gi(z)\alpha_{i}(z)\equiv\frac{\beta}{z+g(z)-g_{i}(z)}. Solving the above system of equations provides the final asymptotic alignments. Proceeding similarly with Eq. (12) we obtain an estimate of the asymptotic singular value λ(β)\lambda^{\infty}(\beta), thereby ending the proof.

B.6 Proof of Corollary 3

By Corollary 1, the limiting Stieltjes transform is given by

g(z)=3z4+33z284,\displaystyle g(z)=-\frac{3z}{4}+\frac{\sqrt{3}\sqrt{3z^{2}-8}}{4},

and since g(z)=i=13gi(z)g(z)=\sum_{i=1}^{3}g_{i}(z) with all gi(z)g_{i}(z) equal, then gi(z)=g(z)3g_{i}(z)=\frac{g(z)}{3} for all i[3]i\in[3]. Hence, for each i[3]i\in[3], αi(z)\alpha_{i}(z) defined in Theorem 5 writes as

αi(z)=βz2+33z286,\displaystyle\alpha_{i}(z)=\frac{\beta}{\frac{z}{2}+\frac{\sqrt{3}\sqrt{3z^{2}-8}}{6}},

and λ(β)\lambda^{\infty}(\beta) is solution to the equation

z4+33z284(z2+33z286)3β2=0.\displaystyle\frac{z}{4}+\frac{\sqrt{3}\sqrt{3z^{2}-8}}{4}-\frac{\left(\frac{z}{2}+\frac{\sqrt{3}\sqrt{3z^{2}-8}}{6}\right)^{3}}{\beta^{2}}=0.

First we compute the critical value of β\beta by solving the above equation in β\beta and taking the limit when zz tends to the right edge of the semi-circle law (i.e., z223+z\to 2\sqrt{\frac{2}{3}}^{+}). Indeed, solving the above equation in β\beta yields

β(z)=2z3+2z29z22434z49z2249z+33z28,\displaystyle\beta(z)=\sqrt{\frac{2z^{3}+\frac{2z^{2}\sqrt{9z^{2}-24}}{3}-4z-\frac{4\sqrt{9z^{2}-24}}{9}}{z+\sqrt{3}\sqrt{3z^{2}-8}}},

hence

βs=limz223+β(z)=233.\displaystyle\beta_{s}=\lim_{z\to 2\sqrt{\frac{2}{3}}^{+}}\beta(z)=\frac{2\sqrt{3}}{3}.

Now to express λ(β)\lambda^{\infty}(\beta) in terms of β\beta, we solve the equation ββ(z)=0\beta-\beta(z)=0 in zz and choose the unique non-decreasing (in β\beta) and positive solution, which yields

λ(β)=β22+2+3(3β24)318β.\displaystyle\lambda^{\infty}(\beta)=\sqrt{\frac{\beta^{2}}{2}+2+\frac{\sqrt{3}\sqrt{\left(3\beta^{2}-4\right)^{3}}}{18\beta}}.

Plugging the above expression of λ(β)\lambda^{\infty}(\beta) into the expressions of the asymptotic alignments in Theorem 5, we obtain for all i[3]i\in[3]

αi(λ(β))=62β9β212+3(3β24)3β+9β2+36+3(3β24)3β,\displaystyle\alpha_{i}(\lambda^{\infty}(\beta))=\frac{6\sqrt{2}\beta}{\sqrt{9\beta^{2}-12+\frac{\sqrt{3}\sqrt{\left(3\beta^{2}-4\right)^{3}}}{\beta}}+\sqrt{9\beta^{2}+36+\frac{\sqrt{3}\sqrt{\left(3\beta^{2}-4\right)^{3}}}{\beta}}},

yielding the final result.

B.7 Proof of Corollary 4

Setting c1=c,c2=1cc_{1}=c,c_{2}=1-c and c3=0c_{3}=0, we get

{g1(z)=g(z)+z24c+(g(z)+z)22,g2(z)=g(z)+z24(1c)+(g(z)+z)22,g3(z)=0.\displaystyle\begin{cases}g_{1}(z)=\frac{g(z)+z}{2}-\frac{\sqrt{4c+(g(z)+z)^{2}}}{2},\\ g_{2}(z)=\frac{g(z)+z}{2}-\frac{\sqrt{4(1-c)+(g(z)+z)^{2}}}{2},\\ g_{3}(z)=0.\end{cases}

And since g(z)=i=13gi(z)g(z)=\sum_{i=1}^{3}g_{i}(z), then g(z)g(z) satisfies the equation

z4c+(g(z)+z)224(1c)+(g(z)+z)22=0,\displaystyle z-\frac{\sqrt{4c+(g(z)+z)^{2}}}{2}-\frac{\sqrt{4(1-c)+(g(z)+z)^{2}}}{2}=0,

the solution of which belongs to

g(z){z4c(c1)+(z21)2z,z+4c(c1)+(z21)2z},\displaystyle g(z)\in\left\{-z-\frac{\sqrt{4c(c-1)+(z^{2}-1)^{2}}}{z},-z+\frac{\sqrt{4c(c-1)+(z^{2}-1)^{2}}}{z}\right\},

thus the limiting Stieltjes transform (having [g(z)]>0\Im[g(z)]>0) is given by

g(z)=z+4c(c1)+(z21)2z.\displaystyle g(z)=-z+\frac{\sqrt{4c(c-1)+(z^{2}-1)^{2}}}{z}.

In particular, the edges of the support of the corresponding limiting distribution ν\nu are the roots of 4c(c1)+(z21)24c(c-1)+(z^{2}-1)^{2}, yielding

𝒮(ν)=[1+2c(1c),12c(1c)][12c(1c),1+2c(1c)].\displaystyle\mathcal{S}(\nu)=\left[-\sqrt{1+2\sqrt{c(1-c)}},-\sqrt{1-2\sqrt{c(1-c)}}\right]\cup\left[\sqrt{1-2\sqrt{c(1-c)}},\sqrt{1+2\sqrt{c(1-c)}}\right].

And the density function of ν\nu is obtained by computing the limit limε0|[g(x+iε)]|\lim_{\varepsilon\to 0}|\Im[g(x+i\varepsilon)]| yielding

ν(dx)=1πxsin(arctan2(0,qc(x))2)|qc(x)|sign(sin(arctan2(0,qc(x))2)x)dx+(12min(c,1c))δ(x),\displaystyle\nu(dx)=\frac{1}{\pi x}\sin\left(\frac{\arctan_{2}(0,q_{c}(x))}{2}\right)\sqrt{|q_{c}(x)|}\operatorname{sign}\left(\frac{\sin\left(\frac{\arctan_{2}(0,q_{c}(x))}{2}\right)}{x}\right)dx+\left(1-2\min(c,1-c)\right)\delta(x),

where qc(x)=(x21)2+4c(c1)q_{c}(x)=(x^{2}-1)^{2}+4c(c-1). The Dirac δ(x)\delta(x) component in the above expression corresponds to the fact that the corresponding matrix model is of rank 2min(m,n)2\min(m,n).

B.8 Proof of Corollary 5

From Subsection B.7, plugging the expression of the limiting Stieltjes transform g(z)g(z) into the expressions defining g1(z)g_{1}(z) and g2(z)g_{2}(z) yields

{g1(z)=4c(c1)+(z21)22z4c(c1+z2)+(z21)22z,g2(z)=4c(c1)+(z21)22z4c(c1z2)+(z2+1)22z.\displaystyle\begin{cases}g_{1}(z)=\frac{\sqrt{4c(c-1)+(z^{2}-1)^{2}}}{2z}-\frac{\sqrt{4c(c-1+z^{2})+(z^{2}-1)^{2}}}{2z},\\ g_{2}(z)=\frac{\sqrt{4c(c-1)+(z^{2}-1)^{2}}}{2z}-\frac{\sqrt{4c(c-1-z^{2})+(z^{2}+1)^{2}}}{2z}.\end{cases}

Thus, by Theorem 5, we have

{α1(z)=β4c24c+z42z2+12z+4c2+4cz24c+z42z2+12zα2(z)=β4c24c+z42z2+12z+4c24cz24c+z4+2z2+12zα3(z)=βz4c24c+z42z2+1.\displaystyle\begin{cases}\alpha_{1}(z)=\frac{\beta}{\frac{\sqrt{4c^{2}-4c+z^{4}-2z^{2}+1}}{2z}+\frac{\sqrt{4c^{2}+4cz^{2}-4c+z^{4}-2z^{2}+1}}{2z}}\\ \alpha_{2}(z)=\frac{\beta}{\frac{\sqrt{4c^{2}-4c+z^{4}-2z^{2}+1}}{2z}+\frac{\sqrt{4c^{2}-4cz^{2}-4c+z^{4}+2z^{2}+1}}{2z}}\\ \alpha_{3}(z)=\frac{\beta z}{\sqrt{4c^{2}-4c+z^{4}-2z^{2}+1}}.\end{cases}

Then solving the equation z+g(z)βα1(z)α2(z)α3(z)z+g(z)-\frac{\beta}{\alpha_{1}(z)\alpha_{2}(z)\alpha_{3}(z)} in zz provides the almost sure limit of λ\lambda in terms of β\beta. Specifically,

λ(β)=β2+1+c(1c)β2.\displaystyle\lambda^{\infty}(\beta)=\sqrt{\beta^{2}+1+\frac{c(1-c)}{\beta^{2}}}.

Plugging the above expression of λ(β)\lambda^{\infty}(\beta) into 1α2(z)α3(z)\frac{1}{\sqrt{\alpha_{2}(z)\alpha_{3}(z)}} by replacing zz with the expression of λ(β)\lambda^{\infty}(\beta) provides the asymptotic alignment a𝒙(β)a_{\bm{x}}^{\infty}(\beta) as 1κ(β,c)\frac{1}{\kappa(\beta,c)}, with κ(β,c)=α2(λ(β))α3(λ(β))\kappa(\beta,c)=\sqrt{\alpha_{2}(\lambda^{\infty}(\beta))\alpha_{3}(\lambda^{\infty}(\beta))} given by the following expression

κ(β,c)=2β2(β2+1)c2+cβ2(β2(β2+1)c(c1))4β4c(c1)+(β4c2+c)2β4β2(β2+1)c(c1)β2(4β2c(β2(β2c+2)c(c1))+(β2(β2+2)c(c1))2β4+4β4c(c1)+(β4c(c1))2β4).\displaystyle\kappa(\beta,c)=\sqrt{\frac{2\sqrt{\frac{\beta^{2}\left(\beta^{2}+1\right)-c^{2}+c}{\beta^{2}}}\left(\beta^{2}\left(\beta^{2}+1\right)-c\left(c-1\right)\right)}{\sqrt{\frac{4\beta^{4}c\left(c-1\right)+\left(\beta^{4}-c^{2}+c\right)^{2}}{\beta^{4}}}\sqrt{\frac{\beta^{2}\left(\beta^{2}+1\right)-c\left(c-1\right)}{\beta^{2}}}\left(\sqrt{\frac{-4\beta^{2}c\left(\beta^{2}\left(\beta^{2}-c+2\right)-c\left(c-1\right)\right)+\left(\beta^{2}\left(\beta^{2}+2\right)-c\left(c-1\right)\right)^{2}}{\beta^{4}}}+\sqrt{\frac{4\beta^{4}c\left(c-1\right)+\left(\beta^{4}-c\left(c-1\right)\right)^{2}}{\beta^{4}}}\right)}}.

From the identities

4β4c(c1)+(β4c(c1))2=(β4+c(c1))2\displaystyle 4\beta^{4}c(c-1)+(\beta^{4}-c(c-1))^{2}=(\beta^{4}+c(c-1))^{2}
4β2c(β2(β2c+2)c(c1))+(β2(β2+2)c(c1))2=(β4+β2(22c)c2+c)2\displaystyle-4\beta^{2}c\left(\beta^{2}\left(\beta^{2}-c+2\right)-c\left(c-1\right)\right)+\left(\beta^{2}\left(\beta^{2}+2\right)-c\left(c-1\right)\right)^{2}=\left(\beta^{4}+\beta^{2}\left(2-2c\right)-c^{2}+c\right)^{2}

κ(β,c)\kappa(\beta,c) simplifies as

κ(β,c)=ββ2(β2+1)c(c1)(β4+c(c1))(β2+1c).\displaystyle\kappa(\beta,c)=\beta\sqrt{\frac{\beta^{2}\left(\beta^{2}+1\right)-c\left(c-1\right)}{(\beta^{4}+c(c-1))\left(\beta^{2}+1-c\right)}}.

The asymptotic alignment a𝒚(β)a_{\bm{y}}^{\infty}(\beta) is given by 1κ(β,1c)\frac{1}{\kappa(\beta,1-c)} since the dimension ratio of the component 𝒚{\bm{y}} is c2=1cc_{2}=1-c. Moreover, the critical value of β\beta is obtained by solving the equation λ(βs)=1+2c(1c)\lambda^{\infty}(\beta_{s})=\sqrt{1+2\sqrt{c(1-c)}}, i.e., when the limiting singular value gets closer to the right edge of the support of the corresponding limiting spectral measure (see Corollary 4). Finally, similarly as above, we can check that α1(λ(β))α2(λ(β))\sqrt{\alpha_{1}(\lambda^{\infty}(\beta))\alpha_{2}(\lambda^{\infty}(\beta))} is equal to 11 for ββs\beta\geq\beta_{s} (i.e., the alignment along the third dimension is 11, corresponding to the component 𝒛{\bm{z}} in Eq. (10)).

B.9 Proof of Theorem 6

Denote the matrix model as

𝑵1NΦd(𝑿,𝒂(1),,𝒂(d)),\displaystyle{\bm{N}}\equiv\frac{1}{\sqrt{N}}\Phi_{d}({\bm{\mathsfit{X}}},{\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)}),

with 𝑿𝕋n1,,nd(𝒩(0,1)){\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{d}}(\mathcal{N}(0,1)) and (𝒂(1),,𝒂(d))𝕊n11××𝕊nd1({\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)})\in{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1} are independent of 𝑿{\bm{\mathsfit{X}}}. We further denote the resolvent of 𝑵{\bm{N}} as

𝑸(z)(𝑵z𝑰N)1=[𝑸11(z)𝑸12(z)𝑸13(z)𝑸1d(z)𝑸12(z)𝑸22(z)𝑸23(z)𝑸2d(z)𝑸13(z)𝑸23(z)𝑸33(z)𝑸3d(z)𝑸1d(z)𝑸2d(z)𝑸3d(z)𝑸dd(z)].\displaystyle{\bm{Q}}(z)\equiv\left({\bm{N}}-z{\bm{I}}_{N}\right)^{-1}=\begin{bmatrix}{\bm{Q}}^{11}(z)&{\bm{Q}}^{12}(z)&{\bm{Q}}^{13}(z)&\cdots&{\bm{Q}}^{1d}(z)\\ {\bm{Q}}^{12}(z)^{\top}&{\bm{Q}}^{22}(z)&{\bm{Q}}^{23}(z)&\cdots&{\bm{Q}}^{2d}(z)\\ {\bm{Q}}^{13}(z)^{\top}&{\bm{Q}}^{23}(z)^{\top}&{\bm{Q}}^{33}(z)&\ldots&{\bm{Q}}^{3d}(z)\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ {\bm{Q}}^{1d}(z)^{\top}&{\bm{Q}}^{2d}(z)^{\top}&{\bm{Q}}^{3d}(z)^{\top}&\cdots&{\bm{Q}}^{dd}(z)\end{bmatrix}.

By Borel-Cantelli lemma, we have 1Ntr𝑸(z)a.s.g(z)\frac{1}{N}\operatorname{tr}{\bm{Q}}(z)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g(z) and for all i[d]i\in[d], 1Ntr𝑸ii(z)a.s.gi(z)\frac{1}{N}\operatorname{tr}{\bm{Q}}^{ii}(z)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g_{i}(z). Applying the identity in Eq. (7) to the symmetric matrix 𝑵{\bm{N}} we get 𝑵𝑸(z)z𝑸(z)=𝑰N{\bm{N}}{\bm{Q}}(z)-z{\bm{Q}}(z)={\bm{I}}_{N}, from which we particularly get

1Nj=2d[𝑿1j𝑸1j(z)]i1i1zQi1i111(z)=1,\displaystyle\frac{1}{\sqrt{N}}\sum_{j=2}^{d}\left[{\bm{\mathsfit{X}}}^{1j}{\bm{Q}}^{1j}(z)^{\top}\right]_{i_{1}i_{1}}-zQ_{i_{1}i_{1}}^{11}(z)=1,

or

1NNj=12i1=1n1[𝑿1j𝑸1j(z)]i1i1zNtr𝑸11(z)=n1N,\displaystyle\frac{1}{N\sqrt{N}}\sum_{j=1}^{2}\sum_{i_{1}=1}^{n_{1}}\left[{\bm{\mathsfit{X}}}^{1j}{\bm{Q}}^{1j}(z)^{\top}\right]_{i_{1}i_{1}}-\frac{z}{N}\operatorname{tr}{\bm{Q}}^{11}(z)=\frac{n_{1}}{N}, (43)

where we recall that 𝑿ij𝑿(𝒂(1),,𝒂(i1),:,𝒂(i+1),,𝒂(j1),:,𝒂(j+1),,𝒂(d))𝕄ni,nj{\bm{\mathsfit{X}}}^{ij}\equiv{\bm{\mathsfit{X}}}({\bm{a}}^{(1)},\ldots,{\bm{a}}^{(i-1)},:,{\bm{a}}^{(i+1)},\ldots,{\bm{a}}^{(j-1)},:,{\bm{a}}^{(j+1)},\ldots,{\bm{a}}^{(d)})\in{\mathbb{M}}_{n_{i},n_{j}}.

We thus need to compute the expectation of 1NNi1=1n1[𝑿1j𝑸1j(z)]i1i1\frac{1}{N\sqrt{N}}\sum_{i_{1}=1}^{n_{1}}\left[{\bm{\mathsfit{X}}}^{1j}{\bm{Q}}^{1j}(z)^{\top}\right]_{i_{1}i_{1}} which develops as

Aj1NNi1=1n1𝔼[𝑿1j𝑸1j(z)]i1i1=1NNi1ij𝔼[[𝑿1j]i1ijQi1ij1j]=1NNi1,,idk1,kjaik(k)𝔼[Qi1ij1jXi1,,id]\displaystyle A_{j}\equiv\frac{1}{N\sqrt{N}}\sum_{i_{1}=1}^{n_{1}}\operatorname{\mathbb{E}}\left[{\bm{\mathsfit{X}}}^{1j}{\bm{Q}}^{1j}(z)^{\top}\right]_{i_{1}i_{1}}=\frac{1}{N\sqrt{N}}\sum_{i_{1}i_{j}}\operatorname{\mathbb{E}}\left[\left[{\bm{\mathsfit{X}}}^{1j}\right]_{i_{1}i_{j}}Q^{1j}_{i_{1}i_{j}}\right]=\frac{1}{N\sqrt{N}}\sum_{i_{1},\ldots,i_{d}}\prod_{k\neq 1,k\neq j}a_{i_{k}}^{(k)}\operatorname{\mathbb{E}}\left[\frac{\partial Q^{1j}_{i_{1}i_{j}}}{\partial X_{i_{1},\ldots,i_{d}}}\right]

where the last equality follows from Stein’s lemma (Lemma 1). In particular, as in Appendix B.2 for the 33-order case, it turns out that the only contributing term in the derivative Qi1ij1jXi1,,id\frac{\partial Q_{i_{1}i_{j}}^{1j}}{\partial X_{i_{1},\ldots,i_{d}}} is 1Nk1,kjaik(k)Qi1i111Qijijjj-\frac{1}{\sqrt{N}}\prod_{k\neq 1,k\neq j}a_{i_{k}}^{(k)}Q_{i_{1}i_{1}}^{11}Q_{i_{j}i_{j}}^{jj} with the other terms yielding quantities of order 𝒪(N1)\mathcal{O}(N^{-1}). Therefore, we find that

Aj\displaystyle A_{j} =1NNi1,,idk1,kjaik(k)𝔼[Qi1ij1jXi1,,id]=1N2i1,,idk1,kj(aik(k))2𝔼[Qi1i111Qijijjj]+𝒪(N1)\displaystyle=\frac{1}{N\sqrt{N}}\sum_{i_{1},\ldots,i_{d}}\prod_{k\neq 1,k\neq j}a_{i_{k}}^{(k)}\operatorname{\mathbb{E}}\left[\frac{\partial Q^{1j}_{i_{1}i_{j}}}{\partial X_{i_{1},\ldots,i_{d}}}\right]=-\frac{1}{N^{2}}\sum_{i_{1},\ldots,i_{d}}\prod_{k\neq 1,k\neq j}\left(a_{i_{k}}^{(k)}\right)^{2}\operatorname{\mathbb{E}}\left[Q_{i_{1}i_{1}}^{11}Q_{i_{j}i_{j}}^{jj}\right]+\mathcal{O}(N^{-1})
=1N2i1ij𝔼[Qi1i111Qijijjj]+𝒪(N1)\displaystyle=-\frac{1}{N^{2}}\sum_{i_{1}i_{j}}\operatorname{\mathbb{E}}\left[Q_{i_{1}i_{1}}^{11}Q_{i_{j}i_{j}}^{jj}\right]+\mathcal{O}(N^{-1})
=1Ntr𝑸11(z)1Ntr𝑸jj(z)+𝒪(N1)\displaystyle=-\frac{1}{N}\operatorname{tr}{\bm{Q}}^{11}(z)\frac{1}{N}\operatorname{tr}{\bm{Q}}^{jj}(z)+\mathcal{O}(N^{-1})
a.s.g1(z)gj(z)+𝒪(N1).\displaystyle\operatorname{\,\xrightarrow{\text{a.s.}}\,}-g_{1}(z)g_{j}(z)+\mathcal{O}(N^{-1}).

From Eq. (43), g1(z)g_{1}(z) satisfies g1(z)j1gj(z)zg1(z)=c1-g_{1}(z)\sum_{j\neq 1}g_{j}(z)-zg_{1}(z)=c_{1} with c1=limn1Nc_{1}=\lim\frac{n_{1}}{N}. Similarly, for all i[d]i\in[d], gi(z)g_{i}(z) satisfies gi(z)jigj(z)zgi(z)=ci-g_{i}(z)\sum_{j\neq i}g_{j}(z)-zg_{i}(z)=c_{i} with ci=limniNc_{i}=\lim\frac{n_{i}}{N}. And since, g(z)=i=1dgi(z)g(z)=\sum_{i=1}^{d}g_{i}(z), we have for each i[d]i\in[d], gi(z)(g(z)gi(z))+zgi(z)+ci=0g_{i}(z)(g(z)-g_{i}(z))+zg_{i}(z)+c_{i}=0, yielding

gi(z)=g(z)+z24ci+(g(z)+z)22,\displaystyle g_{i}(z)=\frac{g(z)+z}{2}-\frac{\sqrt{4c_{i}+(g(z)+z)^{2}}}{2},

with g(z)g(z) solution to the equation g(z)=i=1dgi(z)g(z)=\sum_{i=1}^{d}g_{i}(z) satisfying [g(z)]>0\Im[g(z)]>0 for [z]>0\Im[z]>0.

B.10 Proof of Corollary 6

Given Theorem 6 and setting ci=1dc_{i}=\frac{1}{d} for all i[d]i\in[d], we have gi(z)=g(z)dg_{i}(z)=\frac{g(z)}{d}, thus g(z)g(z) satisfies the equation

g(z)d=g(z)+z24d+(g(z)+z)22.\displaystyle\frac{g(z)}{d}=\frac{g(z)+z}{2}-\frac{\sqrt{\frac{4}{d}+(g(z)+z)^{2}}}{2}.

Solving in g(z)g(z) yields

g(z){dz2(d1)d(dz24d+4)2(d1),dz2(d1)+d(dz24d+4)2(d1)},\displaystyle g(z)\in\left\{-\frac{dz}{2\left(d-1\right)}-\frac{\sqrt{d\left(dz^{2}-4d+4\right)}}{2\left(d-1\right)},-\frac{dz}{2\left(d-1\right)}+\frac{\sqrt{d\left(dz^{2}-4d+4\right)}}{2\left(d-1\right)}\right\},

and the limiting Stieltjes transform with [g(z)]0\Im[g(z)]\geq 0 is

g(z)=dz2(d1)+d(dz24d+4)2(d1).\displaystyle g(z)=-\frac{dz}{2\left(d-1\right)}+\frac{\sqrt{d\left(dz^{2}-4d+4\right)}}{2\left(d-1\right)}.

B.11 Existence of g(z)g(z)

Define the following function for (g,z)×+(g,z)\in{\mathbb{R}}\times{\mathbb{R}}_{+} and d3d\geq 3

k(g,z)=gg+z2d+12i=1d4ci+(g+z)2,\displaystyle k(g,z)=g-\frac{g+z}{2}d+\frac{1}{2}\sum_{i=1}^{d}\sqrt{4c_{i}+(g+z)^{2}},

with i=1dci=1\sum_{i=1}^{d}c_{i}=1. By concavity of the function \sqrt{\cdot} we have i=1d4ci+x2d4d+x2\sum_{i=1}^{d}\sqrt{4c_{i}+x^{2}}\leq d\sqrt{\frac{4}{d}+x^{2}} for all xx\in{\mathbb{R}}. Therefore, k(g,z)k(g,z) is bounded as

k(g,z)k¯(g,z)gg+z2d+d24d+(g+z)2.\displaystyle k(g,z)\leq\bar{k}(g,z)\equiv g-\frac{g+z}{2}d+\frac{d}{2}\sqrt{\frac{4}{d}+(g+z)^{2}}.

From Section B.10, we have, for any z>2d1dz>2\sqrt{\frac{d-1}{d}}, there exists gg_{*}\in{\mathbb{R}} such that k¯(g,z)=0\bar{k}(g_{*},z)=0. Besides, we further have limgk(g,z)=limg+k(g,z)=+\lim_{g\to-\infty}k(g,z)=\lim_{g\to+\infty}k(g,z)=+\infty and hence by continuity, there exists g(z)g(z) such that k(g(z),z)=0k(g(z),z)=0 for zz large enough (e.g., z>2d1dz>2\sqrt{\frac{d-1}{d}}).

B.12 Proof of Theorem 7

Given the random tensor model in Eq. (1) and its singular vectors characterized by Eq. (26), we denote the associated random matrix model as

𝑻Φd(𝑻,𝒖(1),,𝒖(d))=β𝑽𝑩𝑽+𝑵,\displaystyle{\bm{T}}\equiv\Phi_{d}\left({\bm{\mathsfit{T}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}\right)=\beta{\bm{V}}{\bm{B}}{\bm{V}}^{\top}+{\bm{N}},

where 𝑵=1NΦd(𝑿,𝒖(1),,𝒖(d)){\bm{N}}=\frac{1}{\sqrt{N}}\Phi_{d}\left({\bm{\mathsfit{X}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}\right), 𝑩𝕄d{\bm{B}}\in{\mathbb{M}}_{d} with entries Bij=(1δij)ki,j𝒙(k),𝒖(k)B_{ij}=(1-\delta_{ij})\prod_{k\neq i,j}\langle{\bm{x}}^{(k)},{\bm{u}}^{(k)}\rangle

𝑽[𝒙(1)𝟎n1𝟎n1𝟎n2𝒙(2)𝟎n2𝟎n3𝟎nd𝟎nd𝟎nd𝒙(d)]𝕄N,d.\displaystyle{\bm{V}}\equiv\begin{bmatrix}{\bm{x}}^{(1)}&{\bm{0}}_{n_{1}}&\cdots&{\bm{0}}_{n_{1}}\\ {\bm{0}}_{n_{2}}&{\bm{x}}^{(2)}&\cdots&{\bm{0}}_{n_{2}}\\ \vdots&\vdots&\ddots&{\bm{0}}_{n_{3}}\\ {\bm{0}}_{n_{d}}&{\bm{0}}_{n_{d}}&{\bm{0}}_{n_{d}}&{\bm{x}}^{(d)}\end{bmatrix}\in{\mathbb{M}}_{N,d}.

We further denote the resolvent of 𝑻{\bm{T}} and 𝑵{\bm{N}} respectively as

𝑹(z)(𝑻z𝑰N)1=[𝑹11(z)𝑹12(z)𝑹13(z)𝑹1d(z)𝑹12(z)𝑹22(z)𝑹23(z)𝑹2d(z)𝑹13(z)𝑹23(z)𝑹33(z)𝑹3d(z)𝑹1d(z)𝑹2d(z)𝑹3d(z)𝑹dd(z)].\displaystyle{\bm{R}}(z)\equiv\left({\bm{T}}-z{\bm{I}}_{N}\right)^{-1}=\begin{bmatrix}{\bm{R}}^{11}(z)&{\bm{R}}^{12}(z)&{\bm{R}}^{13}(z)&\cdots&{\bm{R}}^{1d}(z)\\ {\bm{R}}^{12}(z)^{\top}&{\bm{R}}^{22}(z)&{\bm{R}}^{23}(z)&\cdots&{\bm{R}}^{2d}(z)\\ {\bm{R}}^{13}(z)^{\top}&{\bm{R}}^{23}(z)^{\top}&{\bm{R}}^{33}(z)&\ldots&{\bm{R}}^{3d}(z)\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ {\bm{R}}^{1d}(z)^{\top}&{\bm{R}}^{2d}(z)^{\top}&{\bm{R}}^{3d}(z)^{\top}&\cdots&{\bm{R}}^{dd}(z)\end{bmatrix}.
𝑸(z)(𝑵z𝑰N)1=[𝑸11(z)𝑸12(z)𝑸13(z)𝑸1d(z)𝑸12(z)𝑸22(z)𝑸23(z)𝑸2d(z)𝑸13(z)𝑸23(z)𝑸33(z)𝑸3d(z)𝑸1d(z)𝑸2d(z)𝑸3d(z)𝑸dd(z)].\displaystyle{\bm{Q}}(z)\equiv\left({\bm{N}}-z{\bm{I}}_{N}\right)^{-1}=\begin{bmatrix}{\bm{Q}}^{11}(z)&{\bm{Q}}^{12}(z)&{\bm{Q}}^{13}(z)&\cdots&{\bm{Q}}^{1d}(z)\\ {\bm{Q}}^{12}(z)^{\top}&{\bm{Q}}^{22}(z)&{\bm{Q}}^{23}(z)&\cdots&{\bm{Q}}^{2d}(z)\\ {\bm{Q}}^{13}(z)^{\top}&{\bm{Q}}^{23}(z)^{\top}&{\bm{Q}}^{33}(z)&\ldots&{\bm{Q}}^{3d}(z)\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ {\bm{Q}}^{1d}(z)^{\top}&{\bm{Q}}^{2d}(z)^{\top}&{\bm{Q}}^{3d}(z)^{\top}&\cdots&{\bm{Q}}^{dd}(z)\end{bmatrix}.

Similarly as in 33-order case, by Woodbury matrix identity (Lemma 5), we have

1Ntr𝑹(z)\displaystyle\frac{1}{N}\operatorname{tr}{\bm{R}}(z) =1Ntr𝑸(z)1Ntr[(1β𝑩1+𝑽𝑸(z)𝑽)1𝑽𝑸2(z)𝑽],\displaystyle=\frac{1}{N}\operatorname{tr}{\bm{Q}}(z)-\frac{1}{N}\operatorname{tr}\left[\left(\frac{1}{\beta}{\bm{B}}^{-1}+{\bm{V}}^{\top}{\bm{Q}}(z){\bm{V}}\right)^{-1}{\bm{V}}^{\top}{\bm{Q}}^{2}(z){\bm{V}}\right],
=1Ntr𝑸(z)+𝒪(N1),\displaystyle=\frac{1}{N}\operatorname{tr}{\bm{Q}}(z)+\mathcal{O}(N^{-1}),

since the perturbation matrix (1β𝑩1+𝑽𝑸(z)𝑽)1𝑽𝑸2(z)𝑽\left(\frac{1}{\beta}{\bm{B}}^{-1}+{\bm{V}}^{\top}{\bm{Q}}(z){\bm{V}}\right)^{-1}{\bm{V}}^{\top}{\bm{Q}}^{2}(z){\bm{V}} is of bounded spectral norm (if 𝑸(z)\|{\bm{Q}}(z)\| is bounded, see condition in Eq. (8)) and has finite size (d×dd\times d matrix). Therefore, the characterization of the spectrum of 𝑻{\bm{T}} boils down to the estimation of 1Ntr𝑸(z)\frac{1}{N}\operatorname{tr}{\bm{Q}}(z). Now we are left to handle the statistical dependency between the tensor noise 𝑿{\bm{X}} and the singular vectors of 𝑻{\bm{\mathsfit{T}}}. Recalling the proof of Appendix B.9, we have again

1NNj=12i1=1n1[𝑿1j𝑸1j(z)]i1i1zNtr𝑸11(z)=n1N,\displaystyle\frac{1}{N\sqrt{N}}\sum_{j=1}^{2}\sum_{i_{1}=1}^{n_{1}}\left[{\bm{\mathsfit{X}}}^{1j}{\bm{Q}}^{1j}(z)^{\top}\right]_{i_{1}i_{1}}-\frac{z}{N}\operatorname{tr}{\bm{Q}}^{11}(z)=\frac{n_{1}}{N},

with 𝑿ij𝑿(𝒖(1),,𝒖(i1),:,𝒖(i+1),,𝒖(j1),:,𝒖(j+1),,𝒖(d))𝕄ni,nj{\bm{\mathsfit{X}}}^{ij}\equiv{\bm{\mathsfit{X}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(i-1)},:,{\bm{u}}^{(i+1)},\ldots,{\bm{u}}^{(j-1)},:,{\bm{u}}^{(j+1)},\ldots,{\bm{u}}^{(d)})\in{\mathbb{M}}_{n_{i},n_{j}}. Taking the expectation of 1NNi1=1n1[𝑿1j𝑸1j(z)]i1i1\frac{1}{N\sqrt{N}}\sum_{i_{1}=1}^{n_{1}}\left[{\bm{\mathsfit{X}}}^{1j}{\bm{Q}}^{1j}(z)^{\top}\right]_{i_{1}i_{1}} yields

Aj=1NNi1,,id𝔼[Qi1ij1jXi1,,idk1,kjuik(k)]+1NNi1,,id𝔼[Qi1ij1jk1,kjuik(k)Xi1,,id]=Aj1+Aj2,\displaystyle A_{j}=\frac{1}{N\sqrt{N}}\sum_{i_{1},\ldots,i_{d}}\operatorname{\mathbb{E}}\left[\frac{\partial Q^{1j}_{i_{1}i_{j}}}{\partial X_{i_{1},\ldots,i_{d}}}\prod_{k\neq 1,k\neq j}u_{i_{k}}^{(k)}\right]+\frac{1}{N\sqrt{N}}\sum_{i_{1},\ldots,i_{d}}\operatorname{\mathbb{E}}\left[Q^{1j}_{i_{1}i_{j}}\prod_{k\neq 1,k\neq j}\frac{\partial u_{i_{k}}^{(k)}}{\partial X_{i_{1},\ldots,i_{d}}}\right]=A_{j1}+A_{j2},

where we already computed the first term (Aj1A_{j1}) in Appendix B.9. Now we will show that Aj2A_{j2} is asymptotically vanishing under Assumption 3. Indeed, by Eq. (28), the higher order terms arise from the term 1Nkui()Rikikkk(λ)-\frac{1}{\sqrt{N}}\prod_{\ell\neq k}u^{(\ell)}_{i_{\ell}}R^{kk}_{i_{k}i_{k}}(\lambda), we thus only show that the contribution of this term is also vanishing. Precisely,

Aj2\displaystyle A_{j2} =1N2i1,,id𝔼[Qi1ij1jk1,kjkui()Rikikkk(λ)]+𝒪(N1)\displaystyle=-\frac{1}{N^{2}}\sum_{i_{1},\ldots,i_{d}}\operatorname{\mathbb{E}}\left[Q_{i_{1}i_{j}}^{1j}\prod_{k\neq 1,k\neq j}\prod_{\ell\neq k}u^{(\ell)}_{i_{\ell}}R^{kk}_{i_{k}i_{k}}(\lambda)\right]+\mathcal{O}(N^{-1})
=1N2i1,,id𝔼[(ui1(1))d2Qi1ij1j(uij(j))d2k1,kjuik(k)Rikikkk(λ)]+𝒪(N1)\displaystyle=-\frac{1}{N^{2}}\sum_{i_{1},\ldots,i_{d}}\operatorname{\mathbb{E}}\left[\left(u_{i_{1}}^{(1)}\right)^{d-2}Q_{i_{1}i_{j}}^{1j}\left(u_{i_{j}}^{(j)}\right)^{d-2}\prod_{k\neq 1,k\neq j}u^{(k)}_{i_{k}}R^{kk}_{i_{k}i_{k}}(\lambda)\right]+\mathcal{O}(N^{-1})
=1N2𝔼[((𝒖(1))d2)𝑸1j(z)(𝒖(j))d2i2,,ij1,ij+1,,idk1,kjuik(k)Rikikkk(λ)]+𝒪(N1)\displaystyle=-\frac{1}{N^{2}}\operatorname{\mathbb{E}}\left[\left(({\bm{u}}^{(1)})^{\odot d-2}\right)^{\top}{\bm{Q}}^{1j}(z)\left({\bm{u}}^{(j)}\right)^{\odot d-2}\sum_{i_{2},\ldots,i_{j-1},i_{j+1},\ldots,i_{d}}\prod_{k\neq 1,k\neq j}u^{(k)}_{i_{k}}R^{kk}_{i_{k}i_{k}}(\lambda)\right]+\mathcal{O}(N^{-1})

where 𝒂q{\bm{a}}^{\odot q} denotes the vector with entries aiqa_{i}^{q}. As such, Aj2A_{j2} is vanishing asymptotically since the singular vectors 𝒖(1),,𝒖(d){\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)} are unitary and since their entries are bounded by 11.

We finally need to check that the derivative of 𝑸(z){\bm{Q}}(z) w.r.t. the entry Xi1,,idX_{i_{1},\ldots,i_{d}} has the same expression asymptotically as the one in the independent case of Appendix B.9. Indeed, we have 𝑸(z)Xi1,,id=𝑸(z)𝑵Xi1,,id𝑸(z)\frac{\partial{\bm{Q}}(z)}{\partial X_{i_{1},\ldots,i_{d}}}=-{\bm{Q}}(z)\frac{\partial{\bm{N}}}{\partial X_{i_{1},\ldots,i_{d}}}{\bm{Q}}(z) with

𝑵Xi1,,id=[𝟎n1×n1𝑪12𝑪1d𝑪12𝟎n2×n2𝑪2d𝑪1d𝑪2d𝟎nd×nd]+1NΦd(𝑿,𝒖(1)Xi1,,id,,𝒖(d)Xi1,,id),\displaystyle\frac{\partial{\bm{N}}}{\partial X_{i_{1},\ldots,i_{d}}}=\begin{bmatrix}{\bm{0}}_{n_{1}\times n_{1}}&{\bm{C}}_{12}&\cdots&{\bm{C}}_{1d}\\ {\bm{C}}_{12}^{\top}&{\bm{0}}_{n_{2}\times n_{2}}&\cdots&{\bm{C}}_{2d}\\ \vdots&\vdots&\ddots&\vdots\\ {\bm{C}}_{1d}^{\top}&{\bm{C}}_{2d}^{\top}&\cdots&{\bm{0}}_{n_{d}\times n_{d}}\end{bmatrix}+\frac{1}{\sqrt{N}}\Phi_{d}\left({\bm{X}},\frac{\partial{\bm{u}}^{(1)}}{\partial X_{i_{1},\ldots,i_{d}}},\ldots,\frac{\partial{\bm{u}}^{(d)}}{\partial X_{i_{1},\ldots,i_{d}}}\right),

where 𝑪ij=ki,kjuik(k)𝒆iini(𝒆ijnj){\bm{C}}_{ij}=\prod_{k\neq i,k\neq j}u^{(k)}_{i_{k}}{\bm{e}}_{i_{i}}^{n_{i}}({\bm{e}}_{i_{j}}^{n_{j}})^{\top} and 𝑶=1NΦd(𝑿,𝒖(1)Xi1,,id,,𝒖(d)Xi1,,id){\bm{O}}=\frac{1}{\sqrt{N}}\Phi_{d}\left({\bm{X}},\frac{\partial{\bm{u}}^{(1)}}{\partial X_{i_{1},\ldots,i_{d}}},\ldots,\frac{\partial{\bm{u}}^{(d)}}{\partial X_{i_{1},\ldots,i_{d}}}\right) is of vanishing norm. Indeed, by Eq. (28), there exists C>0C>0 independent of NN such that 𝒖(1)Xi1,,id,,𝒖(d)Xi1,,idCN\|\frac{\partial{\bm{u}}^{(1)}}{\partial X_{i_{1},\ldots,i_{d}}}\|,\ldots,\|\frac{\partial{\bm{u}}^{(d)}}{\partial X_{i_{1},\ldots,i_{d}}}\|\leq\frac{C}{\sqrt{N}}, therefore, the spectral norm of 𝑶{\bm{O}} is bounded by CNd2\frac{C^{\prime}}{N^{\frac{d}{2}}} for some constant C>0C^{\prime}>0 independent of NN. Finally, Ajg1(z)gj(z)+𝒪(N1)A_{j}\to-g_{1}(z)g_{j}(z)+\mathcal{O}(N^{-1}) (with g1(z),gj(z)g_{1}(z),g_{j}(z) the almost sure limits of 1Ntr𝑸11(z)\frac{1}{N}\operatorname{tr}{\bm{Q}}^{11}(z) and 1Ntr𝑸jj(z)\frac{1}{N}\operatorname{tr}{\bm{Q}}^{jj}(z) respectively), hence yielding the same limiting Stieltjes transform as the one obtained in Appendix B.9.

B.13 Proof of Theorem 8

Given the identities in Eq. (26), we have for all i[d]i\in[d]

1Nj1,,jdxji(i)kiujk(k)Xj1,,jd+βki𝒖(k),𝒙(k)=λ𝒖(i),𝒙(i),\displaystyle\frac{1}{\sqrt{N}}\sum_{j_{1},\ldots,j_{d}}x_{j_{i}}^{(i)}\prod_{k\neq i}u_{j_{k}}^{(k)}X_{j_{1},\ldots,j_{d}}+\beta\prod_{k\neq i}\langle{\bm{u}}^{(k)},{\bm{x}}^{(k)}\rangle=\lambda\langle{\bm{u}}^{(i)},{\bm{x}}^{(i)}\rangle,

with λ\lambda and 𝒖(i),𝒙(i)\langle{\bm{u}}^{(i)},{\bm{x}}^{(i)}\rangle concentrate almost surely around their asymptotic denoted λ(β)\lambda^{\infty}(\beta) and a𝒙(i)(β)a_{{\bm{x}}^{(i)}}^{\infty}(\beta) respectively. Taking the expectation of the first term and applying Stein’s lemma (Lemma 1), we get

Ai=1Nj1,,jdxji(i)𝔼[(kiujk(k))Xj1,,jd]=1Nj1,,jdxji(i)ki𝔼[ujk(k)Xj1,,jdk,iuj()],\displaystyle A_{i}=\frac{1}{\sqrt{N}}\sum_{j_{1},\ldots,j_{d}}x_{j_{i}}^{(i)}\operatorname{\mathbb{E}}\left[\frac{\partial\left(\prod_{k\neq i}u_{j_{k}}^{(k)}\right)}{\partial X_{j_{1},\ldots,j_{d}}}\right]=\frac{1}{\sqrt{N}}\sum_{j_{1},\ldots,j_{d}}x_{j_{i}}^{(i)}\sum_{k\neq i}\operatorname{\mathbb{E}}\left[\frac{\partial u_{j_{k}}^{(k)}}{\partial X_{j_{1},\ldots,j_{d}}}\prod_{\ell\neq k,\ell\neq i}u_{j_{\ell}}^{(\ell)}\right],

where the only contributing term in the expression of ujk(k)Xj1,,jd\frac{\partial u_{j_{k}}^{(k)}}{\partial X_{j_{1},\ldots,j_{d}}} from Eq. (28) is 1Nkuj()Rjkjkkk(λ)-\frac{1}{\sqrt{N}}\prod_{\ell\neq k}u_{j_{\ell}}^{(\ell)}R_{j_{k}j_{k}}^{kk}(\lambda) which yields

Ai\displaystyle A_{i} =1Nj1,,jdxji(i)ki𝔼[Rjkjkkk(λ)kuj()k,iuj()]+𝒪(N1)\displaystyle=-\frac{1}{N}\sum_{j_{1},\ldots,j_{d}}x^{(i)}_{j_{i}}\sum_{k\neq i}\operatorname{\mathbb{E}}\left[R_{j_{k}j_{k}}^{kk}(\lambda)\prod_{\ell\neq k}u_{j_{\ell}}^{(\ell)}\prod_{\ell\neq k,\ell\neq i}u_{j_{\ell}}^{(\ell)}\right]+\mathcal{O}(N^{-1})
=𝔼[𝒖(i),𝒙(i)ki1Ntr𝑹kk(λ)]+𝒪(N1)\displaystyle=-\operatorname{\mathbb{E}}\left[\langle{\bm{u}}^{(i)},{\bm{x}}^{(i)}\rangle\sum_{k\neq i}\frac{1}{N}\operatorname{tr}{\bm{R}}^{kk}(\lambda)\right]+\mathcal{O}(N^{-1})
𝔼[𝒖(i),𝒙(i)]kigk(λ).\displaystyle\to-\operatorname{\mathbb{E}}\left[\langle{\bm{u}}^{(i)},{\bm{x}}^{(i)}\rangle\right]\sum_{k\neq i}g_{k}(\lambda).

Therefore, the almost sure limits λ\lambda^{\infty} and a𝒙(i)a_{{\bm{x}}^{(i)}}^{\infty} for each i[d]i\in[d] satisfy

λa𝒙(i)=βkia𝒙(k)a𝒙(i)kigk(λ),\displaystyle\lambda^{\infty}a_{{\bm{x}}^{(i)}}^{\infty}=\beta\prod_{k\neq i}a_{{\bm{x}}^{(k)}}^{\infty}-a_{{\bm{x}}^{(i)}}^{\infty}\sum_{k\neq i}g_{k}(\lambda^{\infty}),

therefore

a𝒙(i)=αi(λ)kia𝒙(k),withαi(z)=βz+g(z)gi(z),\displaystyle a_{{\bm{x}}^{(i)}}^{\infty}=\alpha_{i}(\lambda^{\infty})\prod_{k\neq i}a_{{\bm{x}}^{(k)}}^{\infty},\quad\text{with}\quad\alpha_{i}(z)=\frac{\beta}{z+g(z)-g_{i}(z)},

since g(z)=k=1dgi(z)g(z)=\sum_{k=1}^{d}g_{i}(z). To solve the above equation, we simply write xi=a𝒙(i)x_{i}=a_{{\bm{x}}^{(i)}}^{\infty} and αi=αi(z)\alpha_{i}=\alpha_{i}(z) by omitting the dependence on zz. We therefore have

xi=αikixkxi=αixjki,kjxk=αixjki,kj(αkkx),\displaystyle x_{i}=\alpha_{i}\prod_{k\neq i}x_{k}\quad\Rightarrow\quad x_{i}=\alpha_{i}x_{j}\prod_{k\neq i,k\neq j}x_{k}=\alpha_{i}x_{j}\prod_{k\neq i,k\neq j}\left(\alpha_{k}\prod_{\ell\neq k}x_{\ell}\right),

from which we have

xi=xj(kjαk)(ki,kjkx)=xj(kjαk)(ki,kjk,ix)xid2,\displaystyle x_{i}=x_{j}\left(\prod_{k\neq j}\alpha_{k}\right)\left(\prod_{k\neq i,k\neq j}\prod_{\ell\neq k}x_{\ell}\right)=x_{j}\left(\prod_{k\neq j}\alpha_{k}\right)\left(\prod_{k\neq i,k\neq j}\prod_{\ell\neq k,\ell\neq i}x_{\ell}\right)x_{i}^{d-2},

thus

xj(kjαk)(ki,kjk,ix)xid3=1,\displaystyle x_{j}\left(\prod_{k\neq j}\alpha_{k}\right)\left(\prod_{k\neq i,k\neq j}\prod_{\ell\neq k,\ell\neq i}x_{\ell}\right)x_{i}^{d-3}=1,

and we remark that (ki,kjk,ix)xid3=(xjαj)d3xjd2\left(\prod_{k\neq i,k\neq j}\prod_{\ell\neq k,\ell\neq i}x_{\ell}\right)x_{i}^{d-3}=\left(\frac{x_{j}}{\alpha_{j}}\right)^{d-3}x_{j}^{d-2}, hence xjx_{j} is given by

xj=(αjd3kjαk)12d4,\displaystyle x_{j}=\left(\frac{\alpha_{j}^{d-3}}{\prod_{k\neq j}\alpha_{k}}\right)^{\frac{1}{2d-4}},

which ends the proof.

Alternative expression of qi(z)q_{i}(z)

From the above, we have λ+g(λ)=βi=1dxi\lambda^{\infty}+g(\lambda^{\infty})=\beta\prod_{i=1}^{d}x_{i} and

(λ+g(λ)gi(λ))xi=βjidxj(λ+g(λ)gi(λ))xi2=βi=1dxi.\displaystyle(\lambda^{\infty}+g(\lambda^{\infty})-g_{i}(\lambda^{\infty}))x_{i}=\beta\prod_{j\neq i}^{d}x_{j}\quad\Rightarrow\quad(\lambda^{\infty}+g(\lambda^{\infty})-g_{i}(\lambda^{\infty}))x_{i}^{2}=\beta\prod_{i=1}^{d}x_{i}.

Therefore,

xi=λ+g(λ)λ+g(λ)gi(λ)=1+gi(λ)λ+g(λ)gi(λ),\displaystyle x_{i}=\sqrt{\frac{\lambda^{\infty}+g(\lambda^{\infty})}{\lambda^{\infty}+g(\lambda^{\infty})-g_{i}(\lambda^{\infty})}}=\sqrt{1+\frac{g_{i}(\lambda^{\infty})}{\lambda^{\infty}+g(\lambda^{\infty})-g_{i}(\lambda^{\infty})}},

with z+g(z)gi(z)=cigi(z)z+g(z)-g_{i}(z)=\frac{-c_{i}}{g_{i}(z)} since gi(z)g_{i}(z) satisfies gi2(z)(g(z)+z)gi(z)ci=0g_{i}^{2}(z)-(g(z)+z)g_{i}(z)-c_{i}=0. Hence, we find

xi=1gi2(λ)ci.\displaystyle x_{i}=\sqrt{1-\frac{g_{i}^{2}(\lambda^{\infty})}{c_{i}}}.

B.14 Additional lemmas

Lemma 4 (Spectral norm of random Gaussian tensors).

Let 𝑿𝕋n1,,nd(𝒩(0,1)){\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{d}}(\mathcal{N}(0,1)), then the spectral norm of 𝑿\|{\bm{\mathsfit{X}}}\| can be bounded, with probability at least 1δ1-\delta for δ>0\delta>0, as

𝑿2[(i=1dni)log(2dlog(3/2))+log(2δ)].\displaystyle\|{\bm{\mathsfit{X}}}\|\leq\sqrt{2\left[\left(\sum_{i=1}^{d}n_{i}\right)\log\left(\frac{2d}{\log(3/2)}\right)+\log\left(\frac{2}{\delta}\right)\right]}.
Proof.

By definition, the spectral norm of 𝑿{\bm{\mathsfit{X}}} is given as

𝑿=sup𝒖(i)𝕊ni1,i[d]𝑿(𝒖(1),,𝒖(d))\displaystyle\|{\bm{\mathsfit{X}}}\|=\sup_{{\bm{u}}^{(i)}\in{\mathbb{S}}^{n_{i}-1},\,i\in[d]}{\bm{\mathsfit{X}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}) (44)

For some ε>0\varepsilon>0, let 1,,d{\mathcal{E}}_{1},\ldots,{\mathcal{E}}_{d} be ε\varepsilon-nets of 𝕊n11,,𝕊nd1{\mathbb{S}}^{n_{1}-1},\ldots,{\mathbb{S}}^{n_{d}-1} respectively. Since 𝕊n11××𝕊nd1{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1} is compact, there exists a maximizer of Eq. (44). And with the ε\varepsilon-net argument, there exists 𝒆(i)i{\bm{e}}^{(i)}\in{\mathcal{E}}_{i} for each i[d]i\in[d] such that

𝑿=𝑿(𝒆(1)+𝜹(1),,𝒆(d)+𝜹(d)),\displaystyle\|{\bm{\mathsfit{X}}}\|={\bm{\mathsfit{X}}}({\bm{e}}^{(1)}+{\bm{\delta}}^{(1)},\ldots,{\bm{e}}^{(d)}+{\bm{\delta}}^{(d)}),

such that 𝜹(i)ε\|{\bm{\delta}}^{(i)}\|\leq\varepsilon for i[d]i\in[d]. Therefore, one has

𝑿𝑿(𝒆(1),,𝒆(d))+(i=1dεi(di))𝑿.\displaystyle\|{\bm{\mathsfit{X}}}\|\leq{\bm{\mathsfit{X}}}({\bm{e}}^{(1)},\ldots,{\bm{e}}^{(d)})+\left(\sum_{i=1}^{d}\varepsilon^{i}\binom{d}{i}\right)\|{\bm{\mathsfit{X}}}\|.

For ε=log(3/2)d\varepsilon=\frac{\log(3/2)}{d}, one has i=1dεi(di)i=1d(εd)ii!eεd1=12\sum_{i=1}^{d}\varepsilon^{i}\binom{d}{i}\leq\sum_{i=1}^{d}\frac{(\varepsilon d)^{i}}{i!}\leq e^{\varepsilon d}-1=\frac{1}{2}. As such, the spectral norm of 𝑿{\bm{\mathsfit{X}}} can be bounded as

𝑿2max𝒆(i)i,i[d]𝑿(𝒆(1),,𝒆(d)).\displaystyle\|{\bm{\mathsfit{X}}}\|\leq 2\max_{{\bm{e}}^{(i)}\in{\mathcal{E}}_{i},\,i\in[d]}{\bm{\mathsfit{X}}}({\bm{e}}^{(1)},\ldots,{\bm{e}}^{(d)}). (45)

Since the entries of 𝑿{\bm{\mathsfit{X}}} are i.i.d. standard Gaussian random variables, we have 𝔼[etXi1id]et2/2\operatorname{\mathbb{E}}\left[e^{tX_{i_{1}\ldots i_{d}}}\right]\leq e^{t^{2}/2}, hence using Hoeffding’s inequality we have for any 𝒖(i)𝕊ni1{\bm{u}}^{(i)}\in{\mathbb{S}}^{n_{i}-1} with i[d]i\in[d]

{𝑿(𝒖(1),,𝒖(d))t}={es𝑿(𝒖(1),,𝒖(d))est}est𝔼[es𝑿(𝒖(1),,𝒖(d))]\displaystyle{\mathbb{P}}\left\{{\bm{\mathsfit{X}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})\geq t\right\}={\mathbb{P}}\left\{e^{s{\bm{\mathsfit{X}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})}\geq e^{st}\right\}\leq e^{-st}\operatorname{\mathbb{E}}\left[e^{s{\bm{\mathsfit{X}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})}\right]
exp{st+s22i1,,id(ui1(1))2(uid(d))2}=exp{st+s22}.\displaystyle\leq\exp\left\{-st+\frac{s^{2}}{2}\sum_{i_{1},\ldots,i_{d}}(u_{i_{1}}^{(1)})^{2}\cdots(u_{i_{d}}^{(d)})^{2}\right\}=\exp\left\{-st+\frac{s^{2}}{2}\right\}.

Minimizing the right-hand side w.r.t. ss yields {𝑿(𝒖(1),,𝒖(d))t}et2/2{\mathbb{P}}\left\{{\bm{\mathsfit{X}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})\geq t\right\}\leq e^{-t^{2}/2}. With the same arguments, we further have {𝑿(𝒖(1),,𝒖(d))t}et2/2{\mathbb{P}}\left\{{\bm{\mathsfit{X}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})\leq-t\right\}\leq e^{-t^{2}/2}. And taking the union of the two cases yields

{|𝑿(𝒖(1),,𝒖(d))|t}2et2/2.\displaystyle{\mathbb{P}}\left\{|{\bm{\mathsfit{X}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})|\geq t\right\}\leq 2e^{-t^{2}/2}.

Back to Eq. (45), since |i|(2ε)ni|{\mathcal{E}}_{i}|\leq\left(\frac{2}{\varepsilon}\right)^{n_{i}}, involving the union bound gives us

{𝑿t}\displaystyle{\mathbb{P}}\left\{\|{\bm{\mathsfit{X}}}\|\geq t\right\} 𝒆(1)1,,𝒆(d)d{𝑿(𝒆(1),,𝒆(d))t2}\displaystyle\leq\sum_{{\bm{e}}^{(1)}\in{\mathcal{E}}_{1},\ldots,{\bm{e}}^{(d)}\in{\mathcal{E}}_{d}}{\mathbb{P}}\left\{{\bm{\mathsfit{X}}}({\bm{e}}^{(1)},\ldots,{\bm{e}}^{(d)})\geq\frac{t}{2}\right\}
(2dlog(3/2))i=1dni2exp(t28),\displaystyle\leq\left(\frac{2d}{\log(3/2)}\right)^{\sum_{i=1}^{d}n_{i}}2\exp\left(-\frac{t^{2}}{8}\right),

which yields the final bound for an appropriate choice of tt. ∎

Lemma 5 (Woodbury matrix identity).

Let 𝐀𝕄n{\bm{A}}\in{\mathbb{M}}_{n}, 𝐁𝕄k{\bm{B}}\in{\mathbb{M}}_{k}, 𝐔𝕄n,k{\bm{U}}\in{\mathbb{M}}_{n,k} and 𝐕𝕄k,n{\bm{V}}\in{\mathbb{M}}_{k,n}, we have

(𝑨+𝑼𝑩𝑽)1=𝑨1𝑨1𝑼(𝑩1+𝑽𝑨1𝑼)1𝑽𝑨1.\displaystyle\left({\bm{A}}+{\bm{U}}{\bm{B}}{\bm{V}}\right)^{-1}={\bm{A}}^{-1}-{\bm{A}}^{-1}{\bm{U}}\left({\bm{B}}^{-1}+{\bm{V}}{\bm{A}}^{-1}{\bm{U}}\right)^{-1}{\bm{V}}{\bm{A}}^{-1}.

References

  • [1] {barticle}[author] \bauthor\bsnmAnandkumar, \bfnmAnimashree\binitsA., \bauthor\bsnmGe, \bfnmRong\binitsR., \bauthor\bsnmHsu, \bfnmDaniel\binitsD., \bauthor\bsnmKakade, \bfnmSham M\binitsS. M. and \bauthor\bsnmTelgarsky, \bfnmMatus\binitsM. (\byear2014). \btitleTensor decompositions for learning latent variable models. \bjournalJournal of machine learning research \bvolume15 \bpages2773–2832. \endbibitem
  • [2] {barticle}[author] \bauthor\bsnmBaik, \bfnmJinho\binitsJ., \bauthor\bsnmArous, \bfnmGérard Ben\binitsG. B. and \bauthor\bsnmPéché, \bfnmSandrine\binitsS. (\byear2005). \btitlePhase transition of the largest eigenvalue for nonnull complex sample covariance matrices. \bjournalThe Annals of Probability \bvolume33 \bpages1643–1697. \endbibitem
  • [3] {barticle}[author] \bauthor\bsnmBen Arous, \bfnmGérard\binitsG., \bauthor\bsnmHuang, \bfnmDaniel Zhengyu\binitsD. Z. and \bauthor\bsnmHuang, \bfnmJiaoyang\binitsJ. (\byear2021). \btitleLong Random Matrices and Tensor Unfolding. \bjournalarXiv preprint arXiv:2110.10210. \endbibitem
  • [4] {barticle}[author] \bauthor\bsnmBen Arous, \bfnmGerard\binitsG., \bauthor\bsnmMei, \bfnmSong\binitsS., \bauthor\bsnmMontanari, \bfnmAndrea\binitsA. and \bauthor\bsnmNica, \bfnmMihai\binitsM. (\byear2019). \btitleThe landscape of the spiked tensor model. \bjournalCommunications on Pure and Applied Mathematics \bvolume72 \bpages2282–2330. \endbibitem
  • [5] {barticle}[author] \bauthor\bsnmBenaych-Georges, \bfnmFlorent\binitsF. and \bauthor\bsnmNadakuditi, \bfnmRaj Rao\binitsR. R. (\byear2011). \btitleThe eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. \bjournalAdvances in Mathematics \bvolume227 \bpages494–521. \endbibitem
  • [6] {barticle}[author] \bauthor\bsnmBiroli, \bfnmGiulio\binitsG., \bauthor\bsnmCammarota, \bfnmChiara\binitsC. and \bauthor\bsnmRicci-Tersenghi, \bfnmFederico\binitsF. (\byear2020). \btitleHow to iron out rough landscapes and get optimal performances: averaged gradient descent and its application to tensor PCA. \bjournalJournal of Physics A: Mathematical and Theoretical \bvolume53 \bpages174003. \endbibitem
  • [7] {barticle}[author] \bauthor\bsnmCapitaine, \bfnmMireille\binitsM., \bauthor\bsnmDonati-Martin, \bfnmCatherine\binitsC. and \bauthor\bsnmFéral, \bfnmDelphine\binitsD. (\byear2009). \btitleThe largest eigenvalues of finite rank deformation of large Wigner matrices: convergence and nonuniversality of the fluctuations. \bjournalThe Annals of Probability \bvolume37 \bpages1–47. \endbibitem
  • [8] {barticle}[author] \bauthor\bsnmGoulart, \bfnmJosé Henrique de Morais\binitsJ. H. d. M., \bauthor\bsnmCouillet, \bfnmRomain\binitsR. and \bauthor\bsnmComon, \bfnmPierre\binitsP. (\byear2021). \btitleA Random Matrix Perspective on Random Tensors. \bjournalarXiv preprint arXiv:2108.00774. \endbibitem
  • [9] {bphdthesis}[author] \bauthor\bsnmHandschy, \bfnmMadeline Curtis\binitsM. C. (\byear2019). \btitlePhase Transition in Random Tensors with Multiple Spikes, \btypePhD thesis, \bpublisherUniversity of Minnesota. \endbibitem
  • [10] {barticle}[author] \bauthor\bsnmHitchcock, \bfnmFrank L\binitsF. L. (\byear1927). \btitleThe expression of a tensor or a polyadic as a sum of products. \bjournalJournal of Mathematics and Physics \bvolume6 \bpages164–189. \endbibitem
  • [11] {barticle}[author] \bauthor\bsnmHuang, \bfnmJiaoyang\binitsJ., \bauthor\bsnmHuang, \bfnmDaniel Z\binitsD. Z., \bauthor\bsnmYang, \bfnmQing\binitsQ. and \bauthor\bsnmCheng, \bfnmGuang\binitsG. (\byear2020). \btitlePower iteration for tensor pca. \bjournalarXiv preprint arXiv:2012.13669. \endbibitem
  • [12] {barticle}[author] \bauthor\bsnmJagannath, \bfnmAukosh\binitsA., \bauthor\bsnmLopatto, \bfnmPatrick\binitsP. and \bauthor\bsnmMiolane, \bfnmLeo\binitsL. (\byear2020). \btitleStatistical thresholds for tensor PCA. \bjournalThe Annals of Applied Probability \bvolume30 \bpages1910–1933. \endbibitem
  • [13] {barticle}[author] \bauthor\bsnmLandsberg, \bfnmJoseph M\binitsJ. M. (\byear2012). \btitleTensors: geometry and applications. \bjournalRepresentation theory \bvolume381 \bpages3. \endbibitem
  • [14] {binproceedings}[author] \bauthor\bsnmLesieur, \bfnmThibault\binitsT., \bauthor\bsnmMiolane, \bfnmLéo\binitsL., \bauthor\bsnmLelarge, \bfnmMarc\binitsM., \bauthor\bsnmKrzakala, \bfnmFlorent\binitsF. and \bauthor\bsnmZdeborová, \bfnmLenka\binitsL. (\byear2017). \btitleStatistical and computational phase transitions in spiked tensor estimation. In \bbooktitleProc. IEEE International Symposium on Information Theory (ISIT) \bpages511–515. \endbibitem
  • [15] {binproceedings}[author] \bauthor\bsnmLim, \bfnmLek-Heng\binitsL.-H. (\byear2005). \btitleSingular values and eigenvalues of tensors: a variational approach. In \bbooktitleProc. IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing \bpages129–132. \endbibitem
  • [16] {barticle}[author] \bauthor\bsnmMarčenko, \bfnmVladimir A\binitsV. A. and \bauthor\bsnmPastur, \bfnmLeonid Andreevich\binitsL. A. (\byear1967). \btitleDistribution of eigenvalues for some sets of random matrices. \bjournalMathematics of the USSR-Sbornik \bvolume1 \bpages457. \endbibitem
  • [17] {barticle}[author] \bauthor\bsnmMontanari, \bfnmAndrea\binitsA. and \bauthor\bsnmRichard, \bfnmEmile\binitsE. (\byear2014). \btitleA statistical model for tensor PCA. \bjournalarXiv preprint arXiv:1411.1076. \endbibitem
  • [18] {bbook}[author] \bauthor\bsnmNocedal, \bfnmJorge\binitsJ. and \bauthor\bsnmWright, \bfnmStephen\binitsS. (\byear2006). \btitleNumerical optimization. \bpublisherSpringer Science & Business Media. \endbibitem
  • [19] {barticle}[author] \bauthor\bsnmPéché, \bfnmSandrine\binitsS. (\byear2006). \btitleThe largest eigenvalue of small rank perturbations of Hermitian random matrices. \bjournalProbability Theory and Related Fields \bvolume134 \bpages127–173. \endbibitem
  • [20] {binproceedings}[author] \bauthor\bsnmPerry, \bfnmAmelia\binitsA., \bauthor\bsnmWein, \bfnmAlexander S\binitsA. S. and \bauthor\bsnmBandeira, \bfnmAfonso S\binitsA. S. (\byear2020). \btitleStatistical limits of spiked tensor models. In \bbooktitleAnnales de l’Institut Henri Poincaré, Probabilités et Statistiques \bvolume56 \bpages230–264. \bpublisherInstitut Henri Poincaré. \endbibitem
  • [21] {barticle}[author] \bauthor\bsnmRabanser, \bfnmStephan\binitsS., \bauthor\bsnmShchur, \bfnmOleksandr\binitsO. and \bauthor\bsnmGünnemann, \bfnmStephan\binitsS. (\byear2017). \btitleIntroduction to tensor decompositions and their applications in machine learning. \bjournalarXiv preprint arXiv:1711.10781. \endbibitem
  • [22] {barticle}[author] \bauthor\bsnmStein, \bfnmCharles M\binitsC. M. (\byear1981). \btitleEstimation of the mean of a multivariate normal distribution. \bjournalThe annals of Statistics \bpages1135–1151. \endbibitem
  • [23] {barticle}[author] \bauthor\bsnmSun, \bfnmWill Wei\binitsW. W., \bauthor\bsnmHao, \bfnmBotao\binitsB. and \bauthor\bsnmLi, \bfnmLexin\binitsL. (\byear2014). \btitleTensors in Modern Statistical Learning. \bjournalWiley StatsRef: Statistics Reference Online \bpages1–25. \endbibitem
  • [24] {bbook}[author] \bauthor\bsnmTao, \bfnmTerence\binitsT. (\byear2012). \btitleTopics in random matrix theory \bvolume132. \bpublisherAmerican Mathematical Soc. \endbibitem
  • [25] {barticle}[author] \bauthor\bsnmZare, \bfnmAli\binitsA., \bauthor\bsnmOzdemir, \bfnmAlp\binitsA., \bauthor\bsnmIwen, \bfnmMark A\binitsM. A. and \bauthor\bsnmAviyente, \bfnmSelin\binitsS. (\byear2018). \btitleExtension of PCA to higher order data structures: An introduction to tensors, tensor decompositions, and tensor PCA. \bjournalProceedings of the IEEE \bvolume106 \bpages1341–1358. \endbibitem