When random tensors meet random matrices

Mohamed El Amine Seddiklabel=e1]mohamed.seddik@huawei.com [ Maxime Guillaudlabel=e2]maxime.guillaud@huawei.com [ Romain Couilletlabel=e3]romain.couillet@univ-grenoble-alpes.fr [ Mathematical and Algorithmic Sciences Laboratory, Huawei Technologies France,
Université Grenoble Alpes, CNRS, GIPSA-lab, Grenoble INP,

Abstract

Relying on random matrix theory (RMT), this paper studies asymmetric order- $d$ spiked tensor models with Gaussian noise. Using the variational definition of the singular vectors and values of (Lim, 2005) [15], we show that the analysis of the considered model boils down to the analysis of an equivalent spiked symmetric block-wise random matrix, that is constructed from contractions of the studied tensor with the singular vectors associated to its best rank-1 approximation. Our approach allows the exact characterization of the almost sure asymptotic singular value and alignments of the corresponding singular vectors with the true spike components, when $\frac{n_{i}}{\sum_{j=1}^{d}n_{j}}\to c_{i}\in(0,1)$ with $n_{i}$ ’s the tensor dimensions. In contrast to other works that rely mostly on tools from statistical physics to study random tensors, our results rely solely on classical RMT tools such as Stein’s lemma. Finally, classical RMT results concerning spiked random matrices are recovered as a particular case.

60B20,

15B52,

random matrix theory,

random tensor theory,

spiked models,

keywords:

[class=MSC]

keywords:

\startlocaldefs\endlocaldefs

, and

Notations:

$[n]$ denotes the set $\{1,\ldots,n\}$ . The set of rectangular matrices of size $m\times n$ is denoted ${\mathbb{M}}_{m,n}$ . The set of square matrices of size $n$ is denoted ${\mathbb{M}}_{n}$ . The set of $d$ -order tensors of size $n_{1}\times\cdots\times n_{d}$ is denoted ${\mathbb{T}}_{n_{1},\ldots,n_{d}}$ . The set of hyper-cubic tensors of size $n$ and order $d$ is denoted ${\mathbb{T}}_{n}^{d}$ . The notation ${\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{d}}(\mathcal{N}(0,1))$ means that ${\bm{\mathsfit{X}}}$ is a random tensor with i.i.d. Gaussian $\mathcal{N}(0,1)$ entries. Scalars are denoted by lowercase letters as $a,b,c$ . Vectors are denoted by bold lowercase letters as ${\bm{a}},{\bm{b}},{\bm{c}}$ . ${\bm{e}}_{i}^{d}$ denotes the canonical vector in ${\mathbb{R}}^{d}$ with $[{\bm{e}}_{i}^{d}]_{j}=\delta_{ij}$ . Matrices are denoted by bold uppercase letters as ${\bm{A}},{\bm{B}},{\bm{C}}$ . Tensors are denoted as ${\bm{\mathsfit{A}}},{\bm{\mathsfit{B}}},{\bm{\mathsfit{C}}}$ . $T_{i_{1},\ldots,i_{d}}$ denotes the entry $(i_{1},\ldots,i_{d})$ of the tensor ${\bm{\mathsfit{T}}}$ . $\langle{\bm{u}},{\bm{v}}\rangle=\sum_{i}u_{i}v_{i}$ denotes the scalar product between ${\bm{u}}$ and ${\bm{v}}$ , the $\ell_{2}$ -norm of a vector ${\bm{u}}$ is denoted as $\|{\bm{u}}\|^{2}=\langle{\bm{u}},{\bm{u}}\rangle$ . $\|\cdot\|$ denotes the spectral norm for tensors. ${\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})\equiv\sum_{i_{1},\ldots,i_{d}}u_{i_{1}}^{(1)}\ldots u_{i_{d}}^{(d)}T_{i_{1},\ldots,i_{d}}$ denotes the contraction of tensor ${\bm{\mathsfit{T}}}$ on the vectors given as arguments. Given some vectors ${\bm{u}}^{(1)},\ldots,{\bm{u}}^{(k)}$ with $k<d$ , the contraction ${\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(k)},:,\ldots,:)$ denotes the resulting $(d-k)$ -th order tensor. ${\mathbb{S}}^{N-1}$ denotes the $N$ -dimensional unit sphere.

1 Introduction

The extraction of latent and low-dimensional structures from raw data is a key step in various machine learning and signal processing applications. Our present interest is in those modern techniques which rely on the extraction of such structures from a low-rank random tensor model [1] and which extends the ideas from matrix-type data to tensor structured data. We refer the reader to [23, 21, 25] and the references therein which introduce an extensive set of applications of tensor decomposition methods to machine learning, including dimensionality reduction, supervised and unsupervised learning, learning subspaces for feature extraction, low-rank tensor recovery etc. Although random matrix models have been extensively studied and well understood in the literature, the understanding of random tensor models is still in its infancy and the ideas from random matrix analysis do not easily extend to higher-order tensors. Indeed, the resolvent notion (see Definition 1) which is at the heart of random matrices does not generalize to tensors. In our present investigation, we consider the spiked tensor model, which consists in an observed $d$ -order tensor ${\bm{\mathsfit{T}}}\in{\mathbb{T}}_{n_{1},\ldots,n_{d}}$ of the form

\displaystyle{\bm{\mathsfit{T}}}=\beta\,{\bm{x}}^{(1)}\otimes\cdots\otimes{\bm{x}}^{(d)}+\frac{1}{\sqrt{N}}{\bm{\mathsfit{X}}},

(1)

where $({\bm{x}}^{(1)},\ldots,{\bm{x}}^{(d)})\in{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}$ , ${\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{d}}(\mathcal{N}(0,1))$ and $N=\sum_{i=1}^{d}n_{i}$ . Note that the tensor noise is normalized by $\sqrt{\sum_{i=1}^{d}n_{i}}$ with $n_{i}$ the tensor dimensions, since the spectral norm of ${\bm{\mathsfit{X}}}$ is of order $\sqrt{\sum_{i=1}^{d}n_{i}}$ from Lemma 4. One aims at retrieving the rank-1 component (or spike) $\beta\,{\bm{x}}^{(1)}\otimes\cdots\otimes{\bm{x}}^{(d)}$ from the noisy tensor ${\bm{\mathsfit{T}}}$ , where $\beta$ can be seen as controlling the signal-to-noise ratio (SNR). The identification of the dominant rank-1 component is an important special case of the low-rank tensor approximation problem, a noisy version of the classical canonical polyadic decomposition (CPD) [10, 13]. Extensive efforts have been made to study the performance of low rank tensor approximation methods in the large dimensional regime – when the tensor dimensions $n_{i}\to\infty$ , however considering symmetric tensor models where $n_{1}=\ldots=n_{d}$ , ${\bm{x}}^{(1)}=\ldots={\bm{x}}^{(d)}$ , and assuming that the noise is symmetric [17, 20, 14, 9, 12, 8].

In particular, in the matrix case (i.e., $d=2$ ), the above spiked tensor model becomes a so-called spiked matrix model. It is well-known that in the large dimensional regime, there exists an order one critical value $\beta_{c}$ of the SNR below which it is information-theoretically impossible to detect/recover the spike, while above $\beta_{c}$ , it is possible to detect the spike and recover the corresponding components in (at least) polynomial time using singular value decomposition (SVD). This phenomenon is sometimes known as the BBP (Baik, Ben Arous, and Péché) phase transition [2, 5, 7, 19].

In the (symmetric) spiked tensor model for $d\geq 3$ , there also exists an order one critical value¹¹1Depending on the tensor order $d$ . We will sometimes omit the dependence on $d$ if there is no ambiguity. $\beta_{c}(d)$ (in the high-dimensional asymptotic) below which it is information-theoretically impossible to detect/recover the spike, while above $\beta_{c}(d)$ recovery is theoretically possible with the maximum likelihood (ML) estimator. Computing the maximum likelihood in the matrix case corresponds to the computation of the largest singular vectors of the considered matrix which has a polynomial time complexity, while for $d\geq 3$ , computing ML is NP-hard [17, 6]. As such, a more practical phase transition for tensors is to characterize the algorithmic critical value $\beta_{a}(d,n)$ (which might depend on the tensor dimension $n$ ) above which the recovery of the spike is possible in polynomial time. Richard and Montanari [17] first introduced the symmetric spiked tensor model (of the form ${\bm{\mathsfit{Y}}}=\mu{\bm{x}}^{\otimes d}+{\bm{\mathsfit{W}}}\in{\mathbb{T}}_{n}^{d}$ with symmetric ${\bm{\mathsfit{W}}}$ ) and also considered the related algorithmic aspects. In particular, they used heuristics to highlight that spike recovery is possible, with Approximate Message Passing (AMP) or the tensor power iteration method, in polynomial time²²2Using tensor power iteration or AMP with random initialization. provided $\mu\gtrsim n^{\frac{d-1}{2}}$ . This phase transition was later proven rigorously for AMP by [14, 12] and recently for tensor power iteration by [11].

Richard and Montanari [17] further introduced a method for tensor decomposition based on tensor unfolding, which consists in unfolding ${\bm{\mathsfit{Y}}}$ to an $n^{q}\times n^{d-q}$ matrix $\operatorname{Mat}({\bm{\mathsfit{Y}}})=\mu{\bm{x}}{\bm{y}}^{\top}+{\bm{Z}}$ for $q\in[d-1]$ , to which a SVD is then performed. Setting $q=1$ , they predicted that their proposed method recovers successively the spike if $\mu\gtrsim n^{\frac{d-2}{4}}$ . In a very recent paper by Ben Arous et al. [3], a study of spiked long rectangular random matrices³³3Number of rows $m$ are allowed to grow polynomially in the number of columns $n$ , i.e., $\frac{m}{n}=n^{\alpha}$ . has been proposed under fairly general (bounded fourth-order moment) noise distribution assumptions. They particularly proved the existence of a critical SNR for which the extreme singular value and singular vectors exhibit a BBP-type phase transition. They applied their result for the asymmetric rank-one spiked model in Eq. (1) (with equal dimensions) using the tensor unfolding method, and found the exact threshold obtained by [17], i.e., $\beta\gtrsim n^{\frac{d-2}{4}}$ for tensor unfolding to succeed in signal recovery.

For the asymmetric spiked tensor model in Eq. (1), few results are available in the literature (to the best of our knowledge only [3] considered this setting by applying the tensor unfolding method proposed by [17]). This is precisely the model we consider in the present work and our more general result is derived as follows. Given the asymmetric model from Eq. (1), the maximum likelihood (ML) estimator of the best rank-one approximation of ${\bm{\mathsfit{T}}}$ is given by

\displaystyle(\lambda_{*},{\bm{u}}_{*}^{(i)})=\operatorname*{arg\,min}_{\lambda\,\in\,{\mathbb{R}}^{+},\,({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})\,\in\,{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}}\|{\bm{\mathsfit{T}}}-\lambda\,{\bm{u}}^{(1)}\otimes\cdots\otimes{\bm{u}}^{(d)}\|_{\text{F}}^{2}.

(2)

In the above equation, $\lambda_{*}$ and the ${\bm{u}}_{*}^{(i)}$ can be respectively interpreted as the generalization to the tensor case of the concepts of dominant singular value and associated singular vectors [15]. Following variational arguments therein, Eq. (2) can be reformulated using contractions of ${\bm{\mathsfit{T}}}$ as

\displaystyle\max_{\prod_{i=1}^{d}\|{\bm{u}}^{(i)}\|=1}|{\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})|,

(3)

the Lagrangian of which writes as $\mathcal{L}\equiv{\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})-\lambda\left(\prod_{i=1}^{d}\|{\bm{u}}^{(i)}\|-1\right)$ with $\lambda>0$ . Hence, the stationary points $(\lambda,{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})$ , with the ${\bm{u}}^{(i)}$ ’s being unitary vectors, must satisfy the Karush-Kuhn-Tucker conditions, for $i\in[d]$

\displaystyle\begin{cases}{\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(i-1)},:,{\bm{u}}^{(i+1)},\ldots,{\bm{u}}^{(d)})=\lambda{\bm{u}}^{(i)},\\ \lambda={\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}).\end{cases}

(4)

An interesting question concerns the computation of the expected number of stationary points (local optima or saddle points) satisfying the identities in Eq. (4). [4] studied the landscape of a symmetric spiked tensor model and found that for $\beta<\beta_{c}$ the values of the objective function of all local maxima (including the global one) tend to concentrate on a small interval, while for $\beta>\beta_{c}$ the value achieved by the global maximum exits that interval and increases with $\beta$ . In contrast, very recently Goulart et al. [8] have studied an order $3$ symmetric spiked random tensor ${\bm{\mathsfit{Y}}}$ using a RMT approach, where it was stated that there exists a threshold $0<\beta_{s}<\beta_{c}$ such that for $\beta\in[\beta_{s},\beta_{c}]$ there exists a local optimum of the ML problem that correlates with the spike and such local optimum coincides with the global one for $\beta>\beta_{c}$ . We conjecture that such observations extend to asymmetric spiked tensors, namely that there exists an order one critical value $\beta_{c}$ above which the ML problem in Eq. (3) admits a global maximum. As for [8], our present findings do not allow to express such $\beta_{c}$ and its exact characterization is left for future investigation. However, for asymmetric spiked random tensors, we also exhibit a threshold $\beta_{s}$ such that for $\beta>\beta_{s}$ there exists a local optimum of the ML objective that correlates with the true spike. Figure 1 provides an illustration of the different thresholds of $\beta$ and see the last part of Subsection 3.2 for a more detailed discussion.

Refer to caption — Figure 1: Illustration of the different thresholds for the SNR $\beta$ . Our approach exhibits $\beta_{s}$ such that for $\beta>\beta_{s}$ there exists a local optimum that correlates with the spike, the threshold $\beta_{c}$ is unknown for asymmetric tensors and corresponds to the ML phase transition (above $\beta_{c}$ the global maximum correlates with the spike), while $\beta_{a}$ corresponds to the algorithmic phase transition (recovery of the spike in polynomial time).

Main Contributions

Starting form the conditions in Eq. (4), we provide an exact expression of the asymptotic singular value and alignments $\langle{\bm{x}}^{(i)},{\bm{u}}_{*}^{(i)}\rangle$ , when the tensor dimensions $n_{i}\to\infty$ with $\frac{n_{i}}{\sum_{j=1}^{d}n_{j}}\to c_{i}\in(0,1)$ , where the tuple $(\lambda_{*},{\bm{u}}_{*}^{(1)},\ldots,{\bm{u}}_{*}^{(d)})$ is associated to a local optimum of the ML problem verifying some technical conditions (detailed in Assumption 4). We conjecture that when the SNR $\beta$ is large enough, there is a unique local optimum verifying Assumption 4 and for which our results characterize the corresponding alignments. We further conjecture that $(\lambda_{*},{\bm{u}}_{*}^{(1)},\ldots,{\bm{u}}_{*}^{(d)})$ coincides with the global maximum above some⁴⁴4Such a critical value has been characterized by [12] for symmetric tensors. See [8] for a detailed discussion about this aspect in the case of symmetric tensors. $\beta_{c}$ – that needs to be characterized.

Technically, we first show that the considered random tensor ${\bm{\mathsfit{T}}}$ can be mapped to an equivalent symmetric random matrix ${\bm{T}}\in{\mathbb{M}}_{N}$ , constructed through contractions of ${\bm{\mathsfit{T}}}$ with $d-2$ directions among ${\bm{u}}_{*}^{(1)},\ldots,{\bm{u}}_{*}^{(d)}$ . Then, leveraging on random matrix theory, we first characterize the limiting spectral measure of ${\bm{T}}$ and then provide estimates of the asymptotic alignments $\langle{\bm{x}}^{(i)},{\bm{u}}_{*}^{(i)}\rangle$ . We precisely show (see Theorem 8) that under Assumption 4, for $d\geq 3$ , there exists $\beta_{s}>0$ such that for $\beta>\beta_{s}$

\displaystyle\begin{cases}\lambda_{*}\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}(\beta),\\ \left|\langle{\bm{x}}^{(i)},{\bm{u}}_{*}^{(i)}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}q_{i}(\lambda^{\infty}(\beta)),\end{cases}

(5)

where $\lambda^{\infty}(\beta)$ satisfies $f(\lambda^{\infty}(\beta),\beta)=0$ with $f(z,\beta)=z+g(z)-\beta\prod_{i=1}^{d}q_{i}(z)$ and

\displaystyle\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}q_{i}(z)=\sqrt{1-\frac{g_{i}^{2}(z)}{c_{i}}},\quad g_{i}(z)=\frac{g(z)+z}{2}-\frac{\sqrt{4c_{i}+(g(z)+z)^{2}}}{2},

$g(z)$ being the solution to the fixed point equation $g(z)=\sum_{i=1}^{d}g_{i}(z)$ (for $z$ large enough); see Section B.11 for the existence of $g(z)$ . Besides, for $\beta\in[0,\beta_{s}]$ with⁵⁵5For arbitrary values of $c_{i}$ ’s, the upper bound of $\lambda^{\infty}$ can be computed numerically as the minimum non-negative real number $z$ for which Algorithm 1 converges. $c_{i}=\frac{1}{d}$ for all $i\in[d]$

\displaystyle\lambda_{*}\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}\leq 2\sqrt{\frac{d-1}{d}},\quad\left|\langle{\bm{x}}^{(i)},{\bm{u}}_{*}^{(i)}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}0,

(6)

that is ${\bm{u}}_{*}^{(i)}$ (asymptotically) ceases to correlate with ${\bm{x}}^{(i)}$ .

Remark 1.

Note that $q_{i}(z)$ can be equivalently expressed as $q_{i}(z)=\left(\frac{\alpha_{i}(z)^{d-3}}{\prod_{j\neq i}\alpha_{j}(z)}\right)^{\frac{1}{2d-4}}$ with $\alpha_{i}(z)=\frac{\beta}{z+g(z)-g_{i}(z)}$ . Note that such expression is defined for $c_{i}\in[0,1]$ with $d\geq 3$ . See details in Section B.13.

We highlight that in our formulation the threshold $\beta_{s}$ corresponds to the minimal value of the SNR $\beta$ above which the derived asymptotic formulas are algebraically defined, which may differ from the true phase transition $\beta_{c}$ of the ML problem: In the case of symmetric tensors, the results form [8] seem to indicate that $\beta_{s}$ is slightly below $\beta_{c}$ obtained by [12] where the ML problem was studied.

Figure 2 notably depicts the asymptotic alignments as in Eq. (5) when all the tensor dimensions $n_{i}$ are equal. Since our result characterizes the alignments for $d\geq 3$ , it is not possible to recover the matrix case by simply setting $d=2$ since the equations are not defined for $d=2$ (see Remark 1). However, the matrix case is recovered by considering an order $3$ tensor ( $d=3$ ) and then taking the limit $c_{3}\to 0$ (equivalent to the degenerate case $n_{3}=1$ which results in a spiked random matrix model), see Section 4 for more details. From Figure 2-(b), unlike the matrix case, i.e., $d=2$ , the predicted asymptotic alignments are not continuous for orders $d\geq 3$ , this phenomenon has already been observed in the case of symmetric random tensors [12]. In particular, the predicted theoretical threshold $\beta_{s}$ in the matrix case $d=2$ coincides with the classical so-called BBP (Baik, Ben Arous, and Péché) phase transition $\beta_{c}(2)$ [2, 5, 7, 19]. Moreover, our result for the matrix case characterizes the asymptotic alignments for the long rectangular matrices studied by [3] and we also recover the threshold $\beta\gtrsim n^{\frac{d-2}{4}}$ for the case of tensor unfolding method (See remark 9). From a methodological view point, our results are derived based solely on Stein’s lemma without the use of complex contour integrals as classically performed in RMT. In essence, we follow the approach in [8] and further introduce a new object (the mapping $\Phi_{d}$ defined subsequently in Eq. (31)) that simplifies drastically the use of RMT tools.

The remainder of the paper is organized as follows. Section 2 recalls some fundamental random matrix theory tools. In Section 3, we study asymmetric spiked tensors of order $3$ where we provide the main steps of our approach. Section 4 characterizes the behavior of spiked random matrices given our result on spiked $3$ -order tensor models from Section 3. The generalization of our results to arbitrary $d$ -order tensors is then presented in Section 5. Section 6 discusses the application of our findings to arbitrary rank- $k$ tensors with mutually orthogonal components. Further discussions are presented in Section 7. In Appendix A, we provide some simulations to support our findings and discuss some algorithmic aspects. Finally, Appendix B provides the proofs of the main developed results.

2 Random matrix theory tools

Before digging into our main findings, we briefly recall some random matrix theory results that are at the heart of our analysis. Specifically, we recall the resolvent formalism which allows to assess the spectral behavior of large symmetric random matrices. In particular, given a symmetric matrix ${\bm{S}}\in{\mathbb{M}}_{n}$ and denoting $\lambda_{1}({\bm{S}})\leq\ldots\leq\lambda_{n}({\bm{S}})$ its $n$ eigenvalues with corresponding eigenvectors ${\bm{u}}_{i}({\bm{S}})$ for $i\in[n]$ , its spectral decomposition writes as

\displaystyle{\bm{S}}=\sum_{i=1}^{n}\lambda_{i}({\bm{S}}){\bm{u}}_{i}({\bm{S}}){\bm{u}}_{i}({\bm{S}})^{\top}.

In the sequel we omit the dependence on ${\bm{S}}$ by simply writing $\lambda_{i}$ and ${\bm{u}}_{i}$ if there is no ambiguity.

2.1 The resolvent matrix

We start by defining the resolvent of a symmetric matrix and present its main properties.

Definition 1.

Given a symmetric matrix ${\bm{S}}\in{\mathbb{M}}_{n}$ , the resolvent of ${\bm{S}}$ is defined as

\displaystyle{\bm{R}}_{\bm{S}}(z)\equiv\left({\bm{S}}-z{\bm{I}}_{n}\right)^{-1},\quad z\in{\mathbb{C}}\setminus\operatorname{\mathcal{S}}({\bm{S}}),

where $\operatorname{\mathcal{S}}({\bm{S}})=\{\lambda_{1},\ldots,\lambda_{n}\}$ is the spectrum of ${\bm{S}}$ .

The resolvent is a fundamental object since it retrieves the spectral characteristics (spectrum and eigenvectors) of ${\bm{S}}$ . It particularly verifies the following property which we will use extensively to derive our main results,

\displaystyle{\bm{R}}_{\bm{S}}(z)=-\frac{1}{z}{\bm{I}}_{n}+\frac{1}{z}{\bm{S}}{\bm{R}}_{\bm{S}}(z)=-\frac{1}{z}{\bm{I}}_{n}+\frac{1}{z}{\bm{R}}_{\bm{S}}(z){\bm{S}}.

(7)

The above identity, coupled with Stein’s Lemma (Lemma 1 in Section 2.3 below), is a fundamental tool used to derive fixed point equations that allow the evaluation of functionals of interests involving ${\bm{R}}_{\bm{S}}(z)$ . Another interesting property of the resolvent concerns its spectral norm, which we denote by $\|{\bm{R}}_{\bm{S}}(z)\|$ . Indeed, if the spectrum of ${\bm{S}}$ has a bounded support, then the spectral norm of ${\bm{R}}_{\bm{S}}(z)$ is bounded. This is a consequence of the inequality

\displaystyle\|{\bm{R}}_{\bm{S}}(z)\|\leq\frac{1}{\operatorname{dist}(z,\operatorname{\mathcal{S}}({\bm{S}}))}

(8)

where $\operatorname{dist}(z,\cdot)$ denotes the distance of $z$ to a set. The resolvent encodes rich information about the behavior of the eigenvalues of ${\bm{S}}$ through the so-called Stieltjes transform which we describe subsequently.

2.2 The Stieltjes transform

Random matrix theory, originally, aims at describing the limiting spectral measure of random matrices when their dimensions grow large. Typically, under certain technical conditions on ${\bm{S}}$ , the empirical spectral measure of ${\bm{S}}$ defined as

\displaystyle\nu_{\bm{S}}\equiv\frac{1}{n}\sum_{i=1}^{n}\delta_{\lambda_{i}({\bm{S}})},

(9)

where $\delta_{x}$ is a Dirac mass on $x$ , converges to a deterministic probability measure $\nu$ . To characterize such asymptotic measure, the Stieltjes transform (defined below) approach is a widely used tool.

Definition 2 (Stieltjes transform).

Given a probability measure $\nu$ , the Stieltjes transform of $\nu$ is defined by

\displaystyle g_{\nu}(z)\equiv\int\frac{d\nu(\lambda)}{\lambda-z},\quad z\in{\mathbb{C}}\setminus\operatorname{\mathcal{S}}(\nu).

Particularly, the Stieltjes transform of $\nu_{\bm{S}}$ is closely related to its associated resolvent ${\bm{R}}_{\bm{S}}(z)$ through the algebraic identity

\displaystyle g_{\nu_{\bm{S}}}(z)=\frac{1}{n}\sum_{i=1}^{n}\int\frac{\delta_{\lambda_{i}({\bm{S}})}(d\lambda)}{\lambda-z}=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{\lambda_{i}({\bm{S}})-z}=\frac{1}{n}\operatorname{tr}{\bm{R}}_{\bm{S}}(z)

The Stieltjes transform $g_{\nu}$ has several interesting analytical properties, among which, we have

1.

$g_{\nu}$ is complex analytic on its definition domain ${\mathbb{C}}\setminus\operatorname{\mathcal{S}}(\nu)$ and $\Im[g_{\nu}(z)]>0$ if $\Im[z]>0$ ;
2.

$g_{\nu}(z)$ is bounded for $z\in{\mathbb{C}}\setminus\operatorname{\mathcal{S}}(\nu)$ if $\operatorname{\mathcal{S}}(\nu)$ is bounded, since $|g_{\nu}(z)|\leq\operatorname{dist}(z,\operatorname{\mathcal{S}}(\nu))^{-1}$ ;
3.

Since $g_{\nu}^{\prime}(z)=\int(\lambda-z)^{-2}d\nu(\lambda)>0$ , $g_{\nu}$ is monotonously increasing for $z\in\mathbb{R}$ .

The Stieltjes transform $g_{\nu}$ admits an inverse formula which provides access to the evaluation of the underlying probability measure $\nu$ , as per the following theorem.

Theorem 1 (Inverse formula of the Stieltjes transform).

Let $a,b$ be some continuity points of the probability measure $\nu$ , then the segment $[a,b]$ is measurable with $\nu$ , precisely

\displaystyle\nu([a,b])=\frac{1}{\pi}\lim_{\epsilon\to 0}\int_{a}^{b}\Im[g_{\nu}(x+i\epsilon)]dx.

Moreover, if $\nu$ admits a density function $f$ at some point $x$ , i.e., $\nu(x)$ is differentiable in a neighborhood of $x$ with $\lim_{\epsilon\to 0}\epsilon^{-1}\nu([x-\epsilon/2,x+\epsilon/2])=f(x)$ , then we have the inverse formula

\displaystyle f(x)=\frac{1}{\pi}\lim_{\epsilon\to 0}\Im[g_{\nu}(x+i\epsilon)].

When it comes to large random matrices, the Stieltjes transform admits a continuity property, in the sense that if a sequence of random probability measures converges to a deterministic measure then the corresponding Stieltjes transforms converge almost surely to a deterministic one, and vice-versa. The following theorem from [24] states precisely this continuity property.

Theorem 2 (Stieltjes transform continuity).

A sequence of random probability measures $\nu_{n}$ , supported on ${\mathbb{R}}$ with corresponding Stieltjes transforms $g_{\nu_{n}}(z)$ , converges almost surely weakly to a deterministic measure $\nu$ , with corresponding Stieltjes transform $g_{\nu}$ if and only if $g_{\nu_{n}}(z)\to g_{\nu}(z)$ almost surely, for all $z\in{\mathbb{R}}+i{\mathbb{R}}_{+}$ .

In particular, Theorems 1 and 2 combined together allow the description of the spectrum of large symmetric random matrices.

2.3 Gaussian calculations

The following lemma by Stein (also called Stein’s identity or Gaussian integration by parts) allows to replace the expectation of the product of a Gaussian variable with a differentiable function $f$ by the variance of that variable times the expectation of $f^{\prime}$ .

Lemma 1 (Stein’s Lemma [22]).

Let $X\sim\mathcal{N}(0,\sigma^{2})$ and $f$ a continuously differentiable function having at most polynomial growth, then

\displaystyle\operatorname{\mathbb{E}}\left[Xf(X)\right]=\sigma^{2}\operatorname{\mathbb{E}}\left[f^{\prime}(X)\right],

when the above expectations exist.

We will further need the Poincaré inequality which allows to control the variance of a functional of $i.i.d.$ Gaussian random variables.

Lemma 2 (Poincaré inequality).

Let $F:{\mathbb{R}}^{n}\to{\mathbb{R}}$ a continuously differentiable function having at most polynomial growth and $X_{1},\ldots,X_{n}$ a collection of i.i.d. standard Gaussian variables. Then,

\displaystyle\mathrm{Var}F(X_{1},\ldots,X_{n})\leq\sum_{i=1}^{n}\operatorname{\mathbb{E}}\left|\frac{\partial F}{\partial X_{i}}\right|^{2}.

3 The asymmetric rank-one spiked tensor model of order $3$

To best illustrate our approach, we start by considering the asymmetric rank-one tensor model of order $3$ of the form

\displaystyle{\bm{\mathsfit{T}}}=\beta\,{\bm{x}}\otimes{\bm{y}}\otimes{\bm{z}}+\frac{1}{\sqrt{N}}{\bm{\mathsfit{X}}},

(10)

where ${\bm{\mathsfit{T}}}$ and ${\bm{\mathsfit{X}}}$ both have dimensions $m\times n\times p$ and $N=m+n+p$ . We assume that ${\bm{x}},{\bm{y}}$ and ${\bm{z}}$ are on the unit spheres ${\mathbb{S}}^{m-1}$ , ${\mathbb{S}}^{n-1}$ and ${\mathbb{S}}^{p-1}$ respectively, ${\bm{\mathsfit{X}}}$ is a Gaussian noise tensor with i.i.d. entries $X_{ijk}\sim\mathcal{N}(0,1)$ independent from ${\bm{x}},{\bm{y}},{\bm{z}}$ .

3.1 Tensor singular value and vectors

According to Eq. (4), the $\ell_{2}$ -singular value and vectors, corresponding to the best rank-one approximation $\lambda_{*}{\bm{u}}_{*}\otimes{\bm{v}}_{*}\otimes{\bm{w}}_{*}$ of ${\bm{\mathsfit{T}}}$ , satisfy the identities

\displaystyle{\bm{\mathsfit{T}}}({\bm{v}}){\bm{w}}=\lambda{\bm{u}}\quad{\bm{\mathsfit{T}}}({\bm{u}}){\bm{w}}=\lambda{\bm{v}}\quad{\bm{\mathsfit{T}}}({\bm{v}})^{\top}{\bm{u}}=\lambda{\bm{w}}\quad\text{with}\quad({\bm{u}},{\bm{v}},{\bm{w}})\,\in\,{\mathbb{S}}^{m-1}\times{\mathbb{S}}^{n-1}\times{\mathbb{S}}^{p-1},

(11)

where we denoted ${\bm{\mathsfit{T}}}({\bm{u}})={\bm{\mathsfit{T}}}({\bm{u}},:,:)\in{\mathbb{M}}_{n,p},{\bm{\mathsfit{T}}}({\bm{v}})={\bm{\mathsfit{T}}}(:,{\bm{v}},:)\in{\mathbb{M}}_{m,p}$ and ${\bm{\mathsfit{T}}}({\bm{w}})={\bm{\mathsfit{T}}}(:,:,{\bm{w}})\in{\mathbb{M}}_{m,n}$ . Furthermore, the singular value $\lambda$ can be characterized through the contraction of ${\bm{\mathsfit{T}}}$ along all its singular vectors ${\bm{u}},{\bm{v}},{\bm{w}}$ , i.e.

\displaystyle\lambda={\bm{\mathsfit{T}}}({\bm{u}},{\bm{v}},{\bm{w}})=\sum_{ijk}u_{i}v_{j}w_{k}T_{ijk}.

(12)

3.2 Associated random matrix model

We follow the approach developed in [8] which consists in studying random matrices that are obtained through contractions of a random tensor model, and extend it to the asymmetric spiked tensor model of Eq. (10). Indeed, it has been shown in [8] that the study of a rank-one symmetric spiked tensor model ${\bm{\mathsfit{Y}}}\in{\mathbb{T}}_{q}^{3}$ boils down to the analysis of the symmetric random matrix⁶⁶6The contraction of the tensor ${\bm{\mathsfit{Y}}}$ with its eigenvector ${\bm{a}}$ . ${\bm{\mathsfit{Y}}}({\bm{a}})\in{\mathbb{M}}_{q}$ where ${\bm{a}}\in{\mathbb{S}}^{q-1}$ stands for the eigenvector of ${\bm{\mathsfit{Y}}}$ corresponding to the best symmetric rank-one approximation of ${\bm{\mathsfit{Y}}}$ , i.e.,

\displaystyle(\mu,{\bm{a}})=\operatorname*{arg\,min}_{\mu\in{\mathbb{R}},\,{\bm{a}}\in{\mathbb{S}}^{q-1}}\|{\bm{\mathsfit{Y}}}-\mu{\bm{a}}^{\otimes 3}\|_{\text{F}}^{2}\quad\Leftrightarrow\quad{\bm{\mathsfit{Y}}}({\bm{a}}){\bm{a}}=\mu{\bm{a}}.

(13)

In the asymmetric case of Eq. (10), the choice of a “relevant” random matrix to study is not trivial since the corresponding contractions ${\bm{\mathsfit{T}}}({\bm{u}}),{\bm{\mathsfit{T}}}({\bm{v}})$ and ${\bm{\mathsfit{T}}}({\bm{w}})$ yield asymmetric random matrices which present more technical difficulties from the random matrix theory perspective, therefore the extension of the approach in [8] to asymmetric tensors is not straightforward. We will see in the following that such a choice of the relevant matrix to study an asymmetric ${\bm{\mathsfit{T}}}$ is naturally obtained through the use of the Pastur’s Stein approach [16].

As described in [8] and since the singular vectors ${\bm{u}},{\bm{v}}$ and ${\bm{w}}$ depend statistically on ${\bm{\mathsfit{X}}}$ , the first technical challenge consists in expressing the derivatives of the singular vectors ${\bm{u}},{\bm{v}}$ and ${\bm{w}}$ w.r.t. the entries of the Gaussian noise tensor ${\bm{\mathsfit{X}}}$ . Indeed, one can show that there exists a differentiable mapping $\mathcal{F}:{\mathbb{T}}_{m,n,p}\to{\mathbb{R}}^{m+n+p+1}$ that maps ${\bm{\mathsfit{X}}}$ to $\mathcal{F}({\bm{\mathsfit{X}}})=(\lambda({\bm{\mathsfit{X}}}),{\bm{u}}({\bm{\mathsfit{X}}}),{\bm{v}}({\bm{\mathsfit{X}}}),{\bm{w}}({\bm{\mathsfit{X}}}))$ singular-value and vectors of ${\bm{\mathsfit{T}}}$ , since the components of ${\bm{u}}({\bm{\mathsfit{X}}}),{\bm{v}}({\bm{\mathsfit{X}}})$ and ${\bm{w}}({\bm{\mathsfit{X}}})$ are bounded and $\lambda({\bm{\mathsfit{X}}})$ has polynomial growth. Indeed, we have the following Lemma which is analog to [8, Lemma 8] and which justifies the application of Stein’s Lemma subsequently.

Lemma 3.

There exists an almost everywhere continuously differentiable function $\mathcal{F}:{\mathbb{T}}_{m,n,p}\to{\mathbb{R}}^{m+n+p+1}$ such that $\mathcal{F}({\bm{\mathsfit{X}}})=(\lambda({\bm{\mathsfit{X}}}),{\bm{u}}({\bm{\mathsfit{X}}}),{\bm{v}}({\bm{\mathsfit{X}}}),{\bm{w}}({\bm{\mathsfit{X}}}))$ is singular-value and vectors of ${\bm{\mathsfit{T}}}$ (for almost every ${\bm{\mathsfit{X}}}$ ).

Proof.

The proof relies on the same arguments as [8, Lemma 8]. ∎

Calculus (see details in B.1) show that deriving the identities in Eq. (11) w.r.t. an entry $X_{ijk}$ of ${\bm{\mathsfit{X}}}$ with $(i,j,k)\in[m]\times[n]\times[p]$ results in

\displaystyle\begin{bmatrix}\frac{\partial{\bm{u}}}{\partial X_{ijk}}\\ \frac{\partial{\bm{v}}}{\partial X_{ijk}}\\ \frac{\partial{\bm{w}}}{\partial X_{ijk}}\end{bmatrix}=-\frac{1}{\sqrt{N}}\left(\begin{bmatrix}{\bm{0}}_{m\times m}&{\bm{\mathsfit{T}}}({\bm{w}})&{\bm{\mathsfit{T}}}({\bm{v}})\\ {\bm{\mathsfit{T}}}({\bm{w}})^{\intercal}&{\bm{0}}_{n\times n}&{\bm{\mathsfit{T}}}({\bm{u}})\\ {\bm{\mathsfit{T}}}({\bm{v}})^{\intercal}&{\bm{\mathsfit{T}}}({\bm{u}})^{\intercal}&{\bm{0}}_{p\times p}\end{bmatrix}-\lambda{\bm{I}}_{N}\right)^{-1}\begin{bmatrix}v_{j}w_{k}({\bm{e}}_{i}^{m}-u_{i}{\bm{u}})\\ u_{i}w_{k}({\bm{e}}_{j}^{n}-v_{j}{\bm{v}})\\ u_{i}v_{j}({\bm{e}}_{k}^{p}-w_{k}{\bm{w}})\end{bmatrix}\in{\mathbb{R}}^{N},

(14)

where we recall that ${\bm{e}}_{i}^{d}$ denotes the canonical vector in ${\mathbb{R}}^{d}$ with $[{\bm{e}}_{i}^{d}]_{j}=\delta_{ij}$ . We further have the identity

\displaystyle\frac{\partial\lambda}{\partial X_{ijk}}=\frac{1}{\sqrt{N}}u_{i}v_{j}w_{k}.

(15)

Denoting by ${\bm{M}}$ the symmetric block-wise random matrix which appears in the matrix inverse in Eq. (14), the derivatives of ${\bm{u}},{\bm{v}}$ and ${\bm{w}}$ are therefore expressed in terms of the resolvent ${\bm{R}}_{\bm{M}}(z)=\left({\bm{M}}-z{\bm{I}}_{N}\right)^{-1}$ evaluated on $\lambda$ , and we will see subsequently that the assessment of the spectral properties of ${\bm{\mathsfit{T}}}$ boils down to the estimation of the normalized trace $\frac{1}{N}\operatorname{tr}{\bm{R}}_{\bm{M}}(z)$ . As such, the matrix ${\bm{M}}$ provides the associated random matrix model that encodes the spectral properties of ${\bm{\mathsfit{T}}}$ . We will henceforth focus our analysis on this random matrix, in order to assess the spectral behavior of ${\bm{\mathsfit{T}}}$ . More generally, we will be interested in studying random matrices from the $3$ -order block-wise tensor contraction ensemble $\mathcal{B}_{3}({\bm{\mathsfit{X}}})$ for ${\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{m,n,p}(\mathcal{N}(0,1))$ defined as

\displaystyle\mathcal{B}_{3}({\bm{\mathsfit{X}}})\equiv\left\{\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}})\,\,\big{|}\,\,({\bm{a}},{\bm{b}},{\bm{c}})\in{\mathbb{S}}^{m-1}\times{\mathbb{S}}^{n-1}\times{\mathbb{S}}^{p-1}\right\},

(16)

where $\Phi_{3}$ is the mapping

\begin{split}\Phi_{3}:{\mathbb{T}}_{m,n,p}\times{\mathbb{S}}^{m-1}\times{\mathbb{S}}^{n-1}\times{\mathbb{S}}^{p-1}&\longrightarrow{\mathbb{M}}_{m+n+p}\\ ({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}})&\longmapsto\begin{bmatrix}{\bm{0}}_{m\times m}&{\bm{\mathsfit{X}}}({\bm{c}})&{\bm{\mathsfit{X}}}({\bm{b}})\\ {\bm{\mathsfit{X}}}({\bm{c}})^{\intercal}&{\bm{0}}_{n\times n}&{\bm{\mathsfit{X}}}({\bm{a}})\\ {\bm{\mathsfit{X}}}({\bm{b}})^{\intercal}&{\bm{\mathsfit{X}}}({\bm{a}})^{\intercal}&{\bm{0}}_{p\times p}\end{bmatrix}.\end{split}

(17)

We associate tensor ${\bm{\mathsfit{T}}}$ to the random matrix

\displaystyle\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})=\beta\,{\bm{V}}{\bm{B}}{\bm{V}}^{\top}+\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{u}},{\bm{v}},{\bm{w}})\in{\mathbb{M}}_{N},

(18)

where

\displaystyle{\bm{B}}\equiv\begin{bmatrix}0&\langle{\bm{z}},{\bm{w}}\rangle&\langle{\bm{y}},{\bm{v}}\rangle\\ \langle{\bm{z}},{\bm{w}}\rangle&0&\langle{\bm{x}},{\bm{u}}\rangle\\ \langle{\bm{y}},{\bm{v}}\rangle&\langle{\bm{x}},{\bm{u}}\rangle&0\end{bmatrix}\in{\mathbb{M}}_{3},\quad{\bm{V}}\equiv\begin{bmatrix}{\bm{x}}&{\bm{0}}_{m}&{\bm{0}}_{m}\\ {\bm{0}}_{n}&{\bm{y}}&{\bm{0}}_{n}\\ {\bm{0}}_{p}&{\bm{0}}_{p}&{\bm{z}}\end{bmatrix}\in{\mathbb{M}}_{N,3}.

and ${\bm{x}},{\bm{y}}$ and ${\bm{z}}$ are on the unit spheres ${\mathbb{S}}^{m-1}$ , ${\mathbb{S}}^{n-1}$ and ${\mathbb{S}}^{p-1}$ respectively. Note that $\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})$ is symmetric and behaves as a so-called spiked random matrix model where the signal part $\beta\,{\bm{V}}{\bm{B}}{\bm{V}}^{\top}$ correlates with the true signals ${\bm{x}},{\bm{y}}$ and ${\bm{z}}$ for a sufficiently large $\beta$ . However, the noise part (i.e., the term $\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{u}},{\bm{v}},{\bm{w}})$ in Eq. (18)) has a non-trivial structure due to the statistical dependencies between the singular vectors ${\bm{u}},{\bm{v}},{\bm{w}}$ and the tensor noise ${\bm{\mathsfit{X}}}$ . Despite this statistical dependency, we will show in the next subsection that the spectral measure of $\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})$ (and that of $\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{u}},{\bm{v}},{\bm{w}})$ ) converges to a deterministic measure (see Theorem 4) that coincides with the limiting spectral measure of $\Phi_{3}({\bm{\mathsfit{T}}},{\bm{a}},{\bm{b}},{\bm{c}})$ where ${\bm{a}},{\bm{b}},{\bm{c}}$ are unit vectors and independent of ${\bm{\mathsfit{X}}}$ . Furthermore, the matrix $\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})$ admits $2\lambda$ as an eigenvalue (regardless of the value of $\beta$ ), which is a simple consequence of the identities in Eq. (11), since

\displaystyle\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})\begin{bmatrix}{\bm{u}}\\ {\bm{v}}\\ {\bm{w}}\end{bmatrix}=\begin{bmatrix}{\bm{\mathsfit{T}}}({\bm{w}}){\bm{v}}+{\bm{\mathsfit{T}}}({\bm{v}}){\bm{w}}\\ {\bm{\mathsfit{T}}}({\bm{w}})^{\top}{\bm{u}}+{\bm{\mathsfit{T}}}({\bm{u}}){\bm{w}}\\ {\bm{\mathsfit{T}}}({\bm{v}})^{\top}{\bm{u}}+{\bm{\mathsfit{T}}}({\bm{u}})^{\top}{\bm{v}}\end{bmatrix}=2\lambda\begin{bmatrix}{\bm{u}}\\ {\bm{v}}\\ {\bm{w}}\end{bmatrix}.

(19)

Note however that the expression in Eq. (14) exists only if $\lambda$ is not an eigenvalue of $\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})$ . As discussed in [8] in the symmetric case, such condition is related to the locality of the maximum ML estimator. Indeed, we first have the following remark that concerns the Hessian of the underlying objective function.

Remark 2.

Recall the Lagrangian of the ML problem as $\mathcal{L}({\bm{u}},{\bm{v}},{\bm{w}})={\bm{\mathsfit{T}}}({\bm{u}},{\bm{v}},{\bm{w}})-\lambda\left(\|{\bm{u}}\|\|{\bm{v}}\|\|{\bm{w}}\|-1\right)$ and denote ${\bm{h}}\equiv\frac{[{\bm{u}}^{\top},{\bm{v}}^{\top},{\bm{w}}^{\top}]^{\top}}{\sqrt{3}}$ . If $\lambda$ is a singular value of ${\bm{\mathsfit{T}}}$ with associated singular vectors ${\bm{u}},{\bm{v}},{\bm{w}}$ , then $(\lambda,{\bm{h}})$ is an eigenpair of the Hessian $\nabla^{2}\mathcal{L}\equiv\frac{\partial^{2}\mathcal{L}}{\partial{\bm{h}}^{2}}$ evaluated at ${\bm{h}}$ . Indeed, $\nabla^{2}\mathcal{L}({\bm{h}})=\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})-\lambda{\bm{I}}_{N}$ , thus $\nabla^{2}\mathcal{L}({\bm{h}}){\bm{h}}=\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}){\bm{h}}-\lambda{\bm{h}}=\lambda{\bm{h}}$ since $\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}){\bm{h}}=2\lambda{\bm{h}}$ (from Eq. (19)).

From [18, Theorem 12.5], a necessary condition for ${\bm{u}},{\bm{v}},{\bm{w}}$ to be a local maximum of the ML problem is that

\displaystyle\langle\nabla^{2}\mathcal{L}({\bm{h}}){\bm{k}},{\bm{k}}\rangle\leq 0,\quad\forall{\bm{k}}\in{\bm{h}}^{\perp}\equiv\left\{{\bm{k}}\in{\mathbb{R}}^{N}\,\mid\,\langle{\bm{h}},{\bm{k}}\rangle=0\right\}.

which yields (by remark 2) the condition

\displaystyle\max_{{\bm{k}}\in{\mathbb{S}}^{N-1}\cap{\bm{h}}^{\perp}}\langle\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}}){\bm{k}},{\bm{k}}\rangle\leq\lambda.

(20)

As such, for $\lambda>0$ , $2\lambda$ must be the largest eigenvalue of $\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})$ as per Eq. (19), while its second largest eigenvalue cannot exceed $\lambda$ as shown by Eq. (20). Thus our analysis applies only to some local optimum of the ML problem for which the corresponding singular value $\lambda$ lies outside the bulk of $\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})$ , namely we suppose⁷⁷7We will find that such assumption is satisfied provided $\beta>\beta_{s}$ for some $\beta_{s}>0$ that we will determine that there exists a tuple $(\lambda,{\bm{u}},{\bm{v}},{\bm{w}})$ verifying the identities in Eq. (11) such that $\lambda$ is not an eigenvalue of $\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})$ . Moreover, this allows the existence of the matrix inverse in Eq. (14) and it is not restrictive in the sense that it is satisfied for a sufficiently large value of the SNR $\beta$ .
In the sequel, for any $(\lambda,{\bm{u}},{\bm{v}},{\bm{w}})$ verifying the identities in Eq. (11), we find that the largest eigenvalue $2\lambda$ is always isolated from the bulk of $\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})$ independently of the SNR $\beta$ . In addition, there exists $\beta_{s}>0$ such that for $\beta\leq\beta_{s}$ , the observed spike is not informative (in the sense that the corresponding alignments will be null) and is visible (an isolated eigenvalue appears in the spectrum of $\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})$ , see Figure 4) because of the statistical dependencies between ${\bm{u}},{\bm{v}},{\bm{w}}$ and ${\bm{\mathsfit{X}}}$ . The same phenomenon has been observed in the case of symmetric random tensors by [8].

3.3 Limiting spectral measure of block-wise $3$ -order tensor contractions

We start by presenting our first result which characterizes the limiting spectral measure of the ensemble $\mathcal{B}_{3}({\bm{\mathsfit{X}}})$ with ${\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{m,n,p}(\mathcal{N}(0,1))$ . We characterize this measure in the limit when the tensor dimensions grow large as in the following assumption.

Assumption 1 (Growth rate).

As $m,n,p\to\infty$ , the dimension ratios $\frac{m}{m+n+p}\to c_{1}\in(0,1)$ , $\frac{n}{m+n+p}\to c_{2}\in(0,1)$ and $\frac{p}{m+n+p}\to c_{3}=1-(c_{1}+c_{2})\in(0,1)$ .

We have the following theorem which characterizes the spectrum of $\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}})$ with arbitrary deterministic unit vectors ${\bm{a}},{\bm{b}},{\bm{c}}$ .

Theorem 3.

Let ${\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{m,n,p}(\mathcal{N}(0,1))$ be a sequence of random asymmetric Gaussian tensors and $({\bm{a}},{\bm{b}},{\bm{c}})\in{\mathbb{S}}^{m-1}\times{\mathbb{S}}^{n-1}\times{\mathbb{S}}^{p-1}$ a sequence of deterministic vectors of increasing dimensions, following Assumption 1. Then the empirical spectral measure of $\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}})$ converges weakly almost surely to a deterministic measure $\nu$ whose Stieltjes transform $g(z)$ is defined as the solution to the equation $g(z)=\sum_{i=1}^{3}g_{i}(z)$ such that $\Im[g(z)]>0$ for $\Im[z]>0$ where, for $i\in[3]$ $g_{i}(z)$ satisfies $g_{i}^{2}(z)-(g(z)+z)g_{i}(z)-c_{i}=0$ for $z\in{\mathbb{C}}\setminus\mathcal{S}(\nu)$ .

Proof.

See Appendix B.2. ∎

Remark 3.

The equation $g(z)=\sum_{i=1}^{3}g_{i}(z)$ yields a polynomial equation of degree $5$ in $g$ which can be solved numerically or via an iteration procedure which converges to a fixed point for $z$ large enough.

In particular, in the cubic case when $c_{1}=c_{2}=c_{3}=\frac{1}{3}$ , the empirical spectral measure of $\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}})$ converges to a semi-circle law as precisely stated by the following Corollary of Theorem 3.

Corollary 1.

Given the setting of Theorem 3 with $c_{1}=c_{2}=c_{3}=\frac{1}{3}$ , the empirical spectral measure of $\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}})$ converges weakly almost surely to the semi-circle distribution supported on $\mathcal{S}(\nu)\equiv\left[-2\sqrt{\frac{2}{3}},2\sqrt{\frac{2}{3}}\right]$ , whose density and Stieltjes transform write respectively as

\displaystyle\nu(dx)=\frac{3}{4\pi}\sqrt{\left(\frac{8}{3}-x^{2}\right)^{+}}dx,\quad g(z)\equiv\frac{-3z+3\sqrt{z^{2}-\frac{8}{3}}}{4},\quad\text{where}\quad z\in{\mathbb{C}}\setminus\mathcal{S}(\nu).

Proof.

See Appendix B.3. ∎

Figure 3 depicts the spectrum of $\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}})$ with ${\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{m,n,p}(\mathcal{N}(0,1))$ and independent unit vectors ${\bm{a}},{\bm{b}},{\bm{c}}$ , it particularly illustrates the convergence in law of this spectrum when the dimensions $m,n,p$ grow large.

Remark 4.

More generally, the spectral measure of $\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}})$ converges to a deterministic measure with connected support if $\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}})$ is almost surely full rank, i.e., if $\max(m,n)-\min(m,n)\leq p\leq m+n$ since the rank of $\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}})$ is $\min(m,n+p)+\min(n,m+p)+\min(p,m+n)$ . In contrast if it is not full rank, its spectral measure converges to a deterministic measure with unconnected support (see the case of matrices in Corollary 4 subsequently).

The analysis of the tensor ${\bm{\mathsfit{T}}}$ relies on describing the spectrum of $\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})$ where the singular vectors ${\bm{u}},{\bm{v}},{\bm{w}}$ depend statistically on ${\bm{\mathsfit{X}}}$ (the noise part of ${\bm{\mathsfit{T}}}$ ). Despite these dependencies, it turns out that the spectrum of $\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})$ converges in law to the same deterministic measure described by Theorem 3. Besides, we need a further technical assumption on the singular value and vectors of ${\bm{\mathsfit{T}}}$ .

Assumption 2.

We assume that there exists a sequence of critical points $(\lambda_{*},{\bm{u}}_{*},{\bm{v}}_{*},{\bm{w}}_{*})$ satisfying Eq. (11) such that $\lambda_{*}\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}(\beta)$ , $|\langle{\bm{u}}_{*},{\bm{x}}\rangle|\operatorname{\,\xrightarrow{\text{a.s.}}\,}a_{x}^{\infty}(\beta)$ , $|\langle{\bm{v}}_{*},{\bm{y}}\rangle|\operatorname{\,\xrightarrow{\text{a.s.}}\,}a_{y}^{\infty}(\beta)$ , $|\langle{\bm{w}}_{*},{\bm{z}}\rangle|\operatorname{\,\xrightarrow{\text{a.s.}}\,}a_{z}^{\infty}(\beta)$ with $\lambda^{\infty}(\beta)\notin\mathcal{S}(\nu)$ and $a_{x}^{\infty}(\beta),a_{y}^{\infty}(\beta),a_{z}^{\infty}(\beta)>0$ .

We precisely have the following result.

Theorem 4.

Let ${\bm{\mathsfit{T}}}$ be a sequence of random tensors defined as in Eq. (10). Under Assumptions 1 and 2, the empirical spectral measure of $\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}}_{*},{\bm{v}}_{*},{\bm{w}}_{*})$ converges weakly almost surely to a deterministic measure $\nu$ whose Stieltjes transform $g(z)$ is defined as the solution to the equation $g(z)=\sum_{i=1}^{3}g_{i}(z)$ such that $\Im[g(z)]>0$ for $\Im[z]>0$ where, for $i\in[3]$ $g_{i}(z)$ satisfies $g_{i}^{2}(z)-(g(z)+z)g_{i}(z)-c_{i}=0$ for $z\in{\mathbb{C}}\setminus\mathcal{S}(\nu)$ .

Proof.

See Appendix B.4. ∎

However, the statistical dependency between ${\bm{u}},{\bm{v}},{\bm{w}}$ and ${\bm{\mathsfit{X}}}$ exhibits an isolated eigenvalue in the spectrum of $\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})$ at the value $2\lambda$ independently of the value of the SNR $\beta$ , which is a consequence of Eq. (19). Figure 4 depicts iterations of the power iteration method where the leftmost histogram corresponds to the fixed point solution, where one sees that the spike converges to the value $2\lambda$ .

Remark 5.

Note that, from Assumption 2, the almost sure limit $\lambda^{\infty}$ of $\lambda$ must lie outside the support $\mathcal{S}(\nu)$ of the deterministic measure $\nu$ described by Theorem 4. The same phenomenon was noticed in [8] where it is assumed that the almost sure limit $\mu^{\infty}$ of $\mu$ must satisfy $\mu^{\infty}>2\sqrt{\frac{2}{3}}$ in the case of symmetric tensors.

Let us denote the blocks of the resolvent of $\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})$ as

\displaystyle{\bm{R}}_{\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})}(z)=\begin{bmatrix}{\bm{R}}^{11}(z)&{\bm{R}}^{12}(z)&{\bm{R}}^{13}(z)\\ {\bm{R}}^{12}(z)^{\top}&{\bm{R}}^{22}(z)&{\bm{R}}^{23}(z)\\ {\bm{R}}^{13}(z)^{\top}&{\bm{R}}^{23}(z)^{\top}&{\bm{R}}^{33}(z)\end{bmatrix},

(21)

where ${\bm{R}}^{11}(z)\in{\mathbb{M}}_{m},{\bm{R}}^{22}(z)\in{\mathbb{M}}_{n}$ and ${\bm{R}}^{33}(z)\in{\mathbb{M}}_{p}$ . The following corollary of Theorem 4 will be useful subsequently.

Corollary 2.

Recall the setting and notations of Theorem 4. For all $z\in{\mathbb{C}}\setminus\mathcal{S}(\nu)$ , we have

\displaystyle\frac{1}{N}\operatorname{tr}{\bm{R}}^{11}(z)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g_{1}(z),\quad\frac{1}{N}\operatorname{tr}{\bm{R}}^{22}(z)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g_{2}(z),\quad\frac{1}{N}\operatorname{tr}{\bm{R}}^{33}(z)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g_{3}(z).

3.4 Concentration of the singular value and the alignments

When the dimensions of ${\bm{\mathsfit{T}}}$ grow large under Assumption 1, the singular value $\lambda$ and the alignments $\langle{\bm{u}},{\bm{x}}\rangle,\langle{\bm{v}},{\bm{y}}\rangle$ and $\langle{\bm{w}},{\bm{z}}\rangle$ converge almost surely to some deterministic limits. This can be shown by controlling the variances of these quantities using the Poincaré’s inequality (Lemma 2). Precisely, for $\lambda$ , invoking Eq. (15) we have

\displaystyle\mathrm{Var}\lambda\leq\sum_{ijk}\operatorname{\mathbb{E}}\left|\frac{\partial\lambda}{\partial X_{ijk}}\right|^{2}=\frac{1}{N}\sum_{ijk}u_{i}^{2}v_{j}^{2}w_{k}^{2}=\frac{1}{N}.

Bounding higher order moments of $\lambda$ similarly allows to obtain the concentration of $\lambda$ , e.g. by Chebyshev’s inequality, we have for all $t>0$

\displaystyle\mathbb{P}\left(|\lambda-\operatorname{\mathbb{E}}\lambda|\geq t\right)\leq\frac{1}{N\,t^{2}}.

Similarly, with Eq. (14), there exists $C>0$ such that for all $t>0$

\displaystyle\mathbb{P}\left(|\langle{\bm{u}},{\bm{x}}\rangle-\operatorname{\mathbb{E}}\langle{\bm{u}},{\bm{x}}\rangle|\geq t\right)\leq\frac{C}{N\,t^{2}},\,\,\mathbb{P}\left(|\langle{\bm{v}},{\bm{y}}\rangle-\operatorname{\mathbb{E}}\langle{\bm{v}},{\bm{y}}\rangle|\geq t\right)\leq\frac{C}{N\,t^{2}},\,\,\mathbb{P}\left(|\langle{\bm{w}},{\bm{z}}\rangle-\operatorname{\mathbb{E}}\langle{\bm{w}},{\bm{z}}\rangle|\geq t\right)\leq\frac{C}{N\,t^{2}}.

For the remainder of the manuscript, we denote the almost sure limits of $\lambda$ and the of alignments $\langle{\bm{u}},{\bm{x}}\rangle,\langle{\bm{v}},{\bm{y}}\rangle$ and $\langle{\bm{w}},{\bm{z}}\rangle$ respectively as

\displaystyle\lambda^{\infty}(\beta)\equiv\lim_{N\to\infty}\lambda,\quad a_{\bm{x}}^{\infty}(\beta)\equiv\lim_{N\to\infty}\langle{\bm{u}},{\bm{x}}\rangle,\quad a_{\bm{y}}^{\infty}(\beta)\equiv\lim_{N\to\infty}\langle{\bm{v}},{\bm{y}}\rangle,\quad a_{\bm{z}}^{\infty}(\beta)\equiv\lim_{N\to\infty}\langle{\bm{w}},{\bm{z}}\rangle.

(22)

3.5 Asymptotic singular value and alignments

Having the concentration result from the previous subsection, it remains to estimate the expectations $\operatorname{\mathbb{E}}\lambda,\operatorname{\mathbb{E}}\langle{\bm{u}},{\bm{x}}\rangle,\operatorname{\mathbb{E}}\langle{\bm{v}},{\bm{y}}\rangle$ and $\operatorname{\mathbb{E}}\langle{\bm{w}},{\bm{z}}\rangle$ . Usually, the evaluation of these quantities using tools from random matrix theory relies on computing Cauchy integrals involving ${\bm{R}}_{\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})}(z)$ (the resolvent of $\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})$ ). Here, we take a different (yet analytically simpler) approach by taking directly the expectation of the identities in Eq. (11) and Eq. (12), then applying Stein’s Lemma (Lemma 1) and Lemma 3. For instance, for $\lambda$ , we have

\displaystyle\operatorname{\mathbb{E}}\lambda=\frac{1}{\sqrt{N}}\sum_{ijk}\operatorname{\mathbb{E}}\left[v_{j}w_{k}\frac{\partial u_{i}}{\partial X_{ijk}}\right]+\operatorname{\mathbb{E}}\left[u_{i}w_{k}\frac{\partial v_{j}}{\partial X_{ijk}}\right]+\operatorname{\mathbb{E}}\left[u_{i}v_{j}\frac{\partial w_{k}}{\partial X_{ijk}}\right]+\beta\operatorname{\mathbb{E}}\left[\langle{\bm{u}},{\bm{x}}\rangle\langle{\bm{v}},{\bm{y}}\rangle\langle{\bm{w}},{\bm{z}}\rangle\right].

From Eq. (14), when $N\to\infty$ , it turns out that the only contributing terms⁸⁸8Yielding non-vanishing terms in the expression of $\lambda^{\infty}(\beta)$ . of the derivatives $\frac{\partial u_{i}}{\partial X_{ijk}},\frac{\partial v_{j}}{\partial X_{ijk}}$ and $\frac{\partial w_{k}}{\partial X_{ijk}}$ in the above sum are respectively

\displaystyle\frac{\partial u_{i}}{\partial X_{ijk}}\simeq-\frac{1}{\sqrt{N}}v_{j}w_{k}R^{11}_{ii}(\lambda),\quad\frac{\partial v_{j}}{\partial X_{ijk}}\simeq-\frac{1}{\sqrt{N}}u_{i}w_{k}R^{22}_{jj}(\lambda),\quad\frac{\partial w_{k}}{\partial X_{ijk}}\simeq-\frac{1}{\sqrt{N}}u_{i}v_{j}R^{33}_{kk}(\lambda).

This yields

\displaystyle\operatorname{\mathbb{E}}\lambda=-\frac{1}{N}\left(\operatorname{tr}{\bm{R}}^{11}(\lambda)+\operatorname{tr}{\bm{R}}^{22}(\lambda)+\operatorname{tr}{\bm{R}}^{33}(\lambda)\right)+\beta\operatorname{\mathbb{E}}\left[\langle{\bm{u}},{\bm{x}}\rangle\langle{\bm{v}},{\bm{y}}\rangle\langle{\bm{w}},{\bm{z}}\rangle\right]+\mathcal{O}\left(N^{-1}\right).

Therefore, the almost sure limit $\lambda^{\infty}(\beta)$ as $N\to\infty$ of $\lambda$ satisfies

\displaystyle\lambda^{\infty}(\beta)+g(\lambda^{\infty}(\beta))=\beta a_{\bm{x}}^{\infty}(\beta)a_{\bm{y}}^{\infty}(\beta)a_{\bm{z}}^{\infty}(\beta),

where $a_{\bm{x}}^{\infty}(\beta),a_{\bm{y}}^{\infty}(\beta)$ and $a_{\bm{z}}^{\infty}(\beta)$ are defined in Eq. (22). From Eq. (11), proceeding similarly as above with the identities

\displaystyle{\bm{x}}^{\top}{\bm{\mathsfit{T}}}({\bm{v}}){\bm{w}}=\lambda\langle{\bm{u}},{\bm{x}}\rangle,\quad{\bm{y}}^{\top}{\bm{\mathsfit{T}}}({\bm{u}}){\bm{w}}=\lambda\langle{\bm{v}},{\bm{y}}\rangle,\quad{\bm{z}}^{\top}{\bm{\mathsfit{T}}}({\bm{v}})^{\top}{\bm{u}}=\lambda\langle{\bm{w}},{\bm{z}}\rangle,

we obtain the following result.

Theorem 5.

Recall the notations in Theorem 4. Under Assumptions 1 and 2, there exists $\beta_{s}>0$ such that for $\beta>\beta_{s}$ ,

\displaystyle\begin{cases}\lambda\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}(\beta),\\ \left|\langle{\bm{u}},{\bm{x}}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}\frac{1}{\sqrt{\alpha_{2}(\lambda^{\infty}(\beta))\alpha_{3}(\lambda^{\infty}(\beta))}},\\ \left|\langle{\bm{v}},{\bm{y}}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}\frac{1}{\sqrt{\alpha_{1}(\lambda^{\infty}(\beta))\alpha_{3}(\lambda^{\infty}(\beta))}},\\ \left|\langle{\bm{w}},{\bm{z}}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}\frac{1}{\sqrt{\alpha_{1}(\lambda^{\infty}(\beta))\alpha_{2}(\lambda^{\infty}(\beta))}},\end{cases}

where $\alpha_{i}(z)\equiv\frac{\beta}{z+g(z)-g_{i}(z)}$ and $\lambda^{\infty}(\beta)$ satisfies $f(\lambda^{\infty}(\beta),\beta)=0$ with $f(z,\beta)=z+g(z)-\frac{\beta}{\alpha_{1}(z)\alpha_{2}(z)\alpha_{3}(z)}$ . Besides, for $\beta\in[0,\beta_{s}]$ , $\lambda^{\infty}$ is bounded⁹⁹9Such bound might be computed numerically and corresponds to the right edge of the limiting spectral measure in Theorem 4. by an order one constant and $\left|\langle{\bm{u}},{\bm{x}}\rangle\right|,\left|\langle{\bm{v}},{\bm{y}}\rangle\right|,\left|\langle{\bm{w}},{\bm{z}}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}0$ .

Proof.

See Appendix B.5. ∎

Remark 6.

A more compact expression for the alignments is provided in Theorem 8.

Figure 5 depicts the predicted asymptotic dominant singular value $\lambda^{\infty}(\beta)$ of ${\bm{\mathsfit{T}}}$ and the corresponding alignments from Theorem 5. As we can see, the result of Theorem 5 predicts that a non-zero correlation between the population signals ${\bm{x}},{\bm{y}},{\bm{z}}$ and their estimated counterparts is possible only when $\beta>\beta_{s}\approx 1.1134$ , which corresponds to the value after which $\lambda^{\infty}(\beta)$ starts to increase with $\beta$ .

Remark 7.

Note that, given $g(z)$ , the inverse formula expressing $\beta$ in terms of $\lambda^{\infty}$ is explicit. Specifically, we have $\beta(\lambda^{\infty})=\sqrt{\frac{\prod_{i=1}^{3}(\lambda^{\infty}+g(\lambda^{\infty})-g_{i}(\lambda^{\infty}))}{\lambda^{\infty}+g(\lambda^{\infty})}}$ . In particular, this inverse formula provides an estimator for the SNR $\beta$ given the largest singular value $\lambda$ of ${\bm{\mathsfit{T}}}$ .

3.6 Cubic $3$ -order tensors: case $c_{1}=c_{2}=c_{3}=\frac{1}{3}$

In this section, we study the particular case where all the tensor dimensions are equal. As such, the three alignments $\langle{\bm{u}},{\bm{x}}\rangle,\langle{\bm{v}},{\bm{y}}\rangle,\langle{\bm{w}},{\bm{z}}\rangle$ converge almost surely to the same quantity. In this case, the almost sure limits of $\lambda$ and $\langle{\bm{u}},{\bm{x}}\rangle,\langle{\bm{v}},{\bm{y}}\rangle,\langle{\bm{w}},{\bm{z}}\rangle$ can be obtained explicitly in terms of the signal strength $\beta$ as per the following corollary of Theorem 5.

Corollary 3.

Under Assumptions 2 and 1 with $c_{1}=c_{2}=c_{3}=\frac{1}{3}$ , for $\beta>\beta_{s}=\frac{2\sqrt{3}}{3}$

\displaystyle\begin{cases}\lambda\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}(\beta)\equiv\sqrt{\frac{\beta^{2}}{2}+2+\frac{\sqrt{3}\sqrt{\left(3\beta^{2}-4\right)^{3}}}{18\beta}},\\ \left|\langle{\bm{u}},{\bm{x}}\rangle\right|,\left|\langle{\bm{v}},{\bm{y}}\rangle\right|,\left|\langle{\bm{w}},{\bm{z}}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}\frac{\sqrt{9\beta^{2}-12+\frac{\sqrt{3}\sqrt{\left(3\beta^{2}-4\right)^{3}}}{\beta}}+\sqrt{9\beta^{2}+36+\frac{\sqrt{3}\sqrt{\left(3\beta^{2}-4\right)^{3}}}{\beta}}}{6\sqrt{2}\beta}.\end{cases}

Besides, for $\beta\in\left[0,\frac{2\sqrt{3}}{3}\right]$ , $\lambda\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}\leq 2\sqrt{\frac{2}{3}}$ and $\left|\langle{\bm{u}},{\bm{x}}\rangle\right|,\left|\langle{\bm{v}},{\bm{y}}\rangle\right|,\left|\langle{\bm{w}},{\bm{z}}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}0$ .

Proof.

See Appendix B.6. ∎

Figure 6 provides plots of the almost sure limits of the singular value and alignments when ${\bm{\mathsfit{T}}}$ is cubic as per Corollary 3 (see Subsection A.2 for simulations supporting the above result). In particular, this result predicts a possible correlation between the singular vectors and the underlying signal components above the value $\beta_{s}=\frac{2\sqrt{3}}{3}\approx 1.154$ with corresponding singular value $\lambda_{s}=2\sqrt{\frac{2}{3}}\approx 1.633$ with an alignment $a_{s}=\frac{\sqrt{2}}{2}\approx 0.707$ . In addition, we can easily check from the formulas above that $\frac{\lambda^{\infty}(\beta)}{\beta}\to 1$ and $\left|\langle{\bm{u}},{\bm{x}}\rangle\right|\to 1$ for large values of $\beta$ . Besides, for values of $\beta$ around the value $\beta_{s}$ , the expression of $\lambda^{\infty}(\beta)$ admits the following expansion

\displaystyle\lambda^{\infty}(\beta)=2\sqrt{\frac{2}{3}}+\frac{\sqrt{2}}{4}\left(\beta-\beta_{s}\right)+\frac{\sqrt{2}\,3^{\frac{1}{4}}}{4}\left(\beta-\beta_{s}\right)^{\frac{3}{2}}+\frac{3\sqrt{6}}{64}\left(\beta-\beta_{s}\right)^{2}+o(\left(\beta-\beta_{s}\right)^{2}),

whereas the corresponding alignment expends as

\displaystyle\frac{\sqrt{2}}{2}+\frac{\sqrt{2}\,3^{\frac{1}{4}}}{4}\sqrt{\beta-\beta_{s}}-\frac{\sqrt{6}}{16}\left(\beta-\beta_{s}\right)-\frac{\sqrt{2}\,3^{\frac{3}{4}}}{16}\left(\beta-\beta_{s}\right)^{\frac{3}{2}}+\frac{21\sqrt{2}}{256}\left(\beta-\beta_{s}\right)^{2}+o(\left(\beta-\beta_{s}\right)^{2}).

4 Random tensors meet random matrices

In this section, we investigate the application of Theorem 5 to the particular case of spiked random matrices. Indeed, for instance when $p=1$ the spiked tensor model in Eq. (10) becomes a spiked matrix model which we will now denote as

\displaystyle{\bm{M}}=\beta{\bm{x}}{\bm{y}}^{\top}+\frac{1}{\sqrt{N}}{\bm{X}},

(23)

where again ${\bm{x}}$ and ${\bm{y}}$ are on the unit spheres ${\mathbb{S}}^{m-1}$ and ${\mathbb{S}}^{n-1}$ respectively, $N=m+n$ and ${\bm{X}}$ is a Gaussian noise matrix with i.i.d. entries $X_{ij}\sim\mathcal{N}(0,1)$ .

Our approach does not apply directly to the matrix ${\bm{M}}$ above. Indeed, the singular vectors ${\bm{u}},{\bm{v}}$ of ${\bm{M}}$ corresponding to its largest singular value $\lambda$ satisfy the identities

\displaystyle{\bm{M}}{\bm{v}}=\lambda{\bm{u}},\quad{\bm{M}}^{\top}{\bm{u}}=\lambda{\bm{v}},\quad\lambda={\bm{u}}^{\top}{\bm{M}}{\bm{v}},

(24)

and deriving ${\bm{u}},{\bm{v}}$ w.r.t. an entry $X_{ij}$ of ${\bm{X}}$ will result in

\displaystyle\left(\begin{bmatrix}{\bm{0}}_{m\times m}&{\bm{M}}\\ {\bm{M}}^{\top}&{\bm{0}}_{n\times n}\end{bmatrix}-\lambda{\bm{I}}_{N}\right)\begin{bmatrix}\frac{\partial{\bm{u}}}{\partial X_{ij}}\\ \frac{\partial{\bm{v}}}{\partial X_{ij}}\end{bmatrix}=-\frac{1}{\sqrt{N}}\begin{bmatrix}v_{j}\left({\bm{e}}_{i}^{m}-u_{i}{\bm{u}}\right)\\ u_{i}\left({\bm{e}}_{j}^{n}-v_{j}{\bm{v}}\right)\end{bmatrix}.

(25)

Now since $\lambda$ is an eigenvalue of $\begin{bmatrix}{\bm{0}}_{m\times m}&{\bm{M}}\\ {\bm{M}}^{\top}&{\bm{0}}_{n\times n}\end{bmatrix}$ , the matrix $\left(\begin{bmatrix}{\bm{0}}_{m\times m}&{\bm{M}}\\ {\bm{M}}^{\top}&{\bm{0}}_{n\times n}\end{bmatrix}-\lambda{\bm{I}}_{N}\right)$ is not invertible, which makes our approach inapplicable for $d=2$ . However, we can retrieve the behavior of the spiked matrix model in Eq. (23) by considering an order $3$ tensor and setting for instance $c_{3}\to 0$ as we will discuss subsequently.

4.1 Limiting spectral measure

Given the result of Theorem 5, the asymptotic singular value and alignments for the spiked matrix model in Eq. (23) correspond to the particular case $c_{3}\to 0$ . We start by characterizing the corresponding limiting spectral measure, we have the following corollary of Theorem 4 when $c_{3}\to 0$ .

Corollary 4.

Given the setting and notations of Theorem 3, under Assumptions 2 and 1 with $c_{1}=c,c_{2}=1-c$ for $c\in(0,1)$ , the empirical spectral measure of $\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}})$ converges weakly almost surely to a deterministic measure $\nu$ defined on the support $\mathcal{S}(\nu)\equiv\left[-\sqrt{1+2\sqrt{\eta}},-\sqrt{1-2\sqrt{\eta}}\right]\cup\left[\sqrt{1-2\sqrt{\eta}},\sqrt{1+2\sqrt{\eta}}\right]$ with $\eta=c(1-c)$ , whose density function writes as

\displaystyle\nu(dx)=\frac{1}{\pi x}\sin\left(\frac{\arctan_{2}(0,q_{c}(x))}{2}\right)\sqrt{|q_{c}(x)|}\operatorname{sign}\left(\frac{\sin\left(\frac{\arctan_{2}(0,q_{c}(x))}{2}\right)}{x}\right)dx+\left(1-2\min(c,1-c)\right)\delta(x),

where $q_{c}(x)=(x^{2}-1)^{2}+4c(c-1)$ . And the corresponding Stieltjes transform writes as

\displaystyle g(z)=-z+\frac{\sqrt{q_{c}(z)}}{z},\quad\text{for}\quad z\in{\mathbb{C}}\setminus\mathcal{S}(\nu).

Proof.

See Appendix B.7. ∎

Remark 8.

Corollary 4 describes the limiting spectral measure of $\frac{1}{\sqrt{m+n}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}})$ for ${\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{m,n,1}(\mathcal{N}(0,1))$ , $({\bm{a}},{\bm{b}})\in{\mathbb{S}}^{m-1}\times{\mathbb{S}}^{n-1}$ and ${\bm{c}}=1$ (a scalar), which is equivalent to random matrices of the form $\frac{1}{\sqrt{m+n}}\begin{bmatrix}{\bm{0}}_{m\times m}&{\bm{X}}\\ {\bm{X}}^{\top}&{\bm{0}}_{n\times n}\end{bmatrix}$ with ${\bm{X}}\sim{\mathbb{M}}_{m,n}(\mathcal{N}(0,1))$ .

Figure 7 depicts the limiting spectral measures as per Corollary 4 for various values of $c$ (see also simulations in Appendix A.1). In particular, the limiting measure is a semi-circle law for square matrices (i.e., $c=\frac{1}{2}$ ), while it decomposes into a two-mode distribution¹⁰¹⁰10With a Dirac $\delta(x)$ at $0$ weighted by $1-2\min(c,1-c)$ , not shown in Figure 7. for $c\in(0,1)\setminus\{\frac{1}{2}\}$ due to the fact that the underlying random matrix model is not full rank ( $\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},1)$ if of rank $2\min(m,n)$ ).

4.2 Limiting singular value and alignments

Given the limiting Stieltjes transform from the previous subsection (Corollary 4), the limiting largest singular value of ${\bm{M}}$ (in Eq. (23)) and the alignments of the corresponding singular vectors are obtained thanks to Theorem 5, yielding the following corollary.

Corollary 5.

Under Assumption 1 with $c_{1}=c,c_{2}=1-c$ , we have for $\beta>\beta_{s}=\sqrt[4]{c(1-c)}$

\displaystyle\begin{cases}\lambda\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}(\beta)=\sqrt{\beta^{2}+1+\frac{c(1-c)}{\beta^{2}}},\\ \left|\langle{\bm{u}},{\bm{x}}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}\frac{1}{\kappa(\beta,c)},\quad\left|\langle{\bm{v}},{\bm{y}}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}\frac{1}{\kappa(\beta,1-c)},\end{cases}

where ${\bm{u}}\in{\mathbb{S}}^{m-1},{\bm{v}}\in{\mathbb{S}}^{n-1}$ are the singular vectors of ${\bm{M}}$ corresponding to its largest singular value $\lambda$ and $\kappa(\beta,c)$ is given by

\displaystyle\kappa(\beta,c)=\beta\sqrt{\frac{\beta^{2}\left(\beta^{2}+1\right)-c\left(c-1\right)}{(\beta^{4}+c(c-1))\left(\beta^{2}+1-c\right)}}.

Besides, for $\beta\in[0,\beta_{s}]$ , $\lambda\operatorname{\,\xrightarrow{\text{a.s.}}\,}\sqrt{1+2\sqrt{c(1-c)}}$ and $\left|\langle{\bm{u}},{\bm{x}}\rangle\right|,\left|\langle{\bm{v}},{\bm{y}}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}0$ .

Proof.

See Appendix B.8. ∎

Figure 8 provides the curves of the asymptotic singular value and alignments of Corollary 5 (see Subsection A.1 for simulations supporting the above result). Unlike the tensor case from the previous section, we see that the asymptotic alignments $a_{\bm{x}}^{\infty}(\beta),a_{\bm{y}}^{\infty}(\beta)$ are continuous and a positive alignment is observed for $\beta>\beta_{s}$ which corresponds to the classical BBP phase transition of spiked random matrix models [2, 5, 19].

Remark 9 (Application to tensor unfolding).

The tensor unfolding method consists in estimating the spike components of a given tensor ${\bm{\mathsfit{T}}}=\gamma{\bm{x}}^{(1)}\otimes\cdots\otimes{\bm{x}}^{(d)}+\frac{1}{\sqrt{n}}{\bm{\mathsfit{X}}}\in{\mathbb{T}}_{n}^{d}$ with ${\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n}^{d}(\mathcal{N}(0,1))$ by applying an SVD to its unfolded matrices (for $i\in[d]$ ) $\operatorname{Mat}_{i}({\bm{\mathsfit{T}}})=\gamma{\bm{x}}^{(i)}({\bm{y}}^{(i)})^{\top}+\frac{1}{\sqrt{n}}{\bm{X}}_{i}$ of size $n\times n^{d-1}$ . Applying Corollary 5 to this case predicts a phase transition at $\beta_{s}=\sqrt[4]{c(1-c)}$ with¹¹¹¹11 In our assumptions, we assume that $c$ is some constant that does not depend on the tensor dimensions. Still, we believe that it can be relaxed in the same vein as in [4]. $c=\frac{n}{n+n^{d-1}}=\frac{1}{1+n^{d-2}}\to 0$ , which yields $\beta_{s}=\sqrt[4]{\frac{n^{d-2}}{(1+n^{d-2})^{2}}}$ . After re-scaling the noise component (by multiplying Eq. (23) by $\sqrt{1+\frac{m}{n}}$ with $m=n^{d-1}$ ) this yields $\gamma>n^{\frac{d-2}{4}}$ , i.e., the phase transition for tensor unfolding obtained by [3] (see Theorem 3.3 therein). More generally, for any order- $d$ tensor ${\bm{\mathsfit{T}}}$ of arbitrary dimensions $n_{1}\times\cdots\times n_{d}$ , the tensor unfolding method succeeds provided that $\beta\geq\left(\prod_{i=1}^{d}n_{i}\right)^{1/4}/\sqrt{\sum_{i=1}^{d}n_{i}}$ .

5 Generalization to arbitrary $d$ -order tensors

We now show that our approach can be generalized straightforwardly to the $d$ -order spiked tensor model of Eq. (1). Indeed, from Eq. (4), the $\ell_{2}$ -singular value $\lambda$ and vectors ${\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}\in{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}$ , corresponding to the best rank-one approximation $\lambda{\bm{u}}^{(1)}\otimes\ldots\otimes{\bm{u}}^{(d)}$ of the $d$ -order tensor ${\bm{\mathsfit{T}}}$ in Eq. (1), satisfy the identities

\displaystyle\begin{cases}{\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(i-1)},:,{\bm{u}}^{(i+1)},\ldots,{\bm{u}}^{(d)})=\lambda{\bm{u}}^{(i)},\\ \lambda={\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})=\sum_{i_{1},\ldots,i_{d}}u_{i_{1}}^{(1)}\ldots u_{i_{d}}^{(d)}T_{i_{1},\ldots,i_{d}}.\end{cases}

(26)

5.1 Associated random matrix ensemble

Let ${\bm{\mathsfit{T}}}^{ij}\in{\mathbb{M}}_{n_{i},n_{j}}$ denote the matrix obtained by contracting the tensor ${\bm{\mathsfit{T}}}$ with the singular vectors in $\{{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}\}\setminus\{{\bm{u}}^{(i)},{\bm{u}}^{(j)}\}$ , i.e.,

\displaystyle{\bm{\mathsfit{T}}}^{ij}\equiv{\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(i-1)},:,{\bm{u}}^{(i+1)},\ldots,{\bm{u}}^{(j-1)},:,{\bm{u}}^{(j+1)},\ldots,{\bm{u}}^{(d)}).

(27)

As in the order $3$ case, from Eq. (26), the derivatives of the singular vectors ${\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}$ with respect to the entry $X_{i_{1},\ldots,i_{d}}$ of the noise tensor ${\bm{\mathsfit{X}}}$ express as

\displaystyle\begin{bmatrix}\frac{\partial{\bm{u}}^{(1)}}{\partial X_{i_{1},\ldots,i_{d}}}\\ \vdots\\ \frac{\partial{\bm{u}}^{(d)}}{\partial X_{i_{1},\ldots,i_{d}}}\end{bmatrix}=-\frac{1}{\sqrt{N}}\left(\begin{bmatrix}{\bm{0}}_{n_{1}\times n_{1}}&{\bm{\mathsfit{T}}}^{12}&{\bm{\mathsfit{T}}}^{13}&\cdots&{\bm{\mathsfit{T}}}^{1d}\\ ({\bm{\mathsfit{T}}}^{12})^{\top}&{\bm{0}}_{n_{2}\times n_{2}}&{\bm{\mathsfit{T}}}^{23}&\cdots&{\bm{\mathsfit{T}}}^{2d}\\ ({\bm{\mathsfit{T}}}^{13})^{\top}&({\bm{\mathsfit{T}}}^{23})^{\top}&{\bm{0}}_{n_{3}\times n_{3}}&\ldots&{\bm{\mathsfit{T}}}^{3d}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ ({\bm{\mathsfit{T}}}^{1d})^{\top}&({\bm{\mathsfit{T}}}^{2d})^{\top}&({\bm{\mathsfit{T}}}^{3d})^{\top}&\cdots&{\bm{0}}_{n_{d}\times n_{d}}\end{bmatrix}-\lambda{\bm{I}}_{N}\right)^{-1}\begin{bmatrix}\prod_{\ell\in\{2,\ldots,d\}}u_{i_{\ell}}^{(\ell)}({\bm{e}}_{i_{1}}^{n_{1}}-u_{i_{1}}^{(i_{1})}{\bm{u}}^{(i_{1})})\\ \vdots\\ \prod_{\ell\in\{1,\ldots,d-1\}}u_{i_{\ell}}^{(\ell)}({\bm{e}}_{i_{d}}^{n_{d}}-u_{i_{d}}^{(i_{d})}{\bm{u}}^{(i_{d})})\end{bmatrix}

(28)

where $N=\sum_{i\in[d]}n_{i}$ , and the derivative of $\lambda$ w.r.t. $X_{i_{1},\ldots,i_{d}}$ writes as

\displaystyle\frac{\partial\lambda}{\partial X_{i_{1},\ldots,i_{d}}}=\frac{1}{\sqrt{N}}\prod_{\ell\in[d]}u_{i_{\ell}}^{(\ell)}.

(29)

As such, the associated random matrix model of ${\bm{\mathsfit{T}}}$ is the matrix appearing in the resolvent in Eq. (28). More generally, the $d$ -order block-wise tensor contraction ensemble $\mathcal{B}_{d}({\bm{\mathsfit{X}}})$ for ${\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{d}}(\mathcal{N}(0,1))$ is defined as

\displaystyle\mathcal{B}_{d}({\bm{\mathsfit{X}}})\equiv\left\{\Phi_{d}({\bm{\mathsfit{X}}},{\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)})\,\,\big{|}\,\,({\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)})\in{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}\right\}

(30)

where $\Phi_{d}$ is the mapping

\begin{split}\Phi_{d}:{\mathbb{T}}_{n_{1},\ldots,n_{d}}\times{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}&\longrightarrow{\mathbb{M}}_{\sum_{i}n_{i}}\\ ({\bm{\mathsfit{X}}},{\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)})&\longmapsto\begin{bmatrix}{\bm{0}}_{n_{1}\times n_{1}}&{\bm{\mathsfit{X}}}^{12}&{\bm{\mathsfit{X}}}^{13}&\cdots&{\bm{\mathsfit{X}}}^{1d}\\ ({\bm{\mathsfit{X}}}^{12})^{\top}&{\bm{0}}_{n_{2}\times n_{2}}&{\bm{\mathsfit{X}}}^{23}&\cdots&{\bm{\mathsfit{X}}}^{2d}\\ ({\bm{\mathsfit{X}}}^{13})^{\top}&({\bm{\mathsfit{X}}}^{23})^{\top}&{\bm{0}}_{n_{3}\times n_{3}}&\ldots&{\bm{\mathsfit{X}}}^{3d}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ ({\bm{\mathsfit{X}}}^{1d})^{\top}&({\bm{\mathsfit{X}}}^{2d})^{\top}&({\bm{\mathsfit{X}}}^{3d})^{\top}&\cdots&{\bm{0}}_{n_{d}\times n_{d}}\end{bmatrix},\end{split}

(31)

with ${\bm{\mathsfit{X}}}^{ij}\equiv{\bm{\mathsfit{X}}}({\bm{a}}^{(1)},\ldots,{\bm{a}}^{(i-1)},:,{\bm{a}}^{(i+1)},\ldots,{\bm{a}}^{(j-1)},:,{\bm{a}}^{(j+1)},\ldots,{\bm{a}}^{(d)})\in{\mathbb{M}}_{n_{i},n_{j}}$ .

Remark 10.

As in the order $3$ case, to ensure the existence of the matrix inverse in Eq. (28), we need to suppose that there exists a tuple $(\lambda_{*},{\bm{u}}_{*}^{(1)},\ldots,{\bm{u}}_{*}^{(d)})$ verifying the identities in Eq. (26) such that $\lambda_{*}$ is not an eigenvalue of $\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}_{*}^{(1)},\ldots,{\bm{u}}_{*}^{(d)})$ .

5.2 Limiting spectral measure of block-wise $d$ -order tensor contractions

In this section, we characterize the limiting spectral measure of the ensemble $\mathcal{B}_{d}({\bm{\mathsfit{X}}})$ for ${\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{d}}(\mathcal{N}(0,1))$ in the limit when all tensor dimensions grow as per the following assumption.

Assumption 3.

For all $i\in[d]$ , assume that $n_{i}\to\infty$ with $\frac{n_{i}}{\sum_{j}n_{j}}\to c_{i}\in(0,1)$ .

We thus have the following result which characterizes the spectrum of $\frac{1}{\sqrt{N}}\Phi_{d}({\bm{\mathsfit{X}}},{\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)})$ for any deterministic unit norm vectors ${\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)}$ .

Theorem 6.

Let ${\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{d}}(\mathcal{N}(0,1))$ be a sequence of random asymmetric Gaussian tensors and $({\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)})\in{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}$ a sequence of deterministic vectors of increasing dimensions, following Assumption 3. Then the empirical spectral measure of $\frac{1}{\sqrt{N}}\Phi_{d}({\bm{\mathsfit{X}}},{\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)})$ converges weakly almost surely to a deterministic measure $\nu$ whose Stieltjes transform $g(z)$ is defined as the solution to the equation $g(z)=\sum_{i=1}^{d}g_{i}(z)$ such that $\Im[g(z)]>0$ for $\Im[z]>0$ where, for $i\in[d]$ $g_{i}(z)$ satisfies $g_{i}^{2}(z)-(g(z)+z)g_{i}(z)-c_{i}=0$ for $z\in{\mathbb{C}}\setminus\mathcal{S}(\nu)$ .

Proof.

See Appendix B.9. ∎

Algorithm 1 Fixed point equation to compute the Stieltjes transform in Theorem 6

z\in{\mathbb{R}}\setminus\mathcal{S}(\nu)

and tensor dimension ratios

{\bm{c}}=[c_{1},\ldots,c_{d}]^{\top}\in(0,1)^{d}

g\in{\mathbb{R}},\quad{\bm{g}}=[g_{1},\ldots,g_{d}]^{\top}\in{\mathbb{R}}^{d}

Repeat

\quad{\bm{g}}\leftarrow\frac{g+z}{2}-\frac{\sqrt{4{\bm{c}}+(g+z)^{2}}}{2}

\triangleright

Element-wise vector operation.

\quad g\leftarrow\sum_{i=1}^{d}g_{i}

until convergence of

g

Algorithm 1 provides a pseudo-code to compute the Stieltjes transform $g(z)$ in Theorem 6 through an iterative solution to the fixed point equation $g(z)=\sum_{i=1}^{d}g_{i}(z)$ . In particular, as for the order $3$ case, when all the tensor dimensions are equal (i.e., $c_{i}=\frac{1}{d}$ for all $i\in[d]$ ), the spectral measure of $\frac{1}{\sqrt{N}}\Phi_{d}({\bm{\mathsfit{X}}},{\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)})$ converges to a semi-circle law. We have the following corollary of Theorem 6.

Corollary 6.

With the setting of Theorem 6. Under Assumption 3 with $c_{i}=\frac{1}{d}$ for all $i\in[d]$ , the empirical spectral measure of $\frac{1}{\sqrt{N}}\Phi_{d}({\bm{\mathsfit{X}}},{\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)})$ converges weakly almost surely to the semi-circle distribution support $\mathcal{S}(\nu)\equiv\left[-2\sqrt{\frac{d-1}{d}},2\sqrt{\frac{d-1}{d}}\right]$ , whose density and Stieltjes transform write respectively as

\displaystyle\nu(dx)=\frac{d}{2(d-1)\pi}\sqrt{\left(\frac{4(d-1)}{d}-x^{2}\right)^{+}},\quad g(z)\equiv\frac{-zd+d\sqrt{z^{2}-\frac{4(d-1)}{d}}}{2(d-1)},\quad\text{where}\quad z\in{\mathbb{C}}\setminus\mathcal{S}(\nu).

Proof.

See Appendix B.10. ∎

Figure 9 provides an illustration of the convergence in law (when the dimension $n_{i}$ grow large) of the spectrum of $\frac{1}{\sqrt{N}}\Phi_{4}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}},{\bm{d}})$ with a $4$ th-order tensor ${\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{4}}(\mathcal{N}(0,1))$ and ${\bm{a}},{\bm{b}},{\bm{c}},{\bm{d}}$ independently sampled from the unit spheres ${\mathbb{S}}^{n_{1}-1},\ldots,{\mathbb{S}}^{n_{4}-1}$ respectively.

Remark 11.

$\frac{1}{\sqrt{N}}\Phi_{d}({\bm{\mathsfit{X}}},{\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)})$ is almost surely of rank $\sum_{i}\min(n_{i},\sum_{j\neq i}n_{j})$ . As in the order $3$ case, when $\frac{1}{\sqrt{N}}\Phi_{d}({\bm{\mathsfit{X}}},{\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)})$ is almost surely full rank, its spectral measure converges to a semi-circle law with connected support, while in general the limiting spectral measure has unconnected support.

The result of Theorem 6 still holds for $\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})$ where ${\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}$ stand for the singular vectors of ${\bm{\mathsfit{T}}}$ defined through Eq. (26). Indeed, the statistical dependencies between these singular vectors and the noise tensor ${\bm{\mathsfit{X}}}$ do not affect the convergence of the spectrum of $\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})$ to the limiting measure described by Theorem 6. We further need the following assumption.

Assumption 4.

We assume that there exists a sequence of critical points $(\lambda_{*},{\bm{u}}_{*}^{(1)},\ldots,{\bm{u}}_{*}^{(d)})$ satisfying Eq. (26) such that $\lambda_{*}\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}(\beta)$ , $|\langle{\bm{u}}_{*}^{(i)},{\bm{x}}^{(i)}\rangle|\operatorname{\,\xrightarrow{\text{a.s.}}\,}a_{{\bm{x}}^{(i)}}^{\infty}(\beta)$ with $\lambda^{\infty}(\beta)\notin\mathcal{S}(\nu)$ and $a_{{\bm{x}}^{(i)}}^{\infty}(\beta)>0$ .

Theorem 7.

Let ${\bm{\mathsfit{T}}}$ be a sequence of spiked random tensors defined as in Eq. (1). Under Assumptions 3 and 4, the empirical spectral measure of $\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}_{*}^{(1)},\ldots,{\bm{u}}_{*}^{(d)})$ converges weakly almost surely to a deterministic measure $\nu$ whose Stieltjes transform $g(z)$ is defined as the solution to the equation $g(z)=\sum_{i=1}^{d}g_{i}(z)$ such that $\Im[g(z)]>0$ for $\Im[z]>0$ where, for $i\in[d]$ $g_{i}(z)$ satisfies $g_{i}^{2}(z)-(g(z)+z)g_{i}(z)-c_{i}=0$ for $z\in{\mathbb{C}}\setminus\mathcal{S}(\nu)$ .

Proof.

See Appendix B.12. ∎

Figure 10 depicts the spectrum of $\Phi_{4}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(4)})$ for an order $4$ tensor ${\bm{\mathsfit{T}}}$ with $\beta=0$ . As we saw previously, an isolated eigenvalue pops out from the continuous bulk of $\Phi_{4}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(4)})$ because of the statistical dependencies between the tensor noise ${\bm{\mathsfit{X}}}$ and the singular vectors ${\bm{u}}^{(1)},\ldots,{\bm{u}}^{(4)}$ . More generally, for an order- $d$ tensor ${\bm{\mathsfit{T}}}$ , the spectrum of $\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})$ admits an isolated spike at the value $(d-1)\lambda$ independently of the signal strength $\beta$ . Indeed, $(d-1)\lambda$ is an eigenvalue of $\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})$ with corresponding eigenvector the concatenation of the singular vectors ${\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}$ , i.e.,

\displaystyle\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})\begin{bmatrix}{\bm{u}}^{(1)}\\ \vdots\\ {\bm{u}}^{(d)}\end{bmatrix}=\begin{bmatrix}\sum_{i\neq 1}{\bm{\mathsfit{T}}}(:,{\bm{u}}^{(2)},\ldots,{\bm{u}}^{(i-1)},:,{\bm{u}}^{(i+1)},\ldots,{\bm{u}}^{(d)}){\bm{u}}^{(i)}\\ \vdots\\ \sum_{i\neq d}{\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(i-1)},:,{\bm{u}}^{(i+1)},\ldots,{\bm{u}}^{(d-1)},:)^{\top}{\bm{u}}^{(i)}\end{bmatrix}=(d-1)\lambda\begin{bmatrix}{\bm{u}}^{(1)}\\ \vdots\\ {\bm{u}}^{(d)}\end{bmatrix}.

(32)

5.3 Asymptotic singular value and alignments of hyper-rectangular tensors

Similarly to the $3$ -order case studied previously, when the dimensions of ${\bm{\mathsfit{T}}}$ grow large at a same rate, its singular value $\lambda$ and the corresponding alignments $\langle{\bm{u}}^{(i)},{\bm{x}}^{(i)}\rangle$ for $i\in[d]$ concentrate almost surely around some deterministic quantities which we denote $\lambda^{\infty}(\beta)\equiv\lim_{N\to\infty}\lambda$ and $a_{{\bm{x}}^{(i)}}^{\infty}(\beta)\equiv\lim_{N\to\infty}|\langle{\bm{u}}^{(i)},{\bm{x}}^{(i)}\rangle|$ respectively. Applying again Stein’s Lemma (Lemma 1) to the identities in Eq. (26), we obtain the following theorem which characterizes the aforementioned deterministic limits.

Theorem 8.

Recall the notations in Theorem 7. Under Assumptions 3 and 4, for $d\geq 3$ , there exists $\beta_{s}>0$ such that for $\beta>\beta_{s}$ ,

\displaystyle\begin{cases}\lambda\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}(\beta),\\ \left|\langle{\bm{x}}^{(i)},{\bm{u}}^{(i)}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}q_{i}\left(\lambda^{\infty}(\beta)\right),\end{cases}

where $q_{i}(z)$ is given by $q_{i}(z)=\sqrt{1-\frac{g_{i}^{2}(z)}{c_{i}}}$ and $\lambda^{\infty}(\beta)$ satisfies $f(\lambda^{\infty}(\beta),\beta)=0$ with $f(z,\beta)=z+g(z)-\beta\prod_{i=1}^{d}q_{i}(z)$ . Besides, for $\beta\in[0,\beta_{s}]$ , $\lambda^{\infty}$ is bounded (in particular when $c_{i}=\frac{1}{d}$ for all $i\in[d]$ , $\lambda\operatorname{\,\xrightarrow{\text{a.s.}}\,}\lambda^{\infty}\leq 2\sqrt{\frac{d-1}{d}}$ ) and $\left|\langle{\bm{x}}^{(i)},{\bm{u}}^{(i)}\rangle\right|\operatorname{\,\xrightarrow{\text{a.s.}}\,}0$ .

Proof.

See Appendix B.13. ∎

Remark 12.

As in the order $3$ case, note that the inverse formula expressing $\beta$ in terms of $\lambda^{\infty}$ is explicit. Specifically, we have $\beta(\lambda^{\infty})=\frac{\lambda^{\infty}+g(\lambda^{\infty})}{\prod_{i=1}^{d}q_{i}(\lambda^{\infty})}$ . In particular, this inverse formula provides an estimator for the SNR $\beta$ given the largest singular value $\lambda$ of ${\bm{\mathsfit{T}}}$ . Algorithm 2 provides a pseudo-code to compute the asymptotic alignments.

Algorithm 2 Compute alignments as per Theorem 8

SNR

\beta\in{\mathbb{R}}_{+}

and tensor dimension ratios

{\bm{c}}=[c_{1},\ldots,c_{d}]^{\top}\in(0,1)^{d}

Asymptotic singular value

\lambda^{\infty}

and corresponding alignments

{\bm{a}}=\left[a_{{\bm{x}}^{(1)}}^{\infty},\ldots,a_{{\bm{x}}^{(d)}}^{\infty}\right]^{\top}\in[0,1]^{d}

Set

\lambda^{\infty}

as the solution of

f(z,\beta)=0

z

Set the alignments as

a_{{\bm{x}}^{(i)}}^{\infty}\leftarrow\sqrt{1-\frac{g_{i}^{2}(\lambda^{\infty})}{c_{i}}}

with

g_{i}(z)

computed by Algorithm 1 for

z=\lambda^{\infty}

Figure 11 depicts the asymptotic singular value and alignments for an order $4$ tensor as per Theorem 8. As discussed previously, the predicted alignments are discontinuous and a strictly positive correlation between the singular vectors and the underlying signals is possible for the considered local optimum of the ML estimator only above the minimal signal strength $\beta_{s}\approx 1.234$ in the shown example. In the case of hyper-cubic tensors, i.e., all the dimensions $n_{i}$ are equal, Figure 12 depicts the minimal SNR values in terms of the tensor order $d$ and the corresponding asymptotic singular value and alignments. As such, when the tensor order increases the minimal SNR and singular value converge respectively to $\approx 1.6$ and $2$ , while the corresponding alignment¹²¹²12corresponding to the minimal theoretical SNR $\beta_{s}$ . gets closer to $1$ . The expressions of $\beta_{s}$ and $a^{\infty}(\beta_{s})$ for hyper-cubic tensors of order $d$ are explicitly given as

\displaystyle\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\beta_{s}=\sqrt{\frac{d-1}{d}}\left(\frac{d-2}{d-1}\right)^{1-\frac{d}{2}},\quad\lim_{\beta\to\beta_{s}}a^{\infty}(\beta)=\sqrt{\frac{d-2}{d-1}}.

(33)

6 Generalization to rank $r$ tensor with orthogonal components

In this section we discuss the generalization of our previous findings to rank $r$ spiked tensor model with $r>1$ of the form

\displaystyle{\bm{\mathsfit{T}}}=\sum_{\ell=1}^{r}\beta_{\ell}\,{\bm{x}}^{(1)}_{\ell}\otimes\cdots\otimes{\bm{x}}^{(d)}_{\ell}+\frac{1}{\sqrt{N}}{\bm{\mathsfit{X}}},

(34)

where $\beta_{1}>\cdots>\beta_{r}$ are the signal strengths, $({\bm{x}}^{(1)}_{\ell},\ldots,{\bm{x}}^{(d)}_{\ell})\in{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}$ for $\ell\in[r]$ , ${\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{d}}(\mathcal{N}(0,1))$ and $N=\sum_{i=1}^{d}n_{i}$ . Supposing that the components ${\bm{x}}^{(i)}_{1},\ldots,{\bm{x}}^{(i)}_{r}$ are mutually orthogonal, i.e., $\langle{\bm{x}}^{(i)}_{\ell},{\bm{x}}^{(i)}_{\ell^{\prime}}\rangle=0$ for $\ell\neq\ell^{\prime}$ , the above $r$ -rank spiked tensor model can be treated through equivalent rank-one tensors defined for each component $\beta_{\ell}\,{\bm{x}}^{(1)}_{\ell}\otimes\cdots\otimes{\bm{x}}^{(d)}_{\ell}$ independently, by applying the result of the rank-one case established in the previous section. Precisely, the best rank- $r$ approximation of ${\bm{\mathsfit{T}}}$ corresponds to

\displaystyle\operatorname*{arg\,min}_{\lambda_{\ell}\,\in\,{\mathbb{R}},\,({\bm{u}}^{(1)}_{\ell},\ldots,{\bm{u}}^{(d)}_{\ell})\,\in\,{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}}\|{\bm{\mathsfit{T}}}-\sum_{\ell=1}^{r}\lambda_{\ell}\,{\bm{u}}^{(1)}_{\ell}\otimes\cdots\otimes{\bm{u}}^{(d)}_{\ell}\|_{\text{F}}^{2},

(35)

and the ${\bm{u}}^{(i)}_{\ell}$ ’s correlate with ${\bm{x}}^{(i)}_{\ell}$ ’s as a result of the uniqueness of orthogonal tensor decomposition (see Theorem 4.1 in [1]). Therefore, the study of ${\bm{\mathsfit{T}}}$ boils down to the study of the random matrices $\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)}_{\ell},\ldots,{\bm{u}}^{(d)}_{\ell})$ for $\ell\in[r]$ which behave as the rank-one case treated previously with signal strength $\beta_{\ell}$ respectively. Indeed, since ${\bm{u}}^{(i)}_{\ell}\in{\mathbb{S}}^{n_{i}-1}$ by definition and given the orthogonality condition on the ${\bm{x}}^{(i)}_{\ell}$ ’s, for $\ell^{\prime}\neq\ell$ the inner product $\langle{\bm{u}}^{(i)}_{\ell},{\bm{x}}^{(i)}_{\ell^{\prime}}\rangle\operatorname{\,\xrightarrow{\text{a.s.}}\,}0$ in high dimension, as such

\displaystyle\|\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)}_{\ell},\ldots,{\bm{u}}^{(d)}_{\ell})-\Phi_{d}({\bm{\mathsfit{T}}}_{\ell},{\bm{u}}^{(1)}_{\ell},\ldots,{\bm{u}}^{(d)}_{\ell})\|\operatorname{\,\xrightarrow{\text{a.s.}}\,}0,

(36)

where ${\bm{\mathsfit{T}}}_{\ell}\equiv\beta_{\ell}\,{\bm{x}}^{(1)}_{\ell}\otimes\cdots\otimes{\bm{x}}^{(d)}_{\ell}+\frac{1}{\sqrt{N}}{\bm{\mathsfit{X}}}$ . Meaning, that the study of $\Phi_{d}({\bm{\mathsfit{T}}},{\bm{u}}^{(1)}_{\ell},\ldots,{\bm{u}}^{(d)}_{\ell})$ is equivalent to consider the rank-one spiked tensor model ${\bm{\mathsfit{T}}}_{\ell}$ .

7 Discussion

In this work, we characterized the asymptotic behavior of spiked asymmetric tensors by mapping them to equivalent (in spectral sense) random matrices. Our starting point is mainly the identities in Eq. (26) which are verified by all critical points of the ML problem. Quite surprisingly and as also discussed in [8] for symmetric tensors, we found that our asymptotic equations describe precisely the maximum of the ML problem which correlates with the true spike. Extrapolating the findings from [12, 8] in the symmetric case, we conjuncture the existence of an order one threshold $\beta_{c}$ above which our equations describe the behavior of the global maximum of the ML problem. Unfortunately, it is still unclear how we can characterize such $\beta_{c}$ with our present approach, which remains an open question. The same question concerns the characterization of the algorithmic threshold $\beta_{a}$ which is more interesting from a practical standpoint as computing the ML solution is NP-hard for $\beta<\beta_{a}$ .

In the present work, our results were derived under a Gaussian assumption on the tensor noise components. We believe that the derived formulas are universal in the sense that they extend to other distributions provided that the fourth order moment is finite (as assumed by [3] for long random matrices). Other extensions concern the generalization to higher-ranks with arbitrary components and possibly correlated noise components since the present RMT-tools are more flexible than the use of tools form statistical physics.

{acks}

[Acknowledgments] This work was supported by the MIAILargeDATA Chair at University Grenoble Alpes led by R. Couillet and the UGA-HUAWEI LarDist project led by M. Guillaud. We would like to thank Henrique Goulart, Pierre Common and Gérard Ben-Aroud for valuable discussions on the topic of random tensors.

Appendix A Simulations

In this section we provide simulations to support our findings.

A.1 Matrix case

We start by considering the spiked random matrix model of the form

\displaystyle{\bm{M}}=\beta{\bm{x}}{\bm{y}}^{\top}+\frac{1}{\sqrt{m+n}}{\bm{X}},\quad\text{with}\quad{\bm{X}}\sim{\mathbb{M}}_{m,n}(\mathcal{N}(0,1)),

(37)

and ${\bm{x}},{\bm{y}}$ are unitary vectors of dimensions $m$ and $n$ respectively. Figure 13 depicts the spectrum of $\begin{bmatrix}{\bm{0}}_{m\times m}&{\bm{M}}\\ {\bm{M}}^{\top}&{\bm{0}}_{n\times n}\end{bmatrix}$ for $\beta=0$ for different values of the dimensions $m,n$ , and the predicted limiting spectral measure as per Corollary 4. Figure 14 shows a comparison between the asymptotic singular value and alignments obtained in Corollary 5 and their simulated counterparts through SVD applied on ${\bm{M}}$ . Figure 15 further provides comparison between theory and simulations in the case of long random matrices ( $m=200,n=m^{\frac{3}{2}}$ ), which corresponds to tensor unfolding as per Remark 9, where a perfect matching is also observed between theory and simulations.

A.2 Order $3$ tensors

Now we consider a $3$ -order random tensor model of the form

\displaystyle{\bm{\mathsfit{T}}}=\beta{\bm{x}}\otimes{\bm{y}}\otimes{\bm{z}}+\frac{1}{\sqrt{m+n+p}}{\bm{\mathsfit{X}}},\quad\text{with}\quad{\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{m,n,p}(\mathcal{N}(0,1)),

(38)

and ${\bm{x}},{\bm{y}},{\bm{z}}$ are unitary vectors of dimensions $m,n$ and $p$ respectively. In our simulations the estimation of the singular vectors ${\bm{u}},{\bm{v}},{\bm{w}}$ of ${\bm{\mathsfit{T}}}$ is performed using the power iteration method described by Algorithm 3. We consider three initialization strategies for Algorithm 3:

(i)

Random initialization by randomly sampling ${\bm{u}}_{0},{\bm{v}}_{0},{\bm{w}}_{0}$ from the unitary spheres ${\mathbb{S}}^{m-1},{\mathbb{S}}^{n-1}$ and ${\mathbb{S}}^{p-1}$ respectively. We refer to this strategy in the figures legends by “Random init.”.
(ii)

Initialization with the true components ${\bm{x}},{\bm{y}},{\bm{z}}$ . We refer to this strategy in the figures legends by “Init. with ${\bm{x}},{\bm{y}},{\bm{z}}$ ”.
(iii)

We start by running Algorithm 3 with strategy (i) for a high value of SNR $\beta\gg 1$ then progressively diminishing $\beta$ and initializing Algorithm 3 with the components obtained for the precedent value of $\beta$ . We refer to this strategy in the figures legends by “Init. with ${\bm{u}},{\bm{v}},{\bm{w}}$ ”.

Algorithm 3 Tensor power method [1]

An order

d

tensor

{\bm{\mathsfit{T}}}\in{\mathbb{T}}_{n_{1},\ldots,n_{d}}

and initial components

({\bm{u}}^{(1)}_{0},\ldots,{\bm{u}}^{(d)}_{0})\in{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}

Estimate of

\operatorname*{arg\,min}_{\lambda\,\in\,{\mathbb{R}},\,({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})\,\in\,{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}}\|{\bm{\mathsfit{T}}}-\lambda\,{\bm{u}}^{(1)}\otimes\cdots\otimes{\bm{u}}^{(d)}\|_{\text{F}}^{2}

Repeat

for

i\in[d]

{\bm{u}}^{(i)}\leftarrow\frac{{\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(i-1)},:,{\bm{u}}^{(i+1)},\ldots,{\bm{u}}^{(d)})}{\|{\bm{\mathsfit{T}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(i-1)},:,{\bm{u}}^{(i+1)},\ldots,{\bm{u}}^{(d)})\|}

\triangleright

Contract

{\bm{\mathsfit{T}}}

on all the

{\bm{u}}^{(j)}

’s for

j\neq i

end for

until convergence of the

{\bm{u}}^{(i)}

’s

Figure 16 provides a comparison between the asymptotic singular value and alignments of ${\bm{\mathsfit{T}}}$ (obtained by Corollary 3) with their simulation counterparts where the singular vectors are estimated by Algorithm 3 with the initialization strategies (i) in yellow dots and (ii) in green dots. As we can see, as the tensor dimensions grow, the numerical estimates approach their asymptotic counterparts for (ii). Besides, the random initialization (i) yields poor convergence for $\beta$ around its minimal value $\beta_{s}$ when the tensor dimension is large enough (see $m=n=p=150$ ). This phenomenon is related to the algorithmic time complexity of the power method which is known to succeed in recovering the underlying signal in polynomial time provided that $\beta\gtrsim n^{\frac{d-2}{4}}$ [17, 6], which is also noticeable in our simulations (the algorithmic phase transition seems to grow with the tensor dimensions). Figure 17 further depicts comparison between theory and simulations when Algorithm 3 is initialized following strategy (iii) which allows to follow the trajectory of the global maximum of the ML problem, where we can notice a good matching between the asymptotic curves and their simulated counterparts.

Appendix B Proofs

B.1 Derivative of tensor singular value and vectors

Deriving the identities in Eq. (11) and Eq. (12) w.r.t. the entry $X_{ijk}$ of the tensor noise ${\bm{\mathsfit{X}}}$ , we obtain the following set of equations

\displaystyle\begin{cases}{\bm{\mathsfit{T}}}({\bm{w}})\frac{\partial{\bm{v}}}{\partial X_{ijk}}+{\bm{\mathsfit{T}}}({\bm{v}})\frac{\partial{\bm{w}}}{\partial X_{ijk}}+\frac{1}{\sqrt{N}}v_{j}w_{k}{\bm{e}}_{i}^{m}=\frac{\partial\lambda}{\partial X_{ijk}}{\bm{u}}+\lambda\frac{\partial{\bm{u}}}{\partial X_{ijk}},\\ {\bm{\mathsfit{T}}}({\bm{w}})^{\top}\frac{\partial{\bm{u}}}{\partial X_{ijk}}+{\bm{\mathsfit{T}}}({\bm{u}})\frac{\partial{\bm{w}}}{\partial X_{ijk}}+\frac{1}{\sqrt{N}}u_{i}w_{k}{\bm{e}}_{j}^{n}=\frac{\partial\lambda}{\partial X_{ijk}}{\bm{v}}+\lambda\frac{\partial{\bm{v}}}{\partial X_{ijk}},\\ {\bm{\mathsfit{T}}}({\bm{v}})^{\top}\frac{\partial{\bm{u}}}{\partial X_{ijk}}+{\bm{\mathsfit{T}}}({\bm{u}})^{\top}\frac{\partial{\bm{v}}}{\partial X_{ijk}}+\frac{1}{\sqrt{N}}u_{i}v_{j}{\bm{e}}_{k}^{p}=\frac{\partial\lambda}{\partial X_{ijk}}{\bm{w}}+\lambda\frac{\partial{\bm{w}}}{\partial X_{ijk}},\\ \frac{\partial\lambda}{\partial X_{ijk}}={\bm{\mathsfit{T}}}\left(\frac{\partial{\bm{u}}}{\partial X_{ijk}},{\bm{v}},{\bm{w}}\right)+{\bm{\mathsfit{T}}}\left({\bm{u}},\frac{\partial{\bm{v}}}{\partial X_{ijk}},{\bm{w}}\right)+{\bm{\mathsfit{T}}}\left({\bm{u}},{\bm{v}},\frac{\partial{\bm{w}}}{\partial X_{ijk}}\right)+\frac{1}{\sqrt{N}}u_{i}v_{j}w_{k}.\end{cases}

Writing ${\bm{\mathsfit{T}}}\left(\frac{\partial{\bm{u}}}{\partial X_{ijk}},{\bm{v}},{\bm{w}}\right)$ as ${\bm{\mathsfit{T}}}\left(\frac{\partial{\bm{u}}}{\partial X_{ijk}},{\bm{v}},{\bm{w}}\right)=\left(\frac{\partial{\bm{u}}}{\partial X_{ijk}}\right)^{\top}{\bm{\mathsfit{T}}}({\bm{v}}){\bm{w}}$ , we can apply again the identities in Eq. (11) which results in ${\bm{\mathsfit{T}}}\left(\frac{\partial{\bm{u}}}{\partial X_{ijk}},{\bm{v}},{\bm{w}}\right)=\lambda\left(\frac{\partial{\bm{u}}}{\partial X_{ijk}}\right)^{\top}{\bm{u}}$ . Doing similarly with ${\bm{\mathsfit{T}}}\left({\bm{u}},\frac{\partial{\bm{v}}}{\partial X_{ijk}},{\bm{w}}\right)$ and ${\bm{\mathsfit{T}}}\left({\bm{u}},{\bm{v}},\frac{\partial{\bm{w}}}{\partial X_{ijk}}\right)$ , we have

\displaystyle\frac{\partial\lambda}{\partial X_{ijk}}=\lambda\left(\left(\frac{\partial{\bm{u}}}{\partial X_{ijk}}\right)^{\top}{\bm{u}}+\left(\frac{\partial{\bm{v}}}{\partial X_{ijk}}\right)^{\top}{\bm{v}}+\left(\frac{\partial{\bm{w}}}{\partial X_{ijk}}\right)^{\top}{\bm{w}}\right)+\frac{1}{\sqrt{N}}u_{i}v_{j}w_{k}.

Furthermore, since ${\bm{u}}^{\top}{\bm{u}}={\bm{v}}^{\top}{\bm{v}}={\bm{w}}^{\top}{\bm{w}}=1$ , we have

\displaystyle\left(\frac{\partial{\bm{u}}}{\partial X_{ijk}}\right)^{\top}{\bm{u}}=\left(\frac{\partial{\bm{v}}}{\partial X_{ijk}}\right)^{\top}{\bm{v}}=\left(\frac{\partial{\bm{w}}}{\partial X_{ijk}}\right)^{\top}{\bm{w}}=0.

Thus the derivative of $\lambda$ writes simply as

\displaystyle\frac{\partial\lambda}{\partial X_{ijk}}=\frac{1}{\sqrt{N}}u_{i}v_{j}w_{k}.

Hence, we find that

\displaystyle\lambda\begin{bmatrix}\frac{\partial{\bm{u}}}{\partial X_{ijk}}\\ \frac{\partial{\bm{v}}}{\partial X_{ijk}}\\ \frac{\partial{\bm{w}}}{\partial X_{ijk}}\end{bmatrix}=\frac{1}{\sqrt{N}}\begin{bmatrix}v_{j}w_{k}({\bm{e}}_{i}^{m}-u_{i}{\bm{u}})\\ u_{i}w_{k}({\bm{e}}_{j}^{n}-v_{j}{\bm{v}})\\ u_{i}v_{j}({\bm{e}}_{k}^{p}-w_{k}{\bm{w}})\end{bmatrix}+\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})\begin{bmatrix}\frac{\partial{\bm{u}}}{\partial X_{ijk}}\\ \frac{\partial{\bm{v}}}{\partial X_{ijk}}\\ \frac{\partial{\bm{w}}}{\partial X_{ijk}}\end{bmatrix}.

Yielding the expression in Eq. (14). The same calculations apply to the more general $d$ -order tensor case yielding the identity in Eq. (28).

B.2 Proof of Theorem 3

Denote the matrix model as

\displaystyle{\bm{N}}\equiv\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{a}},{\bm{b}},{\bm{c}}),

where we recall ${\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{m,n,p}(\mathcal{N}(0,1))$ and $({\bm{a}},{\bm{b}},{\bm{c}})\in{\mathbb{S}}^{m-1}\times{\mathbb{S}}^{m-1}\times{\mathbb{S}}^{p-1}$ are independent of ${\bm{\mathsfit{X}}}$ . We further denote the resolvent matrix of ${\bm{N}}$ as

\displaystyle{\bm{Q}}(z)\equiv\left({\bm{N}}-z{\bm{I}}_{N}\right)^{-1}=\begin{bmatrix}{\bm{Q}}^{11}(z)&{\bm{Q}}^{12}(z)&{\bm{Q}}^{13}(z)\\ {\bm{Q}}^{12}(z)^{\top}&{\bm{Q}}^{22}(z)&{\bm{Q}}^{23}(z)\\ {\bm{Q}}^{13}(z)^{\top}&{\bm{Q}}^{23}(z)^{\top}&{\bm{Q}}^{33}(z)\end{bmatrix}.

In order to characterize the limiting Stieltjes transform $g(z)$ of ${\bm{N}}$ , we need to estimate the quantity $\frac{1}{N}\operatorname{tr}{\bm{Q}}(z)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g(z)$ (as a consequence of Theorem 2). We further introduce the following limits

\displaystyle\frac{1}{N}\operatorname{tr}{\bm{Q}}^{11}(z)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g_{1}(z),\quad\frac{1}{N}\operatorname{tr}{\bm{Q}}^{22}(z)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g_{2}(z),\quad\frac{1}{N}\operatorname{tr}{\bm{Q}}^{33}(z)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g_{3}(z).

From the identity in Eq. (7), we have

\displaystyle{\bm{N}}{\bm{Q}}(z)-z{\bm{Q}}(z)={\bm{I}}_{N},

from which we particularly have

\displaystyle\frac{1}{\sqrt{N}}\left[{\bm{\mathsfit{X}}}({\bm{c}}){\bm{Q}}^{12}(z)^{\top}\right]_{ii}+\frac{1}{\sqrt{N}}\left[{\bm{\mathsfit{X}}}({\bm{b}}){\bm{Q}}^{13}(z)^{\top}\right]_{ii}-zQ^{11}_{ii}(z)=1,

\displaystyle\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{c}}){\bm{Q}}^{12}(z)^{\top}\right]_{ii}+\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{b}}){\bm{Q}}^{13}(z)^{\top}\right]_{ii}-\frac{z}{N}\operatorname{tr}{\bm{Q}}^{11}(z)=\frac{m}{N}.

(39)

We thus need to compute the expectations of $\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{c}}){\bm{Q}}^{12}(z)^{\top}\right]_{ii}$ and $\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{b}}){\bm{Q}}^{13}(z)^{\top}\right]_{ii}$ , which develop as

\displaystyle\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\operatorname{\mathbb{E}}\left[{\bm{\mathsfit{X}}}({\bm{c}}){\bm{Q}}^{12}(z)^{\top}\right]_{ii}=\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\sum_{j=1}^{n}\sum_{k=1}^{p}c_{k}\operatorname{\mathbb{E}}\left[X_{ijk}Q^{12}_{ij}\right]=\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\sum_{j=1}^{n}\sum_{k=1}^{p}c_{k}\operatorname{\mathbb{E}}\left[\frac{\partial Q^{12}_{ij}}{\partial X_{ijk}}\right],

where the last equality is obtained by applying Stein’s lemma (Lemma 1). For continuing the derivations, we need to express the derivative of the resolvent ${\bm{Q}}(z)$ with respect to an entry $X_{ijk}$ of the tensor noise ${\bm{\mathsfit{X}}}$ . Indeed, since ${\bm{N}}{\bm{Q}}(z)-z{\bm{Q}}(z)={\bm{I}}_{N}$ , we have

\displaystyle\frac{\partial{\bm{N}}}{\partial X_{ijk}}{\bm{Q}}(z)+{\bm{N}}\frac{\partial{\bm{Q}}(z)}{\partial X_{ijk}}-z\frac{\partial{\bm{Q}}(z)}{\partial X_{ijk}}={\bm{0}}_{N\times N}\quad\Rightarrow\quad({\bm{N}}-z{\bm{I}}_{N})\frac{\partial{\bm{Q}}(z)}{\partial X_{ijk}}=-\frac{\partial{\bm{N}}}{\partial X_{ijk}}{\bm{Q}}(z),

from which we get

\displaystyle\frac{\partial{\bm{Q}}(z)}{\partial X_{ijk}}=-{\bm{Q}}(z)\frac{\partial{\bm{N}}}{\partial X_{ijk}}{\bm{Q}}(z),

where

\displaystyle\frac{\partial{\bm{N}}}{\partial X_{ijk}}=\frac{1}{\sqrt{N}}\begin{bmatrix}{\bm{0}}_{m\times m}&c_{k}{\bm{e}}_{i}^{m}({\bm{e}}_{j}^{n})^{\top}&b_{j}{\bm{e}}_{i}^{m}({\bm{e}}_{k}^{p})^{\top}\\ c_{k}{\bm{e}}_{j}^{n}({\bm{e}}_{i}^{m})^{\top}&{\bm{0}}_{n\times n}&a_{i}{\bm{e}}_{j}^{n}({\bm{e}}_{k}^{p})^{\top}\\ b_{j}{\bm{e}}_{k}^{p}({\bm{e}}_{i}^{m})^{\top}&a_{i}{\bm{e}}_{k}^{p}({\bm{e}}_{j}^{n})^{\top}&{\bm{0}}_{p\times p}\end{bmatrix},

and we finally obtain the following derivatives

	$\displaystyle\frac{\partial Q_{ab}^{11}}{\partial X_{ijk}}$	$\displaystyle=-\frac{1}{\sqrt{N}}\left[a_{i}(Q_{aj}^{12}Q_{bk}^{13}+Q_{ak}^{13}Q_{bj}^{12})+b_{j}(Q_{ai}^{11}Q_{bk}^{13}+Q_{ak}^{13}Q_{ib}^{11})+c_{k}(Q_{ai}^{11}Q_{bj}^{12}+Q_{aj}^{12}Q_{ib}^{11})\right],$
	$\displaystyle\frac{\partial Q_{ab}^{12}}{\partial X_{ijk}}$	$\displaystyle=-\frac{1}{\sqrt{N}}\left[a_{i}(Q_{aj}^{12}Q_{bk}^{23}+Q_{ak}^{13}Q_{jb}^{22})+b_{j}(Q_{ai}^{11}Q_{bk}^{23}+Q_{ak}^{13}Q_{ib}^{12})+c_{k}(Q_{ai}^{11}Q_{jb}^{22}+Q_{aj}^{12}Q_{ib}^{12})\right],$
	$\displaystyle\frac{\partial Q_{ab}^{13}}{\partial X_{ijk}}$	$\displaystyle=-\frac{1}{\sqrt{N}}\left[a_{i}(Q_{aj}^{12}Q_{kb}^{33}+Q_{ak}^{13}Q_{jb}^{23})+b_{j}(Q_{ai}^{11}Q_{kb}^{33}+Q_{ak}^{13}Q_{ib}^{13})+c_{k}(Q_{ai}^{11}Q_{jb}^{23}+Q_{aj}^{12}Q_{ib}^{13})\right],$
	$\displaystyle\frac{\partial Q_{ab}^{22}}{\partial X_{ijk}}$	$\displaystyle=-\frac{1}{\sqrt{N}}\left[a_{i}(Q_{aj}^{22}Q_{bk}^{23}+Q_{ak}^{23}Q_{jb}^{22})+b_{j}(Q_{ia}^{12}Q_{bk}^{23}+Q_{ak}^{23}Q_{ib}^{12})+c_{k}(Q_{ia}^{12}Q_{jb}^{22}+Q_{aj}^{22}Q_{ib}^{12})\right],$
	$\displaystyle\frac{\partial Q_{ab}^{23}}{\partial X_{ijk}}$	$\displaystyle=-\frac{1}{\sqrt{N}}\left[a_{i}(Q_{aj}^{22}Q_{kb}^{33}+Q_{ak}^{23}Q_{jb}^{23})+b_{j}(Q_{ia}^{12}Q_{kb}^{33}+Q_{ak}^{23}Q_{ib}^{13})+c_{k}(Q_{ia}^{12}Q_{jb}^{23}+Q_{aj}^{22}Q_{ib}^{13})\right],$
	$\displaystyle\frac{\partial Q_{ab}^{33}}{\partial X_{ijk}}$	$\displaystyle=-\frac{1}{\sqrt{N}}\left[a_{i}(Q_{ja}^{23}Q_{kb}^{33}+Q_{ak}^{33}Q_{jb}^{23})+b_{j}(Q_{ia}^{13}Q_{kb}^{33}+Q_{ak}^{33}Q_{ib}^{13})+c_{k}(Q_{ia}^{13}Q_{jb}^{23}+Q_{ja}^{23}Q_{ib}^{13})\right].$

In particular,

\displaystyle\frac{\partial Q_{ij}^{12}}{\partial X_{ijk}}

\displaystyle=-\frac{1}{\sqrt{N}}\left[a_{i}(Q_{ij}^{12}Q_{jk}^{23}+Q_{ik}^{13}Q_{jj}^{22})+b_{j}(Q_{ii}^{11}Q_{jk}^{23}+Q_{ik}^{13}Q_{ij}^{12})+c_{k}(Q_{ii}^{11}Q_{jj}^{22}+Q_{ij}^{12}Q_{ij}^{12})\right].

Going back to the computation of $A\equiv\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\sum_{j=1}^{n}\sum_{k=1}^{p}c_{k}\operatorname{\mathbb{E}}\left[\frac{\partial Q^{12}_{ij}}{\partial X_{ijk}}\right]$ , we will see that the only contributing term in the derivative $\frac{\partial Q_{ij}^{12}}{\partial X_{ijk}}$ is $-\frac{1}{\sqrt{N}}c_{k}Q_{ii}^{11}Q_{jj}^{22}$ . Indeed,

	$\displaystyle A=-\frac{1}{N^{2}}\sum_{ijk}c_{k}\operatorname{\mathbb{E}}\left[a_{i}(Q_{ij}^{12}Q_{jk}^{23}+Q_{ik}^{13}Q_{jj}^{22})+b_{j}(Q_{ii}^{11}Q_{jk}^{23}+Q_{ik}^{13}Q_{ij}^{12})+c_{k}(Q_{ii}^{11}Q_{jj}^{22}+Q_{ij}^{12}Q_{ij}^{12})\right],$
	$\displaystyle=-\frac{1}{N^{2}}\operatorname{\mathbb{E}}\left[{\bm{a}}^{\top}{\bm{Q}}^{12}{\bm{Q}}^{23}{\bm{c}}+{\bm{a}}^{\top}{\bm{Q}}^{13}{\bm{c}}\operatorname{tr}{\bm{Q}}^{22}+\operatorname{tr}{\bm{Q}}^{11}{\bm{b}}^{\top}{\bm{Q}}^{23}{\bm{c}}+{\bm{b}}^{\top}({\bm{Q}}^{12})^{\top}{\bm{Q}}^{13}{\bm{c}}+\operatorname{tr}{\bm{Q}}^{11}\operatorname{tr}{\bm{Q}}^{22}+\operatorname{tr}({\bm{Q}}^{12}({\bm{Q}}^{12})^{\top})\right].$

Now, since the vectors ${\bm{a}},{\bm{b}},{\bm{c}}$ are of bounded norms and assuming ${\bm{Q}}(z)$ is of bounded spectral norm (see condition in Eq. (8)), under Assumption 1 (as $N\to\infty$ ), the terms $\frac{1}{N^{2}}{\bm{a}}^{\top}{\bm{Q}}^{12}{\bm{Q}}^{23}{\bm{c}}$ , $\frac{1}{N^{2}}{\bm{a}}^{\top}{\bm{Q}}^{13}{\bm{c}}\operatorname{tr}{\bm{Q}}^{22}$ , $\frac{1}{N^{2}}\operatorname{tr}{\bm{Q}}^{11}{\bm{b}}^{\top}{\bm{Q}}^{23}{\bm{c}}$ , $\frac{1}{N^{2}}{\bm{b}}^{\top}({\bm{Q}}^{12})^{\top}{\bm{Q}}^{13}{\bm{c}}$ and $\frac{1}{N^{2}}\operatorname{tr}({\bm{Q}}^{12}({\bm{Q}}^{12})^{\top})$ are vanishing almost surely. As such, we find that

\displaystyle\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{c}}){\bm{Q}}^{12}(z)^{\top}\right]_{ii}=-\frac{1}{N}\operatorname{tr}{\bm{Q}}^{11}(z)\frac{1}{N}\operatorname{tr}{\bm{Q}}^{22}(z)+\mathcal{O}(N^{-1})\operatorname{\,\xrightarrow{\text{a.s.}}\,}-g_{1}(z)g_{2}(z)+\mathcal{O}(N^{-1}).

Similarly, we find that

\displaystyle\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{b}}){\bm{Q}}^{13}(z)^{\top}\right]_{ii}=-\frac{1}{N}\operatorname{tr}{\bm{Q}}^{11}(z)\frac{1}{N}\operatorname{tr}{\bm{Q}}^{33}(z)+\mathcal{O}(N^{-1})\operatorname{\,\xrightarrow{\text{a.s.}}\,}-g_{1}(z)g_{3}(z)+\mathcal{O}(N^{-1}).

From Eq. (39), $g_{1}(z)$ satisfies

\displaystyle-g_{1}(z)\left(g_{2}(z)+g_{3}(z)\right)-zg_{1}(z)=c_{1},

where we recall $c_{1}=\lim\frac{m}{N}$ . Similarly, $g_{2}(z)$ and $g_{3}(z)$ satisfy

\displaystyle\begin{cases}-g_{2}(z)(g_{1}(z)+g_{3}(z))-zg_{2}(z)=c_{2},\\ -g_{3}(z)(g_{1}(z)+g_{2}(z))-zg_{3}(z)=c_{3},\end{cases}

where we recall again $c_{2}=\lim\frac{n}{N}$ and $c_{3}=\lim\frac{p}{N}$ . Moreover, by definition, $g(z)=\sum_{i=1}^{3}g_{i}(z)$ , thus we have for each $i\in[3]$

\displaystyle g_{i}(z)(g(z)-g_{i}(z))+zg_{i}(z)+c_{i}=0,

yielding

\displaystyle g_{i}(z)=\frac{g(z)+z}{2}-\frac{\sqrt{4c_{i}+(g(z)+z)^{2}}}{2},

with $g(z)$ solution of the equation $g(z)=\sum_{i=1}^{3}g_{i}(z)$ satisfying $\Im[g(z)]>0$ for $z\in{\mathbb{C}}$ with $\Im[z]>0$ (see Property 2 of the Stieltjes transform in Subsection 2.2).

B.3 Proof of Corollary 1

Given the result of Theorem 3, setting $c_{1}=c_{2}=c_{3}=\frac{1}{3}$ , we have for all $i\in[3]$

\displaystyle g_{i}(z)=\frac{g(z)+z}{2}-\frac{\sqrt{\frac{4}{3}+(g(z)+z)^{2}}}{2},

where $g(z)$ satisfies $g(z)=\sum_{i=1}^{3}g_{i}(z)$ , thus $g(z)$ is the solution to

\displaystyle z+\frac{g(z)+z}{2}-\frac{3\sqrt{\frac{4}{3}+(g(z)+z)^{2}}}{2}=0,

solving in $g(z)$ yields

\displaystyle g(z)\in\left\{-\frac{3z}{4}-\frac{\sqrt{3}\sqrt{3z^{2}-8}}{4},-\frac{3z}{4}+\frac{\sqrt{3}\sqrt{3z^{2}-8}}{4}\right\},

and the limiting Stieltjes transform (with $\Im[g(z)]>0$ for $z$ with $\Im(z)>0$ by property 2 of the Stieltjes transform, see Subsection 2.2) is therefore

\displaystyle g(z)=-\frac{3z}{4}+\frac{\sqrt{3}\sqrt{3z^{2}-8}}{4}.

B.4 Proof of Theorem 4

Given the random tensor model in Eq. (10) and its singular vectors characterized by Eq. (11), we denote the associated random matrix model as

\displaystyle{\bm{T}}\equiv\Phi_{3}({\bm{\mathsfit{T}}},{\bm{u}},{\bm{v}},{\bm{w}})=\beta{\bm{V}}{\bm{B}}{\bm{V}}^{\top}+{\bm{N}},

where

\displaystyle{\bm{N}}=\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{u}},{\bm{v}},{\bm{w}}),\quad{\bm{B}}\equiv\begin{bmatrix}0&\langle{\bm{z}},{\bm{w}}\rangle&\langle{\bm{y}},{\bm{v}}\rangle\\ \langle{\bm{z}},{\bm{w}}\rangle&0&\langle{\bm{x}},{\bm{u}}\rangle\\ \langle{\bm{y}},{\bm{v}}\rangle&\langle{\bm{x}},{\bm{u}}\rangle&0\end{bmatrix}\in{\mathbb{M}}_{3},\quad{\bm{V}}\equiv\begin{bmatrix}{\bm{x}}&{\bm{0}}_{m}&{\bm{0}}_{m}\\ {\bm{0}}_{n}&{\bm{y}}&{\bm{0}}_{n}\\ {\bm{0}}_{p}&{\bm{0}}_{p}&{\bm{z}}\end{bmatrix}\in{\mathbb{M}}_{N,3}.

We further denote the resolvents of ${\bm{T}}$ and ${\bm{N}}$ respectively as

	$\displaystyle{\bm{R}}(z)$	$\displaystyle=\left({\bm{T}}-z{\bm{I}}_{N}\right)^{-1}=\begin{bmatrix}{\bm{R}}^{11}(z)&{\bm{R}}^{12}(z)&{\bm{R}}^{13}(z)\\ {\bm{R}}^{12}(z)^{\top}&{\bm{R}}^{22}(z)&{\bm{R}}^{23}(z)\\ {\bm{R}}^{13}(z)^{\top}&{\bm{R}}^{23}(z)^{\top}&{\bm{R}}^{33}(z)\end{bmatrix},$
	$\displaystyle{\bm{Q}}(z)$	$\displaystyle=\left({\bm{N}}-z{\bm{I}}_{N}\right)^{-1}=\begin{bmatrix}{\bm{Q}}^{11}(z)&{\bm{Q}}^{12}(z)&{\bm{Q}}^{13}(z)\\ {\bm{Q}}^{12}(z)^{\top}&{\bm{Q}}^{22}(z)&{\bm{Q}}^{23}(z)\\ {\bm{Q}}^{13}(z)^{\top}&{\bm{Q}}^{23}(z)^{\top}&{\bm{Q}}^{33}(z)\end{bmatrix}.$

By Woodbury matrix identity (Lemma 5), we have

\displaystyle{\bm{R}}(z)={\bm{Q}}(z)-{\bm{Q}}(z){\bm{V}}\left(\frac{1}{\beta}{\bm{B}}^{-1}+{\bm{V}}^{\top}{\bm{Q}}(z){\bm{V}}\right)^{-1}{\bm{V}}^{\top}{\bm{Q}}(z),

(40)

In particular, taking the normalized trace operator, we get

	$\displaystyle\frac{1}{N}\operatorname{tr}{\bm{R}}(z)$	$\displaystyle=\frac{1}{N}\operatorname{tr}{\bm{Q}}(z)-\frac{1}{N}\operatorname{tr}\left[\left(\frac{1}{\beta}{\bm{B}}^{-1}+{\bm{V}}^{\top}{\bm{Q}}(z){\bm{V}}\right)^{-1}{\bm{V}}^{\top}{\bm{Q}}^{2}(z){\bm{V}}\right],$
		$\displaystyle=\frac{1}{N}\operatorname{tr}{\bm{Q}}(z)+\mathcal{O}(N^{-1}),$

since the matrix $\left(\frac{1}{\beta}{\bm{B}}^{-1}+{\bm{V}}^{\top}{\bm{Q}}(z){\bm{V}}\right)^{-1}{\bm{V}}^{\top}{\bm{Q}}^{2}(z){\bm{V}}$ is of bounded spectral norm (assuming $\|{\bm{Q}}(z)\|$ is bounded, see condition in Eq. (8)) and being of finite size ( $3\times 3$ matrix). As such, the asymptotic spectral measure of ${\bm{T}}$ is the same as the one of ${\bm{N}}$ which can be estimated though $\frac{1}{N}\operatorname{tr}{\bm{Q}}(z)$ . Comparing to the result from Appendix B.2, now the singular vectors ${\bm{u}},{\bm{v}},{\bm{w}}$ depend statistically on the tensor noise ${\bm{\mathsfit{X}}}$ which needs to be handled.

From the identity Eq. (7), we have ${\bm{N}}{\bm{Q}}(z)-z{\bm{Q}}(z)={\bm{I}}_{N}$ , from which

\displaystyle\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{w}}){\bm{Q}}^{12}(z)^{\top}\right]_{ii}+\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{v}}){\bm{Q}}^{13}(z)^{\top}\right]_{ii}-\frac{z}{N}\operatorname{tr}Q^{11}(z)=\frac{m}{N}.

(41)

We thus need to compute the expectations of $\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{w}}){\bm{Q}}^{12}(z)^{\top}\right]_{ii}$ and $\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\left[{\bm{\mathsfit{X}}}({\bm{v}}){\bm{Q}}^{13}(z)^{\top}\right]_{ii}$ . In particular,

\displaystyle A\equiv\frac{1}{N\sqrt{N}}\sum_{i=1}^{m}\operatorname{\mathbb{E}}\left[{\bm{\mathsfit{X}}}({\bm{w}}){\bm{Q}}^{12}(z)^{\top}\right]_{ii}=\frac{1}{N\sqrt{N}}\sum_{ijk}\operatorname{\mathbb{E}}\left[X_{ijk}w_{k}Q^{12}_{ij}\right]=\frac{1}{N\sqrt{N}}\sum_{ijk}\operatorname{\mathbb{E}}\left[\frac{\partial(w_{k}Q^{12}_{ij})}{\partial X_{ijk}}\right],

where the last equality is obtained by Stein’s lemma (Lemma 1). Due to the statistical dependency between ${\bm{w}}$ and ${\bm{\mathsfit{X}}}$ , the above sum decomposes in two terms which are

\displaystyle A=\frac{1}{N\sqrt{N}}\sum_{ijk}\operatorname{\mathbb{E}}\left[w_{k}\frac{\partial Q^{12}_{ij}}{\partial X_{ijk}}\right]+\frac{1}{N\sqrt{N}}\sum_{ijk}\operatorname{\mathbb{E}}\left[\frac{\partial w_{k}}{\partial X_{ijk}}Q^{12}_{ij}\right]=A_{1}+A_{2},

where the first term $A_{1}$ has already been handled in the previous subsection (if replacing ${\bm{c}}$ with ${\bm{w}}$ ). We now show that the second term $A_{2}$ is asymptotically vanishing under Assumption 1. Indeed, by Eq. (14), we have

	$\displaystyle\frac{\partial w_{k}}{\partial X_{ijk}}$	$\displaystyle=-\frac{1}{\sqrt{N}}\left(v_{j}w_{k}(R_{ik}^{13}(\lambda)-u_{i}{\bm{u}}^{\top}{\bm{R}}_{:,k}^{13}(\lambda))\right)$
		$\displaystyle-\frac{1}{\sqrt{N}}\left(u_{i}w_{k}(R_{jk}^{23}(\lambda)-v_{j}{\bm{v}}^{\top}{\bm{R}}_{:,k}^{23}(\lambda))\right)-\frac{1}{\sqrt{N}}\left(u_{i}v_{j}(R_{kk}^{33}(\lambda)-w_{k}{\bm{w}}^{\top}{\bm{R}}_{:,k}^{33}(\lambda))\right).$

As such $A_{2}$ decomposes in three terms $A_{2}=A_{21}+A_{22}+A_{23}$ , where

	$\displaystyle A_{21}$	$\displaystyle=-\frac{1}{N^{2}}\sum_{ijk}\operatorname{\mathbb{E}}\left[v_{j}w_{k}R_{ik}^{13}(\lambda)Q_{ij}^{12}\right]+\frac{1}{N^{2}}\sum_{ijkl}\operatorname{\mathbb{E}}\left[u_{i}v_{j}w_{k}u_{l}R_{lk}^{13}(\lambda)Q_{ij}^{12}\right],$
		$\displaystyle=-\frac{1}{N^{2}}\operatorname{\mathbb{E}}\left[{\bm{v}}^{\top}{\bm{Q}}^{12}(z)^{\top}{\bm{R}}^{13}(\lambda){\bm{w}}\right]+\frac{1}{N^{2}}\operatorname{\mathbb{E}}\left[{\bm{u}}^{\top}{\bm{Q}}^{12}(z){\bm{v}}{\bm{u}}^{\top}{\bm{R}}^{13}(\lambda){\bm{w}}\right]\to 0,$

as $N\to\infty$ , since the singular vectors ${\bm{u}},{\bm{v}},{\bm{w}}$ are of bounded norms and assuming the resolvent ${\bm{Q}}(z)$ and ${\bm{R}}(\lambda)$ are of bounded spectral norms ( ${\bm{R}}(\lambda)$ has bounded spectral norm by Assumption 2 and Eq. (8)). Similarly, we further have

	$\displaystyle A_{22}$	$\displaystyle=-\frac{1}{N^{2}}\sum_{ijk}\operatorname{\mathbb{E}}\left[u_{i}w_{k}R^{23}_{jk}(\lambda)Q_{ij}^{12}\right]+\frac{1}{N^{2}}\sum_{ijkl}\operatorname{\mathbb{E}}\left[u_{i}v_{j}w_{k}v_{l}R^{23}_{lk}(\lambda)Q_{ij}^{12}\right],$
		$\displaystyle=-\frac{1}{N^{2}}\operatorname{\mathbb{E}}\left[{\bm{u}}^{\top}{\bm{Q}}^{12}(z){\bm{R}}^{23}(\lambda){\bm{w}}\right]+\frac{1}{N^{2}}\operatorname{\mathbb{E}}\left[{\bm{u}}^{\top}{\bm{Q}}^{12}(z){\bm{v}}{\bm{v}}^{\top}{\bm{R}}^{23}(\lambda){\bm{w}}\right]\to 0.$

And finally,

	$\displaystyle A_{23}$	$\displaystyle=-\frac{1}{N^{2}}\sum_{ijk}\operatorname{\mathbb{E}}\left[u_{i}v_{j}R^{33}_{kk}(\lambda)Q_{ij}^{12}\right]+\frac{1}{N^{2}}\sum_{ijkl}\operatorname{\mathbb{E}}\left[u_{i}v_{j}w_{k}w_{l}R^{33}_{lk}(\lambda)Q_{ij}^{12}\right],$
		$\displaystyle=-\frac{1}{N}\operatorname{\mathbb{E}}\left[{\bm{u}}^{\top}{\bm{Q}}^{12}(z){\bm{v}}\frac{1}{N}\operatorname{tr}{\bm{R}}^{33}(\lambda)\right]+\frac{1}{N^{2}}\operatorname{\mathbb{E}}\left[{\bm{u}}^{\top}{\bm{Q}}^{12}(z){\bm{v}}{\bm{w}}^{\top}{\bm{R}}^{33}(\lambda){\bm{w}}\right]\to 0.$

Therefore,

\displaystyle A=\frac{1}{N\sqrt{N}}\sum_{ijk}\operatorname{\mathbb{E}}\left[w_{k}\frac{\partial Q^{12}_{ij}}{\partial X_{ijk}}\right]+\mathcal{O}(N^{-1}).

As in the previous subsection, the derivative of ${\bm{Q}}(z)$ w.r.t. the entry $X_{ijk}$ expresses as $\frac{\partial{\bm{Q}}(z)}{\partial X_{ijk}}=-{\bm{Q}}(z)\frac{\partial{\bm{N}}}{\partial X_{ijk}}{\bm{Q}}(z)$ but now with

\displaystyle\frac{\partial{\bm{N}}}{\partial X_{ijk}}=\frac{1}{\sqrt{N}}\begin{bmatrix}{\bm{0}}_{m\times m}&w_{k}{\bm{e}}_{i}^{m}({\bm{e}}_{j}^{n})^{\top}&v_{j}{\bm{e}}_{i}^{m}({\bm{e}}_{k}^{p})^{\top}\\ w_{k}{\bm{e}}_{j}^{n}({\bm{e}}_{i}^{m})^{\top}&{\bm{0}}_{n\times n}&u_{i}{\bm{e}}_{j}^{n}({\bm{e}}_{k}^{p})^{\top}\\ v_{j}{\bm{e}}_{k}^{p}({\bm{e}}_{i}^{m})^{\top}&u_{i}{\bm{e}}_{k}^{p}({\bm{e}}_{j}^{n})^{\top}&{\bm{0}}_{p\times p}\end{bmatrix}+\frac{1}{\sqrt{N}}\Phi_{3}\left({\bm{\mathsfit{X}}},\frac{\partial{\bm{u}}}{\partial X_{ijk}},\frac{\partial{\bm{v}}}{\partial X_{ijk}},\frac{\partial{\bm{w}}}{\partial X_{ijk}}\right),

where ${\bm{O}}\equiv\frac{1}{\sqrt{N}}\Phi_{3}\left({\bm{\mathsfit{X}}},\frac{\partial{\bm{u}}}{\partial X_{ijk}},\frac{\partial{\bm{v}}}{\partial X_{ijk}},\frac{\partial{\bm{w}}}{\partial X_{ijk}}\right)$ is of vanishing spectral norm. Indeed, from Eq. (14), there exists $C>0$ independent of $N$ such that $\|\frac{\partial{\bm{u}}}{\partial X_{ijk}}\|,\|\frac{\partial{\bm{v}}}{\partial X_{ijk}}\|,\|\frac{\partial{\bm{w}}}{\partial X_{ijk}}\|\leq\frac{C}{\sqrt{N}}$ yielding that the spectral norm of ${\bm{O}}$ is bounded by $\frac{C^{\prime}}{N^{\frac{3}{2}}}$ for some constant $C^{\prime}>0$ independent of $N$ . Therefore, we find that $A\to-g_{1}(z)g_{2}(z)+\mathcal{O}(N^{-1})$ (with $g_{1}(z),g_{2}(z)$ the almost sure limits of $\frac{1}{N}\operatorname{tr}{\bm{Q}}^{11}(z)$ and $\frac{1}{N}\operatorname{tr}{\bm{Q}}^{22}(z)$ respectively), thus yielding the same limiting Stieltjes transform as the one obtained in Appendix B.2.

B.5 Proof of Theorem 5

Given the identities in Eq. (11), we have

\displaystyle\lambda\langle{\bm{u}},{\bm{x}}\rangle={\bm{x}}^{\top}{\bm{\mathsfit{T}}}({\bm{v}}){\bm{w}}=\beta\langle{\bm{v}},{\bm{y}}\rangle\langle{\bm{w}},{\bm{z}}\rangle+\frac{1}{\sqrt{N}}{\bm{x}}^{\top}{\bm{\mathsfit{X}}}({\bm{v}}){\bm{w}},

(42)

with $\lambda,\langle{\bm{u}},{\bm{x}}\rangle,\langle{\bm{v}},{\bm{y}}\rangle$ and $\langle{\bm{w}},{\bm{z}}\rangle$ converging almost surely to their asymptotic limits $\lambda^{\infty}(\beta),a_{\bm{x}}^{\infty}(\beta),a_{\bm{y}}^{\infty}(\beta)$ and $a_{\bm{z}}^{\infty}(\beta)$ respectively given the concentration properties in Subsection 3.4. To characterize such limits we need to evaluate the expectation of $\frac{1}{\sqrt{N}}{\bm{x}}^{\top}{\bm{\mathsfit{X}}}({\bm{v}}){\bm{w}}$ . Indeed,

\displaystyle\frac{1}{\sqrt{N}}\operatorname{\mathbb{E}}\left[{\bm{x}}^{\top}{\bm{\mathsfit{X}}}({\bm{v}}){\bm{w}}\right]=\frac{1}{\sqrt{N}}\sum_{ijk}x_{i}\operatorname{\mathbb{E}}\left[v_{j}w_{k}X_{ijk}\right]=\frac{1}{\sqrt{N}}\sum_{ijk}x_{i}\operatorname{\mathbb{E}}\left[\frac{\partial v_{j}}{\partial X_{ijk}}w_{k}\right]+\frac{1}{\sqrt{N}}\sum_{ijk}x_{i}\operatorname{\mathbb{E}}\left[v_{j}\frac{\partial w_{k}}{\partial X_{ijk}}\right]

where the last equality is obtained by Stein’s lemma (Lemma 1). From Eq. (14), we have

	$\displaystyle\frac{\partial v_{j}}{\partial X_{ijk}}$	$\displaystyle=-\frac{1}{\sqrt{N}}\left(v_{j}w_{k}(R_{ij}^{12}(\lambda)-u_{i}{\bm{u}}^{\top}{\bm{R}}_{:,j}^{12}(\lambda))\right)$
		$\displaystyle-\frac{1}{\sqrt{N}}\left(u_{i}w_{k}(R_{jj}^{22}(\lambda)-v_{j}{\bm{v}}^{\top}{\bm{R}}_{:,j}^{22}(\lambda))\right)-\frac{1}{\sqrt{N}}\left(u_{i}v_{j}(R_{jk}^{23}(\lambda)-w_{k}{\bm{w}}^{\top}{\bm{R}}_{:,j}^{32}(\lambda))\right).$

Hence $A\equiv\frac{1}{\sqrt{N}}\sum_{ijk}x_{i}\operatorname{\mathbb{E}}\left[\frac{\partial v_{j}}{\partial X_{ijk}}w_{k}\right]=A_{1}+A_{2}+A_{3}$ decomposes in three terms $A_{1},A_{2}$ and $A_{3}$ . The terms $A_{1}$ and $A_{3}$ will be vanishing asymptotically and only $A_{2}$ contains non-vanishing terms. Indeed,

	$\displaystyle A_{1}$	$\displaystyle=-\frac{1}{N}\sum_{ijk}\operatorname{\mathbb{E}}\left[x_{i}w_{k}v_{j}w_{k}R^{12}_{ij}(\lambda)\right]+\frac{1}{N}\sum_{ijkl}\operatorname{\mathbb{E}}\left[x_{i}w_{k}v_{j}w_{k}u_{i}u_{l}R^{12}_{lj}(\lambda)\right],$
		$\displaystyle=-\frac{1}{N}\operatorname{\mathbb{E}}\left[{\bm{x}}^{\top}{\bm{R}}^{12}(\lambda){\bm{v}}\right]+\frac{1}{N}\operatorname{\mathbb{E}}\left[\langle{\bm{x}},{\bm{u}}\rangle{\bm{u}}^{\top}{\bm{R}}^{12}(\lambda){\bm{v}}\right]\to 0,$

as $N\to\infty$ since ${\bm{x}},{\bm{u}}$ and ${\bm{v}}$ are of bounded norms and ${\bm{R}}(\lambda)$ being of bounded spectral norm for $\lambda$ outside the support of $\frac{1}{\sqrt{N}}\Phi_{3}({\bm{\mathsfit{X}}},{\bm{u}},{\bm{v}},{\bm{w}})$ (through the identity in Eq. (40)). Similarly with $A_{3}$ , we have

	$\displaystyle A_{3}$	$\displaystyle=-\frac{1}{N}\sum_{ijk}\operatorname{\mathbb{E}}\left[x_{i}w_{k}u_{i}v_{j}R^{23}_{jk}(\lambda)\right]+\frac{1}{N}\sum_{ijkl}\operatorname{\mathbb{E}}\left[x_{i}w_{k}u_{i}v_{j}w_{k}w_{l}R_{jl}^{23}(\lambda)\right],$
		$\displaystyle=-\frac{1}{N}\operatorname{\mathbb{E}}\left[\langle{\bm{x}},{\bm{u}}\rangle{\bm{v}}^{\top}{\bm{R}}^{23}(\lambda){\bm{w}}\right]+\frac{1}{N}\operatorname{\mathbb{E}}\left[\langle{\bm{x}},{\bm{u}}\rangle{\bm{v}}^{\top}{\bm{R}}^{23}(\lambda){\bm{w}}\right]\to 0.$

Now $A_{2}$ is not vanishing, precisely,

	$\displaystyle A_{2}$	$\displaystyle=-\frac{1}{N}\sum_{ijk}\operatorname{\mathbb{E}}\left[x_{i}w_{k}u_{i}w_{k}R^{22}_{jj}(\lambda)\right]+\frac{1}{N}\sum_{ijkl}\operatorname{\mathbb{E}}\left[x_{i}w_{k}u_{i}w_{k}v_{j}v_{l}R^{22}_{jl}(\lambda)\right],$
		$\displaystyle=-\frac{1}{N}\operatorname{\mathbb{E}}\left[\langle{\bm{x}},{\bm{u}}\rangle\operatorname{tr}{\bm{R}}^{22}(\lambda)\right]+\frac{1}{N}\operatorname{\mathbb{E}}\left[\langle{\bm{x}},{\bm{u}}\rangle{\bm{v}}^{\top}{\bm{R}}^{22}(\lambda){\bm{v}}\right],$
		$\displaystyle\to-g_{2}(\lambda)\operatorname{\mathbb{E}}\left[\langle{\bm{x}},{\bm{u}}\rangle\right]+\mathcal{O}(N^{-1}),$

where the last line results from the fact that $\frac{1}{N}\operatorname{tr}{\bm{R}}^{22}(\lambda)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g_{2}(\lambda)$ as we saw in the previous subsection. Similarly, we find that

\displaystyle B\equiv\frac{1}{\sqrt{N}}\sum_{ijk}x_{i}\operatorname{\mathbb{E}}\left[v_{j}\frac{\partial w_{k}}{\partial X_{ijk}}\right]\to-g_{3}(\lambda)\operatorname{\mathbb{E}}\left[\langle{\bm{x}},{\bm{u}}\rangle\right]+\mathcal{O}(N^{-1}).

Therefore, by Eq. (42), the almost sure limits $\lambda^{\infty}(\beta),a_{\bm{x}}^{\infty}(\beta),a_{\bm{y}}^{\infty}(\beta)$ and $a_{\bm{z}}^{\infty}(\beta)$ satisfy the equation

\displaystyle\lambda^{\infty}(\beta)a_{\bm{x}}^{\infty}(\beta)=\beta a_{\bm{y}}^{\infty}(\beta)a_{\bm{z}}^{\infty}(\beta)-[g_{2}(\lambda^{\infty}(\beta))+g_{3}(\lambda^{\infty}(\beta))]a_{\bm{x}}^{\infty}(\beta).

Hence,

\displaystyle a_{\bm{x}}^{\infty}(\beta)=\alpha_{1}(\lambda^{\infty}(\beta))a_{\bm{y}}^{\infty}(\beta)a_{\bm{z}}^{\infty}(\beta),

with $\alpha_{1}(z)\equiv\frac{\beta}{z+g(z)-g_{1}(z)}$ . Similarly, we find that

\displaystyle\begin{cases}a_{\bm{y}}^{\infty}(\beta)=\alpha_{2}(\lambda^{\infty}(\beta))a_{\bm{x}}^{\infty}(\beta)a_{\bm{z}}^{\infty}(\beta),\\ a_{\bm{z}}^{\infty}(\beta)=\alpha_{3}(\lambda^{\infty}(\beta))a_{\bm{x}}^{\infty}(\beta)a_{\bm{y}}^{\infty}(\beta),\end{cases}

with $\alpha_{i}(z)\equiv\frac{\beta}{z+g(z)-g_{i}(z)}$ . Solving the above system of equations provides the final asymptotic alignments. Proceeding similarly with Eq. (12) we obtain an estimate of the asymptotic singular value $\lambda^{\infty}(\beta)$ , thereby ending the proof.

B.6 Proof of Corollary 3

By Corollary 1, the limiting Stieltjes transform is given by

\displaystyle g(z)=-\frac{3z}{4}+\frac{\sqrt{3}\sqrt{3z^{2}-8}}{4},

and since $g(z)=\sum_{i=1}^{3}g_{i}(z)$ with all $g_{i}(z)$ equal, then $g_{i}(z)=\frac{g(z)}{3}$ for all $i\in[3]$ . Hence, for each $i\in[3]$ , $\alpha_{i}(z)$ defined in Theorem 5 writes as

\displaystyle\alpha_{i}(z)=\frac{\beta}{\frac{z}{2}+\frac{\sqrt{3}\sqrt{3z^{2}-8}}{6}},

and $\lambda^{\infty}(\beta)$ is solution to the equation

\displaystyle\frac{z}{4}+\frac{\sqrt{3}\sqrt{3z^{2}-8}}{4}-\frac{\left(\frac{z}{2}+\frac{\sqrt{3}\sqrt{3z^{2}-8}}{6}\right)^{3}}{\beta^{2}}=0.

First we compute the critical value of $\beta$ by solving the above equation in $\beta$ and taking the limit when $z$ tends to the right edge of the semi-circle law (i.e., $z\to 2\sqrt{\frac{2}{3}}^{+}$ ). Indeed, solving the above equation in $\beta$ yields

\displaystyle\beta(z)=\sqrt{\frac{2z^{3}+\frac{2z^{2}\sqrt{9z^{2}-24}}{3}-4z-\frac{4\sqrt{9z^{2}-24}}{9}}{z+\sqrt{3}\sqrt{3z^{2}-8}}},

hence

\displaystyle\beta_{s}=\lim_{z\to 2\sqrt{\frac{2}{3}}^{+}}\beta(z)=\frac{2\sqrt{3}}{3}.

Now to express $\lambda^{\infty}(\beta)$ in terms of $\beta$ , we solve the equation $\beta-\beta(z)=0$ in $z$ and choose the unique non-decreasing (in $\beta$ ) and positive solution, which yields

\displaystyle\lambda^{\infty}(\beta)=\sqrt{\frac{\beta^{2}}{2}+2+\frac{\sqrt{3}\sqrt{\left(3\beta^{2}-4\right)^{3}}}{18\beta}}.

Plugging the above expression of $\lambda^{\infty}(\beta)$ into the expressions of the asymptotic alignments in Theorem 5, we obtain for all $i\in[3]$

\displaystyle\alpha_{i}(\lambda^{\infty}(\beta))=\frac{6\sqrt{2}\beta}{\sqrt{9\beta^{2}-12+\frac{\sqrt{3}\sqrt{\left(3\beta^{2}-4\right)^{3}}}{\beta}}+\sqrt{9\beta^{2}+36+\frac{\sqrt{3}\sqrt{\left(3\beta^{2}-4\right)^{3}}}{\beta}}},

yielding the final result.

B.7 Proof of Corollary 4

Setting $c_{1}=c,c_{2}=1-c$ and $c_{3}=0$ , we get

\displaystyle\begin{cases}g_{1}(z)=\frac{g(z)+z}{2}-\frac{\sqrt{4c+(g(z)+z)^{2}}}{2},\\ g_{2}(z)=\frac{g(z)+z}{2}-\frac{\sqrt{4(1-c)+(g(z)+z)^{2}}}{2},\\ g_{3}(z)=0.\end{cases}

And since $g(z)=\sum_{i=1}^{3}g_{i}(z)$ , then $g(z)$ satisfies the equation

\displaystyle z-\frac{\sqrt{4c+(g(z)+z)^{2}}}{2}-\frac{\sqrt{4(1-c)+(g(z)+z)^{2}}}{2}=0,

the solution of which belongs to

\displaystyle g(z)\in\left\{-z-\frac{\sqrt{4c(c-1)+(z^{2}-1)^{2}}}{z},-z+\frac{\sqrt{4c(c-1)+(z^{2}-1)^{2}}}{z}\right\},

thus the limiting Stieltjes transform (having $\Im[g(z)]>0$ ) is given by

\displaystyle g(z)=-z+\frac{\sqrt{4c(c-1)+(z^{2}-1)^{2}}}{z}.

In particular, the edges of the support of the corresponding limiting distribution $\nu$ are the roots of $4c(c-1)+(z^{2}-1)^{2}$ , yielding

\displaystyle\mathcal{S}(\nu)=\left[-\sqrt{1+2\sqrt{c(1-c)}},-\sqrt{1-2\sqrt{c(1-c)}}\right]\cup\left[\sqrt{1-2\sqrt{c(1-c)}},\sqrt{1+2\sqrt{c(1-c)}}\right].

And the density function of $\nu$ is obtained by computing the limit $\lim_{\varepsilon\to 0}|\Im[g(x+i\varepsilon)]|$ yielding

\displaystyle\nu(dx)=\frac{1}{\pi x}\sin\left(\frac{\arctan_{2}(0,q_{c}(x))}{2}\right)\sqrt{|q_{c}(x)|}\operatorname{sign}\left(\frac{\sin\left(\frac{\arctan_{2}(0,q_{c}(x))}{2}\right)}{x}\right)dx+\left(1-2\min(c,1-c)\right)\delta(x),

where $q_{c}(x)=(x^{2}-1)^{2}+4c(c-1)$ . The Dirac $\delta(x)$ component in the above expression corresponds to the fact that the corresponding matrix model is of rank $2\min(m,n)$ .

B.8 Proof of Corollary 5

From Subsection B.7, plugging the expression of the limiting Stieltjes transform $g(z)$ into the expressions defining $g_{1}(z)$ and $g_{2}(z)$ yields

\displaystyle\begin{cases}g_{1}(z)=\frac{\sqrt{4c(c-1)+(z^{2}-1)^{2}}}{2z}-\frac{\sqrt{4c(c-1+z^{2})+(z^{2}-1)^{2}}}{2z},\\ g_{2}(z)=\frac{\sqrt{4c(c-1)+(z^{2}-1)^{2}}}{2z}-\frac{\sqrt{4c(c-1-z^{2})+(z^{2}+1)^{2}}}{2z}.\end{cases}

Thus, by Theorem 5, we have

\displaystyle\begin{cases}\alpha_{1}(z)=\frac{\beta}{\frac{\sqrt{4c^{2}-4c+z^{4}-2z^{2}+1}}{2z}+\frac{\sqrt{4c^{2}+4cz^{2}-4c+z^{4}-2z^{2}+1}}{2z}}\\ \alpha_{2}(z)=\frac{\beta}{\frac{\sqrt{4c^{2}-4c+z^{4}-2z^{2}+1}}{2z}+\frac{\sqrt{4c^{2}-4cz^{2}-4c+z^{4}+2z^{2}+1}}{2z}}\\ \alpha_{3}(z)=\frac{\beta z}{\sqrt{4c^{2}-4c+z^{4}-2z^{2}+1}}.\end{cases}

Then solving the equation $z+g(z)-\frac{\beta}{\alpha_{1}(z)\alpha_{2}(z)\alpha_{3}(z)}$ in $z$ provides the almost sure limit of $\lambda$ in terms of $\beta$ . Specifically,

\displaystyle\lambda^{\infty}(\beta)=\sqrt{\beta^{2}+1+\frac{c(1-c)}{\beta^{2}}}.

Plugging the above expression of $\lambda^{\infty}(\beta)$ into $\frac{1}{\sqrt{\alpha_{2}(z)\alpha_{3}(z)}}$ by replacing $z$ with the expression of $\lambda^{\infty}(\beta)$ provides the asymptotic alignment $a_{\bm{x}}^{\infty}(\beta)$ as $\frac{1}{\kappa(\beta,c)}$ , with $\kappa(\beta,c)=\sqrt{\alpha_{2}(\lambda^{\infty}(\beta))\alpha_{3}(\lambda^{\infty}(\beta))}$ given by the following expression

\displaystyle\kappa(\beta,c)=\sqrt{\frac{2\sqrt{\frac{\beta^{2}\left(\beta^{2}+1\right)-c^{2}+c}{\beta^{2}}}\left(\beta^{2}\left(\beta^{2}+1\right)-c\left(c-1\right)\right)}{\sqrt{\frac{4\beta^{4}c\left(c-1\right)+\left(\beta^{4}-c^{2}+c\right)^{2}}{\beta^{4}}}\sqrt{\frac{\beta^{2}\left(\beta^{2}+1\right)-c\left(c-1\right)}{\beta^{2}}}\left(\sqrt{\frac{-4\beta^{2}c\left(\beta^{2}\left(\beta^{2}-c+2\right)-c\left(c-1\right)\right)+\left(\beta^{2}\left(\beta^{2}+2\right)-c\left(c-1\right)\right)^{2}}{\beta^{4}}}+\sqrt{\frac{4\beta^{4}c\left(c-1\right)+\left(\beta^{4}-c\left(c-1\right)\right)^{2}}{\beta^{4}}}\right)}}.

From the identities

	$\displaystyle 4\beta^{4}c(c-1)+(\beta^{4}-c(c-1))^{2}=(\beta^{4}+c(c-1))^{2}$
	$\displaystyle-4\beta^{2}c\left(\beta^{2}\left(\beta^{2}-c+2\right)-c\left(c-1\right)\right)+\left(\beta^{2}\left(\beta^{2}+2\right)-c\left(c-1\right)\right)^{2}=\left(\beta^{4}+\beta^{2}\left(2-2c\right)-c^{2}+c\right)^{2}$

$\kappa(\beta,c)$ simplifies as

\displaystyle\kappa(\beta,c)=\beta\sqrt{\frac{\beta^{2}\left(\beta^{2}+1\right)-c\left(c-1\right)}{(\beta^{4}+c(c-1))\left(\beta^{2}+1-c\right)}}.

The asymptotic alignment $a_{\bm{y}}^{\infty}(\beta)$ is given by $\frac{1}{\kappa(\beta,1-c)}$ since the dimension ratio of the component ${\bm{y}}$ is $c_{2}=1-c$ . Moreover, the critical value of $\beta$ is obtained by solving the equation $\lambda^{\infty}(\beta_{s})=\sqrt{1+2\sqrt{c(1-c)}}$ , i.e., when the limiting singular value gets closer to the right edge of the support of the corresponding limiting spectral measure (see Corollary 4). Finally, similarly as above, we can check that $\sqrt{\alpha_{1}(\lambda^{\infty}(\beta))\alpha_{2}(\lambda^{\infty}(\beta))}$ is equal to $1$ for $\beta\geq\beta_{s}$ (i.e., the alignment along the third dimension is $1$ , corresponding to the component ${\bm{z}}$ in Eq. (10)).

B.9 Proof of Theorem 6

Denote the matrix model as

\displaystyle{\bm{N}}\equiv\frac{1}{\sqrt{N}}\Phi_{d}({\bm{\mathsfit{X}}},{\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)}),

with ${\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{d}}(\mathcal{N}(0,1))$ and $({\bm{a}}^{(1)},\ldots,{\bm{a}}^{(d)})\in{\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}$ are independent of ${\bm{\mathsfit{X}}}$ . We further denote the resolvent of ${\bm{N}}$ as

\displaystyle{\bm{Q}}(z)\equiv\left({\bm{N}}-z{\bm{I}}_{N}\right)^{-1}=\begin{bmatrix}{\bm{Q}}^{11}(z)&{\bm{Q}}^{12}(z)&{\bm{Q}}^{13}(z)&\cdots&{\bm{Q}}^{1d}(z)\\ {\bm{Q}}^{12}(z)^{\top}&{\bm{Q}}^{22}(z)&{\bm{Q}}^{23}(z)&\cdots&{\bm{Q}}^{2d}(z)\\ {\bm{Q}}^{13}(z)^{\top}&{\bm{Q}}^{23}(z)^{\top}&{\bm{Q}}^{33}(z)&\ldots&{\bm{Q}}^{3d}(z)\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ {\bm{Q}}^{1d}(z)^{\top}&{\bm{Q}}^{2d}(z)^{\top}&{\bm{Q}}^{3d}(z)^{\top}&\cdots&{\bm{Q}}^{dd}(z)\end{bmatrix}.

By Borel-Cantelli lemma, we have $\frac{1}{N}\operatorname{tr}{\bm{Q}}(z)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g(z)$ and for all $i\in[d]$ , $\frac{1}{N}\operatorname{tr}{\bm{Q}}^{ii}(z)\operatorname{\,\xrightarrow{\text{a.s.}}\,}g_{i}(z)$ . Applying the identity in Eq. (7) to the symmetric matrix ${\bm{N}}$ we get ${\bm{N}}{\bm{Q}}(z)-z{\bm{Q}}(z)={\bm{I}}_{N}$ , from which we particularly get

\displaystyle\frac{1}{\sqrt{N}}\sum_{j=2}^{d}\left[{\bm{\mathsfit{X}}}^{1j}{\bm{Q}}^{1j}(z)^{\top}\right]_{i_{1}i_{1}}-zQ_{i_{1}i_{1}}^{11}(z)=1,

\displaystyle\frac{1}{N\sqrt{N}}\sum_{j=1}^{2}\sum_{i_{1}=1}^{n_{1}}\left[{\bm{\mathsfit{X}}}^{1j}{\bm{Q}}^{1j}(z)^{\top}\right]_{i_{1}i_{1}}-\frac{z}{N}\operatorname{tr}{\bm{Q}}^{11}(z)=\frac{n_{1}}{N},

(43)

where we recall that ${\bm{\mathsfit{X}}}^{ij}\equiv{\bm{\mathsfit{X}}}({\bm{a}}^{(1)},\ldots,{\bm{a}}^{(i-1)},:,{\bm{a}}^{(i+1)},\ldots,{\bm{a}}^{(j-1)},:,{\bm{a}}^{(j+1)},\ldots,{\bm{a}}^{(d)})\in{\mathbb{M}}_{n_{i},n_{j}}$ .

We thus need to compute the expectation of $\frac{1}{N\sqrt{N}}\sum_{i_{1}=1}^{n_{1}}\left[{\bm{\mathsfit{X}}}^{1j}{\bm{Q}}^{1j}(z)^{\top}\right]_{i_{1}i_{1}}$ which develops as

\displaystyle A_{j}\equiv\frac{1}{N\sqrt{N}}\sum_{i_{1}=1}^{n_{1}}\operatorname{\mathbb{E}}\left[{\bm{\mathsfit{X}}}^{1j}{\bm{Q}}^{1j}(z)^{\top}\right]_{i_{1}i_{1}}=\frac{1}{N\sqrt{N}}\sum_{i_{1}i_{j}}\operatorname{\mathbb{E}}\left[\left[{\bm{\mathsfit{X}}}^{1j}\right]_{i_{1}i_{j}}Q^{1j}_{i_{1}i_{j}}\right]=\frac{1}{N\sqrt{N}}\sum_{i_{1},\ldots,i_{d}}\prod_{k\neq 1,k\neq j}a_{i_{k}}^{(k)}\operatorname{\mathbb{E}}\left[\frac{\partial Q^{1j}_{i_{1}i_{j}}}{\partial X_{i_{1},\ldots,i_{d}}}\right]

where the last equality follows from Stein’s lemma (Lemma 1). In particular, as in Appendix B.2 for the $3$ -order case, it turns out that the only contributing term in the derivative $\frac{\partial Q_{i_{1}i_{j}}^{1j}}{\partial X_{i_{1},\ldots,i_{d}}}$ is $-\frac{1}{\sqrt{N}}\prod_{k\neq 1,k\neq j}a_{i_{k}}^{(k)}Q_{i_{1}i_{1}}^{11}Q_{i_{j}i_{j}}^{jj}$ with the other terms yielding quantities of order $\mathcal{O}(N^{-1})$ . Therefore, we find that

	$\displaystyle A_{j}$	$\displaystyle=\frac{1}{N\sqrt{N}}\sum_{i_{1},\ldots,i_{d}}\prod_{k\neq 1,k\neq j}a_{i_{k}}^{(k)}\operatorname{\mathbb{E}}\left[\frac{\partial Q^{1j}_{i_{1}i_{j}}}{\partial X_{i_{1},\ldots,i_{d}}}\right]=-\frac{1}{N^{2}}\sum_{i_{1},\ldots,i_{d}}\prod_{k\neq 1,k\neq j}\left(a_{i_{k}}^{(k)}\right)^{2}\operatorname{\mathbb{E}}\left[Q_{i_{1}i_{1}}^{11}Q_{i_{j}i_{j}}^{jj}\right]+\mathcal{O}(N^{-1})$
		$\displaystyle=-\frac{1}{N^{2}}\sum_{i_{1}i_{j}}\operatorname{\mathbb{E}}\left[Q_{i_{1}i_{1}}^{11}Q_{i_{j}i_{j}}^{jj}\right]+\mathcal{O}(N^{-1})$
		$\displaystyle=-\frac{1}{N}\operatorname{tr}{\bm{Q}}^{11}(z)\frac{1}{N}\operatorname{tr}{\bm{Q}}^{jj}(z)+\mathcal{O}(N^{-1})$
		$\displaystyle\operatorname{\,\xrightarrow{\text{a.s.}}\,}-g_{1}(z)g_{j}(z)+\mathcal{O}(N^{-1}).$

From Eq. (43), $g_{1}(z)$ satisfies $-g_{1}(z)\sum_{j\neq 1}g_{j}(z)-zg_{1}(z)=c_{1}$ with $c_{1}=\lim\frac{n_{1}}{N}$ . Similarly, for all $i\in[d]$ , $g_{i}(z)$ satisfies $-g_{i}(z)\sum_{j\neq i}g_{j}(z)-zg_{i}(z)=c_{i}$ with $c_{i}=\lim\frac{n_{i}}{N}$ . And since, $g(z)=\sum_{i=1}^{d}g_{i}(z)$ , we have for each $i\in[d]$ , $g_{i}(z)(g(z)-g_{i}(z))+zg_{i}(z)+c_{i}=0$ , yielding

\displaystyle g_{i}(z)=\frac{g(z)+z}{2}-\frac{\sqrt{4c_{i}+(g(z)+z)^{2}}}{2},

with $g(z)$ solution to the equation $g(z)=\sum_{i=1}^{d}g_{i}(z)$ satisfying $\Im[g(z)]>0$ for $\Im[z]>0$ .

B.10 Proof of Corollary 6

Given Theorem 6 and setting $c_{i}=\frac{1}{d}$ for all $i\in[d]$ , we have $g_{i}(z)=\frac{g(z)}{d}$ , thus $g(z)$ satisfies the equation

\displaystyle\frac{g(z)}{d}=\frac{g(z)+z}{2}-\frac{\sqrt{\frac{4}{d}+(g(z)+z)^{2}}}{2}.

Solving in $g(z)$ yields

\displaystyle g(z)\in\left\{-\frac{dz}{2\left(d-1\right)}-\frac{\sqrt{d\left(dz^{2}-4d+4\right)}}{2\left(d-1\right)},-\frac{dz}{2\left(d-1\right)}+\frac{\sqrt{d\left(dz^{2}-4d+4\right)}}{2\left(d-1\right)}\right\},

and the limiting Stieltjes transform with $\Im[g(z)]\geq 0$ is

\displaystyle g(z)=-\frac{dz}{2\left(d-1\right)}+\frac{\sqrt{d\left(dz^{2}-4d+4\right)}}{2\left(d-1\right)}.

B.11 Existence of $g(z)$

Define the following function for $(g,z)\in{\mathbb{R}}\times{\mathbb{R}}_{+}$ and $d\geq 3$

\displaystyle k(g,z)=g-\frac{g+z}{2}d+\frac{1}{2}\sum_{i=1}^{d}\sqrt{4c_{i}+(g+z)^{2}},

with $\sum_{i=1}^{d}c_{i}=1$ . By concavity of the function $\sqrt{\cdot}$ we have $\sum_{i=1}^{d}\sqrt{4c_{i}+x^{2}}\leq d\sqrt{\frac{4}{d}+x^{2}}$ for all $x\in{\mathbb{R}}$ . Therefore, $k(g,z)$ is bounded as

\displaystyle k(g,z)\leq\bar{k}(g,z)\equiv g-\frac{g+z}{2}d+\frac{d}{2}\sqrt{\frac{4}{d}+(g+z)^{2}}.

From Section B.10, we have, for any $z>2\sqrt{\frac{d-1}{d}}$ , there exists $g_{*}\in{\mathbb{R}}$ such that $\bar{k}(g_{*},z)=0$ . Besides, we further have $\lim_{g\to-\infty}k(g,z)=\lim_{g\to+\infty}k(g,z)=+\infty$ and hence by continuity, there exists $g(z)$ such that $k(g(z),z)=0$ for $z$ large enough (e.g., $z>2\sqrt{\frac{d-1}{d}}$ ).

B.12 Proof of Theorem 7

Given the random tensor model in Eq. (1) and its singular vectors characterized by Eq. (26), we denote the associated random matrix model as

\displaystyle{\bm{T}}\equiv\Phi_{d}\left({\bm{\mathsfit{T}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}\right)=\beta{\bm{V}}{\bm{B}}{\bm{V}}^{\top}+{\bm{N}},

where ${\bm{N}}=\frac{1}{\sqrt{N}}\Phi_{d}\left({\bm{\mathsfit{X}}},{\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}\right)$ , ${\bm{B}}\in{\mathbb{M}}_{d}$ with entries $B_{ij}=(1-\delta_{ij})\prod_{k\neq i,j}\langle{\bm{x}}^{(k)},{\bm{u}}^{(k)}\rangle$

\displaystyle{\bm{V}}\equiv\begin{bmatrix}{\bm{x}}^{(1)}&{\bm{0}}_{n_{1}}&\cdots&{\bm{0}}_{n_{1}}\\ {\bm{0}}_{n_{2}}&{\bm{x}}^{(2)}&\cdots&{\bm{0}}_{n_{2}}\\ \vdots&\vdots&\ddots&{\bm{0}}_{n_{3}}\\ {\bm{0}}_{n_{d}}&{\bm{0}}_{n_{d}}&{\bm{0}}_{n_{d}}&{\bm{x}}^{(d)}\end{bmatrix}\in{\mathbb{M}}_{N,d}.

We further denote the resolvent of ${\bm{T}}$ and ${\bm{N}}$ respectively as

\displaystyle{\bm{R}}(z)\equiv\left({\bm{T}}-z{\bm{I}}_{N}\right)^{-1}=\begin{bmatrix}{\bm{R}}^{11}(z)&{\bm{R}}^{12}(z)&{\bm{R}}^{13}(z)&\cdots&{\bm{R}}^{1d}(z)\\ {\bm{R}}^{12}(z)^{\top}&{\bm{R}}^{22}(z)&{\bm{R}}^{23}(z)&\cdots&{\bm{R}}^{2d}(z)\\ {\bm{R}}^{13}(z)^{\top}&{\bm{R}}^{23}(z)^{\top}&{\bm{R}}^{33}(z)&\ldots&{\bm{R}}^{3d}(z)\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ {\bm{R}}^{1d}(z)^{\top}&{\bm{R}}^{2d}(z)^{\top}&{\bm{R}}^{3d}(z)^{\top}&\cdots&{\bm{R}}^{dd}(z)\end{bmatrix}.

\displaystyle{\bm{Q}}(z)\equiv\left({\bm{N}}-z{\bm{I}}_{N}\right)^{-1}=\begin{bmatrix}{\bm{Q}}^{11}(z)&{\bm{Q}}^{12}(z)&{\bm{Q}}^{13}(z)&\cdots&{\bm{Q}}^{1d}(z)\\ {\bm{Q}}^{12}(z)^{\top}&{\bm{Q}}^{22}(z)&{\bm{Q}}^{23}(z)&\cdots&{\bm{Q}}^{2d}(z)\\ {\bm{Q}}^{13}(z)^{\top}&{\bm{Q}}^{23}(z)^{\top}&{\bm{Q}}^{33}(z)&\ldots&{\bm{Q}}^{3d}(z)\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ {\bm{Q}}^{1d}(z)^{\top}&{\bm{Q}}^{2d}(z)^{\top}&{\bm{Q}}^{3d}(z)^{\top}&\cdots&{\bm{Q}}^{dd}(z)\end{bmatrix}.

Similarly as in $3$ -order case, by Woodbury matrix identity (Lemma 5), we have

	$\displaystyle\frac{1}{N}\operatorname{tr}{\bm{R}}(z)$	$\displaystyle=\frac{1}{N}\operatorname{tr}{\bm{Q}}(z)-\frac{1}{N}\operatorname{tr}\left[\left(\frac{1}{\beta}{\bm{B}}^{-1}+{\bm{V}}^{\top}{\bm{Q}}(z){\bm{V}}\right)^{-1}{\bm{V}}^{\top}{\bm{Q}}^{2}(z){\bm{V}}\right],$
		$\displaystyle=\frac{1}{N}\operatorname{tr}{\bm{Q}}(z)+\mathcal{O}(N^{-1}),$

since the perturbation matrix $\left(\frac{1}{\beta}{\bm{B}}^{-1}+{\bm{V}}^{\top}{\bm{Q}}(z){\bm{V}}\right)^{-1}{\bm{V}}^{\top}{\bm{Q}}^{2}(z){\bm{V}}$ is of bounded spectral norm (if $\|{\bm{Q}}(z)\|$ is bounded, see condition in Eq. (8)) and has finite size ( $d\times d$ matrix). Therefore, the characterization of the spectrum of ${\bm{T}}$ boils down to the estimation of $\frac{1}{N}\operatorname{tr}{\bm{Q}}(z)$ . Now we are left to handle the statistical dependency between the tensor noise ${\bm{X}}$ and the singular vectors of ${\bm{\mathsfit{T}}}$ . Recalling the proof of Appendix B.9, we have again

\displaystyle\frac{1}{N\sqrt{N}}\sum_{j=1}^{2}\sum_{i_{1}=1}^{n_{1}}\left[{\bm{\mathsfit{X}}}^{1j}{\bm{Q}}^{1j}(z)^{\top}\right]_{i_{1}i_{1}}-\frac{z}{N}\operatorname{tr}{\bm{Q}}^{11}(z)=\frac{n_{1}}{N},

with ${\bm{\mathsfit{X}}}^{ij}\equiv{\bm{\mathsfit{X}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(i-1)},:,{\bm{u}}^{(i+1)},\ldots,{\bm{u}}^{(j-1)},:,{\bm{u}}^{(j+1)},\ldots,{\bm{u}}^{(d)})\in{\mathbb{M}}_{n_{i},n_{j}}$ . Taking the expectation of $\frac{1}{N\sqrt{N}}\sum_{i_{1}=1}^{n_{1}}\left[{\bm{\mathsfit{X}}}^{1j}{\bm{Q}}^{1j}(z)^{\top}\right]_{i_{1}i_{1}}$ yields

\displaystyle A_{j}=\frac{1}{N\sqrt{N}}\sum_{i_{1},\ldots,i_{d}}\operatorname{\mathbb{E}}\left[\frac{\partial Q^{1j}_{i_{1}i_{j}}}{\partial X_{i_{1},\ldots,i_{d}}}\prod_{k\neq 1,k\neq j}u_{i_{k}}^{(k)}\right]+\frac{1}{N\sqrt{N}}\sum_{i_{1},\ldots,i_{d}}\operatorname{\mathbb{E}}\left[Q^{1j}_{i_{1}i_{j}}\prod_{k\neq 1,k\neq j}\frac{\partial u_{i_{k}}^{(k)}}{\partial X_{i_{1},\ldots,i_{d}}}\right]=A_{j1}+A_{j2},

where we already computed the first term ( $A_{j1}$ ) in Appendix B.9. Now we will show that $A_{j2}$ is asymptotically vanishing under Assumption 3. Indeed, by Eq. (28), the higher order terms arise from the term $-\frac{1}{\sqrt{N}}\prod_{\ell\neq k}u^{(\ell)}_{i_{\ell}}R^{kk}_{i_{k}i_{k}}(\lambda)$ , we thus only show that the contribution of this term is also vanishing. Precisely,

	$\displaystyle A_{j2}$	$\displaystyle=-\frac{1}{N^{2}}\sum_{i_{1},\ldots,i_{d}}\operatorname{\mathbb{E}}\left[Q_{i_{1}i_{j}}^{1j}\prod_{k\neq 1,k\neq j}\prod_{\ell\neq k}u^{(\ell)}_{i_{\ell}}R^{kk}_{i_{k}i_{k}}(\lambda)\right]+\mathcal{O}(N^{-1})$
		$\displaystyle=-\frac{1}{N^{2}}\sum_{i_{1},\ldots,i_{d}}\operatorname{\mathbb{E}}\left[\left(u_{i_{1}}^{(1)}\right)^{d-2}Q_{i_{1}i_{j}}^{1j}\left(u_{i_{j}}^{(j)}\right)^{d-2}\prod_{k\neq 1,k\neq j}u^{(k)}_{i_{k}}R^{kk}_{i_{k}i_{k}}(\lambda)\right]+\mathcal{O}(N^{-1})$
		$\displaystyle=-\frac{1}{N^{2}}\operatorname{\mathbb{E}}\left[\left(({\bm{u}}^{(1)})^{\odot d-2}\right)^{\top}{\bm{Q}}^{1j}(z)\left({\bm{u}}^{(j)}\right)^{\odot d-2}\sum_{i_{2},\ldots,i_{j-1},i_{j+1},\ldots,i_{d}}\prod_{k\neq 1,k\neq j}u^{(k)}_{i_{k}}R^{kk}_{i_{k}i_{k}}(\lambda)\right]+\mathcal{O}(N^{-1})$

where ${\bm{a}}^{\odot q}$ denotes the vector with entries $a_{i}^{q}$ . As such, $A_{j2}$ is vanishing asymptotically since the singular vectors ${\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)}$ are unitary and since their entries are bounded by $1$ .

We finally need to check that the derivative of ${\bm{Q}}(z)$ w.r.t. the entry $X_{i_{1},\ldots,i_{d}}$ has the same expression asymptotically as the one in the independent case of Appendix B.9. Indeed, we have $\frac{\partial{\bm{Q}}(z)}{\partial X_{i_{1},\ldots,i_{d}}}=-{\bm{Q}}(z)\frac{\partial{\bm{N}}}{\partial X_{i_{1},\ldots,i_{d}}}{\bm{Q}}(z)$ with

\displaystyle\frac{\partial{\bm{N}}}{\partial X_{i_{1},\ldots,i_{d}}}=\begin{bmatrix}{\bm{0}}_{n_{1}\times n_{1}}&{\bm{C}}_{12}&\cdots&{\bm{C}}_{1d}\\ {\bm{C}}_{12}^{\top}&{\bm{0}}_{n_{2}\times n_{2}}&\cdots&{\bm{C}}_{2d}\\ \vdots&\vdots&\ddots&\vdots\\ {\bm{C}}_{1d}^{\top}&{\bm{C}}_{2d}^{\top}&\cdots&{\bm{0}}_{n_{d}\times n_{d}}\end{bmatrix}+\frac{1}{\sqrt{N}}\Phi_{d}\left({\bm{X}},\frac{\partial{\bm{u}}^{(1)}}{\partial X_{i_{1},\ldots,i_{d}}},\ldots,\frac{\partial{\bm{u}}^{(d)}}{\partial X_{i_{1},\ldots,i_{d}}}\right),

where ${\bm{C}}_{ij}=\prod_{k\neq i,k\neq j}u^{(k)}_{i_{k}}{\bm{e}}_{i_{i}}^{n_{i}}({\bm{e}}_{i_{j}}^{n_{j}})^{\top}$ and ${\bm{O}}=\frac{1}{\sqrt{N}}\Phi_{d}\left({\bm{X}},\frac{\partial{\bm{u}}^{(1)}}{\partial X_{i_{1},\ldots,i_{d}}},\ldots,\frac{\partial{\bm{u}}^{(d)}}{\partial X_{i_{1},\ldots,i_{d}}}\right)$ is of vanishing norm. Indeed, by Eq. (28), there exists $C>0$ independent of $N$ such that $\|\frac{\partial{\bm{u}}^{(1)}}{\partial X_{i_{1},\ldots,i_{d}}}\|,\ldots,\|\frac{\partial{\bm{u}}^{(d)}}{\partial X_{i_{1},\ldots,i_{d}}}\|\leq\frac{C}{\sqrt{N}}$ , therefore, the spectral norm of ${\bm{O}}$ is bounded by $\frac{C^{\prime}}{N^{\frac{d}{2}}}$ for some constant $C^{\prime}>0$ independent of $N$ . Finally, $A_{j}\to-g_{1}(z)g_{j}(z)+\mathcal{O}(N^{-1})$ (with $g_{1}(z),g_{j}(z)$ the almost sure limits of $\frac{1}{N}\operatorname{tr}{\bm{Q}}^{11}(z)$ and $\frac{1}{N}\operatorname{tr}{\bm{Q}}^{jj}(z)$ respectively), hence yielding the same limiting Stieltjes transform as the one obtained in Appendix B.9.

B.13 Proof of Theorem 8

Given the identities in Eq. (26), we have for all $i\in[d]$

\displaystyle\frac{1}{\sqrt{N}}\sum_{j_{1},\ldots,j_{d}}x_{j_{i}}^{(i)}\prod_{k\neq i}u_{j_{k}}^{(k)}X_{j_{1},\ldots,j_{d}}+\beta\prod_{k\neq i}\langle{\bm{u}}^{(k)},{\bm{x}}^{(k)}\rangle=\lambda\langle{\bm{u}}^{(i)},{\bm{x}}^{(i)}\rangle,

with $\lambda$ and $\langle{\bm{u}}^{(i)},{\bm{x}}^{(i)}\rangle$ concentrate almost surely around their asymptotic denoted $\lambda^{\infty}(\beta)$ and $a_{{\bm{x}}^{(i)}}^{\infty}(\beta)$ respectively. Taking the expectation of the first term and applying Stein’s lemma (Lemma 1), we get

\displaystyle A_{i}=\frac{1}{\sqrt{N}}\sum_{j_{1},\ldots,j_{d}}x_{j_{i}}^{(i)}\operatorname{\mathbb{E}}\left[\frac{\partial\left(\prod_{k\neq i}u_{j_{k}}^{(k)}\right)}{\partial X_{j_{1},\ldots,j_{d}}}\right]=\frac{1}{\sqrt{N}}\sum_{j_{1},\ldots,j_{d}}x_{j_{i}}^{(i)}\sum_{k\neq i}\operatorname{\mathbb{E}}\left[\frac{\partial u_{j_{k}}^{(k)}}{\partial X_{j_{1},\ldots,j_{d}}}\prod_{\ell\neq k,\ell\neq i}u_{j_{\ell}}^{(\ell)}\right],

where the only contributing term in the expression of $\frac{\partial u_{j_{k}}^{(k)}}{\partial X_{j_{1},\ldots,j_{d}}}$ from Eq. (28) is $-\frac{1}{\sqrt{N}}\prod_{\ell\neq k}u_{j_{\ell}}^{(\ell)}R_{j_{k}j_{k}}^{kk}(\lambda)$ which yields

	$\displaystyle A_{i}$	$\displaystyle=-\frac{1}{N}\sum_{j_{1},\ldots,j_{d}}x^{(i)}_{j_{i}}\sum_{k\neq i}\operatorname{\mathbb{E}}\left[R_{j_{k}j_{k}}^{kk}(\lambda)\prod_{\ell\neq k}u_{j_{\ell}}^{(\ell)}\prod_{\ell\neq k,\ell\neq i}u_{j_{\ell}}^{(\ell)}\right]+\mathcal{O}(N^{-1})$
		$\displaystyle=-\operatorname{\mathbb{E}}\left[\langle{\bm{u}}^{(i)},{\bm{x}}^{(i)}\rangle\sum_{k\neq i}\frac{1}{N}\operatorname{tr}{\bm{R}}^{kk}(\lambda)\right]+\mathcal{O}(N^{-1})$
		$\displaystyle\to-\operatorname{\mathbb{E}}\left[\langle{\bm{u}}^{(i)},{\bm{x}}^{(i)}\rangle\right]\sum_{k\neq i}g_{k}(\lambda).$

Therefore, the almost sure limits $\lambda^{\infty}$ and $a_{{\bm{x}}^{(i)}}^{\infty}$ for each $i\in[d]$ satisfy

\displaystyle\lambda^{\infty}a_{{\bm{x}}^{(i)}}^{\infty}=\beta\prod_{k\neq i}a_{{\bm{x}}^{(k)}}^{\infty}-a_{{\bm{x}}^{(i)}}^{\infty}\sum_{k\neq i}g_{k}(\lambda^{\infty}),

therefore

\displaystyle a_{{\bm{x}}^{(i)}}^{\infty}=\alpha_{i}(\lambda^{\infty})\prod_{k\neq i}a_{{\bm{x}}^{(k)}}^{\infty},\quad\text{with}\quad\alpha_{i}(z)=\frac{\beta}{z+g(z)-g_{i}(z)},

since $g(z)=\sum_{k=1}^{d}g_{i}(z)$ . To solve the above equation, we simply write $x_{i}=a_{{\bm{x}}^{(i)}}^{\infty}$ and $\alpha_{i}=\alpha_{i}(z)$ by omitting the dependence on $z$ . We therefore have

\displaystyle x_{i}=\alpha_{i}\prod_{k\neq i}x_{k}\quad\Rightarrow\quad x_{i}=\alpha_{i}x_{j}\prod_{k\neq i,k\neq j}x_{k}=\alpha_{i}x_{j}\prod_{k\neq i,k\neq j}\left(\alpha_{k}\prod_{\ell\neq k}x_{\ell}\right),

from which we have

\displaystyle x_{i}=x_{j}\left(\prod_{k\neq j}\alpha_{k}\right)\left(\prod_{k\neq i,k\neq j}\prod_{\ell\neq k}x_{\ell}\right)=x_{j}\left(\prod_{k\neq j}\alpha_{k}\right)\left(\prod_{k\neq i,k\neq j}\prod_{\ell\neq k,\ell\neq i}x_{\ell}\right)x_{i}^{d-2},

thus

\displaystyle x_{j}\left(\prod_{k\neq j}\alpha_{k}\right)\left(\prod_{k\neq i,k\neq j}\prod_{\ell\neq k,\ell\neq i}x_{\ell}\right)x_{i}^{d-3}=1,

and we remark that $\left(\prod_{k\neq i,k\neq j}\prod_{\ell\neq k,\ell\neq i}x_{\ell}\right)x_{i}^{d-3}=\left(\frac{x_{j}}{\alpha_{j}}\right)^{d-3}x_{j}^{d-2}$ , hence $x_{j}$ is given by

\displaystyle x_{j}=\left(\frac{\alpha_{j}^{d-3}}{\prod_{k\neq j}\alpha_{k}}\right)^{\frac{1}{2d-4}},

which ends the proof.

Alternative expression of $q_{i}(z)$

From the above, we have $\lambda^{\infty}+g(\lambda^{\infty})=\beta\prod_{i=1}^{d}x_{i}$ and

\displaystyle(\lambda^{\infty}+g(\lambda^{\infty})-g_{i}(\lambda^{\infty}))x_{i}=\beta\prod_{j\neq i}^{d}x_{j}\quad\Rightarrow\quad(\lambda^{\infty}+g(\lambda^{\infty})-g_{i}(\lambda^{\infty}))x_{i}^{2}=\beta\prod_{i=1}^{d}x_{i}.

Therefore,

\displaystyle x_{i}=\sqrt{\frac{\lambda^{\infty}+g(\lambda^{\infty})}{\lambda^{\infty}+g(\lambda^{\infty})-g_{i}(\lambda^{\infty})}}=\sqrt{1+\frac{g_{i}(\lambda^{\infty})}{\lambda^{\infty}+g(\lambda^{\infty})-g_{i}(\lambda^{\infty})}},

with $z+g(z)-g_{i}(z)=\frac{-c_{i}}{g_{i}(z)}$ since $g_{i}(z)$ satisfies $g_{i}^{2}(z)-(g(z)+z)g_{i}(z)-c_{i}=0$ . Hence, we find

\displaystyle x_{i}=\sqrt{1-\frac{g_{i}^{2}(\lambda^{\infty})}{c_{i}}}.

B.14 Additional lemmas

Lemma 4 (Spectral norm of random Gaussian tensors).

Let ${\bm{\mathsfit{X}}}\sim{\mathbb{T}}_{n_{1},\ldots,n_{d}}(\mathcal{N}(0,1))$ , then the spectral norm of $\|{\bm{\mathsfit{X}}}\|$ can be bounded, with probability at least $1-\delta$ for $\delta>0$ , as

\displaystyle\|{\bm{\mathsfit{X}}}\|\leq\sqrt{2\left[\left(\sum_{i=1}^{d}n_{i}\right)\log\left(\frac{2d}{\log(3/2)}\right)+\log\left(\frac{2}{\delta}\right)\right]}.

Proof.

By definition, the spectral norm of ${\bm{\mathsfit{X}}}$ is given as

\displaystyle\|{\bm{\mathsfit{X}}}\|=\sup_{{\bm{u}}^{(i)}\in{\mathbb{S}}^{n_{i}-1},\,i\in[d]}{\bm{\mathsfit{X}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})

(44)

For some $\varepsilon>0$ , let ${\mathcal{E}}_{1},\ldots,{\mathcal{E}}_{d}$ be $\varepsilon$ -nets of ${\mathbb{S}}^{n_{1}-1},\ldots,{\mathbb{S}}^{n_{d}-1}$ respectively. Since ${\mathbb{S}}^{n_{1}-1}\times\cdots\times{\mathbb{S}}^{n_{d}-1}$ is compact, there exists a maximizer of Eq. (44). And with the $\varepsilon$ -net argument, there exists ${\bm{e}}^{(i)}\in{\mathcal{E}}_{i}$ for each $i\in[d]$ such that

\displaystyle\|{\bm{\mathsfit{X}}}\|={\bm{\mathsfit{X}}}({\bm{e}}^{(1)}+{\bm{\delta}}^{(1)},\ldots,{\bm{e}}^{(d)}+{\bm{\delta}}^{(d)}),

such that $\|{\bm{\delta}}^{(i)}\|\leq\varepsilon$ for $i\in[d]$ . Therefore, one has

\displaystyle\|{\bm{\mathsfit{X}}}\|\leq{\bm{\mathsfit{X}}}({\bm{e}}^{(1)},\ldots,{\bm{e}}^{(d)})+\left(\sum_{i=1}^{d}\varepsilon^{i}\binom{d}{i}\right)\|{\bm{\mathsfit{X}}}\|.

For $\varepsilon=\frac{\log(3/2)}{d}$ , one has $\sum_{i=1}^{d}\varepsilon^{i}\binom{d}{i}\leq\sum_{i=1}^{d}\frac{(\varepsilon d)^{i}}{i!}\leq e^{\varepsilon d}-1=\frac{1}{2}$ . As such, the spectral norm of ${\bm{\mathsfit{X}}}$ can be bounded as

\displaystyle\|{\bm{\mathsfit{X}}}\|\leq 2\max_{{\bm{e}}^{(i)}\in{\mathcal{E}}_{i},\,i\in[d]}{\bm{\mathsfit{X}}}({\bm{e}}^{(1)},\ldots,{\bm{e}}^{(d)}).

(45)

Since the entries of ${\bm{\mathsfit{X}}}$ are i.i.d. standard Gaussian random variables, we have $\operatorname{\mathbb{E}}\left[e^{tX_{i_{1}\ldots i_{d}}}\right]\leq e^{t^{2}/2}$ , hence using Hoeffding’s inequality we have for any ${\bm{u}}^{(i)}\in{\mathbb{S}}^{n_{i}-1}$ with $i\in[d]$

	$\displaystyle{\mathbb{P}}\left\{{\bm{\mathsfit{X}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})\geq t\right\}={\mathbb{P}}\left\{e^{s{\bm{\mathsfit{X}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})}\geq e^{st}\right\}\leq e^{-st}\operatorname{\mathbb{E}}\left[e^{s{\bm{\mathsfit{X}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})}\right]$
	$\displaystyle\leq\exp\left\{-st+\frac{s^{2}}{2}\sum_{i_{1},\ldots,i_{d}}(u_{i_{1}}^{(1)})^{2}\cdots(u_{i_{d}}^{(d)})^{2}\right\}=\exp\left\{-st+\frac{s^{2}}{2}\right\}.$

Minimizing the right-hand side w.r.t. $s$ yields ${\mathbb{P}}\left\{{\bm{\mathsfit{X}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})\geq t\right\}\leq e^{-t^{2}/2}$ . With the same arguments, we further have ${\mathbb{P}}\left\{{\bm{\mathsfit{X}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})\leq-t\right\}\leq e^{-t^{2}/2}$ . And taking the union of the two cases yields

\displaystyle{\mathbb{P}}\left\{|{\bm{\mathsfit{X}}}({\bm{u}}^{(1)},\ldots,{\bm{u}}^{(d)})|\geq t\right\}\leq 2e^{-t^{2}/2}.

Back to Eq. (45), since $|{\mathcal{E}}_{i}|\leq\left(\frac{2}{\varepsilon}\right)^{n_{i}}$ , involving the union bound gives us

	$\displaystyle{\mathbb{P}}\left\{\\|{\bm{\mathsfit{X}}}\\|\geq t\right\}$	$\displaystyle\leq\sum_{{\bm{e}}^{(1)}\in{\mathcal{E}}_{1},\ldots,{\bm{e}}^{(d)}\in{\mathcal{E}}_{d}}{\mathbb{P}}\left\{{\bm{\mathsfit{X}}}({\bm{e}}^{(1)},\ldots,{\bm{e}}^{(d)})\geq\frac{t}{2}\right\}$
		$\displaystyle\leq\left(\frac{2d}{\log(3/2)}\right)^{\sum_{i=1}^{d}n_{i}}2\exp\left(-\frac{t^{2}}{8}\right),$

which yields the final bound for an appropriate choice of $t$ . ∎

Lemma 5 (Woodbury matrix identity).

Let ${\bm{A}}\in{\mathbb{M}}_{n}$ , ${\bm{B}}\in{\mathbb{M}}_{k}$ , ${\bm{U}}\in{\mathbb{M}}_{n,k}$ and ${\bm{V}}\in{\mathbb{M}}_{k,n}$ , we have

\displaystyle\left({\bm{A}}+{\bm{U}}{\bm{B}}{\bm{V}}\right)^{-1}={\bm{A}}^{-1}-{\bm{A}}^{-1}{\bm{U}}\left({\bm{B}}^{-1}+{\bm{V}}{\bm{A}}^{-1}{\bm{U}}\right)^{-1}{\bm{V}}{\bm{A}}^{-1}.

References

[1] {barticle}[author] \bauthor\bsnmAnandkumar, \bfnmAnimashree\binitsA., \bauthor\bsnmGe, \bfnmRong\binitsR., \bauthor\bsnmHsu, \bfnmDaniel\binitsD., \bauthor\bsnmKakade, \bfnmSham M\binitsS. M. and \bauthor\bsnmTelgarsky, \bfnmMatus\binitsM. (\byear2014). \btitleTensor decompositions for learning latent variable models. \bjournalJournal of machine learning research \bvolume15 \bpages2773–2832. \endbibitem
[2] {barticle}[author] \bauthor\bsnmBaik, \bfnmJinho\binitsJ., \bauthor\bsnmArous, \bfnmGérard Ben\binitsG. B. and \bauthor\bsnmPéché, \bfnmSandrine\binitsS. (\byear2005). \btitlePhase transition of the largest eigenvalue for nonnull complex sample covariance matrices. \bjournalThe Annals of Probability \bvolume33 \bpages1643–1697. \endbibitem
[3] {barticle}[author] \bauthor\bsnmBen Arous, \bfnmGérard\binitsG., \bauthor\bsnmHuang, \bfnmDaniel Zhengyu\binitsD. Z. and \bauthor\bsnmHuang, \bfnmJiaoyang\binitsJ. (\byear2021). \btitleLong Random Matrices and Tensor Unfolding. \bjournalarXiv preprint arXiv:2110.10210. \endbibitem
[4] {barticle}[author] \bauthor\bsnmBen Arous, \bfnmGerard\binitsG., \bauthor\bsnmMei, \bfnmSong\binitsS., \bauthor\bsnmMontanari, \bfnmAndrea\binitsA. and \bauthor\bsnmNica, \bfnmMihai\binitsM. (\byear2019). \btitleThe landscape of the spiked tensor model. \bjournalCommunications on Pure and Applied Mathematics \bvolume72 \bpages2282–2330. \endbibitem
[5] {barticle}[author] \bauthor\bsnmBenaych-Georges, \bfnmFlorent\binitsF. and \bauthor\bsnmNadakuditi, \bfnmRaj Rao\binitsR. R. (\byear2011). \btitleThe eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. \bjournalAdvances in Mathematics \bvolume227 \bpages494–521. \endbibitem
[6] {barticle}[author] \bauthor\bsnmBiroli, \bfnmGiulio\binitsG., \bauthor\bsnmCammarota, \bfnmChiara\binitsC. and \bauthor\bsnmRicci-Tersenghi, \bfnmFederico\binitsF. (\byear2020). \btitleHow to iron out rough landscapes and get optimal performances: averaged gradient descent and its application to tensor PCA. \bjournalJournal of Physics A: Mathematical and Theoretical \bvolume53 \bpages174003. \endbibitem
[7] {barticle}[author] \bauthor\bsnmCapitaine, \bfnmMireille\binitsM., \bauthor\bsnmDonati-Martin, \bfnmCatherine\binitsC. and \bauthor\bsnmFéral, \bfnmDelphine\binitsD. (\byear2009). \btitleThe largest eigenvalues of finite rank deformation of large Wigner matrices: convergence and nonuniversality of the fluctuations. \bjournalThe Annals of Probability \bvolume37 \bpages1–47. \endbibitem
[8] {barticle}[author] \bauthor\bsnmGoulart, \bfnmJosé Henrique de Morais\binitsJ. H. d. M., \bauthor\bsnmCouillet, \bfnmRomain\binitsR. and \bauthor\bsnmComon, \bfnmPierre\binitsP. (\byear2021). \btitleA Random Matrix Perspective on Random Tensors. \bjournalarXiv preprint arXiv:2108.00774. \endbibitem
[9] {bphdthesis}[author] \bauthor\bsnmHandschy, \bfnmMadeline Curtis\binitsM. C. (\byear2019). \btitlePhase Transition in Random Tensors with Multiple Spikes, \btypePhD thesis, \bpublisherUniversity of Minnesota. \endbibitem
[10] {barticle}[author] \bauthor\bsnmHitchcock, \bfnmFrank L\binitsF. L. (\byear1927). \btitleThe expression of a tensor or a polyadic as a sum of products. \bjournalJournal of Mathematics and Physics \bvolume6 \bpages164–189. \endbibitem
[11] {barticle}[author] \bauthor\bsnmHuang, \bfnmJiaoyang\binitsJ., \bauthor\bsnmHuang, \bfnmDaniel Z\binitsD. Z., \bauthor\bsnmYang, \bfnmQing\binitsQ. and \bauthor\bsnmCheng, \bfnmGuang\binitsG. (\byear2020). \btitlePower iteration for tensor pca. \bjournalarXiv preprint arXiv:2012.13669. \endbibitem
[12] {barticle}[author] \bauthor\bsnmJagannath, \bfnmAukosh\binitsA., \bauthor\bsnmLopatto, \bfnmPatrick\binitsP. and \bauthor\bsnmMiolane, \bfnmLeo\binitsL. (\byear2020). \btitleStatistical thresholds for tensor PCA. \bjournalThe Annals of Applied Probability \bvolume30 \bpages1910–1933. \endbibitem
[13] {barticle}[author] \bauthor\bsnmLandsberg, \bfnmJoseph M\binitsJ. M. (\byear2012). \btitleTensors: geometry and applications. \bjournalRepresentation theory \bvolume381 \bpages3. \endbibitem
[14] {binproceedings}[author] \bauthor\bsnmLesieur, \bfnmThibault\binitsT., \bauthor\bsnmMiolane, \bfnmLéo\binitsL., \bauthor\bsnmLelarge, \bfnmMarc\binitsM., \bauthor\bsnmKrzakala, \bfnmFlorent\binitsF. and \bauthor\bsnmZdeborová, \bfnmLenka\binitsL. (\byear2017). \btitleStatistical and computational phase transitions in spiked tensor estimation. In \bbooktitleProc. IEEE International Symposium on Information Theory (ISIT) \bpages511–515. \endbibitem
[15] {binproceedings}[author] \bauthor\bsnmLim, \bfnmLek-Heng\binitsL.-H. (\byear2005). \btitleSingular values and eigenvalues of tensors: a variational approach. In \bbooktitleProc. IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing \bpages129–132. \endbibitem
[16] {barticle}[author] \bauthor\bsnmMarčenko, \bfnmVladimir A\binitsV. A. and \bauthor\bsnmPastur, \bfnmLeonid Andreevich\binitsL. A. (\byear1967). \btitleDistribution of eigenvalues for some sets of random matrices. \bjournalMathematics of the USSR-Sbornik \bvolume1 \bpages457. \endbibitem
[17] {barticle}[author] \bauthor\bsnmMontanari, \bfnmAndrea\binitsA. and \bauthor\bsnmRichard, \bfnmEmile\binitsE. (\byear2014). \btitleA statistical model for tensor PCA. \bjournalarXiv preprint arXiv:1411.1076. \endbibitem
[18] {bbook}[author] \bauthor\bsnmNocedal, \bfnmJorge\binitsJ. and \bauthor\bsnmWright, \bfnmStephen\binitsS. (\byear2006). \btitleNumerical optimization. \bpublisherSpringer Science & Business Media. \endbibitem
[19] {barticle}[author] \bauthor\bsnmPéché, \bfnmSandrine\binitsS. (\byear2006). \btitleThe largest eigenvalue of small rank perturbations of Hermitian random matrices. \bjournalProbability Theory and Related Fields \bvolume134 \bpages127–173. \endbibitem
[20] {binproceedings}[author] \bauthor\bsnmPerry, \bfnmAmelia\binitsA., \bauthor\bsnmWein, \bfnmAlexander S\binitsA. S. and \bauthor\bsnmBandeira, \bfnmAfonso S\binitsA. S. (\byear2020). \btitleStatistical limits of spiked tensor models. In \bbooktitleAnnales de l’Institut Henri Poincaré, Probabilités et Statistiques \bvolume56 \bpages230–264. \bpublisherInstitut Henri Poincaré. \endbibitem
[21] {barticle}[author] \bauthor\bsnmRabanser, \bfnmStephan\binitsS., \bauthor\bsnmShchur, \bfnmOleksandr\binitsO. and \bauthor\bsnmGünnemann, \bfnmStephan\binitsS. (\byear2017). \btitleIntroduction to tensor decompositions and their applications in machine learning. \bjournalarXiv preprint arXiv:1711.10781. \endbibitem
[22] {barticle}[author] \bauthor\bsnmStein, \bfnmCharles M\binitsC. M. (\byear1981). \btitleEstimation of the mean of a multivariate normal distribution. \bjournalThe annals of Statistics \bpages1135–1151. \endbibitem
[23] {barticle}[author] \bauthor\bsnmSun, \bfnmWill Wei\binitsW. W., \bauthor\bsnmHao, \bfnmBotao\binitsB. and \bauthor\bsnmLi, \bfnmLexin\binitsL. (\byear2014). \btitleTensors in Modern Statistical Learning. \bjournalWiley StatsRef: Statistics Reference Online \bpages1–25. \endbibitem
[24] {bbook}[author] \bauthor\bsnmTao, \bfnmTerence\binitsT. (\byear2012). \btitleTopics in random matrix theory \bvolume132. \bpublisherAmerican Mathematical Soc. \endbibitem
[25] {barticle}[author] \bauthor\bsnmZare, \bfnmAli\binitsA., \bauthor\bsnmOzdemir, \bfnmAlp\binitsA., \bauthor\bsnmIwen, \bfnmMark A\binitsM. A. and \bauthor\bsnmAviyente, \bfnmSelin\binitsS. (\byear2018). \btitleExtension of PCA to higher order data structures: An introduction to tensors, tensor decompositions, and tensor PCA. \bjournalProceedings of the IEEE \bvolume106 \bpages1341–1358. \endbibitem

When random tensors meet random matrices

Abstract

keywords:

keywords:

Notations:

1 Introduction

Main Contributions

Remark 1.

2 Random matrix theory tools

2.1 The resolvent matrix

Definition 1.

2.2 The Stieltjes transform

Definition 2 (Stieltjes transform).

Theorem 1 (Inverse formula of the Stieltjes transform).

Theorem 2 (Stieltjes transform continuity).

2.3 Gaussian calculations

Lemma 1 (Stein’s Lemma [22]).

Lemma 2 (Poincaré inequality).

3 The asymmetric rank-one spiked tensor model of order 33

3.1 Tensor singular value and vectors

3.2 Associated random matrix model

Lemma 3.

Proof.

Remark 2.

3.3 Limiting spectral measure of block-wise 33-order tensor contractions

Assumption 1 (Growth rate).

Theorem 3.

Proof.

Remark 3.

Corollary 1.

Proof.

Remark 4.

Assumption 2.

Theorem 4.

Proof.

Remark 5.

Corollary 2.

3.4 Concentration of the singular value and the alignments

3.5 Asymptotic singular value and alignments

Theorem 5.

Proof.

Remark 6.

Remark 7.

3.6 Cubic 33-order tensors: case c1=c2=c3=13c_{1}=c_{2}=c_{3}=\frac{1}{3}

Corollary 3.

Proof.

4 Random tensors meet random matrices

4.1 Limiting spectral measure

Corollary 4.

Proof.

Remark 8.

4.2 Limiting singular value and alignments

Corollary 5.

Proof.

Remark 9 (Application to tensor unfolding).

5 Generalization to arbitrary dd-order tensors

5.1 Associated random matrix ensemble

Remark 10.

5.2 Limiting spectral measure of block-wise dd-order tensor contractions

Assumption 3.

Theorem 6.

Proof.

Corollary 6.

Proof.

Remark 11.

Assumption 4.

Theorem 7.

Proof.

5.3 Asymptotic singular value and alignments of hyper-rectangular tensors

Theorem 8.

Proof.

Remark 12.

6 Generalization to rank rr tensor with orthogonal components

7 Discussion

Appendix A Simulations

A.1 Matrix case

A.2 Order 33 tensors

Appendix B Proofs

B.1 Derivative of tensor singular value and vectors

B.2 Proof of Theorem 3

3 The asymmetric rank-one spiked tensor model of order $3$

3.3 Limiting spectral measure of block-wise $3$ -order tensor contractions

3.6 Cubic $3$ -order tensors: case $c_{1}=c_{2}=c_{3}=\frac{1}{3}$

5 Generalization to arbitrary $d$ -order tensors

5.2 Limiting spectral measure of block-wise $d$ -order tensor contractions

6 Generalization to rank $r$ tensor with orthogonal components

A.2 Order $3$ tensors

B.11 Existence of $g(z)$

Alternative expression of $q_{i}(z)$