Outage Probability and Finite-SNR DMT Analysis for IRS-aided MIMO Systems: How Large IRSs Need to Be?

Xin Zhang, , Xianghao Yu, , and S.H. Song The authors are with the Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong (e-mail: xzhangfe@connect.ust.hk, {eexyu, eeshsong}@ust.hk).

Abstract

Intelligent reflecting surfaces (IRSs) are promising enablers for high-capacity wireless communication systems by constructing favorable channels between the transmitter and receiver. However, general, accurate, and tractable outage probability analysis for IRS-aided multiple-input-multiple-output (MIMO) systems is not available in the literature. In this paper, we first characterize the distribution of the mutual information (MI) for IRS-aided MIMO systems by capitalizing on large random matrix theory (RMT). Based on this result, a closed-form approximation for the outage probability is derived and a gradient-based algorithm is proposed to minimize the outage probability with statistical channel state information (CSI). We also investigate the diversity-multiplexing tradeoff (DMT) with finite signal-to-noise ratio (SNR). Based on these theoretical results, we further study the impact of the IRS size on system performance. In the high SNR regime, we provide closed-form expressions for the ergodic mutual information (EMI) and outage probability as a function of the IRS size, which analytically reveal that the benefit of increasing the IRS size saturates quickly. Simulation results validate the accuracy of the theoretical analysis and confirm the increasing cost to improve system performance by deploying larger IRSs. For example, for an IRS-aided MIMO system with 20 antennas at both the transmitter and receiver, we need to double the size of the IRS to increase the throughput from 90% to 95% of its maximum value.

Index Terms:

Intelligent reflecting surface (IRS), multiple-input-multiple-output (MIMO), outage probability, random matrix theory (RMT).

I Introduction

Intelligent reflecting surfaces (IRSs) have attracted extensive interests from both academia and industry and are considered as one of the promising solutions for future high-capacity communication systems [1]. By designing the controllable phase shifts, IRSs can customize favorable channels between the transceivers [2], thus increasing the throughput and the reliability of wireless links [3]. In addition, IRSs are energy-efficient due to their passive nature. Inspired by these advantages, IRSs have been applied to various wireless systems such as massive multiple-input-multiple-output (MIMO) [4], simultaneous wireless information and power transfer (SWIPT) [5], and non-orthogonal multiple access (NOMA) [6] systems.

There have been works investigating the design and performance analysis of IRS-aided systems [7], [8], [9]. To this end, capacity (throughput) and outage probability (reliability) are two important performance measures. The ergodic mutual information (EMI) (average throughput) has been well studied for single-input-single-output (SISO) [10] and multiple-input-single-output (MISO) systems [9]. The EMI of IRS-aided MIMO systems over Rician channels was investigated by random matrix theory (RMT) with an accurate and closed-form approximation [11].

The outage probability of IRS-aided SISO systems was evaluated in previous works. In [6], a closed-form expression for the outage probability of NOMA SISO networks was obtained under Nakagami fading using central limit theorem (CLT). The outage probability of a SISO system with multiple IRSs over Rician channels was analyzed in [8]. In [12], the outage probability of IRS-aided vehicular communication systems was evaluated using CLT. Atapattu et al. derived a closed-form expression for the outage probability and provided the optimal phase shifts design [13]. It was also shown that the decreasing rate of the outage probability is related to the size of IRSs.

The outage probability of IRS-aided MISO systems was also studied. In [14], considering reflection pattern modulation (RPM), a closed-form approximation for the asymptotic outage probability over Rician channels was obtained using Gamma approximation. In [15], with maximum-ratio transmission (MRT), the expression of the outage probability and its asymptotically-optimal form were given, while the phase shifts were optimized to minimize the outage probability. In [16] and [17], the robust design of IRS-aided MISO systems was investigated with imperfect channel state information (CSI). In [18], the conventional CLT was used to obtain a closed-form expression of the outage probability.

The analysis in SISO or MISO systems utilized variable and vector based methods (conventional CLT), which are not applicable to MIMO systems. In fact, characterizing the cascaded channel of IRS-aided MIMO system involves the investigation of the spectral distribution of the product of two random matrices, which is a challenging RMT problem [19]. As far as the authors know, the outage probability of IRS-aided MIMO systems was only investigated in [20], where Mellin transform and finite-regime RMT were leveraged to derive the outage probability over Rayleigh fading channels with channel correlation at one side of the transceivers. In other words, a generic and tractable outage probability characterization for IRS-aided MIMO systems is not available in the literature.

In this paper, we first characterize the distribution of the MI for IRS-aided MIMO systems and utilize it to evaluate the outage probability. We then propose a gradient descent algorithm to minimize the outage probability by optimizing the phase shifts. The results about the outage probability are then utilized to investigate the diversity-multiplexing tradeoff (DMT). The SNR-asymptotic DMT was proposed in [21] to characterize the trade-off between diversity gain (reliability) and multiplexing gain (spectral efficiency) [19], which however is not accurate in the finite SNR regime. In this paper, we will investigate the finite-SNR DMT [22] [23] of IRS-aided MIMO systems. Finally, the impact of the IRS size on the system performance is studied to answer the question: How large IRSs need to be?

I-A Contributions

The main contributions of this paper are listed as follows.

1)

The distribution of the MI for IRS-aided MIMO systems is first derived. Based on the result, an approximation on the outage probability over general correlated channels is obtained with only statistical CSI. To the best of the authors’ knowledge, this is the first analytical result regarding the outage probability in IRS-aided MIMO systems over general correlated channels. Numerical results validate the accuracy of the proposed method.
2)

With the derived outage probability, a gradient algorithm is proposed to minimize the outage probability by optimizing the phase shifts at the IRS, assuming only statistical CSI. Numerical results show that the algorithm can efficiently decrease the outage probability.
3)

A closed-form expression is obtained for the finite-SNR DMT of IRS-aided MIMO systems, which is not available in the literature. An interesting observation is that the finite-SNR DMT is highly related to the ratio between the mean and the standard deviation of the MI. The accuracy of the expression is validated by numerical results.
4)

The impact of the size of the IRS on system performance is investigated. To this end, we first propose the concept called IRS efficiency to measure the efficiency of increasing the IRS size in achieving the maximum throughput. The expression of the outage probability with respect to the IRS size over uncorrelated channels is explicitly given in the high SNR regime. Based on the theoretical analysis and simulation results, we have two key observations. First, when the size of the IRS is infinitely large, the performance of the two-hop IRS system is the same as that of the single-hop link. Second, the benefit of increasing the size of the IRS saturates quickly. For example, for an IRS-aided system with 20 antennas at the transceivers over independent channels, we need to double the size of the IRS to increase the throughput from 90% to 95% of its maximum value.

I-B Organizations

The rest of this paper is organized as follows. Section II presents the system model for the IRS-aided MIMO system. Section III introduces the main results including the characterization of the distribution for the MI. The analysis and optimization of the outage probability and the finite-SNR DMT are given in Section IV. Section V discusses the effect of the IRS size on system performance. The theoretical results are validated in Section VI by numerical simulations. Finally, Section VII concludes the paper.

Notations: Bold upper case letters and bold lower case letters represent the matrix and vector, respectively. $\mathrm{Re}\left\{\cdot\right\}$ denotes the real part of a complex number. $\mathbb{P}(\cdot)$ is the probability measure. $\mathbb{C}^{N}$ and $\mathbb{C}^{M\times N}$ represent the space of $N$ -dimensional vectors and the space of $M$ -by- $N$ matrices, respectively. $\mathbb{A}^{H}$ represents the conjugate transpose of $\mathbb{A}$ . $[\mathbb{A}]_{i,j}$ represents the $i,j$ -th entry of $\mathbb{A}$ . $\otimes$ denotes the element-wise product of matrices. $\operatorname{Tr}\mathbb{A}$ and $\|\mathbb{A}\|$ represent the trace and the spectral norm of $\mathbb{A}$ . $\mathbb{E}$ represents the expectation operator. $\Phi(x)$ is the cumulative distribution function (CDF) of standard Gaussian distribution and $Q(\cdot)$ is the $Q$ -function, where $Q(x)=1-\phi(x)$ . $\mathcal{CN}$ and $\mathcal{N}$ represent the circularly complex Gaussian and real Gaussian distribution, respectively. $\xrightarrow[N\rightarrow\infty]{\mathcal{D}}$ , $\xrightarrow[N\rightarrow\infty]{\mathcal{P}}$ , and $\xrightarrow[N\rightarrow\infty]{{a.s.}}$ denote the convergence in distribution, the convergence in probability and the almost sure convergence, respectively. $O(\cdot)$ , $o(\cdot)$ , and $\Theta(\cdot)$ represent the Big-O, the Little-o, and the Big-Theta notations, respectively. Specifically, $f(n)\in O(g(n))$ if and only if there exists a positive constant $c$ and a nonnegative integer $n_{0}$ such that $f(n)\leq cg(n)$ for all $n\geq n_{0}$ . $f(n)\in o(g(n))$ if and only if there exists a nonnegative integer $n_{0}$ such that $f(n)\leq cg(n)$ for all $n\geq n_{0}$ for all positive $c$ . $f(n)\in\Theta(g(n))$ if and only if there exist positive $c_{1}$ and $c_{2}$ and nonnegative integer $n_{0}$ such that $c_{1}g(n)\leq f(n)\leq c_{2}g(n)$ for all $n\geq n_{0}$ [24].

II System Model

II-A System Model

Consider a point-to-point downlink MIMO communication system, where an IRS is deployed to establish favorable communication links for the user equipment (UE) that would otherwise be blocked. There are $M$ antennas at the basestation (BS) and $N$ antennas at the UE, and the number of elements at the IRS is $L$ . Given the line-of-sight (LoS) path is blocked, the received signal $\mathbb{y}\in\mathbb{C}^{N}$ is given by

\mathbb{y}=\mathbb{H}_{1}\mathbb{\Psi}\mathbb{H}_{2}\mathbb{s}+\mathbb{n},

(1)

where $\mathbb{s}\in\mathbb{C}^{M}$ denotes the transmitted signal with unit average transmit power, i.e., $\mathbb{E}|s_{i}|^{2}=1,i=1,2,...,M$ , and $\mathbb{n}\in\mathbb{C}^{N}\sim\mathcal{CN}(0,\sigma^{2}\mathbb{I}_{N})$ represents the additive white Gaussian noise (AWGN) with variance $\sigma^{2}$ . $\mathbb{H}_{2}\in\mathbb{C}^{L\times M}$ and $\mathbb{H}_{1}\in\mathbb{C}^{N\times L}$ represent the channel matrices from the BS to the IRS and from the IRS to the UE, respectively. $\mathbb{\Psi}\in\mathbb{C}^{L\times L}=\mathrm{diag}(\psi_{1},\psi_{2},...,\psi_{L})=\mathrm{diag}(e^{\jmath{\theta_{1}}},e^{\jmath{\theta_{2}}},...,e^{\jmath{\theta_{L}}})$ with $\theta_{i}\in[0,2\pi),~{}i=1,2,...,L$ , denotes the phase shifts imposed by the IRS, and we define $\bm{\theta}=(\theta_{1},\theta_{2},...,\theta_{L})$ . In this paper, we consider the general Rayleigh model with

\mathbb{H}_{1}=\mathbb{R}_{1}^{\frac{1}{2}}\mathbb{X}\mathbb{T}^{\frac{1}{2}}_{1},~{}\mathbb{H}_{2}=\mathbb{R}_{2}^{\frac{1}{2}}\mathbb{Y}\mathbb{T}^{\frac{1}{2}}_{2},

(2)

where $\mathbb{R}_{i}$ and $\mathbb{T}_{i},~{}i=1,2$ , are four positive semi-definite correlation matrices. $\mathbb{R}_{1}$ and $\mathbb{T}_{2}$ denote the correlation matrices of receive and transmit antennas, respectively. $\mathbb{T}_{1}$ and $\mathbb{R}_{2}$ denote the transmit and receive correlation matrices of the IRS, respectively. $\mathbb{X}\in\mathbb{C}^{N\times L}$ and $\mathbb{Y}\in\mathbb{C}^{L\times M}$ are two independent and identically distributed (i.i.d.) Gaussian random matrices, whose entries follow $\mathcal{CN}(0,\frac{1}{L})$ and $\mathcal{CN}(0,\frac{1}{M})$ , respectively. We assume that statistical CSI, i.e., correlation matrices of the channel, is available. To obtain the statistical CSI, the samples of the separate channels are needed. In practice, we can estimate the IRS-UE channel and the BS-IRS channel separately by the methods proposed in [25], [26] and then estimate the corresponding channel covariance matrices based on the techniques proposed in [27], [28].

II-B Mutual Information and Outage Probability

The MI of the IRS-aided MIMO system is given by

I(\rho)=\log\det(\mathbb{I}_{N}+\rho\mathbb{H}_{1}\mathbb{\Psi}\mathbb{H}_{2}\mathbb{H}_{2}^{H}\mathbb{\Psi}^{H}\mathbb{H}_{1}^{H}),

(3)

where $\rho=\frac{P}{M\sigma^{2}}$ with $P$ denoting the total transmit power. Based on the distribution of the MI, we can evaluate the performance of IRS-aided MIMO systems with different metrics. For example, the average throughput can be determined by the EMI as $\mathbb{E}I(\rho)$ . On the other hand, the reliability of the system can be measured by the outage probability, which, for a preset transmission rate $R$ , can be written as

P_{out}(R)=\mathbb{P}(I(\rho)<R).

(4)

The EMI of IRS-aided MIMO systems has been obtained in the literature [11]. In the following, we first investigate the distribution of $I(\rho)$ and then utilize the result to analyze the outage probability and the finite-SNR DMT of IRS-aided MIMO systems.

III Characterization of the MI for IRS-aided MIMO Systems

Before introducing our main results, we first present some preliminary results including the approximation of the EMI. The analysis is based on RMT, which has been shown to be efficient in analyzing MIMO systems [29], [30].

III-A Assumptions and Existing Results on EMI

The results of this paper are developed based on the following assumptions.

Assumption 1. $0<\lim\inf\limits_{M\geq 1}\frac{M}{L}\leq\frac{M}{L}\leq\lim\sup\limits_{M\geq 1}\frac{M}{L}<\infty$ , $0<\lim\inf\limits_{M\geq 1}\frac{M}{N}\leq\frac{M}{N}\leq\lim\sup\limits_{M\geq 1}\frac{M}{N}<\infty$ .

Assumption 2. $\lim\sup\limits_{M\geq 1}\|\mathbb{R}_{i}\|<\infty$ , $\lim\sup\limits_{M\geq 1}\|\mathbb{T}_{i}\|<\infty$ , $i=1,2$ [29] [30].

Assumption 3. $\inf\limits_{M\geq 1}\frac{1}{M}\operatorname{Tr}\mathbb{R}_{1}>0$ , $\inf\limits_{M\geq 1}\frac{1}{M}\operatorname{Tr}\mathbb{T}_{2}>0$ , $\inf\limits_{M\geq 1}\frac{1}{M}\operatorname{Tr}\mathbb{T}_{1}\mathbb{\Psi}\mathbb{R}_{2}\mathbb{\Psi}^{H}>0$ [31] [32].

A.1 is the asymptotic regime considered for the large-scale system, where the dimensions of the system ( $M$ , $N$ , and $L$ ) grow to infinity at the same paces. A.2 and A.3 restrict the rank of the correlation matrices so that the extremely low-rank case, i.e., the ranks of the correlation matrices do not increase with the number of antennas, will not occur.

Given the eigenvalue decompositions $\mathbb{R}=\mathbb{U}_{R}\mathbb{R}_{1}\mathbb{U}_{R}^{H}$ , $\mathbb{S}=\mathbb{U}_{S}\mathbb{T}^{\frac{1}{2}}_{1}\mathbb{\Psi}\mathbb{R}_{2}\mathbb{\Psi}^{H}\mathbb{T}^{\frac{1}{2}}_{1}\mathbb{U}_{S}^{H}$ , $\mathbb{T}=\mathbb{U}_{T}\mathbb{T}_{2}\mathbb{U}_{T}^{H}$ , where $\mathbb{R}$ , $\mathbb{S}$ , and $\mathbb{T}$ are diagonal matrices, and the singular value decomposition (SVD) $\mathbb{T}^{\frac{1}{2}}_{1}\mathbb{\Psi}\mathbb{R}_{2}^{\frac{1}{2}}=\mathbb{U}_{S}^{H}\mathbb{S}^{\frac{1}{2}}\mathbb{V}_{S}$ , the MI in (3) can be written as

		$\displaystyle I(\rho)\overset{a}{=}\log\det(\mathbb{I}_{N}+\rho\mathbb{R}_{1}^{\frac{1}{2}}\mathbb{X}\mathbb{T}^{\frac{1}{2}}_{1}\mathbb{\Psi}\mathbb{R}_{2}^{\frac{1}{2}}\mathbb{Y}\mathbb{T}_{2}\mathbb{Y}^{H}\mathbb{R}_{2}^{\frac{1}{2}}$		(5)
		$\displaystyle\times\mathbb{\Psi}^{H}\mathbb{T}^{\frac{1}{2}}_{1}\mathbb{X}^{H}\mathbb{R}_{1}^{\frac{1}{2}})\overset{b}{=}\log\det({\mathbb{I}}_{N}+\rho\mathbb{U}_{R}^{H}\mathbb{R}^{\frac{1}{2}}\mathbb{U}_{R}\mathbb{X}\mathbb{U}_{S}^{H}$
		$\displaystyle\times\mathbb{S}^{\frac{1}{2}}\mathbb{V}_{S}\mathbb{Y}\mathbb{U}_{T}^{H}\mathbb{T}\mathbb{U}_{T}\mathbb{Y}^{H}\mathbb{V}_{S}^{H}\mathbb{S}^{\frac{1}{2}}\mathbb{U}_{S}\mathbb{X}^{H}\mathbb{U}_{R}^{H}\mathbb{R}^{\frac{1}{2}}\mathbb{U}_{R})$
		$\displaystyle\overset{c}{=}\log\det({\mathbb{I}}_{N}+\rho\mathbb{R}^{\frac{1}{2}}\mathbb{X}^{\prime}\mathbb{S}^{\frac{1}{2}}\mathbb{Y}^{\prime}\mathbb{T}\mathbb{Y}^{\prime H}\mathbb{S}^{\frac{1}{2}}\mathbb{X}^{\prime H}\mathbb{R}^{\frac{1}{2}}),$

where step $a$ follows by plugging (2) into (3). Step $b$ is obtained by plugging in the eigenvalue decompositions. Step $c$ follows from $\mathbb{X}^{\prime}=\mathbb{U}_{R}\mathbb{X}\mathbb{U}_{S}^{H}$ , $\mathbb{Y}^{\prime}=\mathbb{V}_{S}\mathbb{Y}\mathbb{U}_{T}^{H}$ , and $\det(\mathbb{I}+\mathbb{A}\mathbb{B})=\det(\mathbb{I}+\mathbb{B}\mathbb{A})$ . Due to the unitary invariant attributes of Gaussian random matrices, a Gaussian matrix $\mathbb{G}$ is equivalent to $\mathbb{G}^{\prime}=\mathbb{U}\mathbb{G}\mathbb{V}$ statistically, where $\mathbb{U}$ and $\mathbb{V}$ are any unitary matrices. Therefore, $\mathbb{X}$ ( $\mathbb{Y}$ ) and $\mathbb{X}^{\prime}$ ( $\mathbb{Y}^{\prime}$ ) are statistically equivalent so that we will not differentiate them in the following. Thus, the equivalent channel matrix can be given by

{\mathbb{H}}=\mathbb{R}^{\frac{1}{2}}\mathbb{X}\mathbb{S}^{\frac{1}{2}}\mathbb{Y}\mathbb{T}^{\frac{1}{2}}.

(6)

The following theorem gives an approximation for the EMI.

Theorem 1.

([33, Theorem 2], [34, Theorem 5] and [11, Corollary 1]) With the channel matrix $\mathbb{H}$ given in (6), if A.1 and A.2 are satisfied, it holds true that, for general random matrices $\mathbb{X}$ and $\mathbb{Y}$ ,

\frac{1}{N}\mathbb{E}I(\rho)\xrightarrow{N\rightarrow\infty}\frac{1}{N}\overline{I}(\rho).

(7)

When $\mathbb{X},\mathbb{Y}$ are Gaussian random matrices [11], it further holds true that

\mathbb{E}I(\rho)\xrightarrow{N\rightarrow\infty}\overline{I}(\rho),

(8)

where $\overline{I}(\rho)$ is given by

	$\displaystyle\overline{I}(\rho)=$	$\displaystyle\log\det(\mathbb{I}_{N}+\frac{\rho Mg\overline{g}}{L\delta}\mathbb{R})+\log\det(\mathbb{I}_{L}+\delta\overline{g}\mathbb{S})$		(9)
		$\displaystyle+\log\det(\mathbb{I}_{M}+g\mathbb{T})-2Mg\overline{g}.$		(9)

Here, $(\delta,g,\overline{g})$ is the unique positive solution of the following system of equations

\delta=\frac{1}{L}\operatorname{Tr}\mathbb{R}\mathbb{Q}_{R},~{}g=\frac{1}{M}\operatorname{Tr}\mathbb{S}\mathbb{Q}_{S},~{}\overline{g}=\frac{1}{M}\mathbb{T}\mathbb{Q}_{T},

(10)

where

	$\displaystyle\mathbb{Q}_{R}$	$\displaystyle=\left(\frac{1}{\rho}\mathbb{I}_{N}+\frac{Mg\overline{g}}{L\delta}\mathbb{R}\right)^{-1},\mathbb{Q}_{S}=\left(\frac{1}{\delta}\mathbb{I}_{L}+\overline{g}\mathbb{S}\right)^{-1},$		(11)
	$\displaystyle\mathbb{Q}_{T}$	$\displaystyle=\left(\mathbb{I}_{M}+g\mathbb{T}\right)^{-1}.$		(11)

Remark 1.

This theorem indicates that $\overline{I}(\rho)$ is a good approximation for $\mathbb{E}I(\rho)$ . For (7) to hold, the random matrices $\mathbb{X}$ and $\mathbb{Y}$ are not necessarily Gaussian where the EMI is normalized by $N$ . (9) indicates that when $\mathbb{X}$ and $\mathbb{Y}$ are Gaussian, the convergence holds even if we get rid of the factor $\frac{1}{N}$ , which may not hold true for non-Gaussian matrices as discussed in [35], [31]. For IRS-aided MIMO systems, the matrix $\mathbb{S}$ is related to the phase shift matrix $\mathbb{\Psi}$ , which indicates that the solution of (10) is also related to the phase shifts. The convergence in Theorem 1 involves the expectation of the MI. On the other hand, by the asymptotic regime A.1, the MI normalized by the number of the receive antennas will converge to a deterministic quantity almost surely when the number of antennas goes to infinity with the same pace [33, Theorem 2], which implies the occurrence of the channel hardening [18], [36],[37].

Next, we investigate the fluctuation of $I(\rho)$ . The challenge arises from the fact that the effective channel is the product of two random matrices. There are some related results in the literature. The CLT of linear spectral statistics for $F$ -matrices was given in [38]. In [39], the CLT of linear spectral statistics with general non-Gaussian entries was given for the product of two i.i.d. random matrices, which was also investigated in [19] by a free probability approach for Gaussian random matrices. The authors of [19] gave the CLT for the MI of double-Rayleigh channels when $M=N$ . However, the CLT for the MI over correlated channels has not been considered and is one of the main contributions of this paper. Based on the CLT, we also investigate the outage probability for IRS-aided MIMO systems using large RMT.

TABLE I: List of Expressions.

Symbols	Expression	Symbols	Expression	Symbols	Expression	Symbols	Expression
$\gamma_{R}$	$\frac{1}{L}\operatorname{Tr}\mathbb{R}^{2}\mathbb{Q}_{R}^{2}$	$\gamma_{R,I}$	$\frac{1}{L}\operatorname{Tr}\mathbb{R}\mathbb{Q}_{R}^{2}$	$\gamma_{S}$	$\frac{1}{M}\operatorname{Tr}\mathbb{S}^{2}\mathbb{Q}_{S}^{2}$	$\gamma_{S,I}$	$\frac{1}{M}\operatorname{Tr}\mathbb{S}\mathbb{Q}_{S}^{2}$
$\gamma_{T}$	$\frac{1}{M}\operatorname{Tr}\mathbb{T}^{2}\mathbb{Q}_{T}^{2}$	$\gamma_{T,I}$	$\frac{1}{M}\operatorname{Tr}\mathbb{T}\mathbb{Q}_{T}^{2}$	$\eta_{R}$	$\frac{1}{L}\operatorname{Tr}\mathbb{R}^{3}\mathbb{Q}_{R}^{3}$	$\eta_{R,I}$	$\frac{1}{L}\operatorname{Tr}\mathbb{R}^{2}\mathbb{Q}_{R}^{3}$
$\eta_{S}$	$\frac{1}{M}\operatorname{Tr}\mathbb{S}^{3}\mathbb{Q}_{S}^{3}$	$\eta_{S,I}$	$\frac{1}{M}\operatorname{Tr}\mathbb{S}^{2}\mathbb{Q}_{S}^{3}$	$\eta_{T}$	$\frac{1}{M}\operatorname{Tr}\mathbb{T}^{3}\mathbb{Q}_{T}^{3}$	$\eta_{T,I}$	$\frac{1}{M}\operatorname{Tr}\mathbb{T}^{2}\mathbb{Q}_{T}^{3}$
$\Delta_{Y}$	$1-\gamma_{S}\gamma_{T}$	$\Gamma$	$\frac{M}{L\delta^{2}}(\frac{\gamma_{T,I}^{2}\gamma_{S}}{\Delta_{Y}}+g^{2}\gamma_{T})$	$\Delta_{X}$	$1-\gamma_{R}\Gamma$	$g(\mathbb{F})$	$\frac{1}{M}\operatorname{Tr}\mathbb{F}\mathbb{Q}_{S}$
$\psi_{T}$	$\frac{1}{M}\operatorname{Tr}\mathbb{T}^{2}\mathbb{Q}_{T}^{4}$	$\Gamma_{L}$	$\Gamma-\frac{\gamma_{S}\psi_{T}}{L\delta^{2}\Delta_{Y}}$	$\gamma_{S}(\mathbb{F})$	$\frac{1}{M}\operatorname{Tr}\mathbb{S}\mathbb{Q}_{S}\mathbb{F}\mathbb{Q}_{S}$	$\eta_{S}(\mathbb{F})$	$\frac{1}{M}\operatorname{Tr}\mathbb{S}\mathbb{Q}_{S}\mathbb{F}\mathbb{Q}_{S}\mathbb{S}\mathbb{Q}_{S}$

III-B Asymptotic Gaussianity of the MI

To characterize the distribution of the MI, we first prove the Gaussianity of the MI.

Theorem 2.

(CLT for the MI) If A.1 - A.3 are satisfied, it holds true that

\frac{I(\rho)-\overline{I}(\rho)}{\sqrt{V(\rho)}}\xrightarrow[N\rightarrow\infty]{\mathcal{D}}\mathcal{N}(0,1),

(12)

where $\overline{I}(\rho)$ is given in (9). The asymptotic variance $V(\rho)$ is given by

\displaystyle V(\rho)

\displaystyle=-\log(1-\gamma_{R}\Gamma_{L})-\log(1-\gamma_{S}\gamma_{T}),

(13)

where $\gamma_{R}$ , $\gamma_{S}$ , $\gamma_{T}$ , and $\Gamma_{L}$ are listed in Table I.

Proof.

The proof of Theorem 2 is given in Appendix A. ∎

Remark 2.

Theorem 2 indicates the asymptotic Gaussianity of the MI. Note that the variance consists of two terms caused by $\mathbb{X}$ and $\mathbb{Y}$ , respectively. Different from the entry-based CLT used in SISO systems [6], [12], this CLT is developed based on RMT. From the proof of Theorem 2, we can observe that the term $-\frac{\gamma_{S}\psi_{T}}{L\delta^{2}\Delta_{Y}}$ in $\Gamma_{L}$ can be omitted if $L$ is large, where $\psi_{T}$ is given in Table I. In this case, we can use $\Gamma$ , listed in Table I, to replace $\Gamma_{L}$ and the variance is given by

\displaystyle V(\rho)

\displaystyle=-\log(1-\gamma_{R}\Gamma)-\log(1-\gamma_{S}\gamma_{T}).

(14)

The large $L$ case will be used in the following analysis as we will investigate the asymptotic performance when $L$ goes to infinity.

III-C Uncorrelated Cases

To better illustrate the impact of the size of the IRS, we further consider the special case where the channels are i.i.d., i.e., $\mathbb{R}=\mathbb{I}_{N},\mathbb{S}=\mathbb{I}_{L},\mathbb{T}=\mathbb{I}_{N}$ , which is referred to as the double-Rayleigh model [19]. To derive the CLT, we first determine the EMI.

Proposition 1.

If A.1 holds true, the EMI of IRS-aided MIMO systems over independent channels with $N=M$ is given by

\displaystyle\overline{I}(\rho)

\displaystyle\!=\!2N\log(1\!+\!g)\!+\!\frac{N}{\tau}\log(1\!+\!\frac{\tau\rho}{(1+g)^{2}})\!-\!\frac{2Ng}{1+g},

(15)

where $\tau=\frac{M}{L}$ , and $g$ is determined by the following cubic equation

g^{3}+2g^{2}+(1+\rho\tau-\rho)g-\rho=0,

(16)

with $g>0$ and $1+(1-\tau)g>0$ .

Proof.

Proposition 1 can be obtained from Theorem 1 by setting $\mathbb{R}=\mathbb{I}_{N},\mathbb{S}=\mathbb{I}_{L},\mathbb{T}=\mathbb{I}_{N}$ , so we omit the proof. ∎

Remark 3.

In this case, the parameters in Theorem 1 degenerate to one parameter $g$ , which is described by a cubic equation. Thus, the performance only depends on the ratio between the number of transmit antennas ( $M/N$ ) and the size of the IRS (L), i.e., $\tau$ .

Proposition 2.

If A.1 holds true, $N=M$ , and $L$ is comparable with $M$ , the CLT for the MI of IRS-aided MIMO systems over independent channels is given by

\frac{I(\rho)-\overline{I}(\rho)}{\sqrt{V(\rho)}}\xrightarrow[N\rightarrow\infty]{\mathcal{D}}\mathcal{N}(0,1),

(17)

where

V(\rho)=\log[\rho(1+g)^{2}]-\log(\rho+2g^{3}+2g^{2}),

(18)

and $g$ is identical to that in Proposition 1.

Proof.

The proof of Proposition 2 is given in Appendix B. ∎

Remark 4.

Proposition 2 was also obtained in [19, Proposition 3] by a free probability approach while in this paper it is shown as a special case of Theorem 2. The different signs of $g^{2}$ and $\rho$ in (18) originate from the fact that the signs of Cauchy transform and Stieljies transform are opposite.

Proposition 1 and Proposition 2 indicate that the mean and variance of the MI are determined by the root $g$ of the cubic equation (16), whose coefficients are related to $\tau$ and SNR $\rho$ .

IV Outage Probability and Finite-SNR DMT Analysis

IV-A Outage Probability of General Correlated IRS-aided MIMO Systems

As a direct result of Theorem 2, the approximation of the outage probability is given by the following theorem.

Theorem 3.

(A closed-form approximation for the outage probability of IRS-aided MIMO systems) Given a preset transmission rate $R$ , the outage probability can be approximated by

P_{out}(R)\approx\Phi\left(\frac{R-\overline{I}(\rho)}{\sqrt{V(\rho)}}\right).

(19)

On the other hand, given the outage probability $p_{out}$ , the outage rate can be approximated by $R\approx\overline{I}(\rho)+\sqrt{V(\rho)}\Phi^{-1}(p_{out})$ .

The above result is different from [20] in the sense that both $\mathbb{R}_{1}$ and $\mathbb{T}_{2}$ are not restricted to be an identity matrix, indicating that the proposed method can handle the general correlated channels. In fact, the result in Theorem 3 is also applicable for double-scattering channels [33] [40].

IV-B Outage Probability Optimization

Given statistical CSI, the optimization problem for the outage probability can be formulated as

$\displaystyle\mathcal{P}1:{}$	$\displaystyle\min_{\mathbb{\Psi}}P_{out}(R),~{}s.t.$	(20)
	$\displaystyle\mathbb{\Psi}=\mathrm{diag}\left(\psi_{1},\psi_{2},...,\psi_{L}\right),$
	$\displaystyle\|\psi_{l}\|=1,l=1,2,...L,$

where $\psi_{l}=\exp(\jmath\theta_{l})$ . The challenges arise from two aspects: (1) Evaluation of the objective function; (2) The non-convexity of the problem caused by the uni-modular constraints of $\psi_{i}$ . The first one can be resolved by approximating the outage probability using (19) in Theorem 3. Given the SNR $\rho$ , the mean $\overline{I}(\rho)$ and variance $V(\rho)$ are functions of the phase shifts $\psi_{l},l=1,2,...,L$ , so we use the notations $\overline{I}(\mathbb{\Psi})$ and $V(\mathbb{\Psi})$ instead. Thus, $\mathcal{P}1$ can be rewritten as

$\displaystyle\mathcal{P}2:{}$	$\displaystyle\min_{\mathbb{\Psi}}~{}G(\mathbb{\Psi})=\Phi\left(\frac{R-\overline{I}(\mathbb{\Psi})}{\sqrt{V(\mathbb{\Psi})}}\right),~{}s.t.$	(21)
	$\displaystyle\mathbb{\Psi}=\mathrm{diag}\left(\psi_{1},\psi_{2},...,\psi_{L}\right),$
	$\displaystyle\|\psi_{l}\|=1,~{}l=1,2,...,L.$

Although the parameters $\delta$ , $g$ , and $\overline{g}$ are coupled with $\theta_{l}$ as explained in Remark 1, the partial derivatives of the objective function with respect to $\theta_{l}$ can be given in a closed form. Therefore, $\mathcal{P}2$ can be solved using the gradient descent method. Specifically, in each iteration, the update of $\theta_{l}$ is obtained by searching in the negative gradient direction, until the value of the objective function converges to a stationary point.

Next, we compute the partial derivatives with respect to $\theta_{l}$ , $l=1,2,...,L$ , and we use the notation $(\cdot)^{\prime}_{l}=\frac{\partial(\cdot)}{\partial\theta_{l}}$ to represent the partial derivatives. By the chain rule, the partial derivative of $G(\mathbb{\Psi})$ with respect to $\theta_{l}$ is given by

\displaystyle G_{l}^{\prime}(\mathbb{\Psi})

\displaystyle=\frac{\exp\left(-\frac{T^{2}(\mathbb{\Psi})}{2}\right)T_{l}^{\prime}(\mathbb{\Psi})}{\sqrt{2\pi}},

(22)

where

T(\mathbb{\Psi})=\frac{R-\overline{I}(\mathbb{\Psi})}{\sqrt{V(\mathbb{\Psi})}},

(23)

	$\displaystyle T_{l}^{\prime}(\mathbb{\Psi})$	$\displaystyle=\frac{-\overline{I}_{l}^{\prime}(\mathbb{\Psi})V(\mathbb{\Psi})-\frac{1}{2}(R-\overline{I}(\mathbb{\Psi}))V^{\prime}_{l}(\mathbb{\Psi})}{V(\rho)^{\frac{3}{2}}},$
	$\displaystyle\overline{I}_{l}^{\prime}(\mathbb{\Psi})$	$\displaystyle=\overline{g}\operatorname{Tr}[(\frac{1}{\delta}\mathbb{I}_{L}+\overline{g}\mathbb{T}^{\frac{1}{2}}_{1}\mathbb{\Psi}\mathbb{R}_{2}\mathbb{\Psi}^{H}\mathbb{T}^{\frac{1}{2}}_{1})^{-1}\mathbb{F}_{l}].$

$\mathbb{F}_{l}$ is defined as

\mathbb{F}_{l}=\mathbb{T}_{1}^{\frac{1}{2}}(\mathbb{G}_{l}\otimes\mathbb{R}_{2})\mathbb{T}_{1}^{\frac{1}{2}},\quad l=1,2,...,L,

(24)

where

\left[\mathbb{G}_{l}\right]_{p,q}=\left\{\begin{aligned} &\jmath e^{\jmath(\theta_{l}-\theta_{q})},&p=l,\\ &-\jmath e^{\jmath(\theta_{p}-\theta_{l})},&q=l,\\ &0,&otherwise.\end{aligned}\right.

(25)

In fact, $\mathbb{F}_{l}=(\mathbb{T}_{1}^{\frac{1}{2}}\mathbb{\Psi}\mathbb{R}_{2}\mathbb{\Psi}^{H}\mathbb{T}_{1}^{\frac{1}{2}})^{\prime}_{l}$ and the term $V^{\prime}_{l}(\rho)$ can be given by

V^{\prime}_{l}(\mathbb{\Psi})=\frac{\gamma_{S}\gamma_{T,l}^{\prime}+\gamma_{S,l}^{\prime}\gamma_{T}}{\Delta_{Y}}+\frac{\gamma_{R}\Gamma_{l}^{\prime}+\gamma_{R,l}^{\prime}\Gamma}{\Delta_{X}},

(26)

where

	$\displaystyle\Gamma_{l}^{\prime}$	$\displaystyle=\frac{M}{L\delta^{2}}[\frac{2\gamma_{T,I}\gamma_{T,I,l}^{\prime}\gamma_{S}}{\Delta_{Y}}+\frac{\gamma_{T,I}^{2}(\gamma^{\prime}_{S,l}+\gamma_{S}^{2}\gamma_{T,l}^{\prime})}{\Delta_{Y}^{2}}$		(27)
	$\displaystyle+$	$\displaystyle 2gg_{l}^{\prime}\gamma_{T}+g^{2}\gamma_{T,l}^{\prime}]-\frac{2M\delta_{l}^{\prime}}{L\delta^{3}}(\frac{\gamma_{S}\gamma_{T,I}^{2}}{\Delta_{Y}}+g^{2}\gamma_{T}).$		(27)

The derivatives of $\gamma$ ’s in (27) are given as

$\displaystyle\gamma_{R,l}^{\prime}$	$\displaystyle=\frac{-2M\eta_{R}(\delta g^{\prime}_{l}\overline{g}+\delta g\overline{g}^{\prime}_{l}-g\overline{g}\delta^{\prime}_{l})}{L\delta^{2}},$	(28)
$\displaystyle\gamma_{T,l}^{\prime}$	$\displaystyle=-2g^{\prime}\eta_{T},$
$\displaystyle\gamma_{S,l}^{\prime}$	$\displaystyle=-2\overline{g}^{\prime}\eta_{S}-2\overline{g}\eta_{S}(\mathbb{F}_{l})+\frac{2\delta^{\prime}\eta_{S,I}}{\delta^{2}}+2\gamma_{S}(\mathbb{F}_{l}),$

where $(\delta^{\prime}_{l},g^{\prime}_{l},\overline{g}^{\prime}_{l})$ can be computed by the following lemma.

Lemma 1.

Given that $(\delta,g,\overline{g})$ is the positive solution of (10) and $\mathbb{p}_{l}=(\delta^{\prime}_{l},g^{\prime}_{l},\overline{g}^{\prime}_{l})^{T}$ , it holds true that

\displaystyle\mathbb{A}\mathbb{p}_{l}=\mathbb{q}_{l},

(29)

and $|\mathbb{A}|>0$ , where $\mathbb{A}$ and $\mathbb{q}_{l}$ are defined by

	$\displaystyle\mathbb{A}$	$\displaystyle=\begin{bmatrix}z\gamma_{R,I}&\frac{M\overline{g}\gamma_{R}}{L}&\frac{Mg\gamma_{R}}{L}\\ -\frac{\gamma_{S,I}}{{\delta^{2}}}&1&\gamma_{S}\\ 0&\gamma_{T}&1\\ \end{bmatrix},$		(30)
	$\displaystyle\mathbb{q}_{l}$	$\displaystyle=\begin{bmatrix}0&g(\mathbb{F}_{l})-2\overline{g}\gamma_{S}(\mathbb{F}_{l})&0\end{bmatrix}^{T}.$		(30)

Therefore, we have

\mathbb{p}_{l}=\mathbb{A}^{-1}\mathbb{q}_{l}.

(31)

Proof.

The proof can be obtained by taking derivatives of $(\delta,g,\overline{g})$ with respect to $\theta_{l}$ at both sides of (10), and the details are omitted here. We have $|\mathbb{A}|>0$ because

|\mathbb{A}|=z\gamma_{R,I}(1-\gamma_{S}\gamma_{T})+\frac{M\gamma_{R}\gamma_{S,I}\gamma_{T,I}}{L\delta^{2}},

(32)

and the two terms at the right-hand side (RHS) are positive, so $\mathbb{A}$ is invertible. ∎

Next, we provide the gradient descent method. We use the Armijo-Goldstein (AG) line search method [41] to find an expected decrease of the objective function based on the local gradients. The AG condition is given in the $4$ th line of Algorithm 1. The algorithm can be proved to converge to a stationary point by following the proof of Theorem 1 in [42] or Theorem 1 in [43]. The performance of the algorithm will be evaluated in Section VI.

Complexity Analysis: In each iteration, we need to compute $(\delta,g,\overline{g})$ according to the updated $\mathbb{\Phi}$ by solving (10). To avoid the computation induced by the matrix inversion, we can use the diagonal matrix $\mathbb{\Lambda}_{R}=\mathbb{U}\mathbb{R}\mathbb{U}^{H}$ to replace $\mathbb{R}$ by its eigenvalue decomposition, whose complexity is $O(N^{3})$ [44]. Similar operations can be performed on $\mathbb{S}$ and $\mathbb{T}$ . By the analysis in Appendix E, the complexity of obtaining an $\varepsilon$ -approximation of the solution is $N\log^{2}(\frac{1}{\varepsilon})$ . Assume that the numbers of outer and inner iterations are $N_{outer}$ and $N_{inner}$ , respectively. Then the total complexity of the algorithm will be $O(N_{outer}N_{inner}(N^{3}+N\log^{2}(\frac{1}{\varepsilon})))$ , where $N_{outer}$ and $N_{inner}$ are determined by the convergence condition and the second-order functional attributes of $G(\mathbb{\Psi})$ . As a smaller value for the objective function is obtained in each iteration, the algorithm will converge to a stationary point. With the statistical CSI, the phase shifts may not need to be updated frequently so that the overall complexity of Algorithm 1 is not high in real systems.

Algorithm 1 Gradient Descent Algorithm for the Phase Shift Matrix

\mathbb{\Psi}

\bm{\theta}^{\left(0\right)}

, initial stepsize

\alpha_{0}

, scaling factor

0<c<1

and control parameter

0<\beta<1

. Set

t=0

1: repeat

2: Compute the gradient

\nabla_{\bm{\theta}}G(\mathbb{\Psi})=(\frac{\partial G(\mathbb{\Psi})}{\partial\theta_{1}},\frac{\partial G(\mathbb{\Psi})}{\partial\theta_{2}},...,\frac{\partial G(\mathbb{\Psi})}{\partial\theta_{L}})^{T},

according to (22) and its direction

\mathbb{d}^{(t)}=\frac{\nabla_{\bm{\theta}}G(\mathbb{\Psi})}{\|\nabla_{\bm{\theta}}G(\mathbb{\Psi})\|}

\alpha\leftarrow\alpha_{0}

4: while

G(\mathbb{\Psi}^{(t)})-G(\mathrm{diag}[\exp(\jmath\bm{\theta}^{(t)}-\alpha\jmath\mathbb{d}^{(t)})])<\alpha\beta\|\nabla_{\bm{\theta}}G(\mathbb{\Psi}^{(t)})\|

\alpha\leftarrow c\alpha

6: end while

\bm{\theta}^{(t+1)}\leftarrow\bm{\mathbb{\theta}}^{(t)}-\alpha\mathbb{d}^{(t)}

\mathbb{\Psi}^{(t+1)}\leftarrow\mathrm{diag}[\exp(\jmath\bm{\theta}^{(t+1)})]

t\leftarrow t+1

10: until Convergence.

10:

\bm{\theta}^{(t)},\mathbb{\Psi}^{(t)}

IV-C Finite SNR DMT for IRS-aided Systems

In this section, we investigate the finite-SNR DMT of IRS-aided systems. DMT was proposed in [21] to characterize the trade-off between diversity and multiplexing gain, which is typically referred to as the asymptotic-SNR DMT. Here, we investigate the finite-SNR DMT [22], [23], which provides more insightful information for the low and moderate SNR regimes.

By the definition of finite-SNR DMT [23], the multiplexing gain is given by

m=\frac{kR}{\mathbb{E}I(\rho)}\approx\frac{kR}{\overline{I}(\rho)},

(33)

where $k=\min(L,M,N)$ and $R$ is the data rate. By approximating the outage probability using the CLT in Theorem 2, we obtain the following theorem regarding the finite-SNR DMT for IRS-aided systems in Rayleigh channels with equal power allocation among different antennas.

Theorem 4.

The DMT in the finite-SNR regime can be approximated by

\displaystyle d(m,\rho)

\displaystyle=\frac{z(m-k)}{d\sqrt{2\pi}}\frac{\exp(-\frac{(m-k)^{2}H^{2}(z)}{2k^{2}})H^{\prime}(z)}{\Phi(\frac{(m-k)H(z)}{k})},

(34)

where $H(z)=\frac{\overline{I}(z)}{\sqrt{V(z)}}$ . $\overline{I}(z)$ is obtained by replacing $\rho$ in $\overline{I}(\rho)$ with $z=\rho^{-1}$ , and the same holds for $V(z)$ . $H^{\prime}(z)=\frac{\mathrm{d}H(z)}{\mathrm{d}z}$ is given as

\displaystyle H^{\prime}(z)=\frac{\overline{I}^{\prime}(z)V(z)-\frac{1}{2}\overline{I}(z)V^{\prime}(z)}{V(z)^{\frac{3}{2}}},

(35)

where

		$\displaystyle\overline{I}^{\prime}(z)=-\frac{N}{z}+\operatorname{Tr}\mathbb{Q}_{R},$		(36)
		$\displaystyle V^{\prime}_{l}(\rho)=\frac{\gamma_{S}\gamma_{T}^{\prime}+\gamma_{S}^{\prime}\gamma_{T}}{\Delta_{Y}}+\frac{\gamma_{R}\Gamma^{\prime}+\gamma_{R}^{\prime}\Gamma}{\Delta_{X}},$
		$\displaystyle\Gamma^{\prime}=\frac{M}{L\delta^{2}}[\frac{2\gamma_{T,I}\gamma_{T,I}^{\prime}\gamma_{S}}{\Delta_{Y}}+\frac{\gamma_{T,I}^{2}(\gamma^{\prime}_{S}+\gamma_{S}^{2}\gamma_{T}^{\prime})}{\Delta_{Y}^{2}}$
		$\displaystyle+2gg^{\prime}\gamma_{T}+g^{2}\gamma_{T}^{\prime}]-\frac{2M\delta^{\prime}}{L\delta^{3}}(\frac{\gamma_{S}\gamma_{T,I}^{2}}{1-\gamma_{S}\gamma_{T}}+g^{2}\gamma_{T}),$
		$\displaystyle\gamma_{R}^{\prime}=\frac{-2M\eta_{R}(\delta g^{\prime}\overline{g}+\delta g\overline{g}^{\prime}-g\overline{g}\delta^{\prime})}{L\delta^{2}}-2\eta_{R,I},$
		$\displaystyle\gamma_{S}^{\prime}=-2\overline{g}^{\prime}\eta_{S}+\frac{2\delta^{\prime}\eta_{S,I}}{\delta^{2}},$
		$\displaystyle\gamma_{T}^{\prime}=-2g^{\prime}\eta_{T},$
		$\displaystyle[\delta^{\prime},g^{\prime},\overline{g}^{\prime}]^{T}=\mathbb{A}^{-1}[-\delta\gamma_{R,I},0,0]^{T}.$

$\mathbb{A}$ is defined in (30) and other symbols are given in Table I.

Proof.

The proof of Theorem 4 is given in Appendix C. ∎

Remark 5.

Similar to (22), the derivatives in (36) are obtained by the chain rule. It can be observed that $H(z)$ plays an important role in the finite-SNR DMT. The expression can be further simplified by an approximation, which is given in the following proposition.

Proposition 3.

(An approximation for the finite-SNR DMT) The finite-SNR DMT in Theorem 4 can be approximated by

d(m,\rho)\approx-\frac{z(m-k)^{2}H(z)H^{\prime}(z)}{k^{2}}.

(37)

Remark 6.

This approximation is obtained by the approximation $Q(x)\leq\frac{e^{-\frac{x^{2}}{2}}}{2},~{}x>0$ [45]. It can be observed that the finite-SNR DMT is highly related to the ratio between the mean and the standard deviation of the MI, i.e., $H(z)$ . This indicates that the effect of the SNR on the finite-SNR DMT is represented through the term $-zH(z)H^{\prime}(z)$ .

V How Large the IRS Needs to Be?

V-A Impact of the Size of IRSs on EMI

We first characterize the relation between the EMI and the size of the IRS. For simplicity, we assume that $M=N$ and focus on the parameter $\tau$ , i.e., the ratio between the number of antennas $N$ and the size of the IRS $L$ . To study the performance gain obtained by increasing the size of the IRS, we focus on the case when $\tau<1$ , which means that the size of the IRS is larger than the numbers of antennas at the transceiver. Results with $\tau>1$ can be handled similarly. Inspired by the concept of massive MIMO efficiency proposed in [46], we define the concept called IRS efficiency to characterize the efficiency of increasing the IRS size to achieve higher EMI.

Definition 1.

For a given SNR $\rho$ , the IRS efficiency $\eta\in(0,1)$ is defined as

\displaystyle\eta=\frac{\overline{I}(\rho)}{\overline{I}_{\infty}(\rho)},

(38)

where $\overline{I}(\rho)$ denotes the EMI achieved by a given size of the IRS and $\overline{I}_{\infty}(\rho)$ represents the maximum throughput with infinitely large IRSs. Specifically, the IRS efficiency $\eta$ represents the percentage of the maximum throughput that is achieved by a given size of IRS.

Given $\eta$ , we can determine the minimum size of the IRS needed to achieve at least the fraction $\eta$ of the asymptotic performance. To better illustrate the impact of the IRS size, we consider independent channels and ignore the optimization of phase shifts. According to Proposition 1, for given $\rho$ and $M$ , $\overline{I}(\rho)$ will be determined only by $\tau$ . Next, we will investigate $\overline{I}_{\infty}(\rho)$ .

V-B Asymptotic Mean and Variance for the MI

We first determine $\overline{I}_{\infty}(\rho)$ for a finite SNR $\rho$ . The roots of the equation in (16) can be given by Cardano’s formula [19] as

\displaystyle g_{m}\!=\!\frac{e^{\frac{m\pi}{3}}\omega(\tau,\rho)+e^{-\frac{m\pi}{3}}{\omega}^{*}(\tau,\rho)-2}{3},m=1,2,3,

(39)

where ${\omega}(\tau,\rho)$ and ${\omega}^{*}(\tau,\rho)$ are given by

		$\displaystyle\omega(\tau,\rho)=(\sqrt{(3\rho\tau-3\rho-1)^{3}+(1+\frac{9\rho}{2}+9\rho\tau)^{2}}$		(40)
		$\displaystyle+1+\frac{9\rho}{2}+9\rho\tau)^{\frac{1}{3}},$
		$\displaystyle\omega(\tau,\rho)^{*}=\frac{(1-3\rho\tau+3\rho)}{\omega(\tau,\rho)}.$

$(\cdot)^{\frac{1}{3}}$ and $(\cdot)^{\frac{1}{2}}$ take any cubic root and square root, respectively. It is straightforward to verify that the term under the square root operator is negative when $\tau=0$ and there are three real roots. In fact, by Vieta’s formulas, the only positive root is the one we desire. When the size of IRS goes to infinity, i.e., $\tau\rightarrow 0$ , the asymptotic solution of the cubic equation is given by

g_{\infty}=\frac{2(\mathrm{Re}\left\{\omega(0,\rho)\right\}-1)}{3},

(41)

where $\omega(0,\rho)$ is chosen as the one whose real and imaginary parts are both positive. In this case, the asymptotic EMI with $\tau\rightarrow 0$ is given by

\displaystyle\overline{I}_{\infty}(\rho)

\displaystyle=2N\log(1+g_{\infty})+\frac{N\rho}{(1+g_{\infty})^{2}}-\frac{2Ng_{\infty}}{1+g_{\infty}}.

(42)

In fact, the asymptotic mean and variance can be obtained by an easier approach as they are equal to those of the MI for a single-hop MIMO system over independent Rayleigh channels, as shown in the following proposition.

Proposition 4.

When $N=M$ , the mean and variance of the MI of a single-hop and independent Rayleigh channel are given by:

	$\displaystyle\overline{I}_{Rayleigh}(\rho)$	$\displaystyle=N\log(\delta+1+\rho)-\frac{N\delta}{1+\delta},$		(43)
	$\displaystyle V_{Rayleigh}(\rho)$	$\displaystyle=\log[\frac{(1+\delta)^{2}}{2\delta+1}],$		(43)

where $\delta$ is the positive solution of $\delta^{2}+\delta-\rho=0$ . It holds true that

\overline{I}_{\infty}(\rho)=\overline{I}_{Rayleigh}(\rho),~{}~{}V_{\infty}(\rho)=V_{Rayleigh}(\rho).

(44)

Proof.

The expression in (43) can be obtained as a special case of Proposition 1 in [32]. Then, we have $g_{\infty}=\delta$ since the cubic equation in (16) degenerates to a quadratic one, i.e., $g(1+g)=\rho$ . Therefore, it follows from (42)

$\displaystyle\overline{I}_{\infty}(\rho)$	$\displaystyle=N\log[(1+g_{\infty})^{2}]+\frac{Ng_{\infty}}{1+g_{\infty}}-\frac{2Ng_{\infty}}{1+g_{\infty}}$	(45)
	$\displaystyle=N\log[1+g_{\infty}+g_{\infty}(1+g_{\infty})]-\frac{Ng_{\infty}}{1+g_{\infty}}$
	$\displaystyle=N\log(1+\rho+g_{\infty})-\frac{Ng_{\infty}}{1+g_{\infty}}$
	$\displaystyle=\overline{I}_{Rayleigh}(\rho).$

With similar manipulations, the asymptotic variance can be written as

	$\displaystyle V_{\infty}(\rho)$	$\displaystyle=\log[\rho(1+g_{\infty})^{2}]-\log(\rho+2\rho g_{\infty})$		(46)
		$\displaystyle=\log[\frac{(1+g_{\infty})^{2}}{2g_{\infty}+1}]=V_{Rayleigh}(\rho).$		(46)

∎

Remark 7.

(44) indicates that the maximum throughput and the asymptotic variance of the MI with the two-hop IRS channel are equal to those of a single-hop channel if we do not consider correlations and path loss.

Based on (42), we can determine how large the IRS needs to be to achieve a certain percentage of the asymptotic performance. The results for the general correlated cases can be derived similarly by Theorem 1, but with an implicit expression. The impact of the IRS size on the finite-SNR DMT is omitted here due to complex calculations and approximations. Nevertheless, it can be obtained similarly based on Theorem 4. Specifically, we have obtained the approximation for $g$ , the EMI, and the variance. To derive the finite-SNR DMT over correlated channels, we need to calculate the derivatives of the EMI and the variance with respect to $\rho$ by Theorem 4 and omit the higher order terms.

In the above derivation, we assume that the transmitter and receiver have the same number of antennas. The derivations for unequal cases can also be performed in similar ways. Furthermore, the correlated case can also be investigated numerically using Theorem 1 and Theorem 2.

V-C The Impact of the IRS Size in the High SNR Regime

At high SNRs, the EMI and outage probability can be approximated as a more explicit function of $\tau$ compared with the cubic root form in (39), which is presented by the following theorem.

Theorem 5.

(High SNR case) When $\tau<1$ and $\rho\gg 1$ , the only positive solution of the cubic equation in (16) can be approximated as

	$\displaystyle g$	$\displaystyle=\sqrt{(1-\tau)\rho}+[\frac{1}{2(1-\tau)}-1]+o(1)$		(47)
		$\displaystyle=a\rho^{\frac{1}{2}}+b+o(1).$		(47)

When $N$ is fixed and $L\ll\sqrt{\rho}$ , the mean and variance of the MI can be approximated by

$\displaystyle\overline{I}(\rho)$	$\displaystyle=N[\log(\rho)-2-(\frac{1}{\tau}-1)\log(1-\tau)+\frac{2}{a}\rho^{-\frac{1}{2}}]$	(48)
	$\displaystyle+o(\rho^{-\frac{1}{2}}),$
and
$\displaystyle V(\rho)$	$\displaystyle=\frac{1}{2}\log(\frac{\rho}{4a^{2}})+\frac{2a^{2}(1-b)-1}{2a^{3}}\rho^{-\frac{1}{2}}+o(\rho^{-\frac{1}{2}}),$

respectively. Furthermore, when $N$ is fixed and $L$ is comparable to $\sqrt{\rho}$ or far larger than $\sqrt{\rho}$ , the mean and variance of the MI can be approximated by

	$\displaystyle\overline{I}(\rho)$	$\displaystyle=N[\log(\frac{\rho}{e})-\frac{2N}{L}+2\rho^{-\frac{1}{2}}]+o(\rho^{-\frac{1}{2}}),$		(49)
	$\displaystyle V(\rho)$	$\displaystyle=\frac{1}{2}\log(\frac{\rho}{4})+\frac{N}{2L}+\rho^{-\frac{1}{2}}+o(\rho^{-\frac{1}{2}}).$		(49)

The outage probability can be further approximated by $P_{out}(R)\approx\Phi\left(\frac{R-\overline{I}(\rho)}{\sqrt{V(\rho)}}\right)$ .

Remark 8.

Asymptotic performance: At high SNRs, as $L$ goes to infinity, $\overline{I}(\rho)$ and $V(\rho)$ converge to an asymptotic value as shown in (49), which is dominated by the term $\log(\rho)$ .

Remark 9.

Fast convergence: $\overline{I}(\rho)$ and $V(\rho)$ converge fast to the asymptotic values. When $L$ is large, the rate of convergence to the asymptotic performance is $O(\frac{1}{L})$ . This indicates that the gain of EMI achieved by increasing the size of the IRS decreases when the IRS size is getting larger.

Remark 10.

SNR vs. size of IRS: We can observe from (49) that the impact of the IRS size is very limited compared with that of the SNR as the mean and variance are dominated by the SNR.

Remark 11.

Comparison with the single-hop Rayleigh channel: At high SNRs, the mean and variance of the MI for a single-hop i.i.d. Rayleigh channel is given in [23]:

	$\displaystyle\overline{I}(\rho)$	$\displaystyle=N[\log(\frac{\rho}{e})+2\rho^{-\frac{1}{2}}]+o(\rho^{-\frac{1}{2}}),$		(50)
	$\displaystyle V(\rho)$	$\displaystyle=\frac{1}{2}\log(\frac{\rho}{4})+\rho^{-\frac{1}{2}}+o(\rho^{-\frac{1}{2}}).$		(50)

Comparing the results in (50) and (49), we can observe that the difference induced by the IRS is of order $O(\frac{1}{L})$ , which indicates that these two results converge as the IRS size goes to infinity.

The analysis in this section revealed the effect of increasing the size of the IRS, which provides guidance on the design of practical systems. In fact, as indicated by the square law [1], [47], the impact of the IRS size is not only related to the physical size but also the passive beamforming gains introduced by designing phase shifts. In our analysis, the impact of the phase shifts is ignored as there are no closed-form optimal phase shifts for the correlated IRS-aided MIMO system.

VI Numerical Results

In this section, simulations are performed to validate the theoretical results. Here we adopt the exponential correlation model. Specifically, the entries of $\mathbb{R}_{i},\mathbb{T}_{i},i=1,2$ , can be represented by $[\mathbb{C}(\mu)]_{i,j}=\mu^{|i-j|}$ . The unit of the preset transmission rate $R$ is bits/s/Hz. We assume the distance from the BS to the IRS and that from the IRS to the UE, denoted by $d_{BS-IRS}$ and $d_{IRS-UE}$ , respectively, are both $10~{}\mathrm{m}$ ¹¹1The channel model utilized in this paper is proposed for the far-field region, i.e., the transimission distance is larger than the Rayleigh distance. As the MIMO/IRS size becomes larger, the Rayleigh distance will increase. The simulation setting is adopted from previous works and for illustration purpose only. Caution should be exercised when the antenna size is very large.. The distance-dependent path loss is given by

L(d)=C_{0}(\frac{d}{D_{0}})^{-\alpha},

(51)

where $C_{0}=-30~{}\mathrm{dB}$ is the path loss at the reference distance $D_{0}=1~{}\mathrm{m}$ [47]. $d$ denotes the distance of the link, and $\alpha$ denotes the path loss exponent. The path loss exponent of the link from the IRS to the UE is $\alpha_{IRS-UE}=3$ and that from the BS to the IRS is $\alpha_{BS-IRS}=2$ [47]. In the following figures, “Ana” and “Sim” will be used to represent the analytical results and Monte-Carlo simulations, respectively.

VI-A Outage Probability Approximation

First, we consider the case when there is no correlation at the transmitter side, for which both the proposed method and the method based on Mellin transform [20] can be applied. The parameters are set to be $M=L=N=3$ , $\mathbb{R}_{1}=\mathbb{R}_{2}=\mathbb{T}_{1}=\mathbb{C}(\mu)$ with $\mu=0.5$ , and $\mathbb{T}_{2}=\mathbb{I}_{M}$ . The power of the noise $\sigma^{2}=-114.7~{}\mathrm{dBm}$ . The phase shift matrix is set to be $\mathbb{\Psi}=\mathbb{I}_{L}$ . The Monte-Carlo simulation values are computed over $10^{8}$ realizations, and we use the variance in (13) for the small $L$ . It can be observed from Fig. 1 that the performance of the proposed method is almost the same as the one based on Mellin transform but with a simpler expression.

In Fig. 2, we consider the correlated case with $\mathbb{T}_{2}=\mathbb{C}(0.5)$ and plot the outage probability with respect to the rate threshold when $L=3,16,32$ , respectively. Markers represent the simulation results. It can be observed that the proposed method works well when there are correlations at both the transmitter and receiver sides.

VI-B Effectiveness of the Proposed Optimization Algorithm

Refer to caption — Figure 1: Performance comparison with the Mellin transform-based method in [20].

Fig. 3 verifies the effectiveness of Algorithm 1 in minimizing the outage probability where $M=N=4$ and $\mu=0.8$ is used for all correlation matrices. The power of the noise $\sigma^{2}=-116~{}\mathrm{dBm}$ . The initial phase shifts are set to be ${\psi}_{i}=e^{\frac{\jmath 2\pi i}{L}},i=1,2,...,L$ . The initial step size is set as $\alpha_{0}=0.0005$ and the scale factor $\beta=0.5$ . It can be observed that the outage probability is efficiently improved by Algorithm 1. Fig. 4 shows the outage probability in each outer iteration when $L=32$ , from which we can observe that the outage probability converges.

VI-C Impact of Correlations at the Transceiver

In Fig. 5, we investigate the impact of the correlations at the transceiver. We set the correlation coefficients at the IRS to be $0.5$ with $L=16$ , while the coefficients of the correlation matrices at the transceiver, i.e., $\mathbb{R}_{1}=\mathbb{T}_{2}$ , are set to range from 0 to $0.9$ with $M=N=4$ and $\sigma^{2}=-116~{}\mathrm{dBm}$ . The outage probability is computed according to the phase shifts optimized by Algorithm 1. It can be observed that the correlation at the transceiver has a negative effect on the outage performance. Similar observations have been reported in [20], [37].

VI-D Finite SNR DMT

Fig. 6 validates the accuracy of the closed-form expression for the finite SNR DMT as shown in (34). The parameters are set as $N=M=4$ , $L=2$ , $\mu=0.5$ , $\mathbb{\Psi}=\mathbb{I}_{L}$ , and $\sigma^{2}=-116~{}\mathrm{dBm}$ . It can be observed that the closed-form expression is very accurate. Furthermore, the finite-SNR DMT is more accurate at low SNRs compared with the asymptotic SNR results in [48].

VI-E Impact of the Size of the IRS

Asymptotic Performance and Convergence: In Fig. 7, we show how the EMI changes when the IRS size increases. The parameters are set as $M=N=20$ with independent channels. The range of the size of IRS is from $4$ to $100$ . It can be observed that the EMI increases and approaches the asymptotic performance as the IRS size becomes larger. However, the increasing rate of the EMI decreases, which indicates that the performance gain from the increment of the IRS size varnishes for larger IRS. This agrees with Remarks 8 and 9. Similar comparisons are performed for the variance of the MI with $M=N=4$ and $R=4$ . $\sigma^{2}=-116~{}\mathrm{dBm}$ . It can be observed from Fig. 8 that the variance decreases and approaches the asymptotic value when the size of the IRS becomes larger and the approximations for the variance are accurate.

For the correlated case, we use the numerical results to show the bottleneck for the performance improvement induced by increasing the size of the IRS. In Fig. 9 and Fig. 10, the EMI and the variance with the phase shifts optimized by Algorithm 1 are given. We have similar observations, i.e., both the EMI and variance will approach the limit as the size of the IRS increases.

IRS Efficiency: In Fig. 11, the size of the IRS needed to achieve $\eta$ of the asymptotic performance is plotted when $M=N=20$ . It is worth noting that the required size for $\eta=0.95$ is almost twice that for $\eta=0.9$ , which means that the $5\%$ performance increment requires a huge increase in the size of the IRS. This agrees with Remark 9. In practice, we need to decide whether it is worth the cost of deploying larger IRSs to obtain the additional performance.

High SNR Regime: In Figs. 12, the approximation performance of Theorem 5 for the outage probability in the high SNR regime is validated with $M=N=4$ . The range of the IRS size is from $8$ to $256$ and the transmit power is set to be $50$ dBm. It can be observed that the approximation in (48) is accurate in the high SNR regime.

SNR vs. IRS Size: We can observe from Figs. 7 and 12 that, when the size of the IRS increases, the performance (EMI and outage probability) improves but the rate of change decreases, leading to the convergence to the asymptotic value. Furthermore, the EMI gain caused by the increment of SNR is far larger than that caused by the increment of the size of IRS, which agrees with Remark 10.

VII Conclusion

In this paper, we investigated the outage probability and finite-SNR DMT of IRS-aided MIMO systems assuming statistical CSI. By leveraging RMT, the CLT of the MI was derived and the outage probability was approximated in a closed-form. A gradient descent algorithm was proposed to minimize the outage probability by optimizing the phase shifts, which was shown to be efficient by numerical results. To better characterize the trade-off between the outage probability and the throughput in an operational SNR regime, a closed-form finite SNR DMT was presented. Finally, the impact of the IRS size on the EMI and outage probability was studied. Both theoretical and simulation results showed that larger IRSs provide better performance but the gain saturates quickly as the size of IRS increases. For example, to improve the throughput from 90% to 95% of its maximum value, a huge hardware cost (IRS size) is required. The results in this work not only provided accurate characterization for the MI and outage probability of IRS-aided systems, which is not available in the literature, but also revealed the impact of the IRS size on system performance, offering valuable guidance for practical system deployment. Furthermore, the result in this work can be extended to the case with the existence of the direct link. More efforts have to be involved since one more random matrix (the channel matrix of the direct link) needs to be handled in an iterative approach and more complex computations are required for calculating the asymptotic variance.

Appendix A Proof of Theorem 2

Proof.

Recall that $\mathbb{R},\mathbb{S},\mathbb{T}$ can be replaced by diagonal matrices consisting of their eigenvalue matrix. Therefore, $\mathbb{R},\mathbb{S},\mathbb{T}$ are assumed to be diagonal without loss of generality, i.e. $\mathbb{R}=\mathrm{diag}(r_{1},r_{2},...,r_{N})$ , where $r_{1},r_{2},...,r_{N}$ are the eigenvalues of $\mathbb{R}$ . Similarly, we have $\mathbb{S}=\mathrm{diag}(s_{1},s_{2},...,s_{L})$ and $\mathbb{T}=\mathrm{diag}(t_{1},t_{2},...,t_{M})$ . As the channel matrix $\mathbb{H}$ in (6) is the product of two random matrices, i.e., $\mathbb{X}$ and $\mathbb{Y}$ , we use the martingale method to show the CLT [38]. First, we rewrite the process $I(\rho)-\mathbb{E}I(\rho)$ as a summation of two processes:

[I(\rho)-\mathbb{E}(I(\rho)|\mathbb{Y})]+[\mathbb{E}(I(\rho)|\mathbb{Y})-\mathbb{E}I(\rho)],

(52)

whose variances are $V_{1}$ and $V_{2}$ respectively. Then, we show that the two processes are asymptotically Gaussian so that $I(\rho)-\mathbb{E}I(\rho)$ will be Gaussian. In the following, we first provide the outline of the proof in steps and then give the details for each step.

Step 1: Consider the first process $I(\rho)-\mathbb{E}(I(\rho)|\mathbb{Y})$ given $\mathbb{Y}=\left\{all~{}\mathbb{Y}\right\}$ and show that it converges to a Gaussian process. Furthermore, it is proved that the asymptotic mean and variance are independent of the condition $\mathbb{Y}$ , which indicates that the asymptotic distribution of the first process is independent of that for the second process $\mathbb{E}(I(\rho)|\mathbb{Y})-\mathbb{E}I(\rho)$ because its mean and variance are deterministic.

Step 2: Show the asymptotic distribution of $\mathbb{E}(I(\rho)|\mathbb{Y})-\mathbb{E}I(\rho)$ is Gaussian.

Step 3: The sum of two independent Gaussian random processes will result in a Gaussian random process and we will compute the expression of the asymptotic variance in this step.

Step 1: Denote $\mathbb{W}=\mathbb{S}^{\frac{1}{2}}\mathbb{Y}\mathbb{T}^{\frac{1}{2}}$ and it follows from (6) that the channel matrix can be given by $\mathbb{H}=\mathbb{R}^{\frac{1}{2}}\mathbb{X}\mathbb{W}$ . $\|\mathbb{W}\mathbb{W}^{H}\|\leq\|\mathbb{S}\|\|\mathbb{T}\|(1+\sqrt{\frac{L}{M}})<\infty$ and $\lim\inf\frac{1}{L}\operatorname{Tr}\mathbb{Y}\mathbb{Y}^{H}>0$ hold true with probability one. Therefore, the condition of Theorem 1 in [32] is satisfied with probability one, from which we can conclude that $I(\rho)-\mathbb{E}(I(\rho)|\mathbb{Y})$ converges to a zero-mean Gaussian process and the asymptotic variance $V_{1}$ is given by

V_{1}=\mathrm{Var}(I(\rho)|\mathbb{Y})\xrightarrow[M\rightarrow\infty]{\mathcal{P}}-\log(1-\gamma_{\mathbb{Y}}\widetilde{\gamma}_{\mathbb{Y}}),

(53)

with

		$\displaystyle\gamma_{\mathbb{Y}}=\frac{1}{L}\operatorname{Tr}\mathbb{R}\mathbb{C}\mathbb{R}\mathbb{C},\widetilde{\gamma}_{\mathbb{Y}}=\frac{1}{L}\operatorname{Tr}\mathbb{W}\mathbb{W}^{H}\widetilde{\mathbb{C}}\mathbb{W}\mathbb{W}^{H}\widetilde{\mathbb{C}},$		(54)
		$\displaystyle\mathbb{C}=(z\mathbb{I}_{N}+\widetilde{f}\mathbb{R})^{-1},~{}\widetilde{\mathbb{C}}=(\mathbb{I}_{L}+f\mathbb{W}\mathbb{W}^{H})^{-1},$
		$\displaystyle f=\frac{1}{L}\operatorname{Tr}\mathbb{R}\mathbb{C},~{}\widetilde{f}=\frac{1}{L}\operatorname{Tr}\mathbb{W}\mathbb{W}^{H}\widetilde{\mathbb{C}}.$

The RHS of (53) can be further evaluated by the following lemma.

Lemma 2.

Assuming that A.1-A.3 hold true and following the notations in (LABEL:first_vv), there holds true that

		$\displaystyle-\log(1-\gamma_{\mathbb{Y}}\widetilde{\gamma}_{\mathbb{Y}})\xrightarrow[M\rightarrow\infty]{\mathcal{P}}$		(55)
		$\displaystyle-\log[1-\frac{M\gamma_{R}}{L\delta^{2}}(\frac{\gamma_{S}(\gamma_{T,I}^{2}-\frac{1}{M}\psi_{T})}{\Delta_{Y}}\!+\!\gamma_{T}g^{2})]$
		$\displaystyle=-\log(1-\gamma_{R}\Gamma_{L}),$

where $\gamma_{R}$ and $\Gamma_{L}$ are given in Table I.

Proof.

The proof of Lemma 2 is given in Appendix D-A. ∎

By Lemma 2, we can obtain that

V_{1}\xrightarrow[M\rightarrow\infty]{\mathcal{P}}-\log(1-\gamma_{R}\Gamma_{L}),

(56)

where the RHS does not depend on $\mathbb{Y}$ . According to the asymptotic regime and assumptions, we can conclude that $\delta,\Delta_{Y},\gamma_{S},\psi_{T}$ are all of order $\Theta(1)$ . For example, it can be verified that $0<\frac{\operatorname{Tr}\mathbb{R}}{L(z+\|\mathbb{R}\|\|\mathbb{S}\|\|\mathbb{T}\|)}\leq\delta\leq\frac{N\|\mathbb{R}\|}{Lz}<\infty$ and $\Delta_{Y},\gamma_{S},\psi_{T}$ can be verified similarly. Therefore, the term $-\frac{\gamma_{S}\psi_{T}}{L\delta^{2}\Delta_{Y}}$ will vanish as $L$ goes to infinity and $\Gamma_{L}\xrightarrow[]{L\rightarrow\infty}\Gamma$ . For a better approximation of the variance, we will use $\Gamma_{L}$ for the cases with small $L$ and $\Gamma$ for the cases with large $L$ , respectively.

Similar analysis can be performed on the asymptotic mean to show that the asymptotic mean also does not depend on $\mathbb{Y}$ , which indicates that the asymptotic distribution of the first process is independent of that for the second process as the asymptotic mean and variance are deterministic.

Step 2: In this step, we will investigate $\mathbb{E}(I(\rho)|\mathbb{Y})-\mathbb{E}I(\rho)$ , which is a random variable with respect to $\mathbb{Y}$ . We will show the asymptotic Gaussianity of $\mathbb{E}(I(\rho)|\mathbb{Y})-\mathbb{E}I(\rho)$ using the martingale method (CLT for martingales) in [49]. Let $\mathbb{Z}=\mathbb{R}^{\frac{1}{2}}\mathbb{X}\mathbb{S}^{\frac{1}{2}}$ , then $\mathbb{H}=\mathbb{Z}\mathbb{Y}\mathbb{T}^{\frac{1}{2}}$ . By the bookkeeping approach in [35] and [50], we have

		$\displaystyle\mathbb{E}(I(\rho)\|\mathbb{Y})-\mathbb{E}_{\mathbb{Y}}\mathbb{E}(I(\rho)\|\mathbb{Y})$		(57)
		$\displaystyle=\sum_{m=1}^{M}(\mathbb{E}_{m}-\mathbb{E}_{m-1})\log(1+\Lambda_{m}),$		(57)

where

	$\displaystyle\Lambda_{m}$	$\displaystyle=\frac{\mathbb{y}_{m}^{H}\mathbb{Z}^{H}\mathbb{Q}_{m}\mathbb{Z}\mathbb{y}_{m}-\frac{{t}_{m}}{M}\operatorname{Tr}\mathbb{Z}^{H}\mathbb{Z}\mathbb{Q}_{m}}{1+\frac{{t}_{m}}{M}\operatorname{Tr}\mathbb{Z}\mathbb{Z}^{H}\mathbb{Q}_{m}}$		(58)
		$\displaystyle=\frac{e_{m}}{1+\frac{{t}_{m}}{M}\operatorname{Tr}\mathbb{Z}\mathbb{Z}^{H}\mathbb{Q}_{m}}.$		(58)

Here $\mathbb{E}_{m}(\cdot)=\mathbb{E}(\cdot|\mathbb{y}_{m},\mathbb{y}_{m+1},...,\mathbb{y}_{M})$ , $\mathbb{E}_{M+1}(\cdot)=\mathbb{E}(\cdot)$ , and $\mathbb{y}_{i}$ denotes the $i$ -th column of $\mathbb{Y}$ .

We skip the verification of the Lyapunov’s condition and turn to the computation of the asymptotic variance for the second process, $V_{2}$ . Following a similar method as in [35], [50], we have

$\displaystyle V_{2}$	$\displaystyle\xlongrightarrow[M\longrightarrow\infty]{\mathcal{P}}\sum_{m=1}^{M}\mathbb{E}_{m-1}(\mathbb{E}_{m}\Lambda_{m})^{2}$	(59)
	$\displaystyle=\sum_{m=1}^{M}[\mathbb{Q}_{T}]_{m,m}^{2}\mathbb{E}_{m-1}(\mathbb{E}_{m}(e_{j})^{2})$
	$\displaystyle\overset{(a)}{=}\sum_{m=1}^{M}[\mathbb{Q}_{T}]_{m,m}^{2}\mathbb{E}_{m-1}(\frac{t_{m}^{2}}{M^{2}}\mathbb{E}_{m}\operatorname{Tr}[\mathbb{E}_{m}(\mathbb{Z}^{H}\mathbb{Q}\mathbb{Z})$
	$\displaystyle\times\mathbb{Z}^{H}\mathbb{Q}\mathbb{Z}])+o(1),$

where step $(a)$ follows from the rank-one lemma [35, Lemma 3.1]).

Lemma 3.

Let $\mathbb{B}$ and $\mathbb{D}$ be deterministic matrices with bounded norm. For $\mathbb{Q}=(\mathbb{Z}\mathbb{D}\mathbb{Z}^{H}+z\mathbb{I}_{N})^{-1}$ , where $\mathbb{Z}=\mathbb{R}^{\frac{1}{2}}\mathbb{X}\mathbb{S}^{\frac{1}{2}}$ is defined in Appendix A, it holds true that

		$\displaystyle\frac{1}{M}\mathbb{E}\operatorname{Tr}\mathbb{Z}^{H}\mathbb{Q}\mathbb{Z}\mathbb{B}=\frac{1}{M}\mathbb{E}\operatorname{Tr}\mathbb{S}\mathbb{B}$		(60)
		$\displaystyle\times((\frac{1}{L}\operatorname{Tr}\mathbb{E}\mathbb{R}\mathbb{Q})^{-1}\mathbb{I}_{L}+\mathbb{S}\mathbb{D})^{-1}+O(\frac{\\|\mathbb{D}\\|^{2}}{M}).$		(60)

By using Lemma 3 in Appendix D-B twice and similar computations in [35], we have

		$\displaystyle\mathbb{E}_{m}\frac{1}{M}\operatorname{Tr}[\mathbb{E}_{m}(\mathbb{Z}^{{H}}\mathbb{Q}\mathbb{Z})\mathbb{Z}^{{H}}\mathbb{Q}\mathbb{Z}]\xlongrightarrow[M\longrightarrow\infty]{\mathcal{P}~{}(a)}$		(61)
		$\displaystyle\frac{1}{M}\mathbb{E}_{m}\operatorname{Tr}[\mathbb{E}_{m}(((\frac{1}{L}\operatorname{Tr}\mathbb{E}_{\mathbb{X}}\mathbb{R}\mathbb{Q})^{-1}\mathbb{I}_{L}+\mathbb{S}\mathbb{Y}\mathbb{T}\mathbb{Y}^{H})^{-1})$
		$\displaystyle\times\mathbb{S}\mathbb{Z}^{H}\mathbb{Q}\mathbb{Z}]\xlongrightarrow[M\longrightarrow\infty]{\mathcal{P}~{}(b)}$
		$\displaystyle\frac{1}{M}\mathbb{E}_{m}\operatorname{Tr}[\mathbb{E}_{m}(((\frac{1}{L}\operatorname{Tr}\mathbb{E}_{\mathbb{X}}\mathbb{R}\mathbb{Q})^{-1}\mathbb{I}_{L}+\mathbb{S}\mathbb{Y}\mathbb{T}\mathbb{Y}^{H})^{-1})$
		$\displaystyle\times\mathbb{S}((\frac{1}{L}\operatorname{Tr}\mathbb{E}_{\mathbb{X}}\mathbb{R}\mathbb{Q})^{-1}\mathbb{I}_{L}+\mathbb{S}\mathbb{Y}\mathbb{T}\mathbb{Y}^{H})^{-1}\mathbb{S}],$

where $\mathbb{E}_{\mathbb{X}}$ represents the expectation with respect to $\mathbb{X}$ . Step $(a)$ follows by taking $\mathbb{D}={\mathbb{Y}}\mathbb{T}\mathbb{Y}^{H}$ and $\mathbb{B}=\mathbb{Z}^{H}\mathbb{Q}\mathbb{Z}$ in Lemma 3 as shown in Appendix D-B and the convergence follows from

\mathbb{E}|\mathrm{LHS}-\mathrm{RHS}|\leq\frac{K}{M^{2}}\mathbb{E}(\|\mathbb{Y}\mathbb{T}\mathbb{Y}^{H}\|^{2})=O(M^{-2})

according to Lemma 2 in [51] and Markov’s inequality, where $\mathrm{LHS}$ denotes the left-hand side of step $(a)$ . Similarly, step $(b)$ follows by taking $\mathbb{B}=\mathbb{E}_{m}(\frac{1}{L}\operatorname{Tr}\mathbb{E}_{\mathbb{X}}\mathbb{R}\mathbb{Q})^{-1}\mathbb{I}_{L}+\mathbb{S}\mathbb{Y}\mathbb{T}\mathbb{Y}^{H})^{-1}\mathbb{S}$ . Therefore, we have

		$\displaystyle\frac{1}{M}\mathbb{E}_{m}\operatorname{Tr}\mathbb{E}_{m}(((\frac{1}{L}\operatorname{Tr}\mathbb{E}_{\mathbb{X}}\mathbb{R}\mathbb{Q})^{-1}\mathbb{I}_{L}+\mathbb{S}\mathbb{Y}\mathbb{T}\mathbb{Y}^{H})^{-1})$		(62)
		$\displaystyle\mathbb{S}((\frac{1}{L}\operatorname{Tr}\mathbb{E}_{\mathbb{X}}\mathbb{R}\mathbb{Q})^{-1}\mathbb{I}_{L}+\mathbb{S}\mathbb{Y}\mathbb{T}\mathbb{Y}^{H})^{-1}\mathbb{S}\xlongrightarrow[M\longrightarrow\infty]{\mathcal{P}~{}(a)}$
		$\displaystyle\frac{1}{M}\mathbb{E}_{m}\operatorname{Tr}\mathbb{E}_{m}((\delta^{-1}\mathbb{I}_{L}+\mathbb{S}\mathbb{Y}\mathbb{T}\mathbb{Y}^{H})^{-1})\mathbb{S}$
		$\displaystyle\times(\delta^{-1}\mathbb{I}_{L}+\mathbb{S}\mathbb{Y}\mathbb{T}\mathbb{Y}^{H})^{-1}\mathbb{S}$
		$\displaystyle\xlongrightarrow[M\longrightarrow\infty]{\mathcal{P}~{}(b)}\frac{1}{M}\mathbb{E}\operatorname{Tr}\mathbb{S}\mathbb{E}_{m}[\widetilde{\mathbb{C}}(\delta^{-1})]\mathbb{S}\widetilde{\mathbb{C}}(\delta^{-1})\overset{\Delta}{=}\mathbb{\Omega}_{m},$

Steps $(a),~{}(b)$ follow from $\mathrm{Var}(\operatorname{Tr}\mathbb{R}\mathbb{Q})=O(1)$ and $\mathrm{Var}(\operatorname{Tr}\mathbb{S}\widetilde{\mathbb{C}}(\delta^{-1}))=O(1)$ , respectively, which can be obtained by the variance control using Poincarè-Nash inequality [32] [40]. According to (59) and [35, Section 5], it holds true that

\displaystyle\mathbb{\Omega}_{m}=\frac{\gamma_{S}}{1-\gamma_{S}\gamma_{T}^{(m)}}+o(1),

(63)

where $\gamma_{T}^{(m)}=\frac{1}{M}\sum\limits_{l=1}^{m}[\mathbb{Q}_{T}]_{l,l}^{2}t_{l}^{2}$ . Also, we have $\gamma_{T}^{(m)}-\gamma_{T}^{(m-1)}=\frac{1}{M}[\mathbb{Q}_{T}]_{m,m}^{2}t_{m}^{2}$ . Therefore,

$\displaystyle V_{2}$	$\displaystyle\xlongrightarrow[M\longrightarrow\infty]{\mathcal{P}}\sum_{m=1}^{M}\mathbb{E}_{m-1}(\mathbb{E}_{m}\Lambda_{m})^{2}\xlongrightarrow[M\longrightarrow\infty]{\mathcal{P}}$	(64)
	$\displaystyle\sum_{m=1}^{M}\frac{\gamma_{S}(\gamma_{T}^{(m)}-\gamma_{T}^{(m-1)})}{1-\gamma_{S}\gamma_{T}^{(m)}}\xlongrightarrow[]{M\longrightarrow\infty}$
	$\displaystyle-\log(1-\gamma_{T}\gamma_{S})=-\log\Delta_{Y}.$

Step 3: Since $I(\rho)-\mathbb{E}(I(\rho)|\mathbb{Y})$ and $\mathbb{E}(I(\rho)|\mathbb{Y})-\mathbb{E}I(\rho)$ are asymptotically independent Gaussian random processes, the asymptotic distribution of their sum is Gaussian and the variance is given by

\displaystyle V=V_{1}+V_{2}.

(65)

This completes the proof. ∎

Appendix B Proof of Proposition 2

Proof.

As this proposition is the special case of Theorem 2, in which the Gaussianity has been proved, we only need to compute the variance. By (10), we have

\displaystyle\overline{g}=\frac{1}{{g}+1},~{}g=\frac{1}{\tau(\frac{1}{\delta}+\overline{g})},~{}z\delta=\tau\overline{g},

(66)

and

\displaystyle\gamma_{R}=\frac{\delta^{2}}{\tau},~{}\gamma_{S}=\tau g^{2},~{}\gamma_{T}=\overline{g}^{2}.

(67)

Then, the variance can be written as

		$\displaystyle V(\rho)=-\log[(1+g)^{2}-\tau g^{2}-\tau g^{2}\overline{g}^{2}-g^{2}(1-\tau g^{2}\overline{g}^{2})]$		(68)
		$\displaystyle-\log(\overline{g}^{2})=-\log(T_{1})-\log(T_{2}),$		(68)

where

$\displaystyle T_{1}$	$\displaystyle=1+2g-\tau g^{2}+\tau g^{2}(g-1)\overline{g}$	(69)
	$\displaystyle=1+2g-2\tau g^{2}\overline{g}$
	$\displaystyle\overset{(a)}{=}1+2zg^{3}+2zg^{2},$

and $(a)$ follows from $g^{3}+2g^{2}+(1+\tau/z-1/z)g-1/z=0$ . The conclusion follows by noticing that $T_{2}=(1+g)^{-2}$ . ∎

Appendix C Proof of Theorem 4

Proof.

By the definition of multiplexing gain in [23], we can obtain the rate $R$ as,

R=\frac{m\overline{I}(\rho)}{k},

(70)

For the ease of manipulations, we use $\overline{I}(z)$ and $V(z)$ , which are obtained by letting $z=\frac{1}{\rho}$ in $\overline{I}(\rho)$ and $V(\rho)$ , respectively. Then, by the definition of finite-SNR DMT in [23], we have

$\displaystyle d(m,\rho)$	$\displaystyle=-\frac{\partial\log(P_{out}(\frac{(m-k)\overline{I}(z)}{k\sqrt{V(z)}}))}{\partial\log(\rho)}$	(71)
	$\displaystyle=\frac{\partial\log(P_{out}(\frac{(m-k)H(z)}{k}))}{\partial z}z$
	$\displaystyle=\frac{z(m-k)}{k\sqrt{2\pi}}\frac{\exp(-\frac{(m-k)^{2}H^{2}(z)}{2k^{2}})H^{\prime}(z)}{\Phi(\frac{(m-k)H(z)}{k})}.$

$H^{\prime}(z)$ can be obtained by the chain rule. ∎

Appendix D Proof of Preliminary Results

D-A Proof of Lemma 2

Proof.

Next, we will show that, when $M\rightarrow\infty$ , $V_{1}$ converges to a constant in probability. By the following equation derived from the Sherman-Morrison-Woodbury formula [52],

\displaystyle\mathbb{q}^{H}\left(\mathbb{B}+\mu\mathbb{q}\mathbb{q}^{H}\right)^{-1}=\frac{1}{1+\mu\mathbb{q}^{H}\mathbb{B}^{-1}\mathbb{q}}\mathbb{q}^{H}\mathbb{B}^{-1}

(72)

we have (73) at the top of the next page.

		$\displaystyle\widetilde{\gamma}_{\mathbb{Y}}=\frac{1}{L}\operatorname{Tr}\mathbb{W}\mathbb{W}^{H}\widetilde{\mathbb{C}}\mathbb{W}\mathbb{W}^{H}\widetilde{\mathbb{C}}=\frac{1}{L}\sum_{i=1}^{L}\frac{t_{i}\mathbb{y}^{H}_{i}\mathbb{S}^{\frac{1}{2}}\widetilde{\mathbb{C}}_{i}\mathbb{W}\mathbb{W}^{H}\widetilde{\mathbb{C}}\mathbb{S}^{\frac{1}{2}}\mathbb{y}_{i}}{1+t_{i}f\mathbb{y}^{H}_{i}\mathbb{S}^{\frac{1}{2}}\widetilde{\mathbb{C}}_{i}\mathbb{S}^{\frac{1}{2}}\mathbb{y}_{i}}=\frac{1}{L}\sum_{i=1}^{L}\frac{t_{i}\mathbb{y}^{H}_{i}\mathbb{S}^{\frac{1}{2}}\widetilde{\mathbb{C}}_{i}\mathbb{W}\mathbb{W}^{H}\widetilde{\mathbb{C}}\mathbb{S}^{\frac{1}{2}}\mathbb{y}_{i}}{1+\frac{t_{i}f}{M}\operatorname{Tr}\mathbb{S}\widetilde{\mathbb{C}}}+o(1)$		(73)
		$\displaystyle=\frac{1}{L}\sum_{i=1}^{L}\frac{t_{i}\mathbb{y}^{H}_{i}\mathbb{S}^{\frac{1}{2}}\widetilde{\mathbb{C}}_{i}\mathbb{W}_{i}\mathbb{W}_{i}^{H}\widetilde{\mathbb{C}}_{i}\mathbb{S}^{\frac{1}{2}}\mathbb{y}_{i}}{(1+\frac{t_{i}f}{M}\operatorname{Tr}\mathbb{S}\widetilde{\mathbb{C}})^{2}}+\frac{t_{i}^{2}\mathbb{y}^{H}_{i}\mathbb{S}^{\frac{1}{2}}\widetilde{\mathbb{C}}_{i}\mathbb{S}^{\frac{1}{2}}\mathbb{y}_{i}\mathbb{y}_{i}^{H}\mathbb{S}^{\frac{1}{2}}\widetilde{\mathbb{C}}_{i}\mathbb{S}^{\frac{1}{2}}\mathbb{y}_{i}}{(1+\frac{t_{i}f}{M}\operatorname{Tr}\mathbb{S}\widetilde{\mathbb{C}})^{2}}+o(1)=\frac{1}{L}\sum_{i=1}^{L}\frac{\frac{t_{i}}{M}\operatorname{Tr}\mathbb{S}\widetilde{\mathbb{C}}_{i}\mathbb{W}_{i}\mathbb{W}_{i}^{H}\widetilde{\mathbb{C}}_{i}}{(1+\frac{t_{i}f}{M}\operatorname{Tr}\mathbb{S}\widetilde{\mathbb{C}})^{2}}$
		$\displaystyle+\frac{t_{i}^{2}(\frac{1}{M}\operatorname{Tr}\mathbb{S}\widetilde{\mathbb{C}})^{2}}{(1+\frac{t_{i}f}{M}\operatorname{Tr}\mathbb{S}\widetilde{\mathbb{C}})^{2}}+o(1)=Y_{1}+Y_{2}+o(1).$

Next, we evaluate the two terms $Y_{1}$ and $Y_{2}$ in (73). The numerator of $Y_{1}$ can be given by

		$\displaystyle\frac{1}{M}\operatorname{Tr}\mathbb{S}\widetilde{\mathbb{C}}_{i}\mathbb{W}_{i}\mathbb{W}_{i}^{H}\widetilde{\mathbb{C}}_{i}=\sum_{j\neq i}^{M}\frac{t_{j}}{M}\mathbb{y}^{H}_{j}\mathbb{S}^{\frac{1}{2}}\widetilde{\mathbb{C}}_{i}\mathbb{S}\widetilde{\mathbb{C}}_{i}\mathbb{S}^{\frac{1}{2}}\mathbb{y}_{j}$		(74)
		$\displaystyle\overset{a.s.}{\longrightarrow}\sum_{j\neq i}^{M}\frac{t_{j}\mathbb{y}^{H}_{j}\mathbb{S}^{\frac{1}{2}}\widetilde{\mathbb{C}}_{ij}\mathbb{S}\widetilde{\mathbb{C}}_{ij}\mathbb{S}^{\frac{1}{2}}\mathbb{y}_{j}}{M(1+\frac{t_{j}f}{M}\operatorname{Tr}\mathbb{S}\widetilde{\mathbb{C}}_{ij})^{2}}$
		$\displaystyle\overset{a.s.}{\longrightarrow}\sum_{j\neq i}^{M}\frac{\frac{t_{j}}{M}\operatorname{Tr}\mathbb{S}\widetilde{\mathbb{C}}_{ij}\mathbb{S}\widetilde{\mathbb{C}}_{ij}}{M(1+\frac{t_{j}f}{M}\operatorname{Tr}\mathbb{S}\widetilde{\mathbb{C}}_{ij})^{2}},$

where $\mathbb{W}_{i}$ is obtained by removing the $i$ -th column from $\mathbb{W}$ . These two almost convergences come from the convergence of the quadratic form [53, Lemma 3]. Following the steps in Appendix E of [34], we can prove that for any matrix $\mathbb{U}$ with a bounded norm, it holds true that

\frac{1}{M}\operatorname{Tr}\mathbb{U}\widetilde{\mathbb{C}}\xrightarrow[M\rightarrow\infty]{a.s.}\frac{1}{M}\operatorname{Tr}\mathbb{U}{\mathbb{Q}}_{S}.

(75)

Therefore, by similar arguments as [33], we have

		$\displaystyle f=\frac{1}{M}\operatorname{Tr}\mathbb{R}(z\mathbb{I}_{N}+\widetilde{f}\mathbb{R})^{-1}\xrightarrow[M\rightarrow\infty]{a.s.}$		(76)
		$\displaystyle\frac{1}{M}\operatorname{Tr}\mathbb{R}(z\mathbb{I}_{N}+\frac{g\overline{g}}{\delta}\mathbb{R})^{-1}=\delta,$
		$\displaystyle\frac{f}{M}\operatorname{Tr}\mathbb{S}\widetilde{\mathbb{C}}\xrightarrow[M\rightarrow\infty]{a.s.}\frac{1}{M}\operatorname{Tr}\mathbb{S}{\mathbb{Q}}_{S}=g,$
		$\displaystyle\frac{1}{M}\operatorname{Tr}\mathbb{T}(\mathbb{W}^{H}\mathbb{W}+\mathbb{I}_{M})^{-1}\xrightarrow[M\rightarrow\infty]{a.s.}\frac{1}{M}\operatorname{Tr}\mathbb{T}{\mathbb{Q}}_{T}=\overline{g}.$

From [29, Theorem 1], [30, Theorem 1], or [40, Lemma 3], we can show that if $(h,\widetilde{h})$ is the solution of

	$\displaystyle h$	$\displaystyle=\frac{1}{M}\operatorname{Tr}\delta\mathbb{S}\left(\alpha\mathbb{I}_{L}+l\mathbb{D}+\widetilde{h}\delta\mathbb{S}\right)^{-1},$		(77)
	$\displaystyle\widetilde{h}$	$\displaystyle=\frac{1}{M}\operatorname{Tr}\mathbb{T}\left(\mathbb{I}_{M}+h\mathbb{T}\right)^{-1},$		(77)

there holds true that

		$\displaystyle\frac{f}{M}\operatorname{Tr}\mathbb{S}\left(\alpha\mathbb{I}_{L}+l\mathbb{D}+f\mathbb{W}\mathbb{W}^{H}\right)^{-1}$		(78)
		$\displaystyle\xrightarrow[M\rightarrow\infty]{a.s.}\frac{\delta}{M}\operatorname{Tr}\mathbb{S}\left(\alpha\mathbb{I}_{L}+l\mathbb{D}+\widetilde{h}\delta\mathbb{S}\right)^{-1}.$		(78)

Then by letting $\alpha=1$ and $l=0$ , we have $h=g$ and $\widetilde{h}=\overline{g}$ . Taking the derivative of $h$ and $\widetilde{h}$ with respect to $l$ , we have

		$\displaystyle\frac{\mathrm{d}\frac{f}{M}\operatorname{Tr}\mathbb{S}(\alpha\mathbb{I}_{L}+l\mathbb{D}+f\mathbb{W}\mathbb{W}^{H})^{-1}}{\mathrm{d}l}\|_{l=0}\xrightarrow[M\rightarrow\infty]{a.s.}\frac{\mathrm{d}{h}}{\mathrm{d}l}\|_{l=0}$		(79)
		$\displaystyle=\frac{-1}{M}\operatorname{Tr}\mathbb{S}\left(\delta^{-1}\mathbb{I}_{L}+\widetilde{h}\mathbb{S}\right)^{-1}\mathbb{S}\left(\delta^{-1}\mathbb{I}_{L}+\widetilde{h}\mathbb{S}\right)^{-1}\frac{\mathrm{d}\widetilde{h}}{\mathrm{d}l}$
		$\displaystyle+\frac{-1}{M\delta}\operatorname{Tr}\mathbb{S}\left(\delta^{-1}\mathbb{I}_{L}+\widetilde{h}\mathbb{S}\right)^{-1}\mathbb{D}\left(\delta^{-1}\mathbb{I}_{L}+\widetilde{h}\mathbb{S}\right)^{-1},$
		$\displaystyle\frac{\mathrm{d}\widetilde{h}}{\mathrm{d}l}\|_{l=0}=\frac{-1}{M}\operatorname{Tr}\mathbb{T}\left(\mathbb{I}_{M}+h\mathbb{T}\right)^{-1}\mathbb{T}\left(\mathbb{I}_{M}+h\mathbb{T}\right)^{-1}\frac{\mathrm{d}{h}}{\mathrm{d}l}.$

Therefore, we have the following system of equations

\begin{bmatrix}1&\gamma_{S}\\ \gamma_{T}&1\end{bmatrix}\begin{bmatrix}\frac{\mathrm{d}{h}}{\mathrm{d}l}\\ \frac{\mathrm{d}\widetilde{h}}{\mathrm{d}l}\end{bmatrix}=\begin{bmatrix}-\frac{\gamma_{S}(\mathbb{D})}{\delta}\\ 0\end{bmatrix}+o(1).

(80)

According to (76) and letting $\mathbb{D}=\mathbb{S}$ , we have

\frac{1}{M}\operatorname{Tr}\mathbb{S}\widetilde{\mathbb{C}}\mathbb{S}\widetilde{\mathbb{C}}=-\frac{1}{f}\frac{\mathrm{d}{h}}{\mathrm{d}l}|_{l=0}\xrightarrow[M\rightarrow\infty]{a.s.}\frac{1}{\delta^{2}}\frac{\gamma_{S}}{1-\gamma_{S}\gamma_{T}}.

(81)

Hence, we have

\displaystyle Y_{1}\xrightarrow[M\rightarrow\infty]{a.s.}\frac{M\gamma_{S}(\gamma_{T,I}^{2}-\frac{1}{M}\psi_{T})}{L\delta^{2}\Delta_{Y}},~{}Y_{2}\xrightarrow[M\rightarrow\infty]{a.s.}\frac{M\gamma_{T}g^{2}}{L\delta^{2}},

(82)

where $\psi_{T}$ is given in Table I. By applying the continuous mapping theorem [49] to logarithm function, the asymptotic variance is given by

		$\displaystyle V_{1}\xrightarrow[M\rightarrow\infty]{\mathcal{P}}\!-\!\log[1\!-\!\frac{M\gamma_{R}}{L\delta^{2}}(\frac{\gamma_{S}(\gamma_{T,I}^{2}-\frac{1}{M}\psi_{T})}{\Delta_{Y}}\!+\!\gamma_{T}g^{2})]$		(83)
		$\displaystyle=-\log(1-\gamma_{R}\Gamma_{L}).$		(83)

∎

D-B Proof of Lemma 2

Proof.

Here we omit the condition on $\mathbb{Y}$ . With the integration-by-parts formula [32], we have

		$\displaystyle\mathbb{E}[\mathbb{Z}^{H}\mathbb{Q}\mathbb{Z}\mathbb{B}]_{i,j}=\mathbb{E}\mathbb{z}_{i}^{H}\mathbb{Q}\mathbb{Z}\mathbb{b}_{j}=\mathbb{E}s_{i}^{\frac{1}{2}}\mathbb{x}_{i}^{H}\mathbb{R}^{\frac{H}{2}}\mathbb{Q}\mathbb{R}^{\frac{1}{2}}\mathbb{X}\mathbb{S}^{\frac{1}{2}}\mathbb{b}_{j}$		(84)
		$\displaystyle=\frac{1}{L}\mathbb{E}s^{\frac{1}{2}}_{i}\sum_{m,n}{x}_{m,i}^{*}[\mathbb{R}^{\frac{H}{2}}\mathbb{Q}\mathbb{R}^{\frac{1}{2}}\mathbb{X}\mathbb{S}^{\frac{1}{2}}]_{m,n}{b}_{n,j}$
		$\displaystyle=\frac{1}{L}\mathbb{E}\sum_{m}-[\mathbb{R}^{\frac{H}{2}}\mathbb{Q}\mathbb{R}^{\frac{1}{2}}]_{m,m}[\mathbb{S}\mathbb{D}\mathbb{S}^{\frac{1}{2}}\mathbb{X}^{H}\mathbb{R}^{\frac{H}{2}}\mathbb{Q}\mathbb{R}^{\frac{1}{2}}\mathbb{X}\mathbb{S}^{\frac{1}{2}}\mathbb{B}]_{i,j}$
		$\displaystyle+\frac{1}{L}\mathbb{E}\sum_{m}[\mathbb{R}^{\frac{H}{2}}\mathbb{Q}\mathbb{R}^{\frac{1}{2}}]_{m,m}[\mathbb{S}\mathbb{B}]_{i,j}$
		$\displaystyle=-\frac{1}{L}\mathbb{E}\operatorname{Tr}\mathbb{R}\mathbb{Q}[\mathbb{S}\mathbb{D}\mathbb{Z}^{H}\mathbb{Q}\mathbb{Z}\mathbb{B}]_{i,j}+\frac{1}{L}\mathbb{E}\operatorname{Tr}\mathbb{R}\mathbb{Q}[\mathbb{S}\mathbb{B}]_{i,j}.$

Therefore, by moving the first term of the RHS to the LHS, we have

		$\displaystyle\frac{1}{M}\mathbb{E}\operatorname{Tr}\mathbb{Z}^{H}\mathbb{Q}\mathbb{Z}\mathbb{B}$		(85)
		$\displaystyle=\frac{1}{M}\mathbb{E}\operatorname{Tr}((\frac{1}{L}\mathbb{E}\operatorname{Tr}\mathbb{R}\mathbb{Q})^{-1}\mathbb{I}_{L}+\mathbb{S}\mathbb{D})^{-1}\mathbb{S}\mathbb{B}+\varepsilon_{z},$		(85)

where

$\displaystyle\|\varepsilon_{z}\|$	$\displaystyle=\|\frac{1}{M}\mathbb{E}\operatorname{Tr}((\frac{1}{L}\operatorname{Tr}\mathbb{E}\mathbb{R}\mathbb{Q})^{-1}\mathbb{I}_{L}+\mathbb{S}\mathbb{D})^{-1}$	(86)
	$\displaystyle\times\mathbb{S}\mathbb{D}\mathbb{Z}^{H}\mathbb{Q}\mathbb{Z}\mathbb{B}(\frac{1}{L}\operatorname{Tr}(\mathbb{R}\mathbb{Q})-\frac{1}{L}\mathbb{E}\operatorname{Tr}(\mathbb{R}\mathbb{Q}))\|$
	$\displaystyle\leq\frac{K}{ML}\mathrm{Var}^{\frac{1}{2}}(\operatorname{Tr}(\mathbb{R}\mathbb{Q}))=O(\frac{\\|\mathbb{D}\\|^{2}}{M}),$

with $K$ being a constant. The conclusion follows from the variance control in Appendix B of [51]. ∎

Appendix E Complexity Analysis for Solving (10)

To evaluate the complexity of Algorithm 1, we first investigate the complexity to obtain an $\varepsilon$ -approximation of the solution for the canonical equation (10). Starting from a simpler case, we first investigate the $\varepsilon$ -approximation for $g$ and $\overline{g}$ given $\delta$ , which is given as the following lemma.

Lemma 4.

Given $z$ , consider the function $f(\overline{g})=\frac{1}{M}\operatorname{Tr}\mathbb{T}(\mathbb{I}+g\mathbb{T})^{-1}$ with $g=\frac{1}{M}\operatorname{Tr}\mathbb{S}(z\mathbb{I}+\overline{g}\mathbb{S})^{-1}$ . An $\varepsilon$ -solution for the equation $\overline{g}=f(\overline{g})$ can be given in $O(\log(\frac{1}{\varepsilon}))$ iterations.

Proof.

We define the iteration $\overline{g}^{(t+1)}=f(\overline{g}^{(t)})$ . First, we have the following bounds for $g$ and $\overline{g}$

	$\displaystyle\frac{\operatorname{Tr}\mathbb{S}}{M(z+s_{max}t_{max})}<g<\frac{Ls_{max}}{Mz}$		(87)
	$\displaystyle\frac{\operatorname{Tr}\mathbb{T}}{M(1+\frac{Ls_{max}t_{max}}{Nz})}<\overline{g}<t_{max}.$		(87)

which indicates that if we can obtain an $\varepsilon$ -approximation for $g$ , we could obtain an $\varepsilon$ -approximation for $\overline{g}$ and vice versa. $f(\overline{g})$ is monotonically increasing since

f^{\prime}(\overline{g})=\frac{\operatorname{Tr}\mathbb{T}^{2}(\mathbb{I}+g\mathbb{T})^{-2}}{M}\frac{\operatorname{Tr}\mathbb{S}^{2}(z\mathbb{I}+\overline{g}\mathbb{S})^{-2}}{M}>0.

(88)

Meanwhile, it is easy to verify that $\frac{f(\overline{g})}{\overline{g}}$ is monotonically decreasing so we have $\frac{f(\overline{g})}{\overline{g}}<1$ when $\overline{g}>\overline{g}^{*}$ . Therefore, if we start from $\overline{g}^{(0)}=t_{max}$ , we have $\overline{g}^{(t+1)}<...<\overline{g}^{(1)}<\overline{g}^{(0)}$ , indicating that $\overline{g}$ will converge to the true solution $\overline{g}^{*}$ decreasingly. Meanwhile, we have

		$\displaystyle 1=\frac{f(\overline{g})}{f(\overline{g})}=\frac{\frac{\operatorname{Tr}\mathbb{T}(\mathbb{I}+g\mathbb{T})^{-2}}{M}+\frac{g\operatorname{Tr}\mathbb{T}^{2}(\mathbb{I}+g\mathbb{T})^{-2}}{M}}{f(\overline{g})}$		(89)
		$\displaystyle>\frac{(\frac{z\operatorname{Tr}\mathbb{S}(z\mathbb{I}+\overline{g}\mathbb{S})^{-2}}{M}+\frac{\overline{g}\operatorname{Tr}\mathbb{S}^{2}(z\mathbb{I}+\overline{g}\mathbb{S})^{-2}}{M})\frac{\operatorname{Tr}\mathbb{T}^{2}(\mathbb{I}+\overline{g}\mathbb{T})^{-2}}{M}}{f(\overline{g})}.$		(89)

By (88), we have

		$\displaystyle 0<f^{\prime}(\overline{g})<\frac{f(\overline{g})}{\overline{g}}-\frac{z\operatorname{Tr}\mathbb{S}(z\mathbb{I}+\overline{g}\mathbb{S})^{-2}}{\overline{g}M}\frac{\operatorname{Tr}\mathbb{T}^{2}(\mathbb{I}+\overline{g}\mathbb{T})^{-2}}{M}$		(90)
		$\displaystyle<1-\frac{z\operatorname{Tr}\mathbb{S}(z\mathbb{I}+\overline{g}\mathbb{S})^{-2}}{\overline{g}M}\frac{\operatorname{Tr}\mathbb{T}^{2}(\mathbb{I}+\overline{g}\mathbb{T})^{-2}}{M}:=\beta_{g}<1.$		(90)

Therefore, by the mean value theorem, we have

		$\displaystyle\|\overline{g}^{(t+1)}-\overline{g}^{}\|=\|f(\overline{g}^{(t)})-f(\overline{g}^{})\|=\|g^{\prime}(\psi)(\overline{g}^{(t)}-\overline{g}^{*})\|$		(91)
		$\displaystyle<\beta_{g}\|\overline{g}^{(t)}-\overline{g}^{*}\|<\beta_{g}^{t+1}\|\overline{g}^{(0)}\|,$		(91)

which indicates that an $\varepsilon$ -solution can be obtained in $O(\log(\frac{1}{\varepsilon}))$ iterations. ∎

Now we turn to investigate the $\varepsilon$ -approximation for $\delta$ , which is given in the following lemma.

Lemma 5.

Consider the function $h(\delta)=\frac{1}{L}\operatorname{Tr}\mathbb{R}(z\mathbb{I}+\frac{g\overline{g}}{\delta}\mathbb{R})^{-1}$ with $\overline{g}=\frac{1}{M}\operatorname{Tr}\mathbb{T}(\mathbb{I}+g\mathbb{T})^{-1}$ and $g=\frac{1}{M}\operatorname{Tr}\mathbb{S}(\frac{1}{\delta}\mathbb{I}+\overline{g}\mathbb{S})^{-1}$ . An $\varepsilon$ -solution for the equation $\delta=h(\delta)$ can be given in $O(\log(\frac{1}{\varepsilon}))$ iterations.

Proof.

The proof is similar to that of Lemma 4. First, we can obtain the bounds

		$\displaystyle\delta_{L}=\frac{\operatorname{Tr}\mathbb{R}}{L(z+s_{max}t_{max}r_{max})}\leq\delta\leq\frac{Nr_{max}}{Lz}=\delta_{U},$		(92)
		$\displaystyle g_{L}=\frac{\operatorname{Tr}\mathbb{S}}{M(\frac{1}{\delta_{L}}+s_{max}t_{max})}\leq g\leq\frac{Ns_{max}r_{max}}{Mz}=g_{U},$
		$\displaystyle\overline{g}_{L}=\frac{\operatorname{Tr}\mathbb{T}}{M(1+\frac{Nr_{max}t_{max}}{Lz})}\leq\overline{g}\leq t_{max}=\overline{g}_{U}.$

Similar to the proof of Lemma 4, we can verify that the convergence process is

\displaystyle\delta^{(0)}>\delta^{(1)}>...>\delta^{(n)}>...>\delta^{*},

(93)

which means that $\delta$ decreases and converges to the solution. Also, $\frac{h(\delta)}{\delta}<1$ holds true when $\delta\in(\delta^{*},\infty)$ . Next, we will bound the derivative $h^{\prime}(\delta)$ . Since

\displaystyle 1=h(\delta)/h(\delta)=\frac{\delta}{f(\delta)}(\frac{z}{\delta}\gamma_{R,I}+\frac{Mg\overline{g}}{L\delta^{2}}\gamma_{R}),

(94)

there holds true that

$\displaystyle h^{\prime}(\delta)$	$\displaystyle=\frac{Mg\overline{g}\gamma_{R}}{L\delta^{2}}-\frac{M(g-\overline{g}\gamma_{S})(\overline{g}-g\gamma_{T})\gamma_{R}}{L\delta^{2}\Delta_{T}}$	(95)
	$\displaystyle=\frac{h(\delta)}{\delta}-\frac{z\gamma_{R,I}}{\delta}-\frac{M(g-\overline{g}\gamma_{S})(\overline{g}-g\gamma_{T})\gamma_{R}}{L\delta^{2}\Delta_{T}}$
	$\displaystyle<1-\frac{1}{(1+s_{max}t_{max}r_{max}/z)^{2}}:=\beta_{\delta}<1.$

Therefore, we can show that

		$\displaystyle\|\delta^{(t+1)}-\delta^{}\|=\|h(\delta^{(t)})-h(\delta^{})\|=\|h^{\prime}(\psi)(\delta^{(t)}-\delta^{*})\|$		(96)
		$\displaystyle<\beta_{\delta}\|\delta^{(t)}-\delta\|<\beta^{t}\|\delta^{(0)}\|,$		(96)

which indicate that an $\varepsilon$ -solution can be obtained in $O(\log(\frac{1}{\varepsilon}))$ iterations. ∎

However, the conclusion in Lemma 5 holds true when the accurate solution of $f(\overline{g})=\overline{g}$ can be obtained in each iteration. In practice, only an approximation for $f(\overline{g})=\overline{g}$ can be obtained. By Lemma 4, given $\delta$ , we can find a solution $\hat{g}(\delta)$ such that $g-\varepsilon_{inner}<\hat{g}<g$ and use $\hat{h}(\delta)$ to represent the value computed based on $\hat{g}$ and $\hat{\overline{g}}$ . Then, we will evaluate the gap between $\hat{h}$ and ${h}$ . First, we have

		$\displaystyle 0<\hat{h}(\delta)-h(\delta)=\frac{M}{L^{2}}\operatorname{Tr}({\mathbb{R}}^{2}\left(z\mathbb{I}+\frac{M\hat{g}\hat{\overline{g}}}{L\delta}\right)^{-1}$		(97)
		$\displaystyle\times\left(z\mathbb{I}+\frac{Mg\overline{g}}{L\delta}\right)^{-1})\frac{g\overline{g}-\hat{g}\hat{\overline{g}}}{\delta}$
		$\displaystyle\leq\frac{MNr_{max}^{2}}{z^{2}L^{2}\delta_{L}}(g-\hat{g})<\frac{MNr_{max}^{2}\varepsilon_{inner}}{z^{2}L^{2}\delta_{L}}:=\varepsilon_{f}$

		$\displaystyle\|h(\delta^{(t)})-\hat{h}(\hat{\delta}^{(t)})\|\leq\|h(\delta^{(t)})-h(\hat{\delta}^{(t)})\|$		(98)
		$\displaystyle+\|f(\hat{\delta}^{(t)})-\hat{f}(\hat{\delta}^{(t)})\|\leq\beta_{\delta}\|f(\delta^{(t-1)})-\hat{f}(\hat{\delta}^{(t-1)})\|$
		$\displaystyle+\varepsilon_{f}\leq\beta_{\delta}^{2}\|f(\delta^{(t-2)})-\hat{f}(\hat{\delta}^{(t-2)})\|+\beta\varepsilon_{f}+\varepsilon_{f}$
		$\displaystyle\leq...<\frac{\varepsilon_{f}}{1-\beta_{\delta}}.$

		$\displaystyle\|\hat{\delta}^{(t+1)}-\delta^{}\|=\|\hat{f}(\hat{\delta}^{(t)})-f(\delta^{})\|\leq\|\hat{f}(\hat{\delta}^{(t)})-{f}({\delta}^{(t)})\|$		(99)
		$\displaystyle+\|\hat{f}({\delta}^{(t)})-{f}({\delta})\|<\frac{\varepsilon_{f}}{1-\beta_{\delta}}+\beta^{t}\delta^{(0)}$		(99)

If we let $\varepsilon_{inner}=\frac{z^{2}L^{2}\delta_{L}(1-\beta_{\delta})\varepsilon}{2MNr_{max}^{2}}$ , there holds true that $\frac{\varepsilon_{f}}{1-\beta}<\frac{\varepsilon}{2}$ . Meanwhile, by letting $t\geq\lceil\frac{\log(\frac{\varepsilon}{2\delta^{(0)}})}{\log(\beta_{\delta})}\rceil$ , we have $|\hat{\delta}^{(t+1)}-\delta^{*}|\leq\varepsilon$ . Therefore, the complexity of obtaining an $\varepsilon$ -approximation of $\delta^{*}$ is $O(N\log^{2}(\frac{1}{\varepsilon}))$ , where $N$ comes from the calculation of the trace. The algorithm is given in Algorithm 2 , where $N_{1,max}$ and $N_{2,max}$ represent the number of iterations to obtain the $\varepsilon$ accuracy, which can be obtained by the above discussion.

Algorithm 2 Algorithm for obtaining the

\varepsilon

-solution of the Canonical Equation (10).

z

{\mathbb{R}}

{\mathbb{S}}

{\mathbb{T}}

N_{1,max}

N_{2,max}

\delta^{(0)}>\delta^{U}

t_{1}=0

2: repeat

3: Set

\overline{g}^{(0)}=\overline{g}_{U}

t_{2}=1

4: repeat

g^{(t_{2})}=\frac{1}{M}\operatorname{Tr}\mathbb{S}(\frac{1}{\delta^{(t_{1})}}\mathbb{I}+\overline{g}^{(t_{2}-1)}\mathbb{S})^{-1}

\overline{g}^{(t_{2})}=\frac{1}{M}\operatorname{Tr}\mathbb{T}(\mathbb{I}+{g}^{(t_{2})}\mathbb{T})^{-1}

t_{2}=t_{2}+1

8: until

t_{2}>N_{2,max}

\delta^{t_{1}}=\frac{1}{L}\operatorname{Tr}\mathbb{R}(z\mathbb{I}+\frac{Mg^{(t_{2})}\overline{g}^{(t_{2})}}{L\delta^{(t_{1}-1)}}{\mathbb{R}})^{-1}

10:

t_{1}=t_{1}+1

11: until

t_{1}>N_{1,max}

11:

\delta

g

\overline{g}

References

[1] Q. Wu and R. Zhang, “Towards smart and reconfigurable environment: Intelligent reflecting surface aided wireless network,” IEEE Commun. Mag., vol. 58, no. 1, pp. 106–112, Jan. 2019.
[2] X. Yu, V. Jamali, D. Xu, D. W. K. Ng, and R. Schober, “Smart and reconfigurable wireless communications: From IRS modeling to algorithm design,” IEEE Wireless Commun. Mag., vol. 28, no. 6, pp. 118–125, Dec. 2021.
[3] B. Matthiesen, E. Björnson, E. De Carvalho, and P. Popovski, “Intelligent reflecting surface operation under predictable receiver mobility: A continuous time propagation model,” IEEE Wireless Commun. Lett., vol. 10, no. 2, pp. 216–220, Feb. 2020.
[4] Z. Wang, L. Liu, S. Zhang, and S. Cui, “Massive MIMO communication with intelligent reflecting surface,” arXiv preprint arXiv:2107.04255, 2021.
[5] D. Xu, V. Jamali, X. Yu, D. W. K. Ng, and R. Schober, “Optimal resource allocation design for large IRS-assisted SWIPT systems: A scalable optimization framework,” arXiv preprint arXiv:2104.03346, 2021.
[6] Y. Cheng, K. H. Li, Y. Liu, K. C. Teh, and H. V. Poor, “Downlink and uplink intelligent reflecting surface aided networks: NOMA and OMA,” IEEE Trans. Wireless Commun., vol. 20, no. 6, pp. 3988–4000, June. 2021.
[7] W. Mei and R. Zhang, “Performance analysis and user association optimization for wireless network aided by multiple intelligent reflecting surfaces,” IEEE Trans. Commun., vol. 9, no. 9, pp. 6296–6312, Sep. 2021.
[8] Z. Zhang, Y. Cui, F. Yang, and L. Ding, “Analysis and optimization of outage probability in multi-intelligent reflecting surface-assisted systems,” arXiv preprint arXiv:1909.02193, 2019.
[9] Y. Han, W. Tang, S. Jin, C.-K. Wen, and X. Ma, “Large intelligent surface-assisted wireless communication exploiting statistical CSI,” IEEE Trans. Veh. Technol., vol. 68, no. 8, pp. 8238–8242, Sep. 2019.
[10] T. Van Chien, L. T. Tu, S. Chatzinotas, and B. Ottersten, “Coverage probability and ergodic capacity of intelligent reflecting surface-enhanced communication systems,” IEEE Commun. Lett., vol. 25, no. 1, pp. 69–73, Jan. 2020.
[11] J. Zhang, J. Liu, S. Ma, C.-K. Wen, and S. Jin, “Large system achievable rate analysis of RIS-assisted MIMO wireless communication with statistical CSIT,” IEEE Trans. Wireless Commun., vol. 20, no. 9, pp. 5572–5585, Sept. 2021.
[12] J. Wang, W. Zhang, X. Bao, T. Song, and C. Pan, “Outage analysis for intelligent reflecting surface assisted vehicular communication networks,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), Taipei, Taiwan, Dec, 2020, pp. 1–6.
[13] S. Atapattu, R. Fan, P. Dharmawansa, G. Wang, J. Evans, and T. A. Tsiftsis, “Reconfigurable intelligent surface assisted two–way communications: Performance analysis and optimization,” IEEE Trans. Commun., vol. 68, no. 10, pp. 6552–6567, 2020.
[14] S. Lin, B. Zheng, G. C. Alexandropoulos, M. Wen, M. Di Renzo, and F. Chen, “Reconfigurable intelligent surfaces with reflection pattern modulation: Beamforming design and performance analysis,” IEEE Trans. Wireless Commun., vol. 20, no. 2, pp. 741–754, Feb. 2020.
[15] C. Guo, Y. Cui, F. Yang, and L. Ding, “Outage probability analysis and minimization in intelligent reflecting surface-assisted MISO systems,” IEEE Commun. Lett., vol. 24, no. 7, pp. 1563–1567, Jul. 2020.
[16] G. Zhou, C. Pan, H. Ren, K. Wang, and A. Nallanathan, “A framework of robust transmission design for IRS-aided MISO communications with imperfect cascaded channels,” IEEE Trans. Signal Process., vol. 68, pp. 5092–5106, Aug. 2020.
[17] S. Hong, C. Pan, H. Ren, K. Wang, K. K. Chai, and A. Nallanathan, “Robust transmission design for intelligent reflecting surface-aided secure communication systems with imperfect cascaded CSI,” IEEE Trans. Commun., vol. 20, no. 4, pp. 2487–2501, Apr. 2020.
[18] A. Bereyhi, S. Asaad, C. Ouyang, R. R. Müller, R. F. Schaefer, and H. V. Poor, “Channel hardening of IRS-Aided multi-antenna systems: How should IRSs scale?” arXiv preprint arXiv:2203.11592, Mar. 2022.
[19] Z. Zheng, L. Wei, R. Speicher, R. R. Müller, J. Hämäläinen, and J. Corander, “Asymptotic analysis of Rayleigh product channels: A free probability approach,” IEEE Trans. Inf. Theory, vol. 63, no. 3, pp. 1731–1745, Mar. 2016.
[20] Z. Shi, H. Wang, Y. Fu, G. Yang, S. Ma, and F. Gao, “Outage analysis of reconfigurable intelligent surface aided MIMO communications with statistical CSI,” IEEE Trans. Wireless Commun., vol. 21, no. 2, pp. 823–839, Feb. 2022.
[21] L. Zheng and D. N. C. Tse, “Diversity and multiplexing: A fundamental tradeoff in multiple-antenna channels,” IEEE Trans. Inf. Theory, vol. 49, no. 5, pp. 1073–1096, May. 2003.
[22] R. Narasimhan, “Finite-SNR diversity–multiplexing tradeoff for correlated Rayleigh and Rician MIMO channels,” IEEE Trans. Inf. Theory, vol. 52, no. 9, pp. 3965–3979, Sep. 2006.
[23] S. Loyka and G. Levin, “Finite-SNR diversity-multiplexing tradeoff via asymptotic analysis of large MIMO systems,” IEEE Trans. Inf. Theory, vol. 56, no. 10, pp. 4781–4792, Oct. 2010.
[24] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to algorithms. MIT press, 2009.
[25] H. Liu, X. Yuan, and Y.-J. A. Zhang, “Matrix-calibration-based cascaded channel estimation for reconfigurable intelligent surface assisted multiuser mimo,” IEEE J. Sel. Areas Commun., vol. 38, no. 11, pp. 2621–2636, Nov. 2020.
[26] C. Hu, L. Dai, S. Han, and X. Wang, “Two-timescale channel estimation for reconfigurable intelligent surface aided wireless communications,” IEEE Trans. Commun., vol. 69, no. 11, pp. 7736–7747, Nov. 2021.
[27] Y.-C. Liang and F. P. S. Chin, “Downlink channel covariance matrix (DCCM) estimation and its applications in wireless DS-CDMA systems,” IEEE J. Sel. Areas Commun., vol. 19, no. 2, pp. 222–232, Feb. 2001.
[28] Y. Chen, A. Wiesel, Y. C. Eldar, and A. O. Hero, “Shrinkage algorithms for MMSE covariance estimation,” IEEE Trans. Signal Process., vol. 58, no. 10, pp. 5016–5029, Oct. 2010.
[29] R. Couillet, M. Debbah, and J. W. Silverstein, “A deterministic equivalent for the analysis of correlated MIMO multiple access channels,” IEEE Trans. Inf. Theory, vol. 57, no. 6, pp. 3493–3514, Jan. 2011.
[30] C.-K. Wen, G. Pan, K.-K. Wong, M. Guo, and J.-C. Chen, “A deterministic equivalent for the analysis of non-Gaussian correlated MIMO multiple access channels,” IEEE Trans. Inf. Theory, vol. 59, no. 1, pp. 329–352, Sep. 2012.
[31] X. Zhang and S. H. Song, “Bias for the trace of the resolvent and its application on non-Gaussian and non-centered MIMO channels,” IEEE Trans. Inf. Theory, vol. 68, no. 5, pp. 2857–2876, May. 2022.
[32] W. Hachem, O. Khorunzhiy, P. Loubaton, J. Najim, and L. Pastur, “A new approach for mutual information analysis of large dimensional multi-antenna channels,” IEEE Trans. Inf. Theory, vol. 54, no. 9, pp. 3987–4004, Sep. 2008.
[33] J. Hoydis, R. Couillet, and M. Debbah, “Asymptotic analysis of double-scattering channels,” in Proc. Conf. Rec. 45th Asilomar Conf. Signals, Syst. Comput. (ASILOMAR), Pacific Grove, CA, USA, Nov, 2011, pp. 1935–1939.
[34] ——, “Iterative deterministic equivalents for the performance analysis of communication systems,” arXiv preprint arXiv:1112.4167, 2011.
[35] W. Hachem, M. Kharouf, J. Najim, and J. W. Silverstein, “A CLT for information-theoretic statistics of non-centered Gram random matrices,” Random Matrices: Theory and Applications, vol. 1, no. 02, p. 1150010, 2012.
[36] M. Jung, W. Saad, Y. Jang, G. Kong, and S. Choi, “Performance analysis of large intelligent surfaces (LISs): Asymptotic data rate and channel hardening effects,” IEEE Trans. Wireless Commun., vol. 19, no. 3, pp. 2052–2065, Mar. 2020.
[37] H. Zhang, S. Ma, Z. Shi, X. Zhao, and G. Yang, “Sum-rate maximization of ris-aided multi-user mimo systems with statistical csi,” arXiv preprint arXiv:2112.11936, Dec. 2021.
[38] S. Zheng, “Central limit theorems for linear spectral statistics of large dimensional F-matrices,” in Annales de l’IHP Probabilités et statistiques, vol. 48, no. 2, 2012, pp. 444–476.
[39] F. Götze, A. Naumov, and A. Tikhomirov, “Distribution of linear statistics of singular values of the product of random matrices,” Bernoulli, vol. 23, no. 4B, pp. 3067–3113, 2017.
[40] A. Kammoun, M. Debbah, M.-S. Alouini et al., “Asymptotic analysis of RZF over double scattering channels with MMSE estimation,” IEEE Trans. Wireless Commun., vol. 18, no. 5, pp. 2509–2526, May. 2019.
[41] L. Armijo, “Minimization of functions having Lipschitz continuous first partial derivatives,” Pacific Journal of mathematics, vol. 16, no. 1, pp. 1–3, Jan. 1966.
[42] Y. Yang, M. Pesavento, Z.-Q. Luo, and B. Ottersten, “Inexact block coordinate descent algorithms for nonsmooth nonconvex optimization,” IEEE Trans. Signal Process., vol. 68, pp. 947–961, Dec. 2019.
[43] Y. Ma, Y. Shen, X. Yu, J. Zhang, S. H. Song, and K. B. Letaief, “A low-complexity algorithmic framework for large-scale IRS-assisted wireless systems,” in Proc. IEEE Global Commun. Conf. Wkshps. (GLOBECOM Wkshps), Taipei, Taiwan, Dec. 2020, pp. 1–6.
[44] V. Y. Pan and Z. Q. Chen, “The complexity of the matrix eigenproblem,” in Proceedings of the thirty-first annual ACM symposium on Theory of computing, 1999, pp. 507–516.
[45] M. Chiani, D. Dardari, and M. K. Simon, “New exponential bounds and approximations for the computation of error probability in fading channels,” IEEE Trans. Wireless Commun., vol. 2, no. 4, pp. 840–845, Jul. 2003.
[46] J. Hoydis, S. Ten Brink, and M. Debbah, “Massive MIMO in the UL/DL of cellular networks: How many antennas do we need?” IEEE J. Sel. Areas Commun., vol. 31, no. 2, pp. 160–171, Feb. 2013.
[47] Q. Wu and R. Zhang, “Intelligent reflecting surface enhanced wireless network via joint active and passive beamforming,” IEEE Trans. Wireless Commun., vol. 18, no. 11, pp. 5394–5409, Nov. 2019.
[48] S. Yang and J.-C. Belfiore, “Diversity-multiplexing tradeoff of double scattering MIMO channels,” IEEE Trans. Inf. Theory, vol. 57, no. 4, pp. 2027–2034, Apr. 2011.
[49] P. Billingsley, Probability and measure. John Wiley & Sons, 2008.
[50] W. Hachem, P. Loubaton, and J. Najim, “A CLT for information-theoretic statistics of gram random matrices with a given variance profile,” The Annals of Applied Probability, vol. 18, no. 6, pp. 2071–2130, 2008.
[51] F. Rubio, X. Mestre, and W. Hachem, “A CLT on the SNR of diagonally loaded MVDR filters,” IEEE Trans. Signal Process., vol. 60, no. 8, pp. 4178–4195, Aug. 2012.
[52] R. A. Horn and C. R. Johnson, Matrix analysis. Cambridge university press, 2012.
[53] A. Mueller, A. Kammoun, E. Björnson, and M. Debbah, “Linear precoding based on polynomial expansion: Reducing complexity in massive mimo,” EURASIP journal on wireless communications and networking, vol. 2016, no. 1, pp. 1–22, Feb. 2016.

$\displaystyle\|\varepsilon_{z}\|$	$\displaystyle=\|\frac{1}{M}\mathbb{E}\operatorname{Tr}((\frac{1}{L}\operatorname{Tr}\mathbb{E}\mathbb{R}\mathbb{Q})^{-1}\mathbb{I}_{L}+\mathbb{S}\mathbb{D})^{-1}$	(86)
	$\displaystyle\times\mathbb{S}\mathbb{D}\mathbb{Z}^{H}\mathbb{Q}\mathbb{Z}\mathbb{B}(\frac{1}{L}\operatorname{Tr}(\mathbb{R}\mathbb{Q})-\frac{1}{L}\mathbb{E}\operatorname{Tr}(\mathbb{R}\mathbb{Q}))\|$
	$\displaystyle\leq\frac{K}{ML}\mathrm{Var}^{\frac{1}{2}}(\operatorname{Tr}(\mathbb{R}\mathbb{Q}))=O(\frac{\\|\mathbb{D}\\|^{2}}{M}),$

		$\displaystyle\|h(\delta^{(t)})-\hat{h}(\hat{\delta}^{(t)})\|\leq\|h(\delta^{(t)})-h(\hat{\delta}^{(t)})\|$		(98)
		$\displaystyle+\|f(\hat{\delta}^{(t)})-\hat{f}(\hat{\delta}^{(t)})\|\leq\beta_{\delta}\|f(\delta^{(t-1)})-\hat{f}(\hat{\delta}^{(t-1)})\|$
		$\displaystyle+\varepsilon_{f}\leq\beta_{\delta}^{2}\|f(\delta^{(t-2)})-\hat{f}(\hat{\delta}^{(t-2)})\|+\beta\varepsilon_{f}+\varepsilon_{f}$
		$\displaystyle\leq...<\frac{\varepsilon_{f}}{1-\beta_{\delta}}.$