Interleaved Training for Massive MIMO Downlink via Exploring Spatial Correlation

Cheng Zhang, Chang Liu, Yindi Jing,
Minjie Ding, Yongming Huang This work was supported in part by the National Natural Science Foundation of China under Grant 62271140 and 62225107, the Fundamental Research Funds for the Central Universities 2242022k60002, the Natural Science Foundation on Frontier Leading Technology Basic Research Project of Jiangsu under Grant BK20222001, and the Major Key Project of PCL. (Corresponding authors: Y. Huang, C. Zhang)C. Zhang, C. Liu, M. Ding and Y. Huang are with the National Mobile Communication Research Laboratory, Southeast University, Nanjing 210096, China, and also with the Purple Mountain Laboratories, Nanjing 211111, China (e-mail: zhangcheng_seu, 220210844, 220200816, huangym@seu.edu.cn).Y. Jing is with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton T6G 1H9, Canada (e-mail: yindi@ualberta.ca).

Abstract

Interleaved training has been studied for single-user and multi-user massive MIMO downlink with either fully-digital or hybrid beamforming. However, the impact of channel correlation on its average training overhead is rarely addressed. In this paper, we explore the channel correlation to improve the interleaved training for single-user massive MIMO downlink. For the beam-domain interleaved training, we propose a modified scheme by optimizing the beam training codebook. The basic antenna-domain interleaved training is also improved by dynamically adjusting the training order of the base station (BS) antennas during the training process based on the values of the already trained channels. Exact and simplified approximate expressions of the average training length are derived in closed-form for the basic and modified beam-domain schemes and the basic antenna-domain scheme in correlated channels. For the modified antenna-domain scheme, a deep neural network (DNN)-based approximation is provided for fast performance evaluation. Analytical results and simulations verify the accuracy of our derived training length expressions and explicitly reveal the impact of system parameters on the average training length. In addition, the modified beam/antenna-domain schemes are shown to have a shorter average training length compared to the basic schemes.

Index Terms:

Massive MIMO, interleaved training, spatial correlation, conditional distribution, training overhead.

I Introduction

Via exploiting the large number of spatial degrees-of-freedom provided by large-scale antenna arrays, massive MIMO systems can achieve significant performance improvement compared to conventional MIMO systems [1, 2]. One crucial practical issue for massive MIMO downlink is the acquisition of channel state information (CSI) at the base station (BS), especially for frequency-division-duplexing (FDD) systems with no uplink-downlink channel reciprocity [3]. Traditional downlink training and channel estimation schemes cause prohibitive training overhead due to the massive number of channel coefficients to be estimated [4].

Existing studies on the downlink CSI acquisition of massive MIMO can be divided into the following categories. In [5, 6, 7, 8], the channel statistics, e.g., spatial and/or temporal correlation, are utilized to conduct beamformed channel estimation. In [9, 10, 11, 12, 13, 14], compressive sensing algorithms are designed to exploit the channel sparsity in the angular domain and/or the common sparsity among users and/or subcarriers. In [15, 16, 17, 18], either the partial reciprocity between the uplink and downlink channels, e.g., with similar angle and delay of propagation paths, or their implicit relationships, e.g., both being the functions of user location, are used for channel training designs.

These aforementioned schemes aim to obtain the complete antenna-domain CSI with the smallest possible pilot overhead before the data transmissions, thus the training design and data transmission design are decoupled, which imposes limitations on the tradeoff between training overhead and performance. Further, only the throughput or diversity gain has been considered in these existing works. The quality-of-service (QoS) provided by the obtained CSI is not taken into consideration during the training process. Therefore, the training length or pilot overhead is fixed and does not adjust according to specific channel realizations. For massive MIMO systems, it is possible to use partial CSI to design the beamforming scheme for the data transmission period, especially with the outage probability performance measure. A new training method, namely the interleaved training, is proposed to dynamically adjust the training overhead according to the required QoS and specific channel realizations.

The idea of interleaved training was proposed in [19, 20] for the downlink of single-user full-digital massive antenna systems with independent and identically distributed (i.i.d.) channels, where the channels of different antennas are trained sequentially and the estimated CSI or indicator is fed back at the end of each training step. With each feedback, the BS decides to conduct the training of another antenna’s channel or to terminate the training process based on whether an outage occurs. Compared to traditional schemes, the interleaved training can achieve a significant reduction in training overhead with no degradation of outage performance.

The work in [21] applied the idea of interleaved training to beam-domain transmission, where joint beam-based interleaved training and data transmission schemes are proposed for massive MIMO systems with single and multiple users. In [22], an improved codebook is further designed for interleaved training in millimeter-wave hybrid massive MIMO downlink. In [23], a joint interleaved training and transmission design is proposed for large-intelligent surface (LIS) assisted systems under i.i.d. Rayleigh fading channels. Recently, an interleaved training design is proposed for multi-user massive MIMO downlink in [24, 25], in which analytical results on the training length and the transmission success rate are provided for the maximum-ratio transmission (MRT) precoding. Different from the single-user scheme, the multi-user scheme needs to judge whether the signal-to-interference-plus-noise (SINR) requirements of all users can be satisfied with partial CSI during the interleaved training procedure. The advantages of interleaved training are clearly demonstrated in the above studies.

Different from previous papers on interleaved training [19, 20, 21, 22, 23, 24, 25], in this paper, we focus on exploiting channel statistics to further improve the performance of interleaved training for single-user massive MIMO downlink systems. Both beam-domain and antenna-domain modified interleaved training designs are proposed, and we further provide analytical results on the average training length of the proposed schemes and their comparison with those of the basic schemes. Detailed contributions are summarized as follows.

•

In the modified beam-domain scheme, both the beam direction and the beam training order are optimized based on the channel correlation information. In the modified antenna-domain scheme, the BS antenna training order is dynamically adjusted during the training process according to the channel values of the already trained antennas. To this end, we derive the conditional distribution of the untrained BS channels given the values of the trained channels under general correlated channels. And for exponentially correlated channels, we demonstrate that the conditional distribution of any untrained antenna’s channel is only dependent on the channels of its nearest antennas on both sides in the already-trained antenna set, which significantly simplifies the complexity of the modified antenna-domain scheme.
•

Closed-form expressions of the average training length are derived for the basic and modified beam-domain interleaved training schemes with general correlated channels. For exponentially correlated channels, we further provide a simplified approximation of the average training length for the modified scheme when the number of BS antennas is large.
•

A closed-form average-training-length expression is also derived for the basic antenna-domain interleaved training with general correlated channels, and its simple approximation is given for exponentially correlated channels. For the average training length of the modified antenna-domain interleaved training, we propose a deep neural network (DNN)-based approximation to achieve fast performance evaluation.
•

Simulation results verify our derived analytical expressions and theoretical analysis on the impact of systems parameters, e.g., channel correlation, antenna number, and the requirement of the signal-noise-ratio (SNR), on the average training length of the basic and modified antenna/beam-domain interleaved training schemes under two typical correlated channels. In addition, simulations also demonstrate that our proposed modified antenna and beam-domain interleaved training schemes both outperform the basic interleaved training schemes.

The remainder of this paper is organized as follows. In Section II, we introduce the single-user massive MIMO downlink system with two typical channel correlation models and the basic antenna/beam-domain interleaved training scheme. In Section III, the modified beam-domain interleaved training is proposed along with related analytical results. In Section IV, we propose the modified antenna-domain interleaved training and conduct theoretical analysis. Simulations are provided in Section V. Section VI summarizes this work. Some proofs are included in the appendix.

Notation: Bold upper and bold lower case letters denote matrices and vectors. $\mathbb{{C}}^{m\times n}$ denotes the $m$ by $n$ dimensional complex space. $\mathbf{I}_{n}$ denote the $n$ -dimensional identity matrix. The conjugate transpose, transpose, determinant, adjugate matrix, rank and inverse of $\mathbf{A}$ are denoted by $\mathbf{A}^{\rm{H}}$ , $\mathbf{A}^{{\rm T}}$ , ${\rm det}\left(\mathbf{A}\right)$ , ${\rm adj}\left(\mathbf{A}\right)$ , ${\rm rank}\left(\mathbf{A}\right)$ and $\mathbf{A}^{\rm-1}$ . The vector $\mathbf{a}_{i}$ denotes the $i$ -th column of the matrix $\mathbf{A}$ . For a vector ${\mathbf{a}}$ , ${a}_{n}$ is the $n$ -th element of $\mathbf{a}$ and $\mathbf{a}_{\mathbb{S}}$ is the sub-vector composed of the $s$ -th element of $\mathbf{a}$ for $s\in\mathbb{S}$ when the subscript is an index set $\mathbb{S}$ . Similarly, $[\mathbf{A}]_{m,n}$ is the $(m,n)$ -th element of $\mathbf{A}$ and $[\mathbf{A}]_{\mathbb{S},\mathbb{T}}$ is the sub-matrix composed of the $(s,t)$ -th element of $\mathbf{A}$ for $s\in\mathbb{S}$ and $t\in\mathbb{T}$ . Define $[m:n]$ as the set $\{m,m+1,\dots,n\}$ . $\Pr\left(\mathcal{A}\right)$ represents the probability of event $\mathcal{A}$ . $\left\lfloor\right\rfloor$ represents the floor function. $\left\|\mathbf{a}\right\|$ denotes the Euclidean norm of $\mathbf{a}$ . ${\rm diag}(\mathbf{a})$ is the diagonal matrix whose diagonal entries are elements of vector $\mathbf{a}$ . $f_{X}\left(\cdot\right)$ denotes the probability density function (PDF) of a random variable (RV) $X$ . $\mathcal{CN}\left({\boldsymbol{\mu}},\boldsymbol{\Sigma}\right)$ denotes the circularly symmetric complex Gaussian distribution with mean vector $\boldsymbol{\mu}$ and covariance matrix $\boldsymbol{\Sigma}$ . ${{\chi}^{2}}\left(k\right)$ denotes the chi-squared distribution with $k$ being the degrees of freedom. ${{\chi}^{2}}\left(k,\lambda\right)$ denotes the noncentral chi-squared distribution with $k$ and $\lambda$ being the degrees of freedom and the non-centrality parameter. ${{Q}_{1}}\left(a,b\right)$ is the first order Marcum Q-function. $\cong$ denotes the equality in distribution.

II System Model

We consider a massive MIMO downlink system with an $M$ -antenna BS and a single-antenna user equipment (UE). The downlink BS-UE channel, denoted as $\mathbf{h}$ , is modeled as a circular-symmetric complex Gaussian vector following the distribution $\mathcal{CN}\left(\mathbf{0},\mathbf{R}_{\mathbf{h}}\right)$ , where $\mathbf{R}_{\mathbf{h}}$ is the channel covariance matrix. One typical correlation model is the one-ring correlation model [5],[26], which is expressed as

\mathbf{R}_{\mathbf{h}}=\int_{\Theta_{\rm min}}^{\Theta_{\rm max}}g(\theta)\boldsymbol{\alpha}(\theta)\boldsymbol{\alpha}^{{\rm H}}(\theta)d\theta,

(1)

where $\left[\Theta_{\rm min},\Theta_{\rm max}\right]$ is the angle interval of the channel power seen at the BS, $g(\cdot)$ represents the power angle spectrum (PAS), satisfying $\int_{\Theta_{\rm min}}^{\Theta_{\rm max}}g(\theta)d\theta=1$ , and $\boldsymbol{\alpha}(\theta)\in\mathbb{C}^{M\times 1}$ is the BS array response vector. For the uniform linear array (ULA), $\boldsymbol{\alpha}(\theta)=\left[1,\dots,e^{-j2\pi D\sin\left(\theta\right)\left(M-1\right)}\right]^{{\rm T}}$ where $D$ is the antenna spacing ratio. Another typical correlation model is the exponential one [27], i.e.,

[\mathbf{R}_{\mathbf{h}}]_{m,n}=\rho^{m-n},\forall m\geq n,m,n=1,...,M,

(2)

where $\rho$ , satisfying $r=\left|\rho\right|<1$ , is the channel correlation between adjacent antennas. This is a simple single-parameter model commonly used for many communication problems, which is also physically reasonable in the sense that the correlation decreases with increasing distance between antennas, e.g., in the ULA.

The downlink transmission can be represented as

y=\sqrt{P}\mathbf{h}^{\rm H}\mathbf{w}s+n,

(3)

where $y$ is the received signal at the user, $\mathbf{w}\in\mathbb{C}^{M}$ is the antenna-domain beamformer at the BS with the unit norm, i.e., $\left\|\mathbf{w}\right\|=1$ , $s$ is the transmitted symbol with unit average power, $P$ is the transmit power and $n$ is the normalized receive noise at the UE which follows $\mathcal{CN}\left(0,1\right).$ The received SNR can be written as

{\rm SNR}=P\left|\mathbf{h}^{\rm H}\mathbf{w}\right|^{2}.

(4)

If the beam-domain transmission is conducted, $\mathbf{w}$ can be decomposed into two parts: $\mathbf{w}=\mathbf{W}_{\rm O}\mathbf{w}_{\rm I}$ , where $\mathbf{W}_{\rm O}\in\mathbb{C}^{M\times B}$ ( $B\leq M$ ) is the external beamforming matrix, $\mathbf{w}_{\rm I}\in\mathbb{C}^{B}$ is the beam-domain beamformer, and $B$ is the number of beams. One typical $\mathbf{W}_{\rm O}$ is the normalized discrete Fourier transformation (DFT) matrix $\mathbf{D}\in\mathbb{C}^{M\times M}$ with $[\mathbf{D}]_{m,n}=e^{j2\pi\frac{(m-1)(n-1)}{M}}/\sqrt{M}$ . Define the $B$ -dimensional beam-domain channel as $\bar{\mathbf{h}}=\mathbf{W}_{\rm O}^{\rm H}\mathbf{h}$ . We have $\bar{\mathbf{h}}\sim\mathcal{CN}(\mathbf{0},\mathbf{R}_{\bar{\mathbf{h}}})$ with $\mathbf{R}_{\bar{\mathbf{h}}}=\mathbf{W}_{\rm O}^{\rm H}\mathbf{R}_{\mathbf{h}}\mathbf{W}_{\rm O}$ . Eq. (4) can be converted to

{\rm SNR}=P\left|\bar{\mathbf{h}}^{\rm H}\mathbf{w}_{\rm I}\right|^{2}.

(5)

For a given target data transmission rate $R_{\rm th}$ , an outage event occurs if ${\rm log_{2}\left(1+SNR\right)}<R_{\rm th}$ , or equivalently if ${\rm SNR}<P\alpha_{\rm th}$ , where $\alpha_{\rm th}=(2^{R_{\rm th}}-1)/P$ is the normalized receive SNR threshold.

II-A General Framework of Interleaved Training and the Basic Training Scheme

In this subsection, we introduce the general antenna-domain and beam-domain interleaved training and give the basic interleaved training algorithms proposed in [20, 21] as the baseline of our study. For a uniform representation, we define

\left(\tilde{\mathbf{h}},\tilde{\mathbf{w}},L\right)=\begin{cases}\left(\mathbf{h},\mathbf{w},M\right),\text{ for antenna-domain training}\\ \left(\bar{\mathbf{h}},\mathbf{w}_{\rm I},B\right),\text{ for beam-domain training}\\ \end{cases}.

(6)

In the general antenna/beam-domain interleaved training scheme, the BS trains the channel of one antenna/beam for each step, and the order of the antennas/beams during the training is determined according to a predefined criterion. After the $l$ -th training step, the UE knows $\tilde{\mathbf{h}}_{\mathbb{A}_{l}}$ with $\mathbb{A}_{l}$ denoting the set of indices of the already trained BS antennas/beams ¹¹1The main purpose of interleaved training is to reduce the training overhead. In order to focus on the theoretical analysis and give more insights, we do not consider the error resulting from channel estimation and feedback quantization in our study.. To maximize the receive SNR based on this currently acquired CSI, the BS can conduct the following downlink beamforming

\tilde{w}_{n}=\begin{cases}\frac{\tilde{h}_{n}}{\left\|\tilde{\mathbf{h}}_{\mathbb{A}_{l}}\right\|},&\text{if }n\in{\mathbb{A}_{l}}\\ 0,&\text{if }n\notin{\mathbb{A}_{l}}\\ \end{cases}.

(7)

The receive SNR of this beamformer is thus ${\rm SNR}=P\left\|\tilde{\mathbf{h}}_{\mathbb{A}_{l}}\right\|^{2}$ . Based on whether an outage occurs, i.e., $\left\|\tilde{\mathbf{h}}_{\mathbb{A}_{l}}\right\|^{2}<\alpha_{\rm th}$ , the UE decides to notify the BS to continue training by one bit $0$ , or feed back one bit $1$ and channel values of the already trained antennas/beams to the BS for transmission beamforming.

In the basic antenna/beam-domain interleaved training scheme as shown in Algorithm 1, the channel of one BS antenna or one DFT beam is trained for each step, and the antennas/beams are trained sequentially following their original indices, i.e., after the $l$ -th training step, the index set of the already trained antennas/beams is $[1:l]$ .

Algorithm 1 Basic antenna/beam-domain interleaved training scheme[20, 21]

1:Initialization:

\mathbb{A}_{1}=\left\{1\right\}

;

l=1

; The BS sends a pilot for the UE to acquire

\tilde{h}_{1}

;

2:While

\left\|\tilde{\mathbf{h}}_{\mathbb{A}_{l}}\right\|^{2}<\alpha_{\rm th}

l<L

3: The UE sends one bit

0

to the BS;

4: The BS sends a pilot for the UE to acquire

\tilde{h}_{l+1}

;

l=l+1

;

\mathbb{A}_{l}=\left\{\mathbb{A}_{l-1},l\right\}

;

6:end

7:if

\left\|\tilde{\mathbf{h}}_{\mathbb{A}_{l}}\right\|^{2}\geq\alpha_{\rm th}

8: The UE feeds back one bit

1

and

\tilde{\mathbf{h}}_{\mathbb{A}_{l}}

to the BS;

9: The BS conducts downlink beamforming according to Eq. (7);

10:else

11: The UE feeds back one bit

0

to the BS;

12:end

III Modified Beam-Domain Interleaved Training and Performance Analysis

In the basic beam-domain interleaved training scheme [21], the adopted DFT beams give no guarantee to align the effective propagation paths, and nor does it consider setting a higher training priority for beams with stronger average power. In the following, we explore the channel covariance matrix to improve the beam-domain interleaved training via addressing the above issues. In addition, we perform analysis on the average training length of the modified beam-domain interleaved training and compare it with that of the basic beam-domain interleaved training to reveal the advantages of the modified design. The methods of acquiring channel covariance matrix at the BS in massive MIMO systems can be referred to [28, 29, 30].

III-A Modified Beam-Domain Training Design

Recall that the channel covariance matrix $\mathbf{R}_{\mathbf{h}}$ is positive semi-definite and we denote its rank as $r_{M}$ . We consider the compact eigenvalue decomposition of $\mathbf{R}_{\mathbf{h}}$ : $\mathbf{R}_{\mathbf{h}}=\mathbf{U}\boldsymbol{\Sigma}\mathbf{U}^{\rm H}$ , where $\mathbf{U}$ is an $M\times r_{M}$ semi-unitary matrix and $\boldsymbol{\Sigma}={\rm diag}\{\delta_{1},...,\delta_{r_{M}}\}$ with ${\delta}_{1}\geq{\delta}_{2}\cdots\geq{\delta}_{r_{M}}>0$ . With the knowledge of $\mathbf{U}$ and $\boldsymbol{\Sigma}$ , the BS can set $\mathbf{W}_{\rm O}=\mathbf{U}$ , implying that $B=r_{M}$ , and therefore $\bar{\mathbf{h}}=\mathbf{U}^{\rm H}\mathbf{h}$ is the $B$ -dimentional vector of the beam-domain channel coefficients. In the modified scheme, the BS trains the $B$ effective beams $\mathbf{u}_{1},\mathbf{u}_{2},\cdots,\mathbf{u}_{B}$ in turn, such that the beams are trained with decreasing average power. After $b$ steps of beam training, the BS obtains the beam-domain channels $\bar{\mathbf{h}}_{\mathbb{A}_{b}}$ where $\mathbb{A}_{b}=\{1,...,b\}$ . The BS conducts the beam-domain precoding $\mathbf{w}_{\rm I}\in\mathbb{C}^{B}$ according to Eq. (7). And an outage occurs if $\left\|\bar{\mathbf{h}}_{\mathbb{A}_{b}}\right\|^{2}<{{\alpha}_{{\rm th}}}$ . With this beam ordering, the specific process of the modified beam-domain training scheme can be referred to as Algorithm 1. The modified scheme has both beam alignment and ordering through the eigenmatrix $\mathbf{U}$ of the channel covariance matrix. This is the major difference from the basic one.

From the Toeplitz eigen-subspace approximation result in [31], the eigenvectors of the one-ring covariance matrix in Eq. (1) and those of the exponential covariance matrix in Eq. (2) can both be well approximated by the columns of a DFT matrix for $M\gg 1$ . As $M$ increases asymptotically to infinity, both the modified scheme and the basic scheme use the DFT codebook for beam training and their difference then only lies in the order of the beams during training. In this case, since the modified scheme trains the beams with decreasing average power sequentially, it has a shorter average training length.

III-B Average Training Length Analysis

Considering that the difference between the basic beam-domain interleaved training and the modified one lies only in the use of the training beam codebook, we first give the analysis of the average training length for the beam-domain interleaved training scheme with any given beam codebook, based on which the average training length of the modified scheme and its comparison with that of the basic scheme are subsequently given.

III-B1 Analysis for the General Beam-Domain Scheme

Recall that the receiver SNR after the $b$ -th training step is ${\rm SNR}=P\left\|\bar{\mathbf{h}}_{\mathbb{A}_{b}}\right\|^{2}$ . From Algorithm 1, we can see that the training stops after the $b$ -th training step with probability $\Pr\left({{\left|{{\bar{\mathbf{h}}}_{\mathbb{A}_{1}}}\right|}^{2}}\geq\alpha_{\rm th}\right)$ for $b=1$ and $\Pr\left({{\left\|{\bar{\mathbf{h}}_{\mathbb{A}_{b}}}\right\|}^{2}}\geq\alpha_{\rm th}\right)-\Pr\left({{\left\|{\bar{\mathbf{h}}_{\mathbb{A}_{b-1}}}\right\|}^{2}}\geq\alpha_{\rm th}\right)$ for $b=2,...,B-1$ . And the training stops after the $B$ -th training with probability $1-\Pr\left({{\left\|{\bar{\mathbf{h}}_{\mathbb{A}_{B-1}}}\right\|}^{2}}\geq\alpha_{\rm th}\right)$ . The average training length of the beam-domain interleaved training scheme can be expressed as

$\displaystyle L_{t}$	$\displaystyle=\Pr\left({{\left\|{{\bar{\mathbf{h}}}_{\mathbb{A}_{1}}}\right\|}^{2}}\geq\alpha_{\rm th}\right)+$	(8)
	$\displaystyle\sum_{b=2}^{B-1}b\left[\Pr\left({{\left\\|{\bar{\mathbf{h}}_{\mathbb{A}_{b}}}\right\\|}^{2}}\geq\alpha_{\rm th}\right)-\Pr\left({{\left\\|{\bar{\mathbf{h}}_{\mathbb{A}_{b-1}}}\right\\|}^{2}}\geq\alpha_{\rm th}\right)\right]$
	$\displaystyle+B\left[1-\Pr\left({{\left\\|{\bar{\mathbf{h}}_{\mathbb{A}_{B-1}}}\right\\|}^{2}}\geq\alpha_{\rm th}\right)\right]$
	$\displaystyle=1+\sum\limits_{b=1}^{B-1}{\Pr\left({{\left\\|{\bar{\mathbf{h}}_{\mathbb{A}_{b}}}\right\\|}^{2}}<\alpha_{\rm th}\right)}.$

Theorem 1.

For an arbitrary beam codebook $\mathbf{W}_{\rm O}$ , recall that $\mathbf{R}_{\bar{\mathbf{h}}}=\mathbf{W}_{\rm O}^{\rm H}\mathbf{R}_{\mathbf{h}}\mathbf{W}_{\rm O}$ and define $\tilde{\mathbf{R}}_{b}=\left[\mathbf{R}_{\bar{\mathbf{h}}}^{1/2}\right]_{\mathbb{A}_{b},[1:M]}$ . Consider the compact eigenvalue decomposition $\tilde{\mathbf{R}}_{b}^{{\rm H}}\tilde{\mathbf{R}}_{b}={{\mathbf{U}}_{b}}{{\mathbf{\Sigma}}_{b}}\mathbf{U}_{b}^{{\rm H}}$ where $\mathbf{U}_{b}$ is an $M\times r_{b}$ semi-unitary matrix, ${{\mathbf{\Sigma}}_{b}}={\rm diag}\left\{{{\delta}_{b,1}},\dots,{{\delta}_{b,r_{b}}}\right\}$ and $r_{b}$ is the rank of $\tilde{\mathbf{R}}_{b}^{{\rm H}}\tilde{\mathbf{R}}_{b}$ . Suppose that there are $T_{b}$ different eigenvalues with value of $\bar{\delta}_{b,t}$ and repeated time of $r_{b,t}$ for $t=1,...,T_{b}$ . Define $\mathbf{r}_{b}=[{{r}_{b,1}},...,{{r}_{b,T_{b}}}]^{\rm T}$ . The average training length of the general beam-domain interleaved training scheme under correlated channels can be expressed as

	$\displaystyle L_{t}=1+$	$\displaystyle\sum\limits_{b=1}^{B-1}{\prod\limits_{t=1}^{T_{b}}\left(\frac{1}{{\bar{\delta}_{b,t}}}\right)^{{{r}_{b,t}}}}\sum\limits_{k=1}^{T_{b}}\sum\limits_{s=1}^{{{r}_{b,k}}}{{\left(-1\right)}^{{{r}_{b,k}}-s}}{{\bar{\delta}_{b,k}}}^{{{r}_{b,k}}-s+1}$		(9)
		$\displaystyle\times{{\Psi}_{b,k,s,\mathbf{r}_{b}}}\left[1-{{e}^{-\frac{\alpha_{\rm th}}{{\bar{\delta}_{b,k}}}}}\sum\limits_{u=0}^{{{r}_{b,k}}-s}{\frac{{{\left(\frac{\alpha_{\rm th}}{{\bar{\delta}_{b,k}}}\right)}^{u}}}{u!}}\right],$		(9)

where ${{\Psi}_{b,k,s,\mathbf{r}_{b}}}={{\left(-1\right)}^{{{r}_{b,k}}-1}}\sum\limits_{\mathbf{i}\in{{\Omega}_{b,k,s}}}\prod\limits_{n\neq k}\left(\begin{matrix}{{i}_{n}}+{{r}_{b,n}}-1\\ {{i}_{n}}\\ \end{matrix}\right)$ $\times{{\left(\frac{1}{{\bar{\delta}_{b,n}}}-\frac{1}{{\bar{\delta}_{b,k}}}\right)}^{-\left({{i}_{n}}+{{r}_{b,n}}\right)}}$ , $\mathbf{i}={{\left[{{i}_{1}},\ldots,{{i}_{T_{b}}}\right]}^{\rm T}}$ and ${{\Omega}_{b,k,s}}=\left\{\left[{{i}_{1}},\ldots,{{i}_{T_{b}}}\right]\in{{\mathbb{Z}}^{T_{b}}};\sum\limits_{j=1}^{T_{b}}{{{i}_{j}}=s-1,{{i}_{k}}=0,{{i}_{j}}\geq 0}\text{ for all }j\right\}.$

Proof.

Recall that $\bar{\mathbf{h}}=\mathbf{W}_{\rm O}^{\rm H}\mathbf{h}\sim\mathcal{CN}(\mathbf{0},\mathbf{R}_{\bar{\mathbf{h}}})$ . We have $\bar{\mathbf{h}}_{\mathbb{A}_{b}}\cong\tilde{\mathbf{R}}_{b}\mathbf{h}_{\rm iid}$ with ${\mathbf{h}}_{\rm iid}\sim\mathcal{C}\mathcal{N}\left(\mathbf{0},{{\mathbf{I}}_{M}}\right)$ and ${{\left\|\bar{\mathbf{h}}_{\mathbb{A}_{b}}\right\|}^{2}}=\mathbf{h}_{\rm iid}^{{\rm H}}{{\mathbf{U}}_{b}}{{\mathbf{\Sigma}}_{b}}\mathbf{U}_{b}^{{\rm H}}\mathbf{h}_{\rm iid}\cong{{\tilde{\mathbf{{\rm h}}}_{\rm iid}^{b,{\rm H}}}}{{\mathbf{\Sigma}}_{b}}\tilde{\mathbf{h}}_{\rm iid}^{b}$ with $\tilde{\mathbf{h}}_{\rm iid}^{b}\sim\mathcal{C}\mathcal{N}\left(\mathbf{0},{{\mathbf{I}}_{r_{b}}}\right)$ . Therefore,

{{\left\|\bar{\mathbf{h}}_{\mathbb{A}_{b}}\right\|}^{2}}=\sum\limits_{j=1}^{r_{b}}{{{\delta}_{b,j}}{{\left|\tilde{h}_{{\rm iid},j}^{b}\right|}^{2}}}\cong\sum\limits_{t=1}^{T_{b}}{\frac{1}{2}\bar{\delta}_{b,t}}{{Q}_{b,t}},

(10)

where ${{Q}_{b,t}}\sim{{\chi}^{2}}\left(2{{r}_{b,t}}\right)$ . By using results in [32] on the sum of independent chi-square random variables, the PDF of ${{\left\|\bar{\mathbf{h}}_{\mathbb{A}_{b}}\right\|}^{2}}$ is

	$\displaystyle f\left({{{\left\\|\bar{\mathbf{h}}_{\mathbb{A}_{b}}\right\\|}^{2}}}=x;\mathbf{r}_{b},{\bar{\delta}_{b,1}},\ldots,{\bar{\delta}_{b,T_{b}}}\right)=\prod\limits_{t=1}^{T_{b}}{\left(\frac{1}{{\bar{\delta}_{b,t}}}\right)^{{{r}_{b,t}}}}$		(11)
	$\displaystyle\times\sum\limits_{k=1}^{T_{b}}{\sum\limits_{s=1}^{{{r}_{b,k}}}{\frac{{{\Psi}_{b,k,s,\mathbf{r}_{b}}}}{\left({{r}_{b,k}}-s\right)!}}}{{\left(-x\right)}^{{{r}_{b,k}}-s}}{{e}^{-\frac{x}{{\bar{\delta}_{b,k}}}}}.$		(11)

Therefore, we have

		$\displaystyle\Pr\left({{{\left\\|\bar{\mathbf{h}}_{\mathbb{A}_{b}}\right\\|}^{2}}}<\alpha_{\rm th}\right)=\prod\limits_{t=1}^{T_{b}}{\left(\frac{1}{{\bar{\delta}_{b,t}}}\right)^{{{r}_{b,t}}}}\times$		(12)
		$\displaystyle\hskip 14.22636pt\sum\limits_{k=1}^{T_{b}}{\sum\limits_{s=1}^{{{r}_{b,k}}}{\frac{{{\Psi}_{b,k,s,\mathbf{r}_{b}}}}{\left({{r}_{b,k}}-s\right)!}}}\int_{0}^{\alpha_{\rm th}}{{{\left(-x\right)}^{{{r}_{b,k}}-s}}{{e}^{-\frac{x}{{\bar{\delta}_{b,k}}}}}dx}$
		$\displaystyle=\prod\limits_{t=1}^{T_{b}}{\left(\frac{1}{{\bar{\delta}_{b,t}}}\right)^{{{r}_{b,t}}}}\sum\limits_{k=1}^{T_{b}}{\sum\limits_{s=1}^{{{r}_{b,k}}}{{{{\left(-1\right)}^{{{r}_{b,k}}-s}}{\bar{\delta}_{b,k}}^{{{r}_{b,k}}-s+1}{{\Psi}_{b,k,s,\mathbf{r}_{b}}}}}}$
		$\displaystyle\hskip 70.0001pt\times\left[1-{{e}^{-\frac{\alpha_{\rm th}}{{\bar{\delta}_{b,k}}}}}\sum\limits_{u=0}^{{{r}_{b,k}}-s}{\frac{{{\left(\frac{\alpha_{\rm th}}{{\bar{\delta}_{b,k}}}\right)}^{u}}}{u!}}\right].$

Via substituting Eq. (12) into Eq. (8), Eq. (9) can be obtained. ∎

III-B2 Analysis for the Modified Beam-Domain Scheme

The derivation for the average training length of the modified beam-domain interleaved training scheme can be referred to Theorem 1 by setting the beam codebook as $\mathbf{W}_{\rm O}=\mathbf{U}$ . Therefore, $\mathbf{R}_{\bar{\mathbf{h}}}=\mathbf{U}^{\rm H}\mathbf{R}_{\mathbf{h}}\mathbf{U}=\boldsymbol{\Sigma}$ and $\tilde{\mathbf{R}}_{b}=\left[\boldsymbol{\Sigma}^{\frac{1}{2}}\right]_{[1:b],[1:M]}$ , which leads to ${{\delta}_{b,j}}={{\delta}_{j}}$ for $b=1,\dots,B,j=1,...,b$ . It is noteworthy that since the modified scheme uses the eigenvectors of the channel covariance matrix as the beam codebook, $\delta_{b,j}$ is independent of the training step index $b$ . Suppose that there are $\bar{T}_{b}$ different values in the first $b$ eigenvalues of $\mathbf{R}_{{\mathbf{h}}}$ ${{\delta}_{j}},j=1,...,b$ with value of $\bar{\delta}_{b,t}$ and repeated times $\bar{r}_{b,t}$ for $t=1,...,\bar{T}_{b}$ . Define $\bar{\mathbf{r}}_{b}=[{\bar{r}_{b,1}},...,{\bar{r}_{b,\bar{T}_{b}}}]^{\rm T}$ .

Corollary 1.

The average training length of the modified beam-domain interleaved training scheme can be expressed as:

	$\displaystyle\sum\limits_{b=1}^{B-1}\prod\limits_{t=1}^{\bar{T}_{b}}{\left(\frac{1}{\bar{\delta}_{b,t}}\right)^{{\bar{r}_{b,t}}}}\sum\limits_{k=1}^{\bar{T}_{b}}\sum\limits_{s=1}^{\bar{r}_{b,k}}}{{{{\left(-1\right)}^{{\bar{r}_{b,k}}-s}}\bar{\delta}_{k}}^{{\bar{r}_{b,k}}-s+1}$			(13)
		$\displaystyle\times{{\Psi}_{b,k,s,\bar{\mathbf{r}}_{b}}}\left[1-{{e}^{-\frac{\alpha_{\rm th}}{{\bar{\delta}_{b,k}}}}}\sum\limits_{u=0}^{{\bar{r}_{b,k}}-s}{\frac{{{\left({\frac{\alpha_{\rm th}}{\bar{\delta}_{b,k}}}\right)}^{u}}}{u!}}\right],$		(13)

where ${{\Psi}_{b,k,s,\bar{\mathbf{r}}_{b}}}={{\left(-1\right)}^{{\bar{r}_{b,k}}-1}}\sum\limits_{\mathbf{i}\in{{\Omega}_{b,k,s}}}\prod\limits_{n\neq k}\left(\begin{matrix}{{i}_{n}}+{\bar{r}_{b,n}}-1\\ {{i}_{n}}\\ \end{matrix}\right)$ $\times{{\left(\frac{1}{\bar{\delta}_{b,n}}-\frac{1}{\bar{\delta}_{b,k}}\right)}^{-\left({{i}_{n}}+{\bar{r}_{b,n}}\right)}}$ , $\mathbf{i}={{\left[{{i}_{1}},\ldots,{{i}_{\bar{T}_{b}}}\right]}^{\rm T}}$ and ${{\Omega}_{b,k,s}}=\left\{\left[{i}_{1},\ldots,{i}_{\bar{T}_{b}}\right]\in\mathbb{Z}^{\bar{T}_{b}};\sum\limits_{j=1}^{\bar{T}_{b}}{{{i}_{j}}=s-1,{{i}_{k}}=0,{{i}_{j}}\geq 0}\text{ for all }j\right\}$ .

Proof.

The result can be directly obtained from Theorem 1. ∎

Corollary 2.

For channels with the exponential covariance matrix in Eq. (2) and $0\leq r<1$ , a large $M$ approximation of the average training length for the modified beam-domain interleaved training scheme can be written as

L_{t}=1+\sum\limits_{m=1}^{M-1}{\sum\limits_{j=1}^{m}{{{l}_{j}}\left(0\right)}}\left(1-{{e}^{-\frac{\alpha_{\rm th}}{{\delta}_{j}}}}\right),

(14)

where ${{l}_{j}}\left(0\right)=\prod\limits_{k=1,k\neq j}^{m}{\frac{\delta_{j}}{\delta_{j}-\delta_{k}}}$ and

{{\delta}_{j}}\approx\frac{1-{{r}^{2}}}{1+{{r}^{2}}+2r\cos\left(\frac{\left(M+r\right)(M+1-j)\pi}{M\left(M+1\right)}\right)}

(15)

for $j=1,\ldots,M$ .

Proof.

For the exponential covariance matrix ${{\mathbf{R}}_{{{\mathbf{h}}}}}$ , its eigenvalues ${{\delta}_{j}}$ for $M\gg 1$ can be approximated as Eq. (15) by following [33, Eq. (51)]. According to the monotonicity of $\cos(x)$ for $0<x<\pi$ , we have $\delta_{1}>\delta_{2}>\cdots>\delta_{M}>0$ , and thus $B=r_{M}=M$ , ${\bar{T}}_{b}=b$ , $\bar{\delta}_{b,t}={\delta}_{t}$ , ${\bar{r}_{b,t}}=1$ for $t=1,...,\bar{T}_{b}$ . Via substituting these into Eq. (13) in Corollary 1, Eq. (14) can be obtained. ∎

The results in Eq. (9), Eq. (13) and Eq. (14) are in closed-form and can be used to evaluate the average training length for different system parameter values.

In the following, we discuss the impact of the antenna number $M$ on the average training length of the modified beam-domain interleaved training.

Corollary 3.

For the modified beam-domain interleaved training scheme, when the use of all beams can avoid an outage, the average training length $L_{t}$ is a non-increasing function of the antenna number $M$ , in both one-ring correlated channels with non-zero AS and exponentially correlated channels.

Proof.

Denote the eigenvalues of the channel covariance matrix $\mathbf{R}_{\mathbf{h}}$ in descending order as $\lambda_{1},\lambda_{2},...,\lambda_{r_{M}}$ and $\Lambda_{1},\Lambda_{2},...,\Lambda_{r_{M+1}}$ for the number of antennas being $M$ and $M+1$ , respectively. Since the channel covariance matrix for $M$ BS antennas is a submatrix of that for $M+1$ BS antennas, we have either $r_{M+1}=r_{M}+1$ , e.g., for channels with the full-rank exponential covariance matrix in Eq. (2), or $r_{M+1}=r_{M}$ . From Eq. (8) and Eq. (10), the average training length can be expressed as $L_{t}=1+\sum_{b=1}^{r_{M}-1}{\Pr\left({\sum_{j=1}^{b}{{{\left|{{h}_{{\rm iid},j}}\right|}^{2}}{{\delta}_{j}}}}<\alpha_{\rm th}\right)}$ . Then the difference between the average training lengths for systems with $M$ and $M+1$ BS antennas for the case of $r_{M+1}=r_{M}+1$ is

L_{t}(M+1)-L_{t}(M)=\Delta+\Pr\left({\sum\limits_{j=1}^{r_{M}}{{{\left|{{h}_{{\rm iid},j}}\right|}^{2}}{{\Lambda}_{j}}}}<\alpha_{\rm th}\right),

(16)

where $\Delta=\sum_{b=1}^{r_{M}-1}\left[\Pr\left({\sum_{j=1}^{b}{{{\left|{{h}_{{\rm iid},j}}\right|}^{2}}{{\Lambda}_{j}}}}<\alpha_{\rm th}\right)-\Pr\left(\right.\right.$ $\left.\left.\sum_{j=1}^{b}{{{\left|{{h}_{{\rm iid},j}}\right|}^{2}}{{\lambda}_{j}}}<\alpha_{\rm th}\right)\right]$ . From the Eigenvalue Interlacing Theorem [34], we have $\Lambda_{r_{M+1}}\leq\lambda_{r_{M}}\leq\Lambda_{r_{M}}\leq\lambda_{r_{M}-1}\leq\Lambda_{r_{M}-1}\leq\dots\leq\lambda_{2}\leq\Lambda_{2}\leq\lambda_{1}\leq\Lambda_{1}$ . Thus $\Delta\leq 0$ . The condition that the use of all beams can meet the transmission requirement leads to $\Pr\left({\sum_{j=1}^{r_{M}}{{{\left|{{h}_{{\rm iid},j}}\right|}^{2}}{{\Lambda}_{j}}}}<\alpha_{\rm th}\right)=0$ . Therefore, $L_{t}$ is non-increasing with increasing $M$ under this condition. For the case of $r_{M+1}=r_{M}$ , this conclusion still stands due to $L_{t}(M+1)-L_{t}(M)=\Delta$ . ∎

Remark 1.

Numerical simulation based on Eq. (13) shows that the average training length $L_{t}$ increases with $M$ for small $M$ ; while for large $M$ , $L_{t}$ decreases with $M$ and converges to a constant value. This is because when $M$ is small, $\Pr\left({\sum_{j=1}^{r_{M}}{{{\left|{{h}_{{\rm iid},j}}\right|}^{2}}{{\Lambda}_{j}}}}<\alpha_{\rm th}\right)$ is the dominant term in $L_{t}(M+1)-L_{t}(M)$ in Eq. (16), which has a positive value. When $M$ is large, $\Pr\left({\sum_{j=1}^{r_{M}}{{{\left|{{h}_{{\rm iid},j}}\right|}^{2}}{{\Lambda}_{j}}}}<\alpha_{\rm th}\right)\rightarrow 0$ , therefore, as shown in Corollary 3, $L_{t}$ decreases with $M$ .

Next, we discuss the effect of the channel correlation on the average training length of the modified beam-domain interleaved training scheme for channels with the exponential covariance matrix. Numerical simulation based on Eq. (14) shows that for relatively small $\alpha_{\rm th}$ , higher channel correlation helps reduce the average training length of the modified beam-domain interleaved training. However, as $\alpha_{\rm th}$ continues to increase, an increase in the channel correlation may have the opposite effect. According to the derivative of eigenvalues in Eq. (15), i.e., ${{\delta}_{j}}=\frac{1-{{r}^{2}}}{1+{{r}^{2}}+2r\cos\left(\frac{\left(M+r\right)(M+1-j)\pi}{M\left(M+1\right)}\right)},\forall j=1,\ldots,M$ , with respect to $r$ , larger eigenvalues increase, while smaller eigenvalues become smaller, as $r$ increases. For very large $\alpha_{\rm th}$ , $\Pr\left({\sum_{j=1}^{b}{{{\left|{{h}_{{\rm iid},j}}\right|}^{2}}{{\delta}_{j}}}}<\alpha_{\rm th}\right)\approx 1$ for $b<r_{M}-1$ , and $\Pr\left({\sum_{j=1}^{r_{M}-1}{{{\left|{{h}_{{\rm iid},j}}\right|}^{2}}{{\delta}_{j}}}}<\alpha_{\rm th}\right)$ has the greatest impact on $L_{t}$ . In this case, smaller $r$ results in flatter eigenvalue distribution which provides lower $\Pr\left({\sum_{j=1}^{r_{M}-1}{{{\left|{{h}_{{\rm iid},j}}\right|}^{2}}{{\delta}_{j}}}}<\alpha_{\rm th}\right)$ and shorter $L_{t}$ . For small enough $\alpha_{\rm th}$ , $\Pr\left({\sum_{j=1}^{b}{{{\left|{{h}_{{\rm iid},j}}\right|}^{2}}{{\delta}_{j}}}}<\alpha_{\rm th}\right)\approx 0$ for $2<b\leq r_{M}-1$ , and $\Pr\left({{{{\left|{{h}_{{\rm iid},1}}\right|}^{2}}{{\delta}_{1}}}}<\alpha_{\rm th}\right)$ has the greatest impact on $L_{t}$ . In this case, larger $r$ results in higher ${\delta}_{1}$ and shorter $L_{t}$ .

IV Modified Antenna-Domain Interleaved Training and Performance Analysis

In this section, we first discuss the impact of channel correlation on the average training length of the basic antenna-domain interleaved training. Then we derive the conditional distribution of channels of un-trained BS antennas based on channel values of the already trained BS antennas during the interleaved training process, based on which we further propose the design of the modified antenna-domain interleaved training.

IV-A Average Training Length Analysis and Impact of Channel Correlation

In the following, we give closed-form expressions of the average training length of the basic antenna-domain interleaved training scheme under general correlated channels and exponentially correlated channels respectively.

Compared to Theorem 1, the only difference in the derivation on the average training length of the basic antenna-domain interleaved training is that the covariance matrix of the trained channels after $m$ training steps is $\tilde{\mathbf{R}}_{m}=\left[\mathbf{R}_{{{\mathbf{h}}}}^{\frac{1}{2}}\right]_{[1:m],[1:M]}$ and the vector of the trained channels can be represented as ${{\mathbf{h}}_{\mathbb{A}_{m}}}\cong\tilde{\mathbf{R}}_{m}\mathbf{h}_{\rm iid}$ . Consider the compact eigenvalue decomposition: $\tilde{\mathbf{R}}_{m}^{{\rm H}}\tilde{\mathbf{R}}_{m}={{\mathbf{U}}_{m}}{{\mathbf{\Sigma}}_{m}}\mathbf{U}_{m}^{{\rm H}}$ where ${{\mathbf{\Sigma}}_{m}}={\rm diag}\left\{{{\delta}_{m,1}},\ldots,{{\delta}_{m,r_{m}}}\right\}$ and $r_{m}$ is the rank of $\tilde{\mathbf{R}}_{m}^{{\rm H}}\tilde{\mathbf{R}}_{m}$ . Then we have ${{\left\|{{\mathbf{h}}_{\mathbb{A}_{m}}}\right\|}^{2}}\cong{{\tilde{\mathbf{{\rm h}}}_{\rm iid}^{m,{\rm H}}}}{{\mathbf{\Sigma}}_{m}}\tilde{\mathbf{h}}_{\rm iid}^{m}$ with $\tilde{\mathbf{h}}_{\rm iid}^{m}\sim\mathcal{C}\mathcal{N}\left(\mathbf{0},{{\mathbf{I}}_{r_{m}}}\right)$ . Suppose that there are $T_{m}$ different eigenvalues with value of $\bar{\delta}_{m,t}$ and repeated times of $r_{m,t}$ for $t=1,...,T_{m}$ . Define $\mathbf{r}_{m}=[{{r}_{m,1}},...,{{r}_{m,T_{m}}}]^{\rm T}$ .

Theorem 2.

The average training length of the basic antenna-domain interleaved training scheme under general correlated channels can be expressed as

	$\displaystyle L_{t}=1+$	$\displaystyle\sum\limits_{m=1}^{M-1}{\prod\limits_{t=1}^{T_{m}}\left(\frac{1}{{\bar{\delta}_{m,t}}}\right)^{{{r}_{m,t}}}}\sum\limits_{k=1}^{T_{m}}\sum\limits_{s=1}^{{{r}_{m,k}}}{{\left(-1\right)}^{{{r}_{m,k}}-s}}{{\bar{\delta}_{m,k}}}^{{{r}_{m,k}}-s+1}$		(17)
		$\displaystyle\times{{\Psi}_{m,k,s,\mathbf{r}_{m}}}\left[1-{{e}^{-\frac{\alpha_{\rm th}}{{\bar{\delta}_{m,k}}}}}\sum\limits_{u=0}^{{{r}_{m,k}}-s}{\frac{{{\left(\frac{\alpha_{\rm th}}{{\bar{\delta}_{m,k}}}\right)}^{u}}}{u!}}\right],$		(17)

where ${{\Psi}_{m,k,s,\mathbf{r}_{m}}}={{\left(-1\right)}^{{{r}_{m,k}}-1}}\sum\limits_{\mathbf{i}\in{{\Omega}_{m,k,s}}}\prod\limits_{n\neq k}\left(\begin{matrix}{{i}_{n}}+{{r}_{m,n}}-1\\ {{i}_{n}}\\ \end{matrix}\right)$ $\times{{\left(\frac{1}{{\bar{\delta}_{m,n}}}-\frac{1}{{\bar{\delta}_{m,k}}}\right)}^{-\left({{i}_{n}}+{{r}_{m,n}}\right)}}$ , $\mathbf{i}={{\left[{{i}_{1}},\ldots,{{i}_{T_{m}}}\right]}^{\rm T}}$ , and ${{\Omega}_{m,k,s}}=\left\{\left[{{i}_{1}},\ldots,{{i}_{T_{m}}}\right]\in{{\mathbb{Z}}^{T_{m}}};\sum\limits_{j=1}^{T_{m}}{{{i}_{j}}=s-1,{{i}_{k}}=0,{{i}_{j}}\geq 0}\text{ for all }j\right\}.$

Proof.

Please refer to the proof of Theorem 1. ∎

To analyze the impact of channel correlation on the average training length, we consider two extreme cases, i.e., the i.i.d. channels with ${{\delta}_{m,i}}=1,\forall m=1,...,M,i=1,...,l$ and the fully correlated channels with ${{\delta}_{m,1}}=m$ and ${{\delta}_{m,i}}=0,\forall i=2,...,m$ for $m=1,...,M$ .

For the i.i.d. channels, we have $r_{m}=m$ , ${T}_{m}=1$ , $\bar{\delta}_{m,1}=1$ , ${{r}_{m,1}}=m$ and ${{\Psi}_{m,k,s,\mathbf{r}_{m}}}=\left(-1\right)^{m-1},\text{ for }s=1$ ; and ${{\Psi}_{m,k,s,\mathbf{r}_{m}}}=0,\text{ for }s=2,\dots,m$ . Via substituting these into Eq. (17) in Theorem 2, we can obtain

L_{t}^{{\rm(i.i.d.)}}=1+\sum\limits_{m=1}^{M-1}{\left(1-e^{-\alpha_{\rm th}}\sum\limits_{i=0}^{m-1}{\frac{\alpha_{\rm th}^{i}}{i!}}\right)}.

(18)

According to the result in [20, Theorem 2], we have $L_{t}^{{\rm(i.i.d.)}}\leq 1+\alpha_{\rm th}$ for $M\rightarrow\infty$ .

For the fully correlated channels, we have $r_{m}=1$ , ${T}_{m}=1$ , $\bar{\delta}_{m,1}=m$ , ${{r}_{m,1}}=1$ and ${{\Psi}_{m,k,s,\mathbf{r}_{m}}}=1,\text{ for }s=1$ ; and ${{\Psi}_{m,k,s,\mathbf{r}_{m}}}=0,\text{ for }s=2,\dots,m$ . Via substituting these into Eq. (17) in Theorem 2, we can obtain

L_{t}^{{\rm(FC)}}=1+\sum\limits_{m=1}^{M-1}{\left(1-e^{-\frac{\alpha_{\rm th}}{m}}\right)}.

(19)

And the behavior of $L_{t}^{{\rm(FC)}}$ for $M\rightarrow\infty$ is given in the following corollary.

Corollary 4.

For $M\rightarrow\infty$ , we have $1+\alpha_{\rm th}\gamma-\frac{\pi^{2}}{12}\alpha_{\rm th}^{2}\leq L_{t}^{{\rm(FC)}}-\alpha_{\rm th}\ln M\leq 1+\alpha_{\rm th}\gamma$ , where $\gamma\approx 0.5772$ is the Euler’s constant.

Proof.

Define $G(x)=x+e^{-x}$ . For $m>0$ , we have $1-e^{-\frac{\alpha_{\rm th}}{m}}-\frac{\alpha_{\rm th}}{m}=-\left(G\left(\frac{\alpha_{\rm th}}{m}\right)-G(0)\right)=-\int_{0}^{\frac{\alpha_{\rm th}}{m}}G^{\prime}(u)du=-\int_{0}^{\frac{\alpha_{\rm th}}{m}}\left(1-e^{-u}\right)du\geq-\int_{0}^{\frac{\alpha_{\rm th}}{m}}udu=-\frac{\alpha_{\rm th}^{2}}{2m^{2}}$ . Meanwhile, we have $1-e^{-\frac{\alpha_{\rm th}}{m}}-\frac{\alpha_{\rm th}}{m}\leq 0$ . Therefore, we can obtain $-\sum_{m=1}^{M-1}{\frac{\alpha_{\rm th}^{2}}{2m^{2}}}\leq\sum_{m=1}^{M-1}{\left(1-e^{-\frac{\alpha_{\rm th}}{m}}\right)}-\sum_{m=1}^{M-1}{\frac{\alpha_{\rm th}}{m}}\leq 0$ . Since $\sum_{m=1}^{\infty}{\frac{1}{m^{2}}}=\frac{\pi^{2}}{6}$ and $\gamma=\lim\limits_{M\to\infty}\left(\sum_{m=1}^{M}{\frac{1}{m}}-\ln M\right)\approx 0.5772$ , we have $1+\alpha_{\rm th}\gamma-\frac{\pi^{2}}{12}\alpha_{\rm th}^{2}\leq L_{t}^{{\rm(FC)}}-\alpha_{\rm th}\ln M\leq 1+\alpha_{\rm th}\gamma$ for $M\rightarrow\infty$ . ∎

Remark 2.

When $M$ increases asymptotically, under independent channels, the average training length of the basic scheme $L_{t}^{{\rm(i.i.d.)}}$ has the upper bound $1+\alpha_{\rm th}$ ; while under fully correlated channels, the average training length of the basic scheme $L_{t}^{{\rm(FC)}}$ is proportional to $\ln M$ , which implies the negative effect of channel correlation to the average training length of the basic training scheme. Further, numerical calculations based on Eq. (18) and (19) show that when $\alpha_{\rm th}$ is small compared to $M$ , the basic scheme has a shorter average training length in the i.i.d. channels. This mainly benefits from the higher antenna diversity gain. After $\alpha_{\rm th}$ reaches about the same size as $M$ , the basic scheme has a shorter average training length in the fully correlated channels, where the consistency of antenna energy is more important than the diversity gain.

Corollary 5.

For channels with the exponential covariance matrix in Eq. (2) and almost all values of $0<r<1$ , the average training length of the basic antenna-domain interleaved training can be expressed as

L_{t}=2-{{e}^{-\alpha_{\rm th}}}+\sum\limits_{m=2}^{M-1}{\sum\limits_{j=1}^{m}{{{l}_{m,j}}\left(0\right)}}\left(1-{{e}^{-\frac{\alpha_{\rm th}}{{\delta}_{m,j}}}}\right),

(20)

where ${{l}_{m,j}}\left(0\right)=\prod\limits_{k=1,k\neq j}^{m}{\frac{{\delta_{m,j}}}{{\delta_{m,j}}-{\delta_{m,k}}}}$ and

{{{\delta}_{m,j}}}\approx\begin{cases}1-2r\cos\left(\frac{j\pi}{m+1}\right),&\text{if }0<r\ll 1\\ \frac{1-r}{2}{{\sec}^{2}}\left(\frac{j\pi}{2m}\right),&\text{if }0<1-r\ll 1\text{ }\&\\ &\hskip 5.0pt{r\neq 1-\frac{6m}{3\sec^{2}\left(\frac{j\pi}{2m}\right)+2\left(m^{2}-1\right)}}\\ \Phi(r,m,j),&\text{else}\text{ }\&\text{ }m-\sum_{i=1}^{m-1}\Phi(r,m,i)\\ &\hskip 30.00005pt\neq\Phi(r,m,j)\\ \end{cases}

(21)

for $j=1,\ldots,m-1$ and

{{\delta}_{m,m}}\approx\begin{cases}1-2r\cos\left(\frac{m\pi}{m+1}\right),&\text{if }0<r\ll 1\\ m-\frac{\left({{m}^{2}}-1\right)\left(1-r\right)}{3},&\text{if }0<1-r\ll 1\\ m-\sum\limits_{i=1}^{m-1}\Phi(r,m,i),&\text{else}\\ \end{cases}

(22)

where $\Phi(r,m,i)\triangleq\frac{1-r^{2}}{1+r^{2}+2r^{2}\cos\left(\frac{i\pi}{m}\right)+2r(1-r)\cos\left(\frac{i\pi}{m+1}\right)}$ .

Proof.

For the exponential covariance matrix, the approximations of ${{\delta}_{m,j}},j=1,...,m$ can be written as Eq. (21) and Eq. (22) according to [33, Eq. (35), Eq. (43a-b), Eq. (49a-b)]. From the monotonicity of $\cos(x)$ in $0<x<\pi$ and that of $\sec^{2}(x)$ in $0<x<\frac{\pi}{2}$ , we have that for $0<r\ll 1$ , $\delta_{m,j},j=1,...,m$ are different from each other, while for $0<1-r\ll 1$ , $\delta_{m,j},j=1,...,m-1$ are different from each other, and $\delta_{m,m}$ is different from $\delta_{m,j},j=1,...,m-1$ for $r\neq 1-\frac{6m}{3\sec^{2}\left(\frac{j\pi}{2m}\right)+2\left(m^{2}-1\right)}$ . For intermediate $r$ values, $\delta_{m,j},j=1,...,m-1$ are different from each other due to the monotonicity of $\cos(x)$ in $0<x<\pi$ as well, and $\delta_{m,m}$ is different from $\delta_{m,j},j=1,...,m-1$ for $m-\sum_{i=1}^{m-1}\Phi(r,m,i)\neq\Phi(r,m,j)$ . Therefore, we have $r_{m}=m$ , ${T}_{m}=m$ , $\bar{\delta}_{m,t}={\delta}_{m,t}$ , ${{r}_{m,t}}=1$ for $t=1,...,{T}_{m}$ . Via substituting these into Eq. (17) in Theorem 2, Eq. (20) can be obtained. ∎

For almost all values of $0<r<1$ , we can conduct faster evaluation and analysis for the average training length of the basic antenna-domain interleaved training scheme with Eq. (20) compared to that with Eq. (17). For large $r$ values satisfying ${r=1-\frac{6m}{3\sec^{2}\left(\frac{j\pi}{2m}\right)+2\left(m^{2}-1\right)}},\forall j=1,...,m-1$ or intermediate $r$ values satisfying ${m-\sum_{i=1}^{m-1}\Phi(r,m,i)=\Phi(r,m,j)},\forall j=1,...,m-1$ , the training length can still be calculated according to Eq. (17).

IV-B Derivations on the Conditional PDF of the Untrained Channels

On the one hand, simulation based on Eq. (17) shows that the channel correlation leads to an increase in the average training length of the basic antenna-domain interleaved training at the general rate requirement. On the other hand, if the channel correlation exists and the system knows it as a priori, we can use the correlation to improve the training efficiency. Specifically, the conditional PDF of the untrained channels can be derived for given values of the already trained channels. Based on this conditional PDF, the choice of the BS antenna for the next training step can be optimized. In this subsection, we derive the conditional PDF.

Lemma 1.

Given the channel values of the already trained BS antennas, ${h}_{m},m\in\mathbb{A}=\left\{{a}_{1},{a}_{2},...,{a}_{|\mathbb{A}|}\right\}$ , the conditional PDF of the un-trained channels ${h}_{n}|\mathbf{h}_{\mathbb{A}},n\in\mathbb{M}-\mathbb{A}$ follows $\mathcal{CN}\left(\bar{\mu}_{n},\bar{\sigma}_{n}^{2}\right)$ where

\bar{\mu}_{n}=\left[{{[\mathbf{R}_{\mathbf{h}}]}_{n,{{a}_{1}}}},{{[\mathbf{R}_{\mathbf{h}}]}_{n,{{a}_{2}}}},...,{{[\mathbf{R}_{\mathbf{h}}]}_{n,{{a}_{|\mathbb{A}|}}}}\right]{{\mathbf{R}}^{-1}_{{{\mathbf{h}}_{\mathbb{A}}}}}\mathbf{h}_{\mathbb{A}},

(23)

	$\displaystyle\bar{\sigma}_{n}^{2}=1-$	$\displaystyle\left[{{[\mathbf{R}_{\mathbf{h}}]}_{n,{{a}_{1}}}},{{[\mathbf{R}_{\mathbf{h}}]}_{n,{{a}_{2}}}},...,{{[\mathbf{R}_{\mathbf{h}}]}_{n,{{a}_{\|\mathbb{A}\|}}}}\right]{{\mathbf{R}}^{-1}_{{{\mathbf{h}}_{\mathbb{A}}}}}$		(24)
		$\displaystyle\times{{\left[{{[\mathbf{R}_{\mathbf{h}}]}_{{{a}_{1}},n}},{{[\mathbf{R}_{\mathbf{h}}]}_{{{a}_{2}},n}},...,{{[\mathbf{R}_{\mathbf{h}}]}_{{{a}_{\|\mathbb{A}\|}},n}}\right]}^{{\rm T}}},$		(24)

and $\left[{{\mathbf{R}}_{{{\mathbf{h}}_{\mathbb{A}}}}}\right]_{i,j}=[\mathbf{R}_{\mathbf{h}}]_{a_{i},a_{j}},i,j=1,\dots,\left|\mathbb{A}\right|$ . The conditional cumulative distribution function (CDF) of the power of the untrained channel $h_{n},n\in\mathbb{M}-\mathbb{A}$ is

\Pr\left({{\left|{{h}_{n}}\right|}^{2}}\leq x|\mathbf{h}_{\mathbb{A}}\right)=1-{{Q}_{1}}\left(\sqrt{2}\frac{\left|{{{\bar{\mu}}}_{n}}\right|}{{{{\bar{\sigma}}}_{n}}},\sqrt{2}\frac{\sqrt{x}}{{{{\bar{\sigma}}}_{n}}}\right).

(25)

Proof.

${{\mathbf{R}}_{{{\mathbf{h}}_{\mathbb{A}}}}}$ is the covariance matrix of the vector of the trained channels $\mathbf{h}_{\mathbb{A}}$ , which is a submatrix of the overall channel covariance matrix $\mathbf{R}_{\mathbf{h}}$ . Recall that $\mathbf{h}$ is a circular-symmetric complex Gaussian vector. Then from [35, Eq. (32)], the conditional mean in Eq. (23) and the conditional variance in Eq. (24) can be obtained. The CDF in Eq. (25) can be obtained from properties of noncentral chi-squared distribution, i.e., $\frac{\sqrt{2}}{{{{\bar{\sigma}}}_{n}}}{{h}_{n}}|\mathbf{h}_{\mathbb{A}}\sim\mathcal{C}\mathcal{N}\left(\sqrt{2}\frac{{{{\bar{\mu}}}_{n}}}{{{{\bar{\sigma}}}_{n}}},2\right)$ due to ${{h}_{n}}|\mathbf{h}_{\mathbb{A}}\sim\mathcal{C}\mathcal{N}\left({{{\bar{\mu}}}_{n}},\bar{\sigma}_{n}^{2}\right)$ . Then, we have the conditional PDF ${{\left|\frac{\sqrt{2}}{{{{\bar{\sigma}}}_{n}}}{{h}_{n}}\right|}^{2}}|\mathbf{h}_{\mathbb{A}}\sim{{\chi}^{2}}\left(2,2\frac{{{\left|{{{\bar{\mu}}}_{n}}\right|}^{2}}}{{{{\bar{\sigma}}}^{2}}_{n}}\right)$ and the conditional CDF

{{F}_{{{\left|\frac{\sqrt{2}}{{{{\bar{\sigma}}}_{n}}}{{h}_{n}}\right|}^{2}}|\mathbf{h}_{\mathbb{A}}}}\left(x\right)=1-{{Q}_{1}}\left(\sqrt{2}\frac{\left|{{{\bar{\mu}}}_{n}}\right|}{{{{\bar{\sigma}}}_{n}}},\sqrt{x}\right).

(26)

Therefore, Eq. (25) can be obtained. ∎

Recall that $\mathbb{A}=\{a_{1},...,a_{|\mathbb{A}|}\}$ denotes the set of indices of the already trained BS antennas and for simplicity of presentation, we assume that $a_{1}<a_{2}<\cdots<a_{|\mathbb{A}|}$ . If the index of an un-trained BS antenna $n$ satisfies $a_{1}<n<a_{|\mathbb{A}|}$ , we denote the index of its nearest BS antennas in the trained set $\mathbb{A}$ with a smaller index as $a_{x^{\star}}$ , that is, ${x^{\star}}=\arg\min_{a_{i}\in\mathbb{A},a_{i}<n}n-a_{i}$ . Thus $a_{x^{\star}+1}$ is the index of the trained BS antenna which is the nearest to Antenna $n$ with a larger index than $n$ . Define $x_{1}=n-a_{x^{\star}}$ and $x_{2}=a_{{x^{\star}}+1}-n$ .

Corollary 6.

Under the exponential correlation model, we have

\bar{\mu}_{n}=\begin{cases}{{\left(\rho^{*}\right)}^{{{a}_{1}}-n}}{{{h}}_{{{a}_{1}}}},\hskip 38.00008pt\text{ if }n<a_{1}\\ \frac{\left[{{\rho}^{x_{1}}}\left(1-{{r}^{2x_{2}}}\right){{{{h}}}_{{{a}_{x^{\star}}}}}+{{\left(\rho^{*}\right)}^{x_{2}}}\left(1-{{r}^{2x_{1}}}\right){{{{h}}}_{{{a}_{x^{\star}+1}}}}\right]}{1-{{r}^{2\left(x_{1}+x_{2}\right)}}},\\ \hskip 94.00008pt\text{ if }a_{1}<n<a_{|\mathbb{A}|}\\ {{\rho}^{{n-{a}_{|\mathbb{A}|}}}}{{{h}}_{{{a}_{|\mathbb{A}|}}}},\hskip 40.00006pt\text{ if }n>a_{|\mathbb{A}|}\\ \end{cases},

(27)

and

\bar{\sigma}^{2}_{n}=\begin{cases}1-{{r}^{2\left({{a}_{1}}-n\right)}},&\text{ if }n<a_{1}\\ \frac{\left(1-r^{2x_{1}}\right)\left(1-r^{2x_{2}}\right)}{1-{{r}^{2\left(x_{1}+x_{2}\right)}}},&\text{ if }a_{1}<n<a_{|\mathbb{A}|}\\ 1-{{r}^{2\left(n-{{a}_{|\mathbb{A}|}}\right)}},&\text{ if }n>a_{|\mathbb{A}|}\\ \end{cases}.

(28)

The conditional distribution of the channel of BS antenna $n\in\mathbb{M}-\mathbb{A}$ is only related to channel values of the two nearest BS antennas on both sides, $a_{{x^{\star}}}$ and $a_{{x^{\star}}+1}$ .

Proof.

See Appendix A. ∎

The results in Corollary 6 can help significantly reduce the computational complexity for the conditional CDF of the untrained channel power for scenarios with an exponential correlation model.

IV-C Modified Antenna-Domain Interleaved Training Scheme

Based on the conditional PDF of the untrained channels in Lemma 1, we propose a modified antenna-domain interleaved training scheme where at the beginning of each training step, the BS antenna whose channel is to be trained is optimally selected. The basic idea is to use the channel values of the already trained antennas to calculate the probability of meeting the transmission requirement if each untrained BS antenna is selected. Then the antenna with the highest probability is chosen.

For the selection of the first antenna $n_{0}$ to be trained, we cannot use the same approach since no channel values have been obtained. Instead, we use the conditional variance in Eq. (24). Under the assumption that all antennas have the same average power, the first antenna to be trained can be the one resulting in the minimum overall conditional variance of other antennas, e.g., $n_{0}=\arg\min_{m}\sum_{n=1,n\neq m}^{M}\bar{\sigma}^{2}_{n}$ . It can be seen from Eq. (28) that $n_{0}=\left\lfloor\frac{M+1}{2}\right\rfloor$ under the exponential correlation model.

Recall that ${\mathbb{A}_{m}}$ and ${\mathbf{h}}_{\mathbb{A}_{m}}$ are the set of indices of the $m$ BS antennas whose channels have been trained and the obtained channel vector of these BS antennas after $m$ training steps. At the beginning of the $(m+1)$ -th training step, based on the already obtained channel vector $\mathbf{h}_{\mathbb{A}_{m}}$ , the BS antenna to be trained for the $(m+1)$ -th step is selected by the BS as follows. For each untrained BS antenna $n\in\mathbb{M}-\mathbb{A}_{m}$ , the BS calculates the probability that the obtainment of the channel of BS antenna $n$ in the $(m+1)$ -th training step can meet the transmission requirements as follows:

	$\displaystyle\Pr$	$\displaystyle\left({{\left\|{{h}_{n}}\right\|}^{2}}\geq{{\alpha}_{\rm th}}-{{\left\\|{{\mathbf{h}}_{\mathbb{A}_{m}}}\right\\|}^{2}}\right)$		(29)
		$\displaystyle={{Q}_{1}}\left(\sqrt{2}\frac{\left\|{{{\bar{\mu}}}_{n}}\right\|}{{{{\bar{\sigma}}}_{n}}},\sqrt{2}\frac{\sqrt{\alpha_{\rm th}-{{\left\\|{{\mathbf{h}}_{\mathbb{A}_{m}}}\right\\|}^{2}}}}{{{{\bar{\sigma}}}_{n}}}\right).$		(29)

Then, the BS selects the one with the highest probability among all untrained antennas, i.e., the index of the BS antenna for the $m+1$ training step is

n^{\star}=\arg\max_{n\in\mathbb{M}-\mathbb{A}_{m}}{\Pr}\left({{\left|{{h}_{n}}\right|}^{2}}\geq{{\alpha}_{\rm th}}-{{\left\|{{\mathbf{h}}_{\mathbb{A}_{m}}}\right\|}^{2}}\right).

(30)

The proposed modified antenna-domain interleaved training scheme is presented in Algorithm 2, where the major difference to the basic scheme is in Step 4 on the antenna selection.

Algorithm 2 Modified Antenna-Domain Interleaved Training Scheme

1:Initialization:

n^{\star}=n_{0}

;

\mathbb{A}_{1}=\left\{n^{\star}\right\}

;

m=1

; BS sends a pilot for UE to acquire

{h}_{n^{\star}}

;

2:While

{{\left\|{{\mathbf{h}}_{\mathbb{A}_{m}}}\right\|}^{2}}<\alpha_{\rm th}\And m<M

3: The UE sends one bit

0

and

h_{n^{\star}}

to the BS;

4: The BS calculates the probability value for each

n\in\mathbb{M}-\mathbb{A}_{m}

according to Eq. (29) and then decides the index of next training antenna

n^{\star}

according to Eq. (30);

5: The BS sends a pilot for the UE to acquire

{h}_{n^{\star}}

;

m=m+1

;

\mathbb{A}_{m}=\left\{\mathbb{A}_{m-1},n^{\star}\right\}

;

7:end

8:if

\left\|\mathbf{h}_{\mathbb{A}_{m}}\right\|^{2}\geq\alpha_{\rm th}

9: The UE feeds back one bit

1

\&

h_{n^{\star}}

to the BS;

10: The BS conducts downlink precoding according to Eq. (7);

11:else

12: The UE feeds back one bit

0

to the BS;

13:end

IV-C1 Complexity Analysis

The complexity of Algorithm 2 is mainly generated by Step $4$ in the loop from Step $2$ to Step $7$ . Denote the training length for a random channel realization as $N$ satisfying $1\leq N\leq M$ . For the $(m+1)$ -th training step, where $m<N$ , the number of operations needed for calculating ${{\mathbf{R}}^{-1}_{{{\mathbf{h}}_{\mathbb{A}}}}}$ in Eq. (23) and Eq. (24) scales as $m^{3}$ , and the number of operations needed for the remaining matrix multiplications in Eq. (23) and Eq. (24) scales as $(M-m)m^{2}$ . Therefore, the complexity for calculating the conditional mean and variance for a training process with $N$ steps is $O(N^{4}+MN^{3})$ and an upper bound on the complexity of Algorithm 2 is $O(M^{4})$ since $N\leq M$ . For channels with the exponential covariance matrix, since the conditional mean and variance can be calculated according to Eq. (27) and Eq. (28) without matrix inversion and matrix multiplication, an upper bound of the algorithm complexity is $O\left(M^{2}\right)$ .

IV-C2 Average Training Length

Similar to the derivation of Eq. (8), the average training length of the modified antenna-domain interleaved training scheme can be expressed as

L_{t}=1+\sum\limits_{m=1}^{M-1}{\Pr\left({{\left\|{\mathbf{h}}_{\mathbb{A}_{m}}\right\|}^{2}}<\alpha_{\rm th}\right)}.

(31)

To derive the analytical or even closed-form expression of $L_{t}$ in Eq. (31), the key is to calculate ${\Pr}({{\left\|{{\mathbf{h}}_{\mathbb{A}_{m}}}\right\|}^{2}}<\alpha_{\rm th})$ . From Step $4$ in Algorithm 2 and Eq. (29) we know that one should first calculate the conditional mean $\bar{\mu}_{n}$ and conditional variance $\bar{\sigma}^{2}_{n}$ for $n\in\mathbb{M}-{\mathbb{A}_{m}}$ based on both ${\mathbb{A}_{m}}$ and ${\mathbf{h}}_{\mathbb{A}_{m}}$ to decide the antenna index $n^{\star}$ for the $(m+1)$ -th training step. This makes the derivation of the PDF of ${\left\|{\mathbf{h}}_{\mathbb{A}_{m+1}}\right\|}^{2}$ challenging because ${\mathbb{A}_{m}}$ changes for each channel realization. In addition, the ${{Q}_{1}}\left(a,b\right)$ function involves a two-fold infinite series summation, resulting in an implicit relationship between $n^{\star}$ and ${\mathbb{A}_{m}}$ , ${\mathbf{h}}_{\mathbb{A}_{m}}$ . These all make the derivation of an analytical expression of $L_{t}$ in Eq. (31) intractable.

To circumvent the above difficulties, we introduce the deep neural network (DNN) $L_{t}=f\left(M,\mathbf{R}_{\mathbf{h}},\alpha_{\rm th};\boldsymbol{\Theta}\right)$ with $\boldsymbol{\Theta}$ being the network parameter matrix to model the function of $L_{t}$ with respective to system parameters, e.g., the BS antenna number $M$ , the channel covariance matrix $\mathbf{R}_{\mathbf{h}}$ and the normalized SNR threshold $\alpha_{\rm th}=\left({2^{R_{\rm th}}-1}\right)/{P}$ . For channels with the exponential covariance matrix, one can use the correlation coefficient $\rho$ to replace the input parameter $\mathbf{R}_{\mathbf{h}}$ . The latter simulation results show that the function $f$ can be well-fitted by a DNN model with fully connected hidden layers. This deep learning-based approximation of the average training length can provide a faster performance evaluation of the modified antenna-domain interleaved training scheme compared to the Monte Carlo simulation.

V Simulation and Discussion

In this section, simulation results are shown for the proposed modified beam-domain and antenna-domain interleaved training schemes and their comparison with existing baseline schemes. The exponential correlation model in Eq. (2) is considered in Sections V-A to V-D. The one-ring correlation model in Eq. (1) is considered in Section V-E.

V-A Beam-Domain Interleaved Training Under the Exponential Correlation Model

Fig. 1 shows the average training lengths of the basic and modified beam-domain interleaved training schemes under the exponential correlation model, including the simulation values, the theoretical values in Eq. (9) of Theorem 1 and Eq. (13) of Corollary 1, and the approximate values in Eq. (14) of Corollary 2. We can see from Fig. 1a that the curves of simulation values and theoretical values match well for different scenarios. The curves of approximate values for $M=32,64$ and $\rho=0.8$ in Fig. 1b have some gap with the simulation curves, while the gap for the case of $M=64$ is relatively small. This is because that Eq. (15) is a large- $M$ eigenvalue approximation. These observations verify our derivations in Section III-B.

Refer to caption — (a) Basic interleaved training scheme

V-B Comparison of Basic and Modified Beam-Domain Interleaved Training Under the Exponential Correlation Model

Fig. 2 shows the average training lengths of the basic and modified beam-domain interleaved training with different antenna number $M$ for $\rho=0.8$ and $0.4$ . It can be seen that the modified scheme outperforms the basic scheme in the average training length for three combinations of $R_{\rm th}$ and $P$ , i.e., 1) $R_{\rm th}=5$ bit/s/Hz, $P=0$ dB and $\alpha_{\rm th}=31$ ; 2) $R_{\rm th}=4$ bit/s/Hz, $P=-2$ dB and $\alpha_{\rm th}=23.77$ ; 3) $R_{\rm th}=3$ bit/s/Hz, $P=-3$ dB and $\alpha_{\rm th}=13.97$ . And the advantage becomes larger as $\alpha_{\rm th}$ increases from $\alpha_{\rm th}=13.97$ to $\alpha_{\rm th}=31$ , showing that the modified beam-domain scheme exhibits greater performance advantages under more stringent transmission requirements. In addition, the average training lengths of both schemes first increase and then decrease as $M$ increases. For large enough $M$ , the training length levels off as $M$ increases. These results are consistent with the description in Corollary 3 and Remark 1.

Fig. 3 shows the average training lengths of the basic and modified beam-domain interleaved training schemes with different channel correlation levels for $M=32$ , $R_{\rm th}=3$ bit/s/Hz and $P=-6,-4,0$ dB. Note that $\alpha_{\rm th}=27.87,17.58,7$ respectively. The basic scheme uses the DFT codebook. The shorter average training length of the modified scheme can also be observed in the figure, especially for relatively low transmit power, e.g., $P=-4,-6$ dB. And with increasing $\rho$ , the performance advantage of the modified scheme over the basic scheme enlarges for $P=0$ dB, while for $P=-4,-6$ dB, it first increases for $\rho\leq 0.7$ and then decreases for $\rho>0.7$ .

V-C Antenna-Domain Interleaved Training Under the Exponential Correlation Model

Fig. 4 shows the average training lengths of the basic and modified antenna-domain interleaved training schemes under the exponential correlation model for $M=32$ , $R_{\rm th}=2,3$ bit/s/Hz and $\rho=0$ , i.e., i.i.d. channels, $0.4$ , and $0.8$ . Fig. 4a shows the simulation values, the theoretical values provided by Corollary 2, i.e., Eq. (9), and the approximate values of the basic scheme in Eq. (20) of Corollary 5. We can see from the figure that all three curves match well for different scenarios. This verifies the results in Section IV-A. For $R_{\rm th}=2$ bit/s/Hz, the average training length of the basic scheme increases when $\rho$ increases from $0.4$ to $0.8$ for $P>-9$ dB, i.e., $\alpha_{\rm th}<23.83$ . For smaller $P$ and larger $\alpha_{\rm th}$ , the increase of $\rho$ , on the contrary, leads to a decrease of average training length for the basic scheme.

In Fig. 4b, we use a fully connected DNN containing four hidden layers (each with 4, 8, 16 and 32 Relu neurons) to provide an approximate average training length of the modified antenna-domain interleaved training scheme. The function for training the DNN model is the trainlm based the Levenberg-Marquardt algorithm, which has the fastest convergence speed for medium-sized DNN. The loss is the mean-square error (MSE). The dataset has $3173$ samples of $<\rho,\alpha_{\rm th},L_{t}>$ . The ratio of the training set, validation set, and test set is 0.7:0.15:0.15. As shown in the figure, the designed DNN fits the function of training overhead well and its prediction of $L_{t}$ for these unseen combinations of $\rho$ , $\alpha_{\rm th}$ and $P$ matches with the simulation results.

V-D Comparison of Basic and Modified Antenna-Domain Interleaved Training Under the Exponential Correlation Model

Fig. 5 shows the average training lengths of the basic and modified antenna-domain interleaved training schemes with different BS antenna number $M$ for $\rho=0.8$ and $0.4$ . It can be seen that the modified scheme outperforms the basic scheme in the average training length for three combinations of $R_{\rm th}$ and $P$ , i.e., 1) $R_{\rm th}=5$ bit/s/Hz, $P=0$ dB and $\alpha_{\rm th}=31$ ; 2) $R_{\rm th}=4$ bit/s/Hz, $P=-2$ dB and $\alpha_{\rm th}=23.77$ ; 3) $R_{\rm th}=3$ bit/s/Hz, $P=-3$ dB and $\alpha_{\rm th}=13.97$ . And the advantage becomes larger as $\alpha_{\rm th}$ increases from $\alpha_{\rm th}=13.97$ to $\alpha_{\rm th}=31$ . In addition, with increasing $M$ , the average training length of the basic scheme increases and converges, with faster convergence and a smaller value for $\rho=0.4$ compared to those for $\rho=0.8$ . However, the average training length of the modified scheme first increases and then decreases and finally levels off with increasing $M$ . This is because as $M$ increases, there are more untrained antennas available after each interleaved training step, which increases the diversity of untrained antennas’ conditional distributions. Furthermore, the performance advantage of the modified scheme over the basic scheme first increases and then levels off as $M$ increases.

Fig. 6 shows the average training lengths of the basic and modified antenna-domain interleaved training schemes with different channel correlation levels for $M=32$ , $R_{\rm th}=3$ bit/s/Hz and $P=-6,-4,0$ dB. We can see from the figure that the modified scheme has a shorter average training length than the basic scheme. As the correlation coefficient $\rho$ increases, the average training length of the basic scheme increases for $P=0$ dB and $-4$ dB, but decreases for $P=-6$ dB. On the contrary, with increasing $\rho$ , the average training length of the modified scheme for $P=0,-4,-6$ dB first decreases for $\rho\leq 0.8$ and then increases for larger $\rho>0.8$ .

V-E Antenna and Beam-Domain Interleaved Training Under the One-Ring Correlated Channels

In this section, simulations are given to demonstrate the applicability of partial analytical results to the one-ring correlation model in Eq. (1). The uniform PAS model [36], i.e., $f\left(\theta\right)=\frac{1}{2\Delta\theta},\bar{\theta}-\Delta\theta\leq\theta\leq\bar{\theta}+\Delta\theta$ , is considered where $\bar{\theta}$ denotes the mean angular of departure (AoD) and the angular spread (AS) is $\sigma_{A}=\frac{\Delta\theta}{\sqrt{3}}$ .

Fig. 7 shows the average training lengths of the basic and modified beam-domain interleaved training schemes with different transmit powers $P\in[-5,5]$ dB under the one-ring correlated channel model for $M=32$ , $R_{\rm th}=3$ bit/s/Hz, $D=0.5$ , $\bar{\theta}=45^{\circ}$ and $\sigma_{A}=5^{\circ},10^{\circ},20^{\circ}$ . The theoretical values of the average training length in Eq. (9) of Theorem 1 and Eq. (13) of Corollary 1 match the simulation values well. And the modified scheme under three different ASs has an obvious performance advantage over the basic scheme. With decreasing AS or increasing channel correlation, this performance advantage enlarges for relatively high transmit power, e.g., $P=5$ dB, while decreases for low transmit power, e.g., $P=-5$ dB.

Fig. 8 shows the average training lengths of the basic and modified antenna-domain interleaved training schemes with different transmit powers $P\in[-5,5]$ dB under the one-ring correlated channel model for $M=32$ , $R_{\rm th}=3$ bit/s/Hz, $D=0.5$ , $\bar{\theta}=45^{\circ}$ and $\sigma_{A}=5^{\circ},10^{\circ},20^{\circ}$ . The modified scheme in the antenna-domain under three different ASs also outperforms the basic scheme, and the theoretical average training length in Eq. (9) matches the simulation value well.

VI Conclusion

In this paper, the channel spatial correlation was explored to improve the interleaved training for single-user massive MIMO downlink. Via optimizing the beam training codebook and the antenna training sequence based on the channel correlation, we respectively proposed the modified beam-domain and antenna-domain interleaved training schemes. For exponentially correlated channels, the conditional channel distribution of an untrained BS antenna given channel values of the already trained BS antennas was demonstrated to be only dependent on the channels of its nearest antennas on both sides in the already-trained antenna set, simplifying the complexity of the modiﬁed antenna-domain scheme significantly. Exact and approximate closed-form expressions were derived for the basic and modified beam/antenna-domain schemes in correlated channels. The impact of system parameters, e.g., the channel correlation, the antenna number, and the SNR requirement, on the average training length was explicitly revealed. Simulations verified our derivations and demonstrated the performance advantage of our proposed modified schemes.

In addition to spatial correlation, channel temporal correlation can be exploited to improve the channel acquisition efficiency in massive MIMO systems. Unlike spatial correlation, temporal correlation has causality constraints in the time dimension and we can only use the historical training result to extrapolate. However, the historical result is not complete due to the characteristics of the interleaved scheme. How to explore the temporal correlation in interleaved training is worth to be further studied.

Appendix A Appendix

A-A Proof of Corollary 6

Define $\mathbf{m}=\left[\rho^{n-a_{1}},...,\rho^{n-a_{x^{\star}}},\left(\rho^{*}\right)^{a_{x^{\star}+1}-n},...,\left(\rho^{*}\right)^{a_{|\mathbb{A}|}-n}\right]$ $\times{{\mathbf{R}}^{-1}_{{{\mathbf{h}}_{\mathbb{A}}}}}=\left[m_{1},\dots,m_{\left|\mathbb{A}\right|}\right].$ We denote ${{\mathbf{R}}_{{{\mathbf{h}}_{\mathbb{A}}}}}$ as $\mathbf{R}$ for simplicity of presentation. Then we have ${{{m}}_{i}}=\frac{{\rm det}\left({{\mathbf{R}}_{i}}\right)}{{\rm det}\left(\mathbf{R}\right)},i\in\mathbb{I}=\{1,...,|\mathbb{A}|\},$ where $\mathbf{R}_{i}$ is $\mathbf{R}$ by replacing its $i$ -th row with $\left[\rho^{n-a_{1}},...,\rho^{n-a_{x^{\star}}},\left(\rho^{*}\right)^{a_{x^{\star}+1}-n},...,\left(\rho^{*}\right)^{a_{|\mathbb{A}|}-n}\right]$ . Recall that $\mathbf{R}^{-1}=\frac{{\rm adj}\left(\mathbf{R}\right)}{\det\left(\mathbf{R}\right)}$ where ${\rm adj}\left(\mathbf{R}\right)$ is the adjugate matrix of $\mathbf{R}$ , i.e., $\left[{\rm adj}\left(\mathbf{R}\right)\right]_{u,v}=R_{v,u},\forall u,v=1,\dots,\left|\mathbb{A}\right|$ with $R_{u,v}$ being the algebraic cofactor of $[\mathbf{R}]_{u,v}$ . Therefore, we have ${{{m}}_{i}}=\frac{\sum\limits_{j=1}^{x^{\star}}{{{\rho}^{{n}-{{a}_{j}}}}}{{{R}}_{i,j}}+\sum\limits_{j=x^{\star}+1}^{|\mathbb{A}|}{{{\left(\rho^{*}\right)}^{{{a}_{j}}-{n}}}}{{{R}}_{i,j}}}{{\rm det}\left(\mathbf{R}\right)}=\frac{{\rm det}\left({{\mathbf{R}}_{i}}\right)}{{\rm det}\left(\mathbf{R}\right)}$ .

Here we prove that $m_{x^{\star}}$ and $m_{x^{\star}+1}$ are the only two non-zero elements in $\mathbf{m}$ , equivalently ${\rm det}\left(\mathbf{R}_{i}\right)=0$ when $i\in\mathbb{I}-\{x^{\star},x^{\star}+1\}$ and ${\rm det}\left(\mathbf{R}_{i}\right)\neq 0$ when $i\in\{x^{\star},x^{\star}+1\}$ . For the first part, it is suffice to show that the row vectors of $\mathbf{R}_{i}$ are linearly dependent when $i\in\mathbb{I}-\{x^{\star},x^{\star}+1\}$ and we show this by construction. Let $c_{x^{\star}}={{\rho}^{n-{{a}_{x^{\star}}}}}\frac{1-{{\left(\rho^{*}\rho\right)}^{{{a}_{x^{\star}+1}}-n}}}{1-{{\left(\rho^{*}\rho\right)}^{{{a}_{x^{\star}+1}}-{{a}_{x^{\star}}}}}},c_{x^{\star}+1}={{\left(\rho^{*}\right)}^{{{a}_{x^{\star}+1}}-n}}\frac{1-{{\left(\rho^{*}\rho\right)}^{n-{{a}_{x^{\star}}}}}}{1-{{\left(\rho^{*}\rho\right)}^{{{a}_{x^{\star}+1}}-{{a}_{x^{\star}}}}}},c_{i}=-1,c_{j}=0\text{ for }j\notin\{x^{\star},x^{\star}+1,i\}$ , we have via straightforward calculations that $\sum_{j=1}^{|\mathbb{A}|}{c_{j}[\mathbf{R}_{i}]_{j,[1:|\mathbb{A}|]}}=c_{x^{\star}}[\mathbf{R}_{i}]_{x^{\star},[1:|\mathbb{A}|]}+c_{x^{\star}+1}[\mathbf{R}_{i}]_{x^{\star}+1,[1:|\mathbb{A}|]}-[\mathbf{R}_{i}]_{i,[1:|\mathbb{A}|]}=\mathbf{0}$ .

Next we prove that ${{{m}}_{i}}\neq 0$ when $i\in\left\{x^{\star},x^{\star}+1\right\}$ . Define $\mathbf{A}_{j}=\left[\mathbf{R}\right]_{[j:\left|\mathbb{A}\right|],\{1\}\cup[(1+j):\left|\mathbb{A}\right|]}$ for $j\in\left\{1,\ldots,|\mathbb{A}|-1\right\}$ . Via splitting the $(1,2)$ -th element in $\mathbf{A}_{j}$ , i.e., the $(j,j+1)$ -th element in $\mathbf{R}$ , ${{\left(\rho^{*}\right)}^{{{a}_{j+1}}-{{a}_{j}}}}$ into $\left({{\left(\rho^{*}\right)}^{{{a}_{j+1}}-{{a}_{j}}}}-{{\rho}^{{{a}_{j}}-{{a}_{j+1}}}}\right)+{{\rho}^{{{a}_{j}}-{{a}_{j+1}}}}$ , we can split the ${\rm det}\left(\mathbf{A}_{j}\right)$ into the sum of two determinants and obtain the recurrence formula via expanding the first determinant in Eq. (32) by the second column, i.e.,

$\displaystyle{\rm det}$	$\displaystyle\left(\mathbf{A}_{j}\right)=$	(32)
	$\displaystyle\left\|\begin{matrix}{{\rho}^{{{a}_{j}}-{{a}_{1}}}}&{{\left(\rho^{}\right)}^{{{a}_{j+1}}-{{a}_{j}}}}-{{\rho}^{{{a}_{j}}-{{a}_{j+1}}}}&\cdots&{{\left(\rho^{}\right)}^{{{a}_{\|\mathbb{A}\|}}-{{a}_{j}}}}\\ {{\rho}^{{{a}_{j+1}}-{{a}_{1}}}}&0&\cdots&{{\left(\rho^{*}\right)}^{{{a}_{\|\mathbb{A}\|}}-{{a}_{j+1}}}}\\ \vdots&\vdots&\ddots&\vdots\\ {{\rho}^{{{a}_{\|\mathbb{A}\|}}-{{a}_{1}}}}&0&\cdots&1\\ \end{matrix}\right\|$
$\displaystyle+$	$\displaystyle\left\|\begin{matrix}{{\rho}^{{{a}_{j}}-{{a}_{1}}}}&{{\rho}^{{{a}_{j}}-{{a}_{j+1}}}}&\cdots&{{\left(\rho^{}\right)}^{{{a}_{\|\mathbb{A}\|}}-{{a}_{j}}}}\\ {{\rho}^{{{a}_{j+1}}-{{a}_{1}}}}&1&\cdots&{{\left(\rho^{}\right)}^{{{a}_{\|\mathbb{A}\|}}-{{a}_{j+1}}}}\\ \vdots&\vdots&\ddots&\vdots\\ {{\rho}^{{{a}_{\|\mathbb{A}\|}}-{{a}_{1}}}}&{{\rho}^{{{a}_{\|\mathbb{A}\|}}-{{a}_{j+1}}}}&\cdots&1\\ \end{matrix}\right\|$
$\displaystyle=$	$\displaystyle-\left({{\left(\rho^{*}\right)}^{{{a}_{j+1}}-{{a}_{j}}}}-{{\rho}^{{{a}_{j}}-{{a}_{j+1}}}}\right){\rm det}\left(\mathbf{A}_{j+1}\right).$

Note that the second determinant in Eq. (32) is zero since the first column of its matrix is ${{\rho}^{{{a}_{j+1}}-{{a}_{1}}}}$ times the second column. Then we calculate ${\rm det}\left(\mathbf{R}\right)={\rm det}\left(\mathbf{A}_{1}\right)$ as follows:

	$\displaystyle{\rm det}\left(\mathbf{R}\right)$	$\displaystyle={\left(-1\right)}^{\left\|\mathbb{A}\right\|-2}{\rm det}\left(\mathbf{A}_{\left\|\mathbb{A}\right\|-1}\right)\prod\limits_{j=1}^{\left\|\mathbb{A}\right\|-2}{{{\left(\rho^{*}\right)}^{{{a}_{j+1}}-{{a}_{j}}}}-{{\rho}^{{{a}_{j}}-{{a}_{j+1}}}}}$		(33)
		$\displaystyle=\prod\limits_{j=1}^{\|\mathbb{A}\|-1}{\left[1-{{\left(\rho^{*}\rho\right)}^{{{a}_{j+1}}-{{a}_{j}}}}\right]}.$		(33)

Similar procedure can be used to calculate ${\rm det}\left({{\mathbf{R}}_{x^{\star}}}\right)$ . The difference is that the $(x^{*},x^{*}+1)$ -th element ${{\left(\rho^{*}\right)}^{{{a}_{{x^{\star}}+1}}-{n}}}$ in ${{\mathbf{R}}_{x^{\star}}}$ is split into $\left({{\left(\rho^{*}\right)}^{{{a}_{{x^{\star}}+1}}-{n}}}-{{\rho}^{{n}-{{a}_{{x^{\star}}+1}}}}\right)+{{\rho}^{{n}-{{a}_{{x^{\star}}+1}}}}$ . Then we can obtain

	$\displaystyle{{{m}}_{x^{\star}}}$	$\displaystyle=\frac{{\rm det}\left({{\mathbf{R}}_{x^{\star}}}\right)}{{\rm det}\left(\mathbf{R}\right)}=\frac{{{\left(\rho^{}\right)}^{{{a}_{x^{\star}+1}}-n}}-{{\rho}^{n-{{a}_{x^{\star}+1}}}}}{{{\left(\rho^{}\right)}^{{{a}_{x^{\star}+1}}-{{a}_{x^{\star}}}}}-{{\rho}^{{{a}_{x^{\star}}}-{{a}_{x^{\star}+1}}}}}$		(34)
		$\displaystyle={{\rho}^{n-{{a}_{x^{\star}}}}}\frac{1-{{\left(\rho^{}\rho\right)}^{{{a}_{x^{\star}+1}}-n}}}{1-{{\left(\rho^{}\rho\right)}^{{{a}_{x^{\star}+1}}-{{a}_{x^{\star}}}}}}\neq 0,$		(34)

for $\left|\rho\right|<1$ and $\left|\rho\right|\neq 0$ . $\det\left(\mathbf{R}\right)$ can also be calculated by splitting the $\left(j+1,j\right)$ -th element ${{\rho}^{{{a}_{j+1}}-{{a}_{j}}}}$ into $\left({{\rho}^{{{a}_{j+1}}-{{a}_{j}}}}-{{\left(\rho^{*}\right)}^{{{a}_{j}}-{{a}_{j+1}}}}\right)+{{\left(\rho^{*}\right)}^{{{a}_{j}}-{{a}_{j+1}}}}$ . Then we split the determinant by the $\left(j+1\right)$ -th row into the sum of two determinants and leave the rest of the rows unchanged. Similarly, we calculate ${\rm det}\left({{\mathbf{R}}_{x^{\star}+1}}\right)$ in this way. The difference is that the $\left({x^{\star}+1},{x^{\star}}\right)$ -th element ${{\rho}^{{{n}-{a}_{{x^{\star}}}}}}$ in ${{\mathbf{R}}_{x^{\star}+1}}$ is split into $\left({{\rho}^{{n}-{{a}_{{x^{\star}}}}}}-{{\left(\rho^{*}\right)}^{{{{a}_{{x^{\star}}}}}-n}}\right)+{{\left(\rho^{*}\right)}^{{{{a}_{{x^{\star}}}}}-n}}$ . Then we obtain

	$\displaystyle{{{m}}_{x^{\star}+1}}$	$\displaystyle=\frac{{\rm det}\left({{\mathbf{R}}_{x^{\star}+1}}\right)}{{\rm det}\left(\mathbf{R}\right)}=\frac{{{\rho}^{n-{{a}_{x^{\star}}}}}-{{\left(\rho^{}\right)}^{{{a}_{x^{\star}}}-n}}}{{{\rho}^{{{a}_{x^{\star}+1}}-{{a}_{x}}}}-{{\left(\rho^{}\right)}^{{{a}_{x^{\star}}}-{{a}_{x^{\star}+1}}}}}$		(35)
		$\displaystyle={{\left(\rho^{}\right)}^{{{a}_{x^{\star}+1}}-n}}\frac{1-{{\left(\rho^{}\rho\right)}^{n-{{a}_{x^{\star}}}}}}{1-{{\left(\rho^{*}\rho\right)}^{{{a}_{x^{\star}+1}}-{{a}_{x^{\star}}}}}}\neq 0.$		(35)

References

[1] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,” IEEE Trans. Wireless Commun., vol. 9, no. 11, pp. 3590–3600, 2010.
[2] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Energy and spectral efficiency of very large multiuser MIMO systems,” IEEE Trans. Commun., vol. 61, no. 4, pp. 1436–1449, 2013.
[3] C. Zhang, Y. Huang, Y. Jing, S. Jin, and L. Yang, “Sum-rate analysis for massive MIMO downlink with joint statistical beamforming and user scheduling,” IEEE Trans. Wireless Commun., vol. 16, no. 4, pp. 2181–2194, 2017.
[4] M. Biguesh and A. B. Gershman, “Training-based MIMO channel estimation: A study of estimator tradeoffs and optimal training signals,” IEEE Trans. Signal Process., vol. 54, no. 3, pp. 884–893, 2006.
[5] A. Adhikary, J. Nam, J.-Y. Ahn, and G. Caire, “Joint spatial division and multiplexing—The large-scale array regime,” IEEE Trans. Inf. Theory, vol. 59, no. 10, pp. 6441–6463, 2013.
[6] J. Choi, D. J. Love, and P. Bidigare, “Downlink training techniques for FDD massive MIMO systems: Open-loop and closed-loop training with memory,” IEEE J. Sel. Topics Signal Process., vol. 8, no. 5, pp. 802–814, 2014.
[7] W. Shen, L. Dai, B. Shim, S. Mumtaz, and Z. Wang, “Joint CSIT acquisition based on low-rank matrix completion for FDD massive MIMO systems,” IEEE Commun. Lett., vol. 19, no. 12, pp. 2178–2181, 2015.
[8] Y.-X. Zhang, A.-A. Lu, and X. Gao, “Sum-rate-optimal statistical precoding for FDD massive MIMO downlink with deterministic equivalents,” IEEE Trans. Veh. Technol., vol. 71, no. 7, pp. 7359–7370, 2022.
[9] X. Rao and V. K. Lau, “Distributed compressive CSIT estimation and feedback for FDD multi-user massive MIMO systems,” IEEE Trans. Signal Process., vol. 62, no. 12, pp. 3261–3271, 2014.
[10] Z. Gao, L. Dai, Z. Wang, and S. Chen, “Spatially common sparsity based adaptive channel estimation and feedback for FDD massive MIMO,” IEEE Trans. Signal Process., vol. 63, no. 23, pp. 6169–6183, 2015.
[11] W. Shen, L. Dai, Y. Shi, B. Shim, and Z. Wang, “Joint channel training and feedback for FDD massive MIMO systems,” IEEE Trans. Veh. Technol., vol. 65, no. 10, pp. 8762–8767, 2015.
[12] K. Venugopal, A. Alkhateeb, N. G. Prelcic, and R. W. Heath Jr., “Channel estimation for hybrid architecture-based wideband millimeter wave systems,” IEEE J. Sel. Areas Commun., vol. 35, no. 9, pp. 1996–2009, 2017.
[13] P. N. Alevizos, X. Fu, N. D. Sidiropoulos, Y. Yang, and A. Bletsas, “Limited feedback channel estimation in massive MIMO with non-uniform directional dictionaries,” IEEE Trans. Signal Process., vol. 66, no. 19, pp. 5127–5141, 2018.
[14] M. Ke, Z. Gao, Y. Wu, X. Gao, and R. Schober, “Compressive sensing-based adaptive active user detection and channel estimation: Massive access meets massive MIMO,” IEEE Trans. Signal Process., vol. 68, pp. 764–779, 2020.
[15] Y. Han, T.-H. Hsu, C.-K. Wen, K.-K. Wong, and S. Jin, “Efficient downlink channel reconstruction for FDD multi-antenna systems,” IEEE Trans. Wireless Commun., vol. 18, no. 6, pp. 3161–3176, 2019.
[16] W. Peng, W. Li, W. Wang, X. Wei, and T. Jiang, “Downlink channel prediction for time-varying FDD massive MIMO systems,” IEEE J. Sel. Topics Signal Process., vol. 13, no. 5, pp. 1090–1102, 2019.
[17] S. Kim, J. W. Choi, and B. Shim, “Downlink pilot precoding and compressed channel feedback for FDD-based cell-free systems,” IEEE Trans. Wireless Commun., vol. 19, no. 6, pp. 3658–3672, 2020.
[18] Y. Yang, F. Gao, G. Y. Li, and M. Jian, “Deep learning-based downlink channel prediction for FDD massive MIMO system,” IEEE Commun. Lett., vol. 23, no. 11, pp. 1994–1998, 2019.
[19] E. Koyuncu and H. Jafarkhani, “Interleaving training and limited feedback for point-to-point massive multiple-antenna systems,” in Proc. IEEE Int. Symp. Inf. Theor., Hong Kong, China, Jun. 2015, pp. 1242–1246.
[20] E. Koyuncu, X. Zou, and H. Jafarkhani, “Interleaving channel estimation and limited feedback for point-to-point systems with a large number of transmit antennas,” IEEE Trans. Wireless Commun., vol. 17, no. 10, pp. 6762–6774, 2018.
[21] C. Zhang, Y. Jing, Y. Huang, and L. Yang, “Interleaved training and training-based transmission design for hybrid massive antenna downlink,” IEEE J. Sel. Topics Signal Process., vol. 12, no. 3, pp. 541–556, 2018.
[22] W. He, C. Zhang, and Y. Huang, “Interleaved training codebook design for millimeter-wave communication system,” in Proc. IEEE/CIC Int. Conf. Commun., Beijing, China, Aug. 2018, pp. 6–10.
[23] C. Zhang, Y. Jing, Y. Huang, and X. You, “Interleaved training for intelligent surface-assisted wireless communications,” IEEE Signal Process Lett., vol. 27, pp. 1774–1778, 2020.
[24] Y. Jing, S. ShahbazPanahi, and X. Yu, “SINR-based interleaved training design for multi-user massive MIMO downlink with MRT,” in Proc. IEEE Int. Conf. Commun., Seoul, South Korea, May 2022, pp. 237–242.
[25] Y. Jing, X. Yu, and S. ShahbazPanahi, “Interleaved training scheme for multi-user massive MIMO downlink with user SINR constraint,” IEEE Trans. Commun., 2023.
[26] L. You, X. Gao, X.-G. Xia, N. Ma, and Y. Peng, “Pilot reuse for massive MIMO transmission over spatially correlated Rayleigh fading channels,” IEEE Trans. Wireless Commun., vol. 14, no. 6, pp. 3352–3366, 2015.
[27] S. L. Loyka, “Channel capacity of MIMO architecture using the exponential correlation matrix,” IEEE Commun. Lett., vol. 5, no. 9, pp. 369–371, 2001.
[28] A. Decurninge, M. Guillaud, and D. T. Slock, “Channel covariance estimation in massive MIMO frequency division duplex systems,” in Proc. IEEE Globecom Workshops, San Diego, CA, USA, Dec. 2015, pp. 1–6.
[29] D. Neumann, M. Joham, and W. Utschick, “Covariance matrix estimation in massive MIMO,” IEEE Signal Process Lett., vol. 25, no. 6, pp. 863–867, 2018.
[30] K. Li, Y. Li, L. Cheng, Q. Shi, and Z.-Q. Luo, “Downlink channel covariance matrix reconstruction for FDD massive MIMO systems with limited feedback,” arXiv preprint arXiv:2204.00779, 2022.
[31] U. Grenander and G. Szegö, Toeplitz forms and their applications. California, U.S.A.: California Univ. Press, 1958.
[32] E. Björnson, D. Hammarwall, and B. Ottersten, “Exploiting quantized channel norm feedback through conditional statistics in arbitrarily correlated MIMO systems,” IEEE Trans. Signal Process., vol. 57, no. 10, pp. 4027–4041, 2009.
[33] R. K. Mallik, “The exponential correlation matrix: Eigen-analysis and applications,” IEEE Trans. Wireless Commun., vol. 17, no. 7, pp. 4690–4705, 2018.
[34] S.-G. Hwang, “Cauchy’s interlace theorem for eigenvalues of Hermitian matrices,” Amer. Math. Monthly, vol. 111, no. 2, pp. 157–159, 2004.
[35] B. Picinbono, “Second-order complex random vectors and normal distributions,” IEEE Trans. Signal Process., vol. 44, no. 10, pp. 2637–2640, 1996.
[36] L. Schumacher, K. I. Pedersen, and P. E. Mogensen, “From antenna spacings to theoretical capacities-guidelines for simulating MIMO systems,” in Proc. IEEE Int. Symp. Person. Indoor Mobile Radio Commun., vol. 2, Lisboa, Portugal, Sep. 2002, pp. 587–592.

$\displaystyle L_{t}$	$\displaystyle=\Pr\left({{\left\|{{\bar{\mathbf{h}}}_{\mathbb{A}_{1}}}\right\|}^{2}}\geq\alpha_{\rm th}\right)+$	(8)
	$\displaystyle\sum_{b=2}^{B-1}b\left[\Pr\left({{\left\\|{\bar{\mathbf{h}}_{\mathbb{A}_{b}}}\right\\|}^{2}}\geq\alpha_{\rm th}\right)-\Pr\left({{\left\\|{\bar{\mathbf{h}}_{\mathbb{A}_{b-1}}}\right\\|}^{2}}\geq\alpha_{\rm th}\right)\right]$
	$\displaystyle+B\left[1-\Pr\left({{\left\\|{\bar{\mathbf{h}}_{\mathbb{A}_{B-1}}}\right\\|}^{2}}\geq\alpha_{\rm th}\right)\right]$
	$\displaystyle=1+\sum\limits_{b=1}^{B-1}{\Pr\left({{\left\\|{\bar{\mathbf{h}}_{\mathbb{A}_{b}}}\right\\|}^{2}}<\alpha_{\rm th}\right)}.$