A Sequential Subspace Method for Millimeter Wave MIMO Channel Estimation

Wei Zhang, , Taejoon Kim, , and Shu-Hung Leung W. Zhang is with the Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR, China (e-mail: wzhang237-c@my.cityu.edu.hk). T. Kim is with the Department of Electrical Engineering and Computer Science, The University of Kansas, KS 66045, USA (e-mail: taejoonkim@ku.edu). S.-H. Leung is with State Key Laboratory of Terahertz and Millimeter Waves and Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR, China (e-mail: eeeugshl@cityu.edu.hk).

Abstract

Data transmission over the millimeter wave (mmWave) in fifth-generation wireless networks aims to support very high speed wireless communications. A substantial increase in spectrum efficiency for mmWave transmission can be achieved by using advanced hybrid analog-digital precoding, for which accurate channel state information (CSI) is the key. Rather than estimating the entire channel matrix, it is now well-understood that directly estimating subspace information, which contains fewer parameters, does have enough information to design transceivers. However, the large channel use overhead and associated computational complexity in the existing channel subspace estimation techniques are major obstacles to deploy the subspace approach for channel estimation. In this paper, we propose a sequential two-stage subspace estimation method that can resolve the overhead issues and provide accurate subspace information. Utilizing a sequential method enables us to avoid manipulating the entire high-dimensional training signal, which greatly reduces the computational complexity. Specifically, in the first stage, the proposed method samples the columns of channel matrix to estimate its column subspace. Then, based on the obtained column subspace, it optimizes the training signals to estimate the row subspace. For a channel with $N_{r}$ receive antennas and $N_{t}$ transmit antennas, our analysis shows that the proposed technique only requires $\mathcal{O}(N_{t})$ channel uses, while providing a guarantee of subspace estimation accuracy. By theoretical analysis, it is shown that the similarity between the estimated subspace and the true subspace is linearly related to the signal-to-noise ratio (SNR), i.e., $\mathcal{O}(\text{SNR})$ , at high SNR, while quadratically related to the SNR, i.e., $\mathcal{O}(\text{SNR}^{2})$ , at low SNR. Simulation results show that the proposed sequential subspace method can provide improved subspace accuracy, normalized mean squared error, and spectrum efficiency over existing methods.

Index Terms:

Channel estimation, compressed sensing, millimeter wave communication, multi-input multi-output, subspace estimation.

I Introduction

Wireless communications using the millimeter wave (mmWave), which occupies the frequency band (30–300 GHz), address the current scarcity of wireless broadband spectrum and enable high speed transmission in fifth-generation (5G) wireless networks [1]. Due to the short wavelength, it is possible to employ large-scale antenna arrays with small-form-factor [2, 3, 4]. To reduce power consumption and hardware complexity, the mmWave systems exploit hybrid analog-digital multiple-input multiple-output (MIMO) architecture operating with a limited number of radio frequency (RF) chains [2]. Under the perfect channel state information (CSI), it has been shown that hybrid precoding can achieve nearly optimal performance as fully-digital precoding [2, 3, 5]. In practice, accurate CSI must be estimated via channel training in order to have effective precoding for robust mmWave MIMO transmission. However, extracting accurate CSI in the mmWave MIMO poses new challenges due to the limited number of RF chains that limits the observability of the channel and greatly increases the channel use overhead.

To reduce the channel use overhead, initial works focused on the beam alignment techniques [6, 7] utilizing beam search codebooks. By exploiting the fact that mmWave propagation exhibits low-rank characteristic, recent researches formulated the channel estimation task as a sparse signal reconstruction problem [8, 9] and low-rank matrix reconstruction problem [10, 11, 12, 13, 14, 15]. By using the knowledge of sparse signal reconstruction, orthogonal matching pursuit (OMP) [8] and sparse Bayesian learning (SBL) [9] were motivated to estimate the sparse mmWave channel in angular domain. Alternatively, if the channel is rank-sparse, it is possible to directly extract sufficient channel subspace information for the precoder design [16, 10, 11]. These subspace-based methods employ the Arnoldi iteration [16] to estimate the channel subspaces and knowledge of matrix completion [10, 11] to estimate the low-rank mmWave channel information.

Though the sparse signal reconstruction [8, 9] and matrix completion [10, 11] techniques can reduce the channel use overhead compared to traditional beam alignment techniques, the training sounders of these techniques [8, 9, 10, 11] are pre-designed and high-dimensional, which leads to the fact that these works suffer from explosive computational complexity as the size of arrays grows. To reduce the computational complexity, the adaptive training techniques have been investigated in [4, 16, 17], where the training sounders can be adaptively designed based on the feedback or two-way training. But these adaptive training techniques could not guarantee the performance on mean squared error (MSE) and/or subspace estimation accuracy. Moreover, the techniques provided in [4, 16, 17] will introduce additional channel use overhead due to the required feedback and two-way training.

To resolve the feedback overhead and maintain the benefit of adaptive training, in this paper, we present a two-stage subspace estimation approach, which sequentially estimates the column and row subspaces of the mmWave MIMO channel. Compared to the existing channel estimation techniques in [8, 9, 10, 11], the training sounders of the proposed approach are adaptively designed to reduce the channel use overhead and computational complexity. Moreover, the proposed approach is open-loop, thus it has no requirements of feedback and two-way channel sounding compared to priori adaptive training techniques [4, 16, 17]. The main contributions of this paper are described as follows:

•

We propose a two-stage subspace estimation technique called a sequential and adaptive subspace estimation (SASE) method. In the channel estimation of the proposed SASE, the column and row subspaces are estimated sequentially. Specifically, in the first stage, we sample a small fraction of columns of the channel matrix to obtain an estimate of the column subspace of the channel. In the second stage, the row subspace of the channel is estimated based on the obtained column subspace. In particular, by using the estimated column subspace obtained in the first stage, the receive training sounders of the second stage are optimized to reduce the number of channel uses. Compared to the existing works with fixed training sounders, where the entire high-dimensional training signals are utilized to obtain the CSI, the proposed adaptation has the advantage that the dimension of signals being processed in each stage is much less than that of the entire training signal, greatly reducing the computational complexity. Thus, the proposed SASE has much less computational complexity than those of the existing methods.
•

We analyze the subspace estimation accuracy, which guarantees the performance of the proposed SASE technique. Through extensive analysis, it is shown that the subspace estimation accuracy of the SASE is linearly related to the signal-to-noise ratio (SNR), i.e., $\mathcal{O}(\text{SNR})$ , at high SNR, and quadratically related to the SNR, i.e., ${\mathcal{O}}(\text{SNR}^{2})$ , at low SNR. Moreover, simulation results show that the proposed SASE improves estimation accuracy over the prior arts.
•

After obtaining the estimated column and row subspaces, an efficient method is developed for estimating the high-dimensional but low-rank channel matrix. Specifically, given the subspaces estimated by the proposed SASE, the mmWave channel estimation task can be simplified to solving a low-dimensional least squares problem, whose computation is much lower. Simulation results show that the proposed channel estimation method has lower normalized mean squared error and higher spectrum efficiency than those of the existing methods.

This paper is organized as follows, in Section II, we introduce the mmWave MIMO system model. In Section III, the proposed SASE is developed and analyzed. The channel use overhead, computational complexity, and an extension of the proposed SASE are discussed in Section IV. Finally, the simulation results and the conclusion remarks are provided in Sections V and VI, respectively.

Notation: Bold small and captial letters denote vectors and matrices, respectively. ${\mathbf{A}}^{T},{\mathbf{A}}^{H},{\mathbf{A}}^{\!-1}$ , $|{\mathbf{A}}|$ , $\|{\mathbf{A}}\|_{F}$ , $\mathop{\mathrm{tr}}({\mathbf{A}})$ , and $\|{\mathbf{a}}\|_{2}$ are, respectively, the transpose, conjugate transpose, inverse, determinant, Frobenius norm, trace of ${\mathbf{A}}$ , and $l_{2}$ -norm of ${\mathbf{a}}$ . $[{\mathbf{A}}]_{:,i}$ , $[{\mathbf{A}}]_{i,:}$ , and $[{\mathbf{A}}]_{i,j}$ are, respectively, the $i$ th column, $i$ th row, and $i$ th row $j$ th column entry of ${\mathbf{A}}$ . $\mathop{\mathrm{vec}}({\mathbf{A}})$ stacks the columns of ${\mathbf{A}}$ and forms a column vector. $\mathop{\mathrm{diag}}({\mathbf{a}})$ denotes a square diagonal matrix with vector ${\mathbf{a}}$ as the main diagonal. $\sigma_{L}({\mathbf{A}})$ denotes the $L$ th largest singular value of matrix ${\mathbf{A}}$ . ${\mathbf{I}}_{M}\!\in\!{\mathbb{R}}^{M\times M}$ is the identity matrix. The $\mathbf{1}_{M,N}\!\in\!{\mathbb{R}}^{M\times N}$ , $\mathbf{0}_{M}\!\in\!{\mathbb{R}}^{M\times 1},\mathbf{0}_{M,N}\!\in\!{\mathbb{R}}^{M\times N}$ are the all one matrix, zero vector, and zero matrix, respectively. $\mathop{\mathrm{col}}({\mathbf{A}})$ denotes the column subspace spanned by the columns of matrix ${\mathbf{A}}$ . The operator $(\cdot)_{+}$ denotes $\max\{0,\cdot\}$ . The operator $\otimes$ denotes the Kronecker product.

II MmWave MIMO System Model

Refer to caption — Figure 1: The mmWave MIMO channel sounding model

II-A Channel Sounding Model

The mmWave MIMO channel sounding model is shown in Fig. 1, where the transmitter and receiver are equipped with $N_{t}$ and $N_{r}$ antennas, respectively. There are $N_{RF}\geq 2$ and $M_{RF}\geq 2$ RF chains at the transmitter and receiver, respectively. Without loss of generality, we assume $N_{t}$ is an integer multiple of $N_{RF}$ , and $N_{r}$ is also an integer multiple of $M_{RF}$ . In the considered mmWave channel sounding framework, one sounding symbol is transmitted over a unit time interval from the transmitter, which is defined as one channel use. It is assumed that the system employs $K$ channel uses for channel sounding. The received signal ${\mathbf{y}}_{(k)}\in{\mathbb{C}}^{M_{RF}\times 1}$ at the $k$ th channel use is given by

\displaystyle{\mathbf{y}}_{(k)}={\mathbf{W}}_{(k)}^{H}{\mathbf{H}}{\mathbf{f}}_{(k)}+{\mathbf{W}}_{(k)}^{H}{\mathbf{n}}_{(k)},~{}k=1,\ldots,K,

(1)

where ${\mathbf{W}}_{(k)}\!\!=\!\!{\mathbf{W}}_{A,k}{\mathbf{W}}_{D,k}\in{\mathbb{C}}^{N_{r}\times M_{RF}}$ is the receive sounder composed of receive analog sounder ${\mathbf{W}}_{A,k}\in{\mathbb{C}}^{N_{r}\times M_{RF}}$ and receive digital sounder ${\mathbf{W}}_{D,k}\in{\mathbb{C}}^{M_{RF}\times M_{RF}}$ in series, ${\mathbf{f}}_{(k)}\!\!=\!\!{\mathbf{F}}_{A,k}{\mathbf{F}}_{D,k}{\mathbf{s}}_{k}\!\!\in\!\!{\mathbb{C}}^{N_{t}\times 1}$ is the transmit sounder composed of transmit analog sounder ${\mathbf{F}}_{A,k}\in{\mathbb{C}}^{N_{t}\times N_{RF}}$ and transmit digital sounder ${\mathbf{F}}_{D,k}\in{\mathbb{C}}^{N_{RF}\times N_{RF}}$ in series with transmitted sounding signal ${\mathbf{s}}_{k}$ , and ${\mathbf{n}}_{(k)}\!\in\!{\mathbb{C}}^{{N_{r}}\times 1}$ is the noise.

Considering that the transmitted sounding signal ${\mathbf{s}}_{k}$ is included in ${\mathbf{f}}_{(k)}$ , for convenience, we let ${\mathbf{s}}_{k}\!\!=\!\!\frac{1}{\sqrt{N_{RF}}}[1,\ldots,1]^{T}\in{\mathbb{R}}^{N_{RF}\times 1}$ , which enables us to focus on the design of ${\mathbf{F}}_{A,k}$ and ${\mathbf{F}}_{D,k}$ . It is worth noting that the analog sounders are constrained to be constant modulus, that is, $|[{\mathbf{W}}_{A,k}]_{i,j}|={1}/{\sqrt{N_{r}}}$ , and $|[{\mathbf{F}}_{A,k}]_{i,j}|={1}/{\sqrt{N_{t}}},\forall i,j$ . Without loss of generality, we assume the power of the transmit sounder is one, that is, $\|{\mathbf{f}}_{(k)}\|_{2}^{2}=1$ . The noise ${\mathbf{n}}_{(k)}$ is an independent zero mean complex Gaussian vector with covariance matrix $\sigma^{2}{\mathbf{I}}_{N_{r}}$ . Due to the unit power of transmit sounder, we define the signal-to-noise-ratio (SNR) as $1/{\sigma^{2}}$ .¹¹1Here, the SNR is the ratio of transmitted sounder’s power to the noise’s power, which is a common practice in the channel estimation literature [4, 8, 9, 10, 16, 17]. The details of designing the receive and transmit sounders for facilitating the channel estimation will be discussed in Section III.

To model the point-to-point sparse mmWave MIMO channel, we assume there are $L$ clusters with $L\ll\min\{N_{r},N_{t}\}$ , and each constitutes a propagation path. The channel model can be expressed as [18, 19],

\displaystyle{\mathbf{H}}=\sqrt{\frac{N_{r}N_{t}}{L}}\sum_{l=1}^{L}h_{l}{\mathbf{a}}_{r}(\theta_{r,l}){\mathbf{a}}_{t}^{H}(\theta_{t,l}).

(2)

where ${\mathbf{a}}_{r}(\theta_{r,l})\!\in\!{\mathbb{C}}^{N_{r}\!\times 1}\!$ and ${\mathbf{a}}_{t}(\theta_{t,l})\!\in\!{\mathbb{C}}^{N_{t}\!\times 1}\!$ are array response vectors of the uniform linear arrays (ULAs) at the receiver and transmitter, respectively. We extend it to the channel model with 2D uniform planar arrays (UPAs) in Section IV-C. In particular, ${\mathbf{a}}_{r}(\theta_{r,l})$ and ${\mathbf{a}}_{t}(\theta_{t,l})$ are expressed as

	$\displaystyle{\mathbf{a}}_{r}(\theta_{r,l})\!=\!\frac{1}{\sqrt{N_{r}}}[1,e^{-j\frac{2\pi}{\lambda}d\sin\theta_{r,l}},\!\cdots\!,e^{-j\frac{2\pi}{\lambda}d(N_{r}-1)\sin\theta_{r,l}}]^{T},$
	$\displaystyle{\mathbf{a}}_{t}(\theta_{t,l})\!=\!\frac{1}{\sqrt{N_{t}}}[1,e^{-j\frac{2\pi}{\lambda}d\sin\theta_{t,l}},\!\cdots\!,e^{-j\frac{2\pi}{\lambda}d(N_{t}-1)\sin\theta_{t,l}}]^{T},$

where $\lambda$ is the wavelength, $d=0.5\lambda$ is the antenna spacing, $\theta_{r,l}$ and $\theta_{t,l}$ are the angle of arrival (AoA) and angle of departure (AoD) of the $l$ th path uniformly distributed in $[-\pi/2,\pi/2)$ , respectively, and $h_{l}\sim{\mathcal{C}}{\mathcal{N}}(0,\sigma_{h,l}^{2})$ is the complex gain of the $l$ th path.

The channel model in (2) can be rewritten as

\displaystyle{\mathbf{H}}={\mathbf{A}}_{r}\mathop{\mathrm{diag}}(\mathbf{h}){\mathbf{A}}_{t}^{H},

(3)

where $\!\!{\mathbf{A}}_{r}\!\!=\!\![{\mathbf{a}}_{r}(\theta_{r,1}),\ldots,{\mathbf{a}}_{r}(\theta_{r,L})]\!\!\in\!\!{\mathbb{C}}^{N_{r}\times L}$ , ${\mathbf{A}}_{t}\!\!=\!\![{\mathbf{a}}_{t}(\theta_{t,1}),\ldots,{\mathbf{a}}_{t}(\theta_{t,L})]\!\in\!{\mathbb{C}}^{N_{t}\times L}$ , and $\mathbf{h}\!\!=\!\![h_{1},\cdots,h_{L}]^{T}\!\!\in\!\!{\mathbb{C}}^{L\times 1}$ . The channel estimation task is to obtain an estimate of ${\mathbf{H}}$ , i.e., $\widehat{{\mathbf{H}}}$ , from ${\mathbf{y}}_{(k)}$ , ${\mathbf{W}}_{(k)}$ , and ${\mathbf{f}}_{(k)}$ , $k\!=\!\!1,\cdots,K$ in (1).

II-B Performance Evaluation of Channel Estimation

To evaluate the channel estimation performance, the achieved spectrum efficiency by utilizing the channel estimate $\widehat{{\mathbf{H}}}$ is discussed in the following. Conventionally, the precoder $\widehat{{\mathbf{F}}}\!\in\!{\mathbb{C}}^{N_{t}\times N_{d}}$ and combiner $\widehat{{\mathbf{W}}}\!\in\!{\mathbb{C}}^{N_{r}\times N_{d}}$ are designed, based on the estimated $\widehat{{\mathbf{H}}}$ , where $N_{d}$ is the number of transmitted data streams with $N_{d}\!\leq\!\min\{N_{RF},M_{RF}\}$ . Here, when evaluating the channel estimation performance, it is assumed the number of transmitted data streams is equal to the number of dominant paths, i.e., $N_{d}=L$ . After the design of precoder and combiner, the received signal for the data transfer is given by

\displaystyle{\mathbf{y}}=\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}\widehat{{\mathbf{F}}}{\mathbf{s}}+\widehat{{\mathbf{W}}}^{H}{\mathbf{n}},

(4)

where the signal follows ${\mathbf{s}}\sim{\mathcal{C}}{\mathcal{N}}(\mathbf{0}_{L},\frac{1}{L}{\mathbf{I}}_{L})$ and ${\mathbf{n}}\sim{\mathcal{C}}{\mathcal{N}}(\mathbf{0}_{N_{r}},\sigma^{2}{\mathbf{I}}_{N_{r}})$ . It is worth noting that (4) is for data transmission, while (1) is for channel sounding. The spectrum efficiency achieved by $\widehat{{\mathbf{W}}}$ and $\widehat{{\mathbf{F}}}$ in (4) is defined in [20] as,

\displaystyle R=\mathrm{log}_{2}\left|{\mathbf{I}}_{L}+\frac{1}{\sigma^{2}L}{\mathbf{R}}_{n}^{-1}{\mathbf{H}}_{e}{{\mathbf{H}}_{e}}^{H}\right|,

(5)

where ${\mathbf{H}}_{e}\!=\!\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}\widehat{{\mathbf{F}}}\in{\mathbb{C}}^{{L}\times{L}}$ and ${\mathbf{R}}_{n}\!=\!\widehat{{\mathbf{W}}}^{H}\widehat{{\mathbf{W}}}\in{\mathbb{C}}^{{L}\times{L}}$ . In this work, we assume that the precoder and combiner are unitary, such that $\widehat{{\mathbf{W}}}^{H}\widehat{{\mathbf{W}}}={\mathbf{I}}_{L}$ and $\widehat{{\mathbf{F}}}^{H}\widehat{{\mathbf{F}}}={\mathbf{I}}_{L}$ . Under this assumption, we have ${\mathbf{R}}_{n}={\mathbf{I}}_{L}$ in (5).

It is worth noting that the spectrum efficiency in (5) is invariant to the right rotations of the precoder and combiner, i.e., substituting $\widetilde{{\mathbf{F}}}=\widehat{{\mathbf{F}}}{\mathbf{R}}_{\mathbf{F}}$ and $\widetilde{{\mathbf{W}}}=\widehat{{\mathbf{W}}}{\mathbf{R}}_{\mathbf{W}}$ into (5), where ${\mathbf{R}}_{\mathbf{F}}\in{\mathbb{C}}^{L\times L}$ and ${\mathbf{R}}_{\mathbf{W}}\in{\mathbb{C}}^{L\times L}$ are unitary matrices, does not change the spectrum efficiency. Thus, the $R$ in (5) is a function of subspaces spanned by the precoder and combiner, i.e., $\mathop{\mathrm{col}}(\widehat{{\mathbf{F}}})$ and $\mathop{\mathrm{col}}(\widehat{{\mathbf{W}}})$ . Moreover, the highest spectrum efficiency can be achieved when $\mathop{\mathrm{col}}(\widehat{{\mathbf{F}}})$ and $\mathop{\mathrm{col}}(\widehat{{\mathbf{W}}})$ respectively equal to the row and column subspaces of ${\mathbf{H}}$ .

Apart from the spectrum efficiency achieved by the signal model in (4), we consider the effective SNR at the receiver,

\displaystyle\gamma=\frac{\|\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}\widehat{{\mathbf{F}}}\|_{F}^{2}}{\sigma^{2}\|\widehat{{\mathbf{W}}}\|_{F}^{2}}=\frac{\|\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}\widehat{{\mathbf{F}}}\|_{F}^{2}}{\sigma^{2}L}.

(6)

The received SNR $\gamma$ in (6) has the same rotation invariance property as the spectrum efficiency. In other words, the $\gamma$ in (6) is a function of the estimated column and row subspaces. The maximum of the $\gamma$ is also achieved when $\widehat{{\mathbf{W}}}$ and $\widehat{{\mathbf{F}}}$ span the column and row subspaces of ${\mathbf{H}}$ , respectively.

Inspired by the definition in (6), in this paper, the accuracy of subspace estimation is defined as the ratio of the power captured by the transceiver matrices [21] $\widehat{{\mathbf{W}}}$ and $\widehat{{\mathbf{F}}}$ to the power of the channel,

\displaystyle\eta(\widehat{{\mathbf{W}}},\widehat{{\mathbf{F}}})=\frac{\|\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}\widehat{{\mathbf{F}}}\|_{F}^{2}}{\mathop{\mathrm{tr}}({\mathbf{H}}^{H}{\mathbf{H}})}.

(7)

Similarly, the measures for the accuracy of column subspace and row subspace estimation, i.e., $\eta_{c}(\widehat{{\mathbf{W}}})$ and $\eta_{r}(\widehat{{\mathbf{F}}})$ , are respectively defined as the ratio of the power captured by $\widehat{{\mathbf{W}}}$ and $\widehat{{\mathbf{F}}}$ to the power of the channel in the following,

	$\displaystyle\eta_{c}(\widehat{{\mathbf{W}}})$	$\displaystyle=$	$\displaystyle\frac{\mathop{\mathrm{tr}}(\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}{\mathbf{H}}^{H}\widehat{{\mathbf{W}}})}{\mathop{\mathrm{tr}}({\mathbf{H}}^{H}{\mathbf{H}})},$		(8)
	$\displaystyle\eta_{r}(\widehat{{\mathbf{F}}})$	$\displaystyle=$	$\displaystyle\frac{\mathop{\mathrm{tr}}(\widehat{{\mathbf{F}}}^{H}{\mathbf{H}}^{H}{\mathbf{H}}\widehat{{\mathbf{F}}})}{\mathop{\mathrm{tr}}({\mathbf{H}}^{H}{\mathbf{H}})}.$		(9)

Moreover, $\eta_{c}$ and $\eta_{r}$ are also rotation invariant. When the values of $\eta_{c}$ or $\eta_{r}$ are closed to one, the corresponding $\widehat{{\mathbf{W}}}$ or $\widehat{{\mathbf{F}}}$ can be treated accurate subspace estimates.

The illustration of the proposed SASE algorithm is shown in Fig. 2. It consists of two stages: one is column subspace estimation and the other is row subspace estimation. In particular, the training sounders of the second stage can be optimized by fully adapting them to the estimated column subspace, which would reduce the number of channel uses and improve the estimation accuracy.

III Sequential and Adaptive Subspace Estimation

III-A Estimate the Column Subspace

In this subsection, we present the design of transmit and receive sounders along with the method for obtaining the column subspace of the mmWave channel. To begin with, the following lemma shows that under the mmWave channel model in (3), the column subspaces of ${\mathbf{H}}$ and sub-matrix ${\mathbf{H}}_{S}$ are equivalent.

Lemma 1

Let ${\mathbf{H}}_{S}={\mathbf{H}}{\mathbf{S}}\in{\mathbb{C}}^{N_{r}\times m}$ be a sub-matrix that selects the first $m$ columns of ${\mathbf{H}}$ with $m\geq L$ , where ${\mathbf{S}}$ is expressed as

\displaystyle{\mathbf{S}}=\begin{bmatrix}{\mathbf{I}}_{m}\\ \mathbf{0}_{N_{t}-m,m}\end{bmatrix}\in{\mathbb{R}}^{N_{t}\times m}.

For the mmWave channel model in (3), if all the values of angles $\{\theta_{t,l}\}_{l=1}^{L}$ and $\{\theta_{r,l}\}_{l=1}^{L}$ are distinct, the column subspaces of ${\mathbf{H}}$ and ${\mathbf{H}}_{S}$ will be equivalent, i.e., $\mathop{\mathrm{col}}({\mathbf{H}}_{S})=\mathop{\mathrm{col}}({\mathbf{H}})$ .

Proof:

See Appendix A. ∎∎

Remark 1

Because $\{\theta_{t,l}\}_{l=1}^{L}$ and $\{\theta_{r,l}\}_{l=1}^{L}$ are continuous random variables (r.v.s) in $[-\pi/2,\pi/2)$ , hence, they are distinct almost surely (i.e., with probability $1$ ).

Lemma 1 reveals that when $\mathop{\mathrm{col}}({\mathbf{H}}_{S})=\mathop{\mathrm{col}}({\mathbf{H}})$ , to obtain the column subspace of ${\mathbf{H}}$ , it suffices to sample the first $m$ columns of ${\mathbf{H}}$ , i.e., ${\mathbf{H}}_{S}$ , which reduces the number of channel uses. However, the mmWave hybrid MIMO architecture can not directly access the entries of ${\mathbf{H}}$ due to the analog array constraints. This can be overcome by adopting the technique proposed in [16]. Specifically, to sample the $i$ th column of ${\mathbf{H}}$ , i.e., $[{\mathbf{H}}]_{:,i}$ , the transmitter needs to construct the transmit sounder ${\mathbf{f}}_{(i)}={\mathbf{e}}_{i}\in{\mathbb{C}}^{N_{t}\times 1}$ , where ${\mathbf{e}}_{i}$ is the $i$ th column of ${\mathbf{I}}_{N_{t}}$ . This is possible due to the fact that any precoder vector can be generated by $N_{RF}\geq 2$ RF chains [22]. To be more specific, there exists ${\mathbf{F}}_{A,i}$ , ${\mathbf{F}}_{D,i}$ , and ${\mathbf{s}}_{i}$ such that ${\mathbf{e}}_{i}={\mathbf{F}}_{A,i}{\mathbf{F}}_{D,i}{\mathbf{s}}_{i}$ ,

\displaystyle{\mathbf{e}}_{i}\!\!=\!\!\underbrace{\frac{1}{\sqrt{N_{t}}}\!\!\begin{bmatrix}1&1\!\!\!&\cdots\\ \vdots&\vdots\!\!\!&\cdots\\ 1&1\!\!\!&\cdots\\ 1&-1\!\!\!&\cdots\\ 1&1\!\!\!&\cdots\\ \vdots&\vdots\!\!\!&\cdots\\ 1&1\!\!\!&\cdots\end{bmatrix}}_{\triangleq{\mathbf{F}}_{A,i}}\underbrace{\begin{bmatrix}\frac{\sqrt{N_{RF}N_{t}}}{2}\!\!\!\!\!\!&0\!\!\!&\cdots\!\!\!&0\\ -\frac{\sqrt{N_{RF}N_{t}}}{2}\!\!\!\!\!\!&0\!\!\!&\cdots\!\!\!&0\\ 0\!\!\!\!\!\!&0\!\!\!&\cdots\!\!\!&0\\ \vdots\!\!\!\!\!\!&\vdots\!\!\!&\ddots\!\!\!&\vdots\\ 0\!\!\!\!\!\!&0\!\!\!&\cdots\!\!\!&0\end{bmatrix}}_{\triangleq{\mathbf{F}}_{D,i}}\underbrace{\frac{1}{\sqrt{N_{RF}}}\begin{bmatrix}1\\ 1\\ \vdots\\ 1\end{bmatrix}}_{\triangleq{\mathbf{s}}_{i}},

where ${\mathbf{F}}_{A,i}\!\!=\!\!\frac{1}{\sqrt{N_{t}}}\mathbf{1}_{N_{t},N_{RF}}$ except for $[{\mathbf{F}}_{A,i}]_{i,2}\!\!=\!\!-\frac{1}{\sqrt{N_{t}}}$ , the ${\mathbf{F}}_{D,i}\!\!=\!\!\mathbf{0}_{N_{RF},N_{RF}}$ except for $[{\mathbf{F}}_{D,i}]_{1,1}\!\!=\!\!\frac{\sqrt{N_{RF}N_{t}}}{2},[{\mathbf{F}}_{D,i}]_{2,1}\!\!=\!\!-\frac{\sqrt{N_{RF}N_{t}}}{2}$ , and ${\mathbf{s}}_{i}\!\!=\!\!\frac{1}{\sqrt{N_{RF}}}[1,\ldots,1]^{T}\in{\mathbb{R}}^{N_{RF}\times 1}$ .

At the receiver side, we collect the receive sounders of $N_{r}/M_{RF}$ channel uses to form the full-rank matrix,

\displaystyle{\mathbf{M}}=[{\mathbf{W}}_{(i,1)},{\mathbf{W}}_{(i,2)},\cdots,{\mathbf{W}}_{(i,N_{r}/M_{RF})}]\in{\mathbb{C}}^{N_{r}\times N_{r}},

(10)

where ${\mathbf{W}}_{(i,j)}\in{\mathbb{C}}^{N_{r}\times M_{RF}},~{}j=1,\ldots,N_{r}/M_{RF}$ , denotes the $j$ th receive sounder corresponding to transmit sounder ${\mathbf{e}}_{i}$ . In order to satisfy the analog constraint where the entries in analog sounders should be constant modulus, we let the matrix ${\mathbf{M}}$ in (10) be the discrete Fourier transform (DFT) matrix. Specifically, the analog and digital receive sounders associated with ${\mathbf{W}}_{(i,j)}$ in (10) are expressed as follows

\displaystyle{\mathbf{W}}_{(i,j)}=\underbrace{[{\mathbf{M}}]_{:,(j-1)M_{RF}+1:jM_{RF}}}_{\text{analog sounder}}\underbrace{{\mathbf{I}}_{M_{RF}}}_{\text{digital sounder}}.

Thus, the received signal ${\mathbf{y}}_{(i,j)}\!\in\!{\mathbb{C}}^{M_{RF}\times 1}$ under the transmit sounder ${\mathbf{e}}_{i}$ and receive sounder ${\mathbf{W}}_{\!(i,j)}$ is expressed as follows

\displaystyle{\mathbf{y}}_{(i,j)}={\mathbf{W}}_{(i,j)}^{H}{\mathbf{H}}{\mathbf{e}}_{i}+{\mathbf{W}}_{(i,j)}^{H}{\mathbf{n}}_{(i,j)},

where ${\mathbf{n}}_{(i,j)}\in{\mathbb{C}}^{N_{r}\times 1}$ is the noise vector with ${\mathbf{n}}_{(i,j)}\sim{\mathcal{C}}{\mathcal{N}}(\bm{0}_{N_{r}},\sigma^{2}{\mathbf{I}}_{N_{r}})$ . Then we stack the observations of $N_{r}/M_{RF}$ channel uses as ${\mathbf{y}}_{i}=[{\mathbf{y}}_{(i,1)}^{T},\cdots,{\mathbf{y}}^{T}_{(i,N_{r}/M_{RF})}]^{T}\in{\mathbb{C}}^{N_{r}\times 1}$ ,

	$\displaystyle\underbrace{\left[\begin{matrix}{\mathbf{y}}_{(i,1)}\\ {\mathbf{y}}_{(i,2)}\\ \vdots\\ {\mathbf{y}}_{(i,\frac{N_{r}}{M_{RF}})}\end{matrix}\right]}_{\triangleq{\mathbf{y}}_{i}}\!\!\!\!$	$\displaystyle=$	$\displaystyle\!\!\!\!\underbrace{\left[\begin{matrix}{\mathbf{W}}_{(i,1)}^{H}\\ {\mathbf{W}}_{(i,2)}^{H}\\ \vdots\\ {\mathbf{W}}_{(i,\frac{N_{r}}{M_{RF}})}^{H}\end{matrix}\right]}_{\triangleq{\mathbf{M}}^{H}}\underbrace{{\mathbf{H}}{\mathbf{e}}_{i}}_{\triangleq[{\mathbf{H}}]_{:,i}}\!\!+\!\!\underbrace{\left[\begin{matrix}{\mathbf{W}}_{(i,1)}^{H}{\mathbf{n}}_{(i,1)}\\ {\mathbf{W}}_{(i,2)}^{H}{\mathbf{n}}_{(i,2)}\\ \vdots\\ {\mathbf{W}}_{(i,\frac{N_{r}}{M_{RF}})}^{H}{\mathbf{n}}_{(i,\frac{N_{r}}{M_{RF}})}\end{matrix}\right]}_{\triangleq\tilde{{\mathbf{n}}}_{i}}$		(11)
		$\displaystyle=$	$\displaystyle\!\!\!\!{\mathbf{M}}^{H}[{\mathbf{H}}]_{:,i}+\tilde{{\mathbf{n}}}_{i},$		(11)

where $\tilde{{\mathbf{n}}}_{i}\in{\mathbb{C}}^{N_{r}\times 1}$ is the effective noise vector after stacking, whose covariance matrix is expressed as,

\displaystyle\!\!\!\!\!\!{\mathbb{E}}[\tilde{{\mathbf{n}}}_{i}\tilde{{\mathbf{n}}}_{i}^{H}\!]\!\!\!\!\!\!

\displaystyle=

\displaystyle\!\!\!\!\!\sigma^{2}\!\!\!\begin{bmatrix}\!\!\!\!{\mathbf{W}}_{(i,1)}^{H}{\mathbf{W}}_{(i,1)}\!\!\!\!&\!\!\!\!\cdots\!\!\!\!&\!\!{\mathbf{W}}_{(i,1)}^{H}{\mathbf{W}}_{(i,\frac{N_{r}}{M_{RF}})}\\ \!\vdots\!\!\!\!&\!\!\!\!\ddots\!\!\!\!&\!\!\vdots\\ \!{\mathbf{W}}_{(i,\frac{N_{r}}{M_{RF}})}^{H}{\mathbf{W}}_{(i,1)}\!\!\!\!&\!\!\!\!\cdots\!\!\!\!&\!\!\!{\mathbf{W}}_{(i,\frac{N_{r}}{M_{RF}}\!\!)}^{H}\!\!{\mathbf{W}}_{\!\!(i,\frac{N_{r}}{M_{RF}}\!\!)}\!\!\!\end{bmatrix}\!\!.

(12)

Because the DFT matrix ${\mathbf{M}}$ in (10) satisfies ${\mathbf{M}}^{H}{\mathbf{M}}={\mathbf{M}}{\mathbf{M}}^{H}={\mathbf{I}}_{N_{r}}$ , the following holds

\displaystyle{\mathbf{W}}_{(i,j)}^{H}{\mathbf{W}}_{(i,k)}=\left\{\begin{array}[]{rcl}{\mathbf{I}}_{M_{RF}}&&{j=k},\\ \mathbf{0}_{M_{RF}}&&{j\neq k}.\end{array}\right.

(15)

Substituting (15) into (12), we can verify that ${\mathbb{E}}[\tilde{{\mathbf{n}}}_{i}\tilde{{\mathbf{n}}}_{i}^{H}]=\sigma^{2}{\mathbf{I}}_{N_{r}}$ , and precisely, $\tilde{{\mathbf{n}}}_{i}\sim{\mathcal{C}}{\mathcal{N}}(\mathbf{0}_{N_{r}},\sigma^{2}{\mathbf{I}}_{N_{r}})$ . Moreover, by denoting $\widetilde{{\mathbf{N}}}=[\tilde{{\mathbf{n}}}_{1},\cdots,\tilde{{\mathbf{n}}}_{m}]\in{\mathbb{C}}^{N_{r}\times m}$ , it is straightforward that the entries in $\widetilde{{\mathbf{N}}}$ are independent, identically distributed (i.i.d.) as ${\mathcal{C}}{\mathcal{N}}(0,\sigma^{2})$ . Here, for convenience, we denote $\widetilde{{\mathbf{Y}}}_{S}=[{\mathbf{y}}_{1},\cdots,{\mathbf{y}}_{m}]\in{\mathbb{C}}^{N_{r}\times m}$ where ${\mathbf{y}}_{i}$ is defined in (11). Then, we apply DFT to the collected observation $\widetilde{{\mathbf{Y}}}_{S}$ , and obtain ${\mathbf{Y}}_{S}={\mathbf{M}}\widetilde{{\mathbf{Y}}}_{S}\in{\mathbb{C}}^{N_{r}\times m}$ as

\displaystyle{\mathbf{Y}}_{S}={\mathbf{H}}_{S}+{\mathbf{N}}_{S},

(16)

where ${\mathbf{N}}_{S}\!=\!{\mathbf{M}}\widetilde{{\mathbf{N}}}\in{\mathbb{C}}^{N_{r}\times m}$ and ${\mathbf{H}}_{S}\!=\![{\mathbf{H}}]_{:,1:m}\in{\mathbb{C}}^{N_{r}\times m}$ . Before talking about the noise part ${\mathbf{N}}_{S}$ in (16), the following lemma is a preliminary which gives the distribution of entries in the product of matrices.

Lemma 2

Given a semi-unitary matrix ${\mathbf{A}}\in{\mathbb{C}}^{d\times N}$ with ${\mathbf{A}}{\mathbf{A}}^{H}={\mathbf{I}}_{d}$ , and a random matrix ${\mathbf{X}}\in{\mathbb{C}}^{N\times m}$ with i.i.d. entries of ${\mathcal{C}}{\mathcal{N}}(0,\sigma^{2})$ , the product ${\mathbf{Y}}={\mathbf{A}}{\mathbf{X}}\in{\mathbb{C}}^{d\times m}$ also has i.i.d. entries with distribution of ${\mathcal{C}}{\mathcal{N}}(0,\sigma^{2})$ .

Proof:

See Appendix B. ∎∎

Therefore, considering the noise part in (16), i.e., ${\mathbf{N}}_{S}={\mathbf{M}}\widetilde{{\mathbf{N}}}$ , where ${\mathbf{M}}$ is unitary and $\widetilde{{\mathbf{N}}}$ has i.i.d. ${\mathcal{C}}{\mathcal{N}}(0,\sigma^{2})$ entries, the conclusion of Lemma 2 can be applied, which verifies that the entries of ${\mathbf{N}}_{S}$ in (16) are i.i.d. as ${\mathcal{C}}{\mathcal{N}}(0,\sigma^{2})$ .

Given the expression in (16), the column subspace estimation problem is formulated as,

\displaystyle\widehat{{\mathbf{U}}}=\mathop{\mathrm{argmax}}\limits_{{\mathbf{U}}\in{\mathbb{C}}^{N_{r}\times L}}\left\|{\mathbf{U}}^{H}{\mathbf{Y}}_{S}\right\|_{F}^{2}~{}\text{subject to}~{}{\mathbf{U}}^{H}{\mathbf{U}}={\mathbf{I}}_{L},

(17)

where one of the optimal solutions of (17) can be obtained by taking the dominant $L$ left singular vectors of ${\mathbf{Y}}_{S}$ . Here, the number of paths, $L$ , is assumed to be known as a priori. In practice, it is possible to estimate $L$ by comparing the singular values of ${\mathbf{Y}}_{S}$ [23]. Because ${\mathbf{Y}}_{S}={\mathbf{H}}_{S}+{\mathbf{N}}_{S}$ and $\mathop{\mathrm{rank}}({\mathbf{H}}_{S})=L$ , there will be $L$ singular values of ${\mathbf{Y}}_{S}$ whose magnitudes clearly dominate the other singular values. Alternatively, we can set it to $L_{\text{sup}}$ , which is an upper bound on the number of dominant paths such that $L\leq L_{\text{sup}}$ .²²2Due to the limited RF chains, the dimension of channel subspaces for data transmission is less than $\min\{M_{RF},N_{RF}\}$ . Thus, if the path number estimate is larger than $\min\{M_{RF},N_{RF}\}$ , we let it be $\min\{M_{RF},N_{RF}\}$ .

Now, we design the receive combiner $\widehat{{\mathbf{W}}}$ in (4) for data transmission to approximate the estimated $\widehat{{\mathbf{U}}}\in{\mathbb{C}}^{N_{r}\times L}$ in (17). Specifically, we design the analog combiner $\widehat{{\mathbf{W}}}_{A}\in{\mathbb{C}}^{N_{r}\times M_{RF}}$ and digital combiner $\widehat{{\mathbf{W}}}_{D}\in{\mathbb{C}}^{M_{RF}\times L}$ at the receiver by solving the following problem

	$\displaystyle\left(\widehat{{\mathbf{W}}}_{A},\widehat{{\mathbf{W}}}_{D}\right)=\mathop{\mathrm{argmin}}_{{\mathbf{W}}_{A},{\mathbf{W}}_{D}}\\|\widehat{{\mathbf{U}}}-{\mathbf{W}}_{A}{\mathbf{W}}_{D}\\|_{F},$
	$\displaystyle\text{subject to~{}~{}}\left\|[{\mathbf{W}}_{A}]_{i,j}\right\|=\frac{1}{\sqrt{N_{r}}}.$		(18)

The problem above can be solved by using the OMP algorithm [5] or alternating minimization method [24]. The designed receive combiner is given by $\widehat{{\mathbf{W}}}=\widehat{{\mathbf{W}}}_{A}\widehat{{\mathbf{W}}}_{D}\in{\mathbb{C}}^{N_{r}\times L}$ with $\widehat{{\mathbf{W}}}^{H}\widehat{{\mathbf{W}}}={\mathbf{I}}_{L}$ . The methods in [5, 24] have shown to guarantee the near optimal performance, such as $\mathop{\mathrm{col}}(\widehat{{\mathbf{W}}})\approx\mathop{\mathrm{col}}(\widehat{{\mathbf{U}}})$ . The details of our column subspace estimation algorithm are summarized in Algorithm 1.

In general, $\mathop{\mathrm{col}}(\widehat{{\mathbf{W}}})$ is not equal to the column subspace of ${\mathbf{H}}$ , i.e., $\mathop{\mathrm{col}}({\mathbf{U}})$ with ${\mathbf{U}}\in{\mathbb{C}}^{N_{r}\times L}$ , due to the noise ${\mathbf{N}}_{S}$ in (16). To analyze the column subspace accuracy $\eta_{c}(\widehat{{\mathbf{W}}})$ defined in (8), we introduce the theorem [25] below.

Theorem 1 ([25])

Suppose ${\mathbf{X}}\in{\mathbb{C}}^{M\times N}(M\geq N)$ is of rank- $r$ , and $\widehat{{\mathbf{X}}}={\mathbf{X}}+{\mathbf{N}}$ , where $[{\mathbf{N}}]_{i,j}$ is i.i.d. with zero mean and unit variance (not necessarily Gaussian). Let the compact SVD of ${\mathbf{X}}$ be

\displaystyle{\mathbf{X}}={\mathbf{U}}\mathbf{\Sigma}{\mathbf{V}}^{H},

where ${\mathbf{U}}\in{\mathbb{C}}^{M\times r}$ , ${\mathbf{V}}\in{\mathbb{C}}^{N\times r}$ , and $\mathbf{\Sigma}\in{\mathbb{C}}^{r\times r}$ . We assume the singular values in $\mathbf{\Sigma}$ are in descending order, i,e, $\sigma_{1}({\mathbf{X}})\geq\cdots\geq\sigma_{r}({\mathbf{X}})$ . Similarly, we partition the SVD of $\widehat{{\mathbf{X}}}$ as

\displaystyle\widehat{{\mathbf{X}}}=\begin{bmatrix}\widehat{{\mathbf{U}}}&\widehat{{\mathbf{U}}}_{\perp}\end{bmatrix}\begin{bmatrix}\widehat{\mathbf{\Sigma}}_{1}&\mathbf{0}\\ \mathbf{0}&\widehat{\mathbf{\Sigma}}_{2}\end{bmatrix}\begin{bmatrix}\widehat{{\mathbf{V}}}^{H}\\ \widehat{{\mathbf{V}}}_{\perp}^{H}\end{bmatrix},

where $\widehat{{\mathbf{U}}}\in{\mathbb{C}}^{M\times r}$ , $\widehat{{\mathbf{U}}}_{\perp}\in{\mathbb{C}}^{M\times(M-r)}$ , $\widehat{{\mathbf{V}}}\in{\mathbb{C}}^{N\times r}$ , $\widehat{{\mathbf{V}}}_{\perp}\in{\mathbb{C}}^{N\times(N-r)}$ , $\widehat{\mathbf{\Sigma}}_{1}\in{\mathbb{C}}^{r\times r}$ , and $\widehat{\mathbf{\Sigma}}_{2}\in{\mathbb{C}}^{(M-r)\times(N-r)}$ . Then, there exists a constant $C>0$ such that

	$\displaystyle{\mathbb{E}}\left[\sigma_{r}^{2}({\mathbf{U}}^{H}\widehat{{\mathbf{U}}})\right]\geq\left(1-\frac{CM(\sigma_{r}^{2}({\mathbf{X}})+N)}{\sigma_{r}^{4}({\mathbf{X}})}\right)_{+},$
	$\displaystyle{\mathbb{E}}\left[\sigma_{r}^{2}({\mathbf{V}}^{H}\widehat{{\mathbf{V}}})\right]\geq\left(1-\frac{CN(\sigma_{r}^{2}({\mathbf{X}})+M)}{\sigma_{r}^{4}({\mathbf{X}})}\right)_{+},$

where the expectation is taken over the random noise ${\mathbf{N}}$ . In particular, when the noise is i.i.d. ${\mathcal{C}}{\mathcal{N}}(0,1)$ , it has $C=2$ .

Algorithm 1 Column subspace estimation

1: Input: channel dimension:

N_{r}

N_{t}

; number of RF chains at receiver:

M_{RF}

; channel paths:

L

; parameter:

m

2: Initialization: channel use index

k=1

3: for

i=1

m

4: Set transmit sounder as

{\mathbf{f}}_{(i)}={\mathbf{e}}_{i}

5: for

j=1

N_{r}/M_{RF}

6: Design receive training sounder as

{\mathbf{W}}_{(i,j)}=[{\mathbf{M}}]_{:,(j-1)M_{RF}+1:jM_{RF}}{\mathbf{I}}_{M_{RF}}

7: Obtain the received signal

{\mathbf{y}}_{(i,j)}\!\!\!\!=\!\!\!\!{\mathbf{W}}_{(i,j)}^{H}{\mathbf{H}}{\mathbf{f}}_{(i)}\!\!+\!\!{\mathbf{W}}_{(i,j)}^{H}{\mathbf{n}}_{(i,j)}

8: Update

k=k+1

9: end for

10:

{\mathbf{y}}_{i}=\left[{\mathbf{y}}_{(i,1)}^{T},\cdots,{\mathbf{y}}^{T}_{(i,N_{r}/M_{RF})}\right]^{T}

11: end for

12:

{\mathbf{Y}}_{S}={\mathbf{M}}\left[{\mathbf{y}}_{1},\cdots,{\mathbf{y}}_{m}\right]

13: Column subspace

\widehat{{\mathbf{U}}}

is obtained by the dominant

L

left singular vectors of

{\mathbf{Y}}_{S}

14: Design

\widehat{{\mathbf{W}}}

based on

\widehat{{\mathbf{U}}}

by solving (18).

15: Output: Column subspace estimation

\widehat{{\mathbf{W}}}

We have the following proposition for the accuracy of the column subspace estimation in Algorithm 1.

Proposition 1

If the Euclidean distance $\|\widehat{{\mathbf{W}}}-\widehat{{\mathbf{U}}}\|_{F}\leq\delta_{1}$ in (18), then the accuracy of the estimated column subspace matrix $\widehat{{\mathbf{W}}}$ obtained from Algorithm 1 is lower bounded as

\displaystyle\sqrt{\eta_{c}(\widehat{{\mathbf{W}}})}\geq\sigma_{L}(\widehat{{\mathbf{U}}}^{H}{\mathbf{U}})-\delta_{1},

(19)

where ${\mathbf{U}}\in{\mathbb{C}}^{N_{r}\times L}$ is the matrix composed of $L$ dominant left singular vectors of ${\mathbf{H}}$ . In particular, if $\delta_{1}\rightarrow 0$ , we have

	$\displaystyle\mathbb{E}\left[\eta_{c}(\widehat{{\mathbf{W}}})\right]\!\!\!\!$	$\displaystyle\geq$	$\displaystyle\!\!\!\!\sigma_{L}^{2}(\widehat{{\mathbf{U}}}^{H}{\mathbf{U}})$		(20)
		$\displaystyle\geq$	$\displaystyle\!\!\!\!\left(1-\frac{2N_{r}(\sigma^{2}\sigma_{L}^{2}({\mathbf{H}}_{S})+m\sigma^{4})}{\sigma_{L}^{4}({\mathbf{H}}_{S})}\right)_{+},$		(20)

where the $\sigma_{L}({\mathbf{H}}_{S})$ is the $L$ th largest singular value of ${\mathbf{H}}_{S}$ .

Proof:

See Appendix C. ∎∎

From (20), the larger the value of $m$ is, the more accurate the column subspace estimation. Thus, when more columns are used for the column subspace estimation, the estimated column subspace will be more reliable. In particular, when the noise level is low such that $\sigma_{L}^{2}({\mathbf{H}}_{S})\!\!\gg\!m\sigma^{2}$ in (20), we have

\displaystyle\mathbb{E}\left[\eta_{c}(\widehat{{\mathbf{W}}})\right]\geq\left(1-\frac{2N_{r}\sigma^{2}}{\sigma_{L}^{2}({\mathbf{H}}_{S})}\right)_{+}.

It means that the column subspace estimation accuracy is linearly related to the value of $\sigma^{2}\!\!/\!\sigma_{L}^{2}({\mathbf{H}}_{S})$ , i.e., ${\mathcal{O}}(\text{SNR})$ . On the other hand, when the noise level is high such that $\sigma_{L}^{2}({\mathbf{H}}_{S})\!\ll\!m\sigma^{2}$ , the bound in (20) can be written as

\displaystyle\mathbb{E}\left[\eta_{c}(\widehat{{\mathbf{W}}})\right]\geq\left(1-\frac{2N_{r}m\sigma^{4}}{\sigma_{L}^{4}({\mathbf{H}}_{S})}\right)_{+}.

At low SNR, the column subspace estimation accuracy is quadratically related to $\sigma^{4}/\sigma_{L}^{4}({\mathbf{H}}_{S})$ , i.e., ${\mathcal{O}}(\text{SNR}^{2})$ .

Remark 2

When the number of paths, $L$ , increases, the value of $\sigma_{L}({\mathbf{H}}_{S})$ in (20) will decrease, which can be interpreted as follows. When $m,N_{r}\!\rightarrow\!\infty$ , the entries in ${\mathbf{H}}_{S}\!\in\!{\mathbb{C}}^{N_{r}\times m}$ can be generally approximated as standard Gaussian r.v.s [26]. Moreover, it has been shown in [27, 28] that the $L$ th largest singular value of $\sigma_{L}({\mathbf{H}}_{S})\!\propto\!\frac{N_{r}+1-L}{\sqrt{N_{r}}}$ with high probability. As a result, the accuracy of column subspace estimation will be decreased as $L$ increases due to (20) of Proposition 1.

III-B Estimate the Row Subspace

In this subsection, we present how to learn the row subspace by leveraging the estimated column subspace matrix $\widehat{{\mathbf{W}}}$ . Because we have already sampled the first $m$ columns of ${\mathbf{H}}$ in the first stage, we only need to sample the remaining $N_{t}-m$ columns to estimate the row subspace as shown in Fig. 2.

At the $k$ th channel use of the second stage, we observe the $(m+k)$ th column of ${\mathbf{H}}$ , $k=1,\ldots,(N_{t}-m)$ . To achieve this, we employ the transmit sounder as

\displaystyle{\mathbf{f}}_{(k)}={\mathbf{e}}_{m+k}.

(21)

For the receive sounder, given the estimated column subspace matrix $\widehat{{\mathbf{W}}}$ in the first stage, we just let the receive sounder of the second stage be $\widehat{{\mathbf{W}}}\in{\mathbb{C}}^{N_{r}\times L}$ .³³3It is worth noting that because the estimated column subspace of the first stage is $\widehat{{\mathbf{W}}}\in{\mathbb{C}}^{N_{r}\times L}$ , thus the dimension for receive sounder of second stage is ${N_{r}\times L}$ rather than ${N_{r}\times M_{RF}}$ in (1). It is worth noting $\widehat{{\mathbf{W}}}$ is trivially applicable for hybrid precoding architecture since $\widehat{{\mathbf{W}}}$ is obtained from (18). Therefore, under the transmit sounder ${\mathbf{f}}_{(k)}$ in (21) and receive sounder $\widehat{{\mathbf{W}}}$ in (18), the observation ${\mathbf{y}}_{(k)}\in{\mathbb{C}}^{L\times 1}$ at the receiver can be given by

	$\displaystyle{\mathbf{y}}_{(k)}$	$\displaystyle=$	$\displaystyle\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}{\mathbf{f}}_{(k)}+\widehat{{\mathbf{W}}}^{H}{\mathbf{n}}_{(k)}$		(22)
		$\displaystyle=$	$\displaystyle\widehat{{\mathbf{W}}}^{H}[{\mathbf{H}}]_{:,m+k}+\widehat{{\mathbf{W}}}^{H}{\mathbf{n}}_{(k)},$		(22)

where ${\mathbf{n}}_{(k)}\in{\mathbb{C}}^{N_{r}\times 1}$ is the noise vector with ${\mathbf{n}}_{(k)}\sim{\mathcal{C}}{\mathcal{N}}(\bm{0}_{N_{r}},\sigma^{2}{\mathbf{I}}_{N_{r}})$ . Then, the observations $k=1,\ldots,(N_{t}-m)$ in (22) are packed into a matrix $\widehat{{\mathbf{Q}}}_{C}\in{\mathbb{C}}^{L\times(N_{t}-m)}$ as

	$\displaystyle\widehat{{\mathbf{Q}}}_{C}$	$\displaystyle=$	$\displaystyle[{\mathbf{y}}_{(1)},{\mathbf{y}}_{(2)},\cdots,{\mathbf{y}}_{(N_{t}-m)}]$		(23)
		$\displaystyle=$	$\displaystyle\widehat{{\mathbf{W}}}^{H}({\mathbf{H}}_{C}+{\mathbf{N}}_{C}),$		(23)

where ${\mathbf{H}}_{C}=\left[[{\mathbf{H}}]_{:,m+1},\ldots,[{\mathbf{H}}]_{:,N_{t}}\right]\in{\mathbb{C}}^{N_{r}\times(N_{t}-m)}$ , and ${\mathbf{N}}_{C}=[{\mathbf{n}}_{(1)},\ldots,{\mathbf{n}}_{(N_{t}-m)}]\in{\mathbb{C}}^{N_{r}\times(N_{t}-m)}$ .

Algorithm 2 Row subspace estimation

1: Input: channel dimension:

N_{r}

N_{t}

; channel paths:

L

; estimated column subspace:

\widehat{{\mathbf{W}}}

; observations of first stage:

{\mathbf{Y}}_{S}

; parameter:

m

2: Set the receive training sounder as

\widehat{{\mathbf{W}}}

3: for

k=

1 to

(N_{t}-m)

4: Set the transmit training sounder as

{\mathbf{f}}_{(k)}={\mathbf{e}}_{m+k}

5: Obtain the received signal:

{\mathbf{y}}_{(k)}=\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}{\mathbf{f}}_{(k)}+\widehat{{\mathbf{W}}}^{H}{\mathbf{n}}_{(k)}

7: end for

8: Stack all the observations and (24):

\widehat{{\mathbf{Q}}}_{C}=[{\mathbf{y}}_{(1)},{\mathbf{y}}_{(2)},\cdots,{\mathbf{y}}_{(N_{t}-m)}]

10: Calculate

\widehat{{\mathbf{Q}}}

\widehat{{\mathbf{Q}}}=\left[\widehat{{\mathbf{W}}}^{H}{\mathbf{Y}}_{S},\widehat{{\mathbf{Q}}}_{C}\right]

11: Row subspace matrix

\widehat{{\mathbf{V}}}

is obtained by the dominant

L

right singular vectors of

\widehat{{\mathbf{Q}}}

12: Design

\widehat{{\mathbf{F}}}

based on

\widehat{{\mathbf{V}}}

by solving (26).

13: Output: row subspace estimation

\widehat{{\mathbf{F}}}

In addition, given the receive sounder $\widehat{{\mathbf{W}}}$ and observations ${\mathbf{Y}}_{S}$ of the first stage in (16), we define $\widehat{{\mathbf{Q}}}_{S}\in{\mathbb{C}}^{L\times m}$ as,

\displaystyle\widehat{{\mathbf{Q}}}_{S}=\widehat{{\mathbf{W}}}^{H}{\mathbf{Y}}_{S}=\widehat{{\mathbf{W}}}^{H}({\mathbf{H}}_{S}+{\mathbf{N}}_{S}).

(24)

Combining (24) and (23) yields $\widehat{{\mathbf{Q}}}\in{\mathbb{C}}^{L\times N_{t}}$ expressed as,

$\displaystyle\widehat{{\mathbf{Q}}}$	$\displaystyle=$	$\displaystyle\left[\widehat{{\mathbf{Q}}}_{S},\widehat{{\mathbf{Q}}}_{C}\right]$	(25)
	$\displaystyle=$	$\displaystyle\left[\widehat{{\mathbf{W}}}^{H}({\mathbf{H}}_{S}+{\mathbf{N}}_{S}),\widehat{{\mathbf{W}}}^{H}({\mathbf{H}}_{C}+{\mathbf{N}}_{C})\right]$
	$\displaystyle=$	$\displaystyle\underbrace{\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}}_{\triangleq\bar{{\mathbf{Q}}}}+\underbrace{\widehat{{\mathbf{W}}}^{H}{\mathbf{N}}}_{\triangleq\bar{{\mathbf{N}}}},$

where ${\mathbf{N}}=[{\mathbf{N}}_{S},{\mathbf{N}}_{C}]\!\!\in{\mathbb{C}}^{N_{r}\times N_{t}}$ , ${\mathbf{H}}=[{\mathbf{H}}_{S},{\mathbf{H}}_{C}]\in{\mathbb{C}}^{N_{r}\times N_{t}}$ , $\bar{{\mathbf{Q}}}=\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}\in{\mathbb{C}}^{L\times N_{t}}$ , and $\bar{{\mathbf{N}}}=\widehat{{\mathbf{W}}}^{H}{\mathbf{N}}\in{\mathbb{C}}^{N_{r}\times N_{t}}$ . Meanwhile, since $\widehat{{\mathbf{W}}}$ is semi-unitary and the entries in ${{\mathbf{N}}}$ are i.i.d. with distribution ${\mathcal{C}}{\mathcal{N}}(0,\sigma^{2})$ , according to Lemma 2, the entries in $\bar{{\mathbf{N}}}$ are also i.i.d. with distribution ${\mathcal{C}}{\mathcal{N}}(0,\sigma^{2})$ .

Now, given the expression $\widehat{{\mathbf{Q}}}$ in (25), the row subspace estimation problem is formulated as,

\displaystyle\widehat{{\mathbf{V}}}=\mathop{\mathrm{argmax}}\limits_{{\mathbf{V}}\in{\mathbb{C}}^{N_{t}\times L}}\|\widehat{{\mathbf{Q}}}{\mathbf{V}}\|_{F}^{2}~{}\text{subject to}~{}{\mathbf{V}}^{H}{\mathbf{V}}={\mathbf{I}}_{L},

where the estimated row subspace matrix $\widehat{{\mathbf{V}}}\in{\mathbb{C}}^{N_{t}\times L}$ is obtained as the dominant $L$ right singular vectors of $\widehat{{\mathbf{Q}}}$ . Similarly, in order to design the precoder $\widehat{{\mathbf{F}}}$ in (4) for data transmission, we need to approximate the estimated row subspace matrix $\widehat{{\mathbf{V}}}$ under the hybrid precoding architecture. Specifically, we design the analog precoder $\widehat{{\mathbf{F}}}_{A}\in{\mathbb{C}}^{N_{t}\times N_{RF}}$ and digital precoder $\widehat{{\mathbf{F}}}_{D}\in{\mathbb{C}}^{N_{RF}\times L}$ by solving the following problem

	$\displaystyle\left(\widehat{{\mathbf{F}}}_{A},\widehat{{\mathbf{F}}}_{D}\right)=\mathop{\mathrm{argmin}}_{{\mathbf{F}}_{A},{\mathbf{F}}_{D}}\\|\widehat{{\mathbf{V}}}-{\mathbf{F}}_{A}{\mathbf{F}}_{D}\\|_{F},$
	$\displaystyle\text{subject to~{}~{}}\left\|[{\mathbf{F}}_{A}]_{i,j}\right\|=\frac{1}{\sqrt{N_{t}}}.$		(26)

Therefore, the transmit precoder is given by $\widehat{{\mathbf{F}}}=\widehat{{\mathbf{F}}}_{A}\widehat{{\mathbf{F}}}_{D}\in{\mathbb{C}}^{N_{t}\times L}$ with $\widehat{{\mathbf{F}}}^{H}\widehat{{\mathbf{F}}}={\mathbf{I}}_{L}$ . Similarly, the method on solving (26) in [5] can guarantee $\text{col}(\widehat{{\mathbf{F}}})\approx\text{col}(\widehat{{\mathbf{V}}})$ . The details of our row subspace estimation algorithm are shown in Algorithm 2. We have the following proposition about the estimated row subspace accuracy for Algorithm 2.

Proposition 2

If the Euclidean distance $\|\widehat{{\mathbf{F}}}-\widehat{{\mathbf{V}}}\|_{F}\leq\delta_{2}$ in (26), then the accuracy of the estimated row subspace matrix $\widehat{{\mathbf{F}}}$ obtained from Algorithm 2 is lower bounded as

\displaystyle\sqrt{\eta_{r}(\widehat{{\mathbf{F}}})}\geq\sigma_{L}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}})-\delta_{2},

(27)

where ${\mathbf{V}}\in{\mathbb{C}}^{N_{t}\times L}$ is the matrix composed of the $L$ dominant right singular vectors of ${\mathbf{H}}$ . In particular, if $\delta_{2}\rightarrow 0$ , we have

	$\displaystyle\mathbb{E}\left[\eta_{r}(\widehat{{\mathbf{F}}})\right]\!\!\!\!$	$\displaystyle\geq$	$\displaystyle\!\!\!\!\sigma_{L}^{2}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}})$		(28)
		$\displaystyle\geq$	$\displaystyle\!\!\!\!\left(1-\frac{2N_{t}(\sigma^{2}\sigma_{L}^{2}(\bar{{\mathbf{Q}}})+L\sigma^{4})}{\sigma_{L}^{4}(\bar{{\mathbf{Q}}})}\right)_{+},$		(28)

where $\sigma_{L}(\bar{{\mathbf{Q}}})$ is the $L$ th largest singular value of $\bar{{\mathbf{Q}}}$ in (25).

Proof:

See Appendix D. ∎∎

Similar as the column subspace estimation, the row subspace accuracy linearly increases with the SNR, i.e., $\mathcal{O}(\text{SNR})$ at high SNR, and quadratically increases with SNR, i.e., ${\mathcal{O}}(\text{SNR}^{2})$ , at low SNR. Also, the accuracy of row subspace estimation decreases with the number of paths, $L$ . As the value of $\sigma_{L}(\bar{{\mathbf{Q}}})$ in (28) grows, we can have a more accurate row subspace estimation. Moreover, considering $\bar{{\mathbf{Q}}}=\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}$ , it is intuitive that the estimated column subspace matrix $\widehat{{\mathbf{W}}}$ will affect the value of $\sigma_{L}(\bar{{\mathbf{Q}}})$ , and then affect the accuracy of row subspace estimation. Specifically, when $\mathop{\mathrm{col}}(\widehat{{\mathbf{W}}})=\mathop{\mathrm{col}}({{\mathbf{U}}})$ , we will have $\sigma_{L}(\bar{\mathbf{Q}})=\sigma_{L}({\mathbf{H}})$ , which attains the maximum. In the following, we further discuss the relationship between $\sigma_{L}(\bar{{\mathbf{Q}}})$ and $\sigma_{L}({\mathbf{H}})$ .

With the SVD of ${\mathbf{H}}$ , i.e., ${\mathbf{H}}={\mathbf{U}}\mathbf{\Sigma}{\mathbf{V}}^{H}$ , we have $\bar{{\mathbf{Q}}}=\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}=\widehat{{\mathbf{W}}}^{H}{\mathbf{U}}\mathbf{\Sigma}{\mathbf{V}}^{H}$ . Then, the following relationship is true due to the singular value product inequality,

	$\displaystyle\sigma_{L}(\bar{{\mathbf{Q}}})$	$\displaystyle\geq$	$\displaystyle\sigma_{L}(\widehat{{\mathbf{W}}}^{H}{\mathbf{U}})\sigma_{L}(\mathbf{\Sigma}{\mathbf{V}}^{H})$		(29)
		$\displaystyle=$	$\displaystyle\sigma_{L}(\widehat{{\mathbf{W}}}^{H}{\mathbf{U}})\sigma_{L}({\mathbf{H}}).$		(29)

Therefore, $\sigma_{L}(\bar{{\mathbf{Q}}})$ is lower bounded by the product of the $L$ th largest singular values of $\widehat{{\mathbf{W}}}^{H}{\mathbf{U}}$ and ${\mathbf{H}}$ . When the estimation of the column subspace becomes accurate, the $\sigma_{L}(\widehat{{\mathbf{W}}}^{H}{\mathbf{U}})$ will approach to one. As a result, the value of $\sigma_{L}(\bar{{\mathbf{Q}}})$ is approximately equal to $\sigma_{L}({\mathbf{H}})$ , resulting in a further enhanced row subspace estimation. The inequality in (29) reveals that the column subspace estimation affects the accuracy of the row subspace estimation.

Given the estimated column subspace $\widehat{{\mathbf{W}}}$ in Algorithm 1 and row subspace $\widehat{{\mathbf{F}}}$ in Algorithm 2, the following lemma shows the subspace estimation accuracy of the proposed SASE, i.e., $\eta(\widehat{{\mathbf{W}}},\widehat{{\mathbf{F}}})$ defined in (7).

Lemma 3

If we assume $\delta_{1}\rightarrow 0$ and $\delta_{2}\rightarrow 0$ in (18) and (26), the subspace estimation accuracy defined in (7) associated with $\widehat{{\mathbf{W}}}$ and $\widehat{{\mathbf{F}}}$ is lower bounded as

\displaystyle\eta(\widehat{{\mathbf{W}}},\widehat{{\mathbf{F}}})\geq\sigma_{L}^{2}(\widehat{{\mathbf{U}}}^{H}{\mathbf{U}})\sigma_{L}^{2}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}}).

(30)

Proof:

Using the definition of $\eta(\widehat{{\mathbf{W}}},\widehat{{\mathbf{V}}})$ in (7), we have the following expressions,

$\displaystyle\eta(\widehat{{\mathbf{W}}},\widehat{{\mathbf{F}}})$	$\displaystyle=$	$\displaystyle{\\|\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}\widehat{{\mathbf{F}}}\\|_{F}^{2}}/{\mathop{\mathrm{tr}}({\mathbf{H}}^{H}{\mathbf{H}})}$
	$\displaystyle\overset{(a)}{=}$	$\displaystyle{\\|\widehat{{\mathbf{U}}}^{H}{\mathbf{H}}\widehat{{\mathbf{V}}}\\|_{F}^{2}}/{\mathop{\mathrm{tr}}({\mathbf{H}}^{H}{\mathbf{H}})}$
	$\displaystyle=$	$\displaystyle{\\|\widehat{{\mathbf{U}}}^{H}{\mathbf{U}}\mathbf{\Sigma}{\mathbf{V}}^{H}\widehat{{\mathbf{V}}}\\|_{F}^{2}}/{\mathop{\mathrm{tr}}({\mathbf{H}}^{H}{\mathbf{H}})}$
	$\displaystyle\overset{(b)}{\geq}$	$\displaystyle\sigma_{L}^{2}(\widehat{{\mathbf{U}}}^{H}{\mathbf{U}})\sigma_{L}^{2}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}}),$

where the equality $(a)$ holds for $\delta_{1}\rightarrow 0$ and $\delta_{2}\rightarrow 0$ , and the inequality $(b)$ holds based on the singular value product inequality. ∎∎

Lemma 3 tells that the power captured by $\widehat{{\mathbf{W}}}$ and $\widehat{{\mathbf{F}}}$ is lower bounded by the product of $\sigma_{L}^{2}(\widehat{{\mathbf{U}}}^{H}{\mathbf{U}})$ and $\sigma_{L}^{2}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}})$ . These two parts denotes the two stages in the proposed SASE, which are column subspace estimation and row subspace estimation, respectively. Ideally, when $\text{col}(\widehat{{\mathbf{U}}})=\text{col}({{\mathbf{U}}})$ and $\text{col}(\widehat{{\mathbf{V}}})=\text{col}({{\mathbf{V}}})$ , we have $\eta(\widehat{{\mathbf{W}}},\widehat{{\mathbf{F}}})=1$ . Nevertheless, the proposed SASE can still achieve nearly optimal $\eta(\widehat{{\mathbf{W}}},\widehat{{\mathbf{F}}})$ . This is because $\sigma_{L}^{2}(\widehat{{\mathbf{U}}}^{H}{\mathbf{U}})$ and $\sigma_{L}^{2}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}})$ are close to one according to the bounds provided in (20) and (28), respectively.

III-C Channel Estimation Based on the Estimated Subspaces

In this subsection, we introduce a channel estimation method based on the estimated column subspace $\widehat{{\mathbf{W}}}\in{\mathbb{C}}^{N_{r}\times L}$ and row subspace $\widehat{{\mathbf{F}}}\in{\mathbb{C}}^{N_{t}\times L}$ . Let the channel estimate be expressed as

\displaystyle\widehat{{\mathbf{H}}}=\widehat{{\mathbf{W}}}\widehat{{\mathbf{R}}}\widehat{{\mathbf{F}}}^{H},

(31)

where $\widehat{{\mathbf{R}}}\in{\mathbb{C}}^{L\times L}$ . Now, given $\widehat{{\mathbf{W}}}$ and $\widehat{{\mathbf{F}}}$ , it only needs to obtain $\widehat{{\mathbf{R}}}$ in an optimal manner.

Recalling the column subspace estimation in Section III-A and row subspace estimation in Section III-B, the corresponding received signals are expressed as

	$\displaystyle{\mathbf{Y}}_{S}$	$\displaystyle=$	$\displaystyle{\mathbf{H}}_{S}+{\mathbf{N}}_{S}$
	$\displaystyle\widehat{{\mathbf{Q}}}_{C}$	$\displaystyle=$	$\displaystyle\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}_{C}+\widehat{{\mathbf{W}}}^{H}{\mathbf{N}}_{C}.$

It is worth noting that the entries in ${\mathbf{N}}_{S}$ and $\widehat{{\mathbf{W}}}^{H}{\mathbf{N}}_{C}$ are both i.i.d with distribution ${\mathcal{C}}{\mathcal{N}}(0,\!\sigma^{2})$ . Based on the expression of $\widehat{{\mathbf{H}}}$ in (31), the maximum likelihood estimation of $\widehat{{\mathbf{R}}}$ in (31) can be obtained through the following least squares problem,

	$\displaystyle\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\min\limits_{{\mathbf{R}}\in{\mathbb{C}}^{L\times L}}\\|{\mathbf{Y}}_{S}-\widehat{{\mathbf{H}}}_{S}\\|_{F}^{2}+\\|\widehat{{\mathbf{Q}}}_{C}-\widehat{{\mathbf{W}}}^{H}\widehat{{\mathbf{H}}}_{C}\\|_{F}^{2}$
	$\displaystyle\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\text{subject to}~{}\widehat{{\mathbf{H}}}_{S}\!\!=\!\![\widehat{{\mathbf{W}}}{\mathbf{R}}\widehat{{\mathbf{F}}}^{H}]_{:,1:m},~{}\!\!\widehat{{\mathbf{H}}}_{C}\!\!=\!\![\widehat{{\mathbf{W}}}{\mathbf{R}}\widehat{{\mathbf{F}}}^{H}]_{:,m+1:N_{t}}\!.$		(32)

Before discussing how to solve the problem in (32), for convenience, we define

$\displaystyle{\mathbf{r}}$	$\displaystyle=$	$\displaystyle\mathop{\mathrm{vec}}({\mathbf{R}})\in{\mathbb{C}}^{L^{2}\times 1},$
$\displaystyle{\mathbf{y}}_{S}$	$\displaystyle=$	$\displaystyle\mathop{\mathrm{vec}}({\mathbf{Y}}_{S})\in{\mathbb{C}}^{mN_{r}\times 1},$
$\displaystyle\widehat{{\mathbf{q}}}_{C}$	$\displaystyle=$	$\displaystyle\mathop{\mathrm{vec}}(\widehat{{\mathbf{Q}}}_{C})\in{\mathbb{C}}^{(N_{t}-m)L\times 1},$
$\displaystyle{\mathbf{A}}_{1}$	$\displaystyle=$	$\displaystyle([\widehat{{\mathbf{F}}}]_{:,1:m}^{H})^{T}\otimes\widehat{{\mathbf{W}}}\in{\mathbb{C}}^{mN_{r}\times L^{2}},$
$\displaystyle{\mathbf{A}}_{2}$	$\displaystyle=$	$\displaystyle([\widehat{{\mathbf{F}}}]_{:,m+1:N_{t}}^{H})^{T}\otimes{\mathbf{I}}_{L}\in{\mathbb{C}}^{(N_{t}-m)L\times L^{2}}.$

Using the definitions above, the minimization problem in (32) can be rewritten as

\displaystyle\min\limits_{{\mathbf{r}}\in{\mathbb{C}}^{L^{2}\times 1}}\left\|{\mathbf{y}}_{S}\!-\!{\mathbf{A}}_{1}{\mathbf{r}}\right\|_{2}^{2}\!+\!\left\|\widehat{{\mathbf{q}}}_{C}\!-\!{\mathbf{A}}_{2}{\mathbf{r}}\right\|_{2}^{2}.

(33)

The following lemma provides the solution of problem (33).

Lemma 4

Given the problem below

\displaystyle\min\limits_{{\mathbf{r}}\in{\mathbb{C}}^{L^{2}\times 1}}\left\|{\mathbf{y}}_{S}-{\mathbf{A}}_{1}{\mathbf{r}}\right\|_{2}^{2}\!+\!\left\|\widehat{{\mathbf{q}}}_{C}\!-\!{\mathbf{A}}_{2}{\mathbf{r}}\right\|_{2}^{2},

the optimal solution is given by

\displaystyle\widehat{{\mathbf{r}}}=({\mathbf{A}}_{1}^{H}{\mathbf{A}}_{1}+{\mathbf{A}}_{2}^{H}{\mathbf{A}}_{2})^{-1}({\mathbf{A}}_{1}^{H}{\mathbf{y}}_{S}+{\mathbf{A}}_{2}^{H}\widehat{{\mathbf{q}}}_{C}).

(34)

Proof:

The problem is convex with respect to ${\mathbf{r}}$ . Thus, the optimal solution can be obtained by setting the first order derivative of the objective function to zero as

\displaystyle{\mathbf{A}}_{1}^{H}({\mathbf{A}}_{1}{\mathbf{r}}-{\mathbf{y}}_{S})+{\mathbf{A}}_{2}^{H}({\mathbf{A}}_{2}{\mathbf{r}}-\widehat{{\mathbf{q}}}_{C})=\mathbf{0}.

(35)

The solution of (35) is exactly the result in (34), which concludes the proof. ∎∎

It is worth noting that after we have obtained the column and row subspace estimates, i.e., $\widehat{{\mathbf{W}}}$ and $\widehat{{\mathbf{F}}}$ , the channel estimation is simply to compute $\widehat{{\mathbf{r}}}=\mathop{\mathrm{vec}}(\widehat{{\mathbf{R}}})$ in (34). Since the dimension of $\widehat{{\mathbf{R}}}$ is much lower than that of ${\mathbf{H}}$ , the channel estimation complexity is substantially reduced as shown in Lemma 4.

IV Discussion of Algorithm

In this section, we analyze the complexity of the proposed SASE method in terms of the channel use overhead and computational complexity. Moreover, we discuss the application of the SASE in other channel scenarios.

IV-A Channel Use Overhead

TABLE I: Channel Uses of Algorithms

Algorithms	Number of Channel Uses
SASE	${mN_{r}}/{M_{RF}}+(N_{t}-m)$
MF [11]	$\mathcal{O}(L(N_{r}+N_{t})/{M_{RF}})$
SD [10]	$\mathcal{O}(L(N_{r}+N_{t})/{M_{RF}})$
Arnoldi [16]	$2qN_{r}/{M_{RF}}+2qN_{t}/{N_{RF}}$
OMP [8]	$\mathcal{O}(L\ln(G^{2})/{M_{RF}})$
SBL [9]	$\mathcal{O}(L\ln(G^{2})/{M_{RF}})$
ACE [17]	$s^{2}L^{3}\mathrm{log}_{s}(N_{m}/L)/{M_{RF}}$

Considering the channel uses in each stage, the total number of channel uses for the SASE is given by

\displaystyle K_{\text{SASE}}

\displaystyle=

\displaystyle{mN_{r}}/{M_{RF}}+(N_{t}-m).

(36)

Therefore, the number of channel uses grows linearly with the channel dimension, i.e., $\mathcal{O}(N_{t})$ . In particular, when we let $m=L$ , the number of channel uses in (36) will be ${LN_{r}}/{M_{RF}}+(N_{t}-L)$ . Considering that each channel use contributes to $M_{RF}$ observations in the first stage, and $L$ observations in the second stage, the total number of the observations is $LN_{r}+L(N_{t}-L)$ , which is equivalent to the degrees of freedom of $\mathop{\mathrm{rank}}$ - $L$ matrix ${\mathbf{H}}\in{\mathbb{C}}^{N_{r}\times N_{t}}$ [29].

The numbers of channel uses of the proposed SASE and other benchmarks [8, 9, 17, 10, 11, 16] are compared in Table I. For the angle estimation methods in [8, 9, 17], the number of required channel uses for the OMP [8] and SBL [9] is $K_{\text{OMP}}=K_{\text{SBL}}=\mathcal{O}(L\ln(G^{2})/{M_{RF}})$ , where $G$ is the number of grids with $G\geq\max\{N_{r},N_{t}\}$ . The number of channel uses for adaptive channel estimation (ACE) [17] is $K_{\text{ACE}}=s^{2}L^{3}\mathrm{log}_{s}(N_{m}/L)/{M_{RF}}$ , where $2\pi/N_{m}$ with $N_{m}\geq\max\{N_{r},N_{t}\}$ is the desired angle resolution for the ACE, and $s$ is the number of beamforming vectors in each stage of the ACE. For the subspace estimation methods in [10, 11, 16], the numbers of required channel uses for subspace decomposition (SD) [10] and matrix factorization (MF) [11] are $K_{\text{SD}}=K_{\text{MF}}=\mathcal{O}(L(N_{r}+N_{t})/{M_{RF}})$ , while it requires $K_{\text{Arnoldi}}=2qN_{r}/{M_{RF}}+2qN_{t}/{N_{RF}}$ channel uses where $q\geq L$ for Arnoldi approach [16]. Because the number of estimated parameters of the angle estimation methods such as OMP, SBL, and ACE, is less than that of the proposed SASE, they require slightly fewer channel uses than SASE. Nevertheless, the proposed SASE consumes fewer channel uses than those of the existing subspace estimation methods [10, 11, 16] as shown in Table I.

IV-B Computational Complexity

For the proposed SASE, the computational complexity of the first stage comes from the SVD of ${{\mathbf{Y}}}_{S}$ , which is $\mathcal{O}(m^{2}N_{r})$ [28]. The complexity of the second stage is dominated by the design of $\widehat{{\mathbf{W}}}$ in (18), which is $\mathcal{O}(LDN_{r})$ , where $D\geq N_{r}$ denotes the cardinality of an over-complete dictionary. Hence, the overall complexity of the proposed SASE algorithm is $\mathcal{O}(m^{2}N_{r}+LDN_{r})=\mathcal{O}(LDN_{r})$ . The computational complexities of benchmarks, i.e., the angle estimation methods OMP [8], SBL [9], and ACE [17] along with the subspace estimation methods Arnoldi [16], SD [10], and MF [11] are compared in Table II, where $K$ denotes the number of channel uses. For a fair comparison, when comparing the computational complexity, we assume the number of channel uses, $K$ , is equal among the benchmarks. As we can see from Table II, the proposed SASE has the lowest computational complexity.

TABLE II: Computational Complexity of Algorithms

Algorithms	Computational Complexity
SASE	$\mathcal{O}(LDN_{r})$
MF [11]	$\mathcal{O}(KM_{RF}L^{2}(N_{r}^{2}+N_{t}^{2}))$
SD [10]	$\mathcal{O}(KM_{RF}L^{2}(N_{r}^{2}+N_{t}^{2}))$
Arnoldi [16]	$\mathcal{O}(K^{2}M_{RF}^{2}/(N_{r}+N_{t}))$
OMP [8]	$\mathcal{O}(LKM_{RF}G^{2})$
SBL [9]	$\mathcal{O}(G^{6})$
ACE [17]	$\mathcal{O}(KM_{RF}^{2}DN_{r}/(sL)+KN_{RF}^{2}DN_{t}/(sL))$

IV-C Extension of SASE

In this subsection, we extend the proposed SASE to the 2D mmWave channel model with UPAs. There are $N_{cl}$ clusters, and each of cluster is composed of $N_{ray}$ rays. For this model, the mmWave channel matrix is expressed as [30, 31, 5]

\displaystyle{\mathbf{H}}=\sqrt{\frac{N_{r}N_{t}}{N_{cl}N_{ray}}}\sum_{i=1}^{N_{cl}}\sum_{j=1}^{N_{ray}}h_{ij}{\mathbf{a}}_{r}(\phi^{r}_{ij},\theta^{r}_{ij}){\mathbf{a}}_{t}^{H}(\phi^{t}_{ij},\theta^{t}_{ij}),

(37)

where $h_{ij}$ represents the complex gain associated with the $j$ th path of the $i$ th cluster. The ${\mathbf{a}}_{r}(\phi^{r}_{ij},\theta^{r}_{ij})\in{\mathbb{C}}^{N_{r}\times 1}$ and ${\mathbf{a}}_{t}(\phi^{t}_{ij},\theta^{t}_{ij})\in{\mathbb{C}}^{N_{t}\times 1}$ are the receive and transmit array response vectors, where $\phi^{r}_{ij}(\phi^{t}_{ij})$ and $\theta^{r}_{ij}(\theta^{t}_{ij})$ denote the azimuth and elevation angles of the receiver (transmitter). Specifically, the ${\mathbf{a}}_{r}(\phi^{r}_{ij},\theta^{r}_{ij})$ and ${\mathbf{a}}_{t}(\phi^{t}_{ij},\theta^{t}_{ij})$ are expressed as

	$\displaystyle{\mathbf{a}}_{r}(\phi^{r}_{ij},\theta^{r}_{ij})=\frac{1}{\sqrt{N_{r}}}[1,\cdots,e^{j\frac{2\pi}{\lambda}d(m_{r}\sin\phi^{r}_{ij}\sin\theta^{r}_{ij}+n_{r}\cos\theta^{r}_{ij})},$
	$\displaystyle\cdots,e^{j\frac{2\pi}{\lambda}d((\sqrt{N_{r}}-1)\sin\phi^{r}_{ij}\sin\theta^{r}_{ij}+(\sqrt{N_{r}}-1)\cos\theta^{r}_{ij})}],$
	$\displaystyle{\mathbf{a}}_{t}(\phi^{t}_{ij},\theta^{t}_{ij})=\frac{1}{\sqrt{N_{t}}}[1,\cdots,e^{j\frac{2\pi}{\lambda}d(m_{t}\sin\phi^{t}_{ij}\sin\theta^{t}_{ij}+n_{t}\cos\theta^{t}_{ij})},$
	$\displaystyle\cdots,e^{j\frac{2\pi}{\lambda}d((\sqrt{N_{t}}-1)\sin\phi^{t}_{ij}\sin\theta^{t}_{ij}+(\sqrt{N_{t}}-1)\cos\theta^{t}_{ij})}],$

where $d$ and $\lambda$ are the antenna spacing and the wavelength, respectively, $0\leq m_{r},n_{r}<\sqrt{N_{r}}$ and $0\leq m_{t},n_{t}<\sqrt{N_{t}}$ are the antenna indices in the 2D plane.

For the channel model in (37), it is worth noting that the rank of ${\mathbf{H}}$ is at most $N_{cl}N_{ray}$ . Using the similar derivations as the proof of Lemma 1, we can verify that when $m\geq N_{cl}N_{ray}$ , the sub-matrix ${\mathbf{H}}_{S}=[{\mathbf{H}}]_{:,1:m}\in{\mathbb{C}}^{N_{r}\times m}$ satisfies $\mathop{\mathrm{rank}}({\mathbf{H}}_{S})=\mathop{\mathrm{rank}}({\mathbf{H}})$ . Therefore, it is possible to sample the first $m$ columns of ${\mathbf{H}}$ in (37) to obtain column subspace information, and sample the remaining columns to obtain the row subspace information. This means that the proposed SASE can be extended directly to the channel model given in (37).

In summary, the proposed SASE has no strict limitations to be applied to other channel models if the channel matrix ${\mathbf{H}}$ experiences sparse propagation and $\text{col}({\mathbf{H}}_{S})=\text{col}({\mathbf{H}})$ . Moreover, because the proposed SASE is an open-loop framework, it can be easily extended to multiuser MIMO downlink scenarios.

V Simulation Results

In this section, we evaluate the performance of the proposed SASE algorithm by simulation.

V-A Simulation Setup

In the simulation, we consider the numbers of the receive and transmit antennas are $N_{r}=36$ , and $N_{t}=144$ , respectively, and the numbers of the RF chains at the receiver and transmitter are $M_{RF}=6$ and $N_{RF}=8$ , respectively. Without lose of generality, it is assumed that the variance of the complex gain of the $l$ th path is $\sigma_{h,l}^{2}=1,\forall l$ . We consider three subspace-based channel estimation methods as the benchmarks, i.e., SD [10] and MF [11], and Arnoldi [16], where SD and MF aim to recover the low-rank mmWave channel matrix, and Arnoldi is to estimate the dominant singular subspaces of the mmWave channel. For a fair comparison, the considered benchmarks are to estimate the subspace rather than the parameters such as the angles of the paths.

V-B Numerical Results

In order to evaluate the subspace accuracy of different methods, we compute the subspace accuracy $\eta(\widehat{{\mathbf{W}}},\widehat{{\mathbf{F}}})$ in (7), column subspace accuracy $\eta_{c}(\widehat{{\mathbf{W}}})$ in (8), and row subspace accuracy $\eta_{r}(\widehat{{\mathbf{F}}})$ in (9) for comparison. We also evaluate the normalized mean squared error (NMSE) and spectrum efficiency. The NMSE is defined as $\text{NMSE}={\mathbb{E}}[{\|{\mathbf{H}}-\widehat{{\mathbf{H}}}\|_{F}^{2}}/{\|{\mathbf{H}}\|_{F}^{2}}]$ , where $\widehat{{\mathbf{H}}}$ denotes the channel estimate. In particular, the channel estimate of the SASE is obtained by the method derived in Section III-C. The spectrum efficiency in (5) is calculated with the combiner $\widehat{{\mathbf{W}}}$ and precoder $\widehat{{\mathbf{F}}}$ , which are designed according to the precoding design techniques provided in [5] with the obtained channel estimate $\widehat{{\mathbf{H}}}$ via channel estimation.

V-B1 Equivalence of Subspace

It is worth noting that the column subspace estimation in Section III-A depends on the fact of subspace equivalence between ${\mathbf{H}}_{S}$ and ${\mathbf{H}}$ in (16). We illustrate in Fig. 3 the rank of ${\mathbf{H}}_{S}$ with different $m$ . In this simulation, we set $L=4$ and $m=\{1L,2L,\ldots,10L\}$ . It can be seen in Fig. 3 that the rank of ${\mathbf{H}}_{S}$ is equal to $L$ for all the values of $m$ , i.e., the rank of ${\mathbf{H}}_{S}$ is equal to the rank of ${\mathbf{H}}$ , for $m\geq L$ . This validates the fact that $\text{col}({\mathbf{H}}_{S})=\text{col}({\mathbf{H}})$ .

V-B2 Performance versus Signal-to-Noise Ratio

In Fig. 4 and Fig. 5, we compare the performance versus SNR of the proposed SASE algorithm to SD, MF and Arnoldi methods. The number of paths is set as $L=4$ . For a fair comparison, the numbers of channel uses for the benchmarks are kept approximately equal, i.e., $K=244$ .

In Fig. 4(a), the column subspace accuracy $\eta_{c}$ of the proposed SASE is compared with the benchmarks. As we can see, the SASE and SD methods obtain nearly similar column subspace accuracy, and they outperform over the MF and Arnoldi. It means that sampling the sub-matrix ${\mathbf{H}}_{S}$ of the channel ${\mathbf{H}}$ can provide a robust column subspace estimation. In Fig. 4(b), the row subspace accuracy $\eta_{r}$ versus SNR is plotted. We found that the proposed SASE outperforms over the others. It verifies that adapting the receiver sounders to the column subspace can surely improve the accuracy of row subspace estimation. In Fig. 4(c), the subspace accuracy $\eta$ defined in (7) of the proposed SASE is evaluated. As can be seen that the proposed SASE achieves the most accurate subspace estimation over the other methods. For the SASE, MF and SD, the nearly optimal subspace estimation, i.e., $\eta\approx 1$ , can be achieved in the high SNR region ( $10\text{dB}\sim 20\text{dB}$ ). Since the performance of the Arnoldi highly depends on the number of available channel uses, its accuracy is degraded and saturated at high SNR due to the limited channel uses ( $K=244$ ). Thus, the ideal performance of the Arnoldi relies on a large number of channel uses or enough RF chains [16].

In Fig. 5(a), the NMSE of the proposed SASE is decreased as the SNR increases. It has similar characteristis as that of the MF, but has much lower value. The NMSE of the SD is almost constant in the low SNR region and decreases in higher SNR region. Overall, the SASE outperforms the SD when $\text{SNR}\geq-15\text{dB}$ . In Fig. 5(b), the spectral efficiency of the SASE is plotted. The curve for perfect CSI with fully digital precoding is plotted for comparison. The proposed SASE achieves the nearly optimal spectrum efficiency among all the methods. It is observed that the spectrum efficiency of the SASE has a different trend from the NMSE in Fig. 5(a), while it has similar characteristic as the subspace accuracy in Fig. 4(c). The evaluation validates the effectiveness of the SASE in channel estimation to provide good spectrum efficiency.

V-B3 Performance versus Number of Channel Uses

In Fig. 6, we show the channel estimation performance of the SASE for different numbers of channel uses. The simulation setting is $L=4,\text{SNR}=5,20\text{dB}$ . The value of $m$ in (36) is in the set of $\{4,8,\cdots,48\}$ . Accordingly, the set of the numbers of channel uses is $K=\{164,184,\cdots,384\}$ .

Fig. 6(a) shows the subspace estimation performance versus the number of channel uses. As the number of channel uses increases, the subspace accuracy of all the methods is increased monotonically. It is worth noting that when $K=164$ ( $m=4$ ), the subspace accuracy of the SASE is slightly lower than that of the SD. This is because there are only $m=L=4$ columns sampled for column subspace estimation that affects the column subspace accuracy of the SASE slightly. Nevertheless, when $m\geq 8$ , i.e., $K\geq 184$ , the SASE obtains the most accurate subspace estimation, i.e., $\eta\approx 1$ , among all the methods. In particular, when the SNR is moderate, i.e., SNR $=5$ dB, the SASE clearly outperforms over the other methods. This means that the SASE requires less channel uses to provide a robust subspace estimation.

Fig. 6(b) shows the spectrum efficiency versus the number of channel uses. The curve for perfect CSI with fully digital precoding is also plotted for comparison. Again, the SASE achieves nearly optimal spectrum efficiency compared to the other methods. The performance gap between the SASE and the other methods are more noticeable at SNR $=5$ dB. In particular, as seen in the figure, when the number of channel uses, $K\geq 244$ , the performance gap between the SASE and perfect curve at SNR $=5$ dB is less than $1.5$ bits/s/Hz.

V-B4 Performance versus Number of Paths

In Fig. 7, we evaluate the estimation performance of the SASE for different numbers of paths, $L$ . The number of channel uses is $K=244$ and $\text{SNR}=5,20\text{dB}$ . Due to the limited number of channel uses, the Anorldi method can not perform the channel estimation for $L\geq 5$ . Thus, we only show the performance of the Arnoldi for $L\leq 4$ .

In Fig. 7(a), the subspace accuracy $\eta$ of different methods versus number of paths, $L$ , is illustrated. As we can see, the SASE, SD and MF achieve a more accurate subspace estimation compared to the Arnoldi. It is seen that the Arnoldi has a sharp decrease in the accuracy for $L>2$ . It means that the Arnoldi can provide a good channel estimate only for $L\leq 2$ with the use of $K=244$ channel uses. When SNR $=5$ dB, the SASE outperforms over the other methods. When the SNR is high, i.e., SNR $=20$ dB, for the proposed SASE, the subspace accuracy decreases slightly with the number of paths, $L$ , which verifies our discussion about the effect of $L$ in Remark 2 of Section III.

In Fig. 7(b), the spectrum efficiency versus number of paths, $L$ , is shown. Apart from the Arnoldi, the spectrum efficiency achieved by the SASE, MF and SD increases with the number of paths. When the SNR is high, i.e., SNR $=20$ dB, the SASE, MF and SD can achieve nearly optimal performance. When the SNR is moderate, i.e., SNR $=5$ dB, the proposed SASE achieves the highest spectrum efficiency among all the methods. Moreover, for the SASE, MF and SD, their performance gaps with the curve of perfect CSI is getting wider as $L$ increases. Nevertheless, the spectrum efficiency of the SASE is more closer than the other methods, which implies that the SASE can leverage the property of limited number of paths in mmWave channels more effectively than the other methods.

V-B5 Performance versus Inaccurate Path information

Thus far, in the previous simulations, we have assumed the number of paths, $L$ , is known a priori. In Fig. 8, we evaluate the performance of the SASE under the situation that the accurate path information is not available. As discussed in Section III-A, we utilize the upper bound of the number of paths for simplicity, where we let $L_{\text{sup}}=\{5,6\}$ while $L=4$ .⁴⁴4If the $L_{\text{sup}}$ with $L_{\text{sup}}\geq L$ is utilized for SASE, the estimated subspaces will be $\widehat{{\mathbf{W}}}\in{\mathbb{C}}^{N_{r}\times L_{\text{sup}}}$ and $\widehat{{\mathbf{F}}}\in{\mathbb{C}}^{N_{t}\times L_{\text{sup}}}$ . For a fair comparison, we choose the dominant $L$ modes in $\widehat{{\mathbf{W}}}$ and $\widehat{{\mathbf{F}}}$ when evaluating the performance. For a clear illustration, we also evaluate the performance of proposed SASE by using the lower bound of number of paths, i.e., $L_{\text{inf}}=3$ . As can be seen in Fig. 8, compared to the case of $L_{\text{inf}}=3$ , using the upper bound $L_{\text{sup}}=\{5,6\}$ for SASE achieves a similar performance as the accurate path information of $L=4$ . In particular, it is noted in Fig. 8(a) and Fig. 8(b) that the estimation performance of $L_{\text{sup}}$ is slightly worse than that of accurate path information when SNR is high, while it is marginally better when SNR is low. This is because using inaccurate path information $L_{\text{sup}}$ with $L_{\text{sup}}\geq L$ does not affect the column subspace estimation, but according to Proposition 2, it provides worse row subspace estimation at high SNR and more accurate row subspace estimation at low SNR.⁵⁵5It is worth noting that if $L_{\text{sup}}$ is utilized for SASE, according to Proposition 2, the row subspace accuracy is bounded as $\mathbb{E}[\eta_{r}(\widehat{{\mathbf{F}}})]\geq\left(1-{2N_{t}(\sigma^{2}\sigma_{L}^{2}(\bar{{\mathbf{Q}}})+L_{\text{sup}}\sigma^{4})}/{\sigma_{L}^{4}(\bar{{\mathbf{Q}}})}\right)_{+}$ . The statements can be verified easily through analyzing this row subspace accuracy bound. Nevertheless, in overall, the performance of proposed SASE is not sensitive to the inaccurate path number.

VI Conclusion

In this paper, we formulate the mmWave channel estimation as a subspace estimation problem and propose the SASE algorithm. In the SASE algorithm, the channel estimation task is divided into two stages: the first stage is to obtain the column channel subspace, and in the second stage, based on the acquired column subspace, the row subspace is estimated with optimized training signals. By estimating the column and row subspaces sequentially, the computational complexity of the proposed SASE was reduced substantially to $\mathcal{O}(LDN_{r})$ with $D\geq N_{r}$ . It was analyzed that ${\mathcal{O}}(N_{t})$ channel uses are sufficient to guarantee subspace estimation accuracy of the proposed SASE. By simulation, the proposed SASE has better subspace accuracy, lower NMSE, and higher spectrum efficiency than those of the existing subspace methods for practical SNRs.

Appendix A Proof of Lemma 1

From the mmWave channel model in (3), when the angles $\{\theta_{t,l}\}_{l=1}^{L}$ and $\{\theta_{r,l}\}_{l=1}^{L}$ are distinct,

\displaystyle\mathop{\mathrm{rank}}({\mathbf{A}}_{t})=\mathop{\mathrm{rank}}({\mathbf{A}}_{r})=L,

which holds due to the fact that ${\mathbf{A}}_{t}$ and ${\mathbf{A}}_{r}$ are both Vandermonde matrices. Then, ${\mathbf{H}}_{S}={\mathbf{H}}{\mathbf{S}}$ can be expressed as

\displaystyle{\mathbf{H}}_{S}={\mathbf{A}}_{r}\mathop{\mathrm{diag}}(\mathbf{h}){\mathbf{A}}_{t}^{H}{\mathbf{S}}.

Combining the rank inequality of matrix product $\mathop{\mathrm{rank}}({\mathbf{H}}_{S})\leq\mathop{\mathrm{rank}}({\mathbf{H}})=L$ and the following lower bound,

	$\displaystyle\mathop{\mathrm{rank}}({\mathbf{H}}_{S})$	$\displaystyle\geq$	$\displaystyle\mathop{\mathrm{rank}}({\mathbf{A}}_{r}\mathop{\mathrm{diag}}(\mathbf{h}))+\mathop{\mathrm{rank}}({\mathbf{A}}_{t}^{H}{\mathbf{S}})-L$
		$\displaystyle=$	$\displaystyle\mathop{\mathrm{rank}}({\mathbf{A}}_{t}^{H}{\mathbf{S}}),$

yields $L\geq\mathop{\mathrm{rank}}({\mathbf{H}}_{S})\geq\mathop{\mathrm{rank}}({\mathbf{A}}_{t}^{H}{\mathbf{S}})$ . Therefore, in order to show $\mathop{\mathrm{col}}({\mathbf{H}}_{S})=\mathop{\mathrm{col}}({\mathbf{H}})$ , namely, $\mathop{\mathrm{rank}}({\mathbf{H}}_{S})=L$ , it suffices to show that $\mathop{\mathrm{rank}}({\mathbf{A}}_{t}^{H}{\mathbf{S}})=L$ . Considering that ${\mathbf{A}}_{t}^{H}{\mathbf{S}}$ is a Vandermonde matrix, it has $\mathop{\mathrm{rank}}({\mathbf{A}}_{t}^{H}{\mathbf{S}})=L$ . This completes the proof. ∎

Appendix B Proof of Lemma 2

It is trivial that the entries in ${\mathbf{Y}}$ follow the identical distribution of ${\mathcal{C}}{\mathcal{N}}(0,\sigma^{2})$ . Therefore, it remains to show that all the entries in ${\mathbf{Y}}$ are independent. Because of the typical property of Gaussian distribution, it suffices to prove that they are uncorrelated. For any $i\neq j$ or $m\neq n$ , the following holds,

	$\displaystyle{\mathbb{E}}\left[[{\mathbf{Y}}]_{i,m}[{\mathbf{Y}}]_{j,n}\right]$	$\displaystyle=$	$\displaystyle{\mathbb{E}}\left[{\mathbf{A}}_{i,:}[{\mathbf{X}}]_{:,m}[{\mathbf{X}}]_{:,n}^{H}[{\mathbf{A}}]_{j,:}^{H}\right]$
		$\displaystyle=$	$\displaystyle 0.$

Therefore, the entries in ${\mathbf{Y}}$ are uncorrelated and thus independent, which concludes the proof. ∎

Appendix C Proof of Proposition 1

Based on the definition of $\eta_{c}(\widehat{{\mathbf{W}}})$ in (8), it has

$\displaystyle\sqrt{\eta_{c}(\widehat{{\mathbf{W}}})}$	$\displaystyle=$	$\displaystyle\sqrt{\frac{\mathop{\mathrm{tr}}(\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}{\mathbf{H}}^{H}\widehat{{\mathbf{W}}})}{\mathop{\mathrm{tr}}({\mathbf{H}}^{H}{\mathbf{H}})}}$	(38)
	$\displaystyle=$	$\displaystyle\frac{\\|(\widehat{{\mathbf{W}}}-\widehat{{\mathbf{U}}}+\widehat{{\mathbf{U}}})^{H}{\mathbf{H}}\\|_{F}}{\\|{\mathbf{H}}\\|_{F}}$
	$\displaystyle\overset{(a)}{\geq}$	$\displaystyle\frac{\\|\widehat{{\mathbf{U}}}^{H}{\mathbf{H}}\\|_{F}}{\\|{\mathbf{H}}\\|_{F}}-\frac{\\|(\widehat{{\mathbf{W}}}-\widehat{{\mathbf{U}}})^{H}{\mathbf{H}}\\|_{F}}{\\|{\mathbf{H}}\\|_{F}}$
	$\displaystyle=$	$\displaystyle\frac{\\|\widehat{{\mathbf{U}}}^{H}{\mathbf{U}}\mathbf{\Sigma}{\mathbf{V}}^{H}\\|_{F}}{\\|{\mathbf{H}}\\|_{F}}-\frac{\\|(\widehat{{\mathbf{W}}}-\widehat{{\mathbf{U}}})^{H}{\mathbf{H}}\\|_{F}}{\\|{\mathbf{H}}\\|_{F}}$
	$\displaystyle\overset{(b)}{\geq}$	$\displaystyle\sigma_{L}(\widehat{{\mathbf{U}}}^{H}{\mathbf{U}})-\\|\widehat{{\mathbf{W}}}-\widehat{{\mathbf{U}}}\\|_{2},$
	$\displaystyle{\geq}$	$\displaystyle\sigma_{L}(\widehat{{\mathbf{U}}}^{H}{\mathbf{U}})-\delta_{1},$

where the inequality $(a)$ holds from the triangle inequality, and the inequality $(b)$ comes from the fact that for ${\mathbf{A}}\in{\mathbb{C}}^{n\times n}$ with $\mathop{\mathrm{rank}}({\mathbf{A}})=n$ and ${\mathbf{B}}\in{\mathbb{C}}^{n\times k}$ , $\|{\mathbf{A}}{\mathbf{B}}\|_{F}^{2}\geq\sigma_{n}^{2}({\mathbf{A}})\|{\mathbf{B}}\|_{F}^{2}$ , where the latter follows by $\|{\mathbf{A}}{\mathbf{B}}\|_{F}^{2}=\sum_{i=1}^{k}\|{\mathbf{A}}[{\mathbf{B}}]_{:,i}\|_{2}^{2}\geq\sum_{i=1}^{k}\sigma_{n}^{2}({\mathbf{A}})\|[{\mathbf{B}}]_{:,i}\|_{2}^{2}=\sigma_{n}^{2}({\mathbf{A}})\|{\mathbf{B}}\|_{F}^{2}$ . Thus, this concludes the proof for the inequality in (19).

Then, by letting $\delta_{1}\rightarrow 0$ in (19), we take expectation of squares of both sides in (38), then it has the following

	$\displaystyle\mathbb{E}\left[\eta_{c}(\widehat{{\mathbf{W}}})\right]$	$\displaystyle\geq$	$\displaystyle\mathbb{E}\left[\sigma_{L}^{2}(\widehat{{\mathbf{U}}}^{H}{\mathbf{U}})\right]$		(39)
		$\displaystyle\overset{(c)}{\geq}$	$\displaystyle\!\!\left(1-\frac{2N_{r}(\sigma^{2}\sigma_{L}^{2}({\mathbf{H}}_{S})+m\sigma^{4})}{\sigma_{L}^{4}({\mathbf{H}}_{S})}\right)_{+}\!\!,$		(39)

where the inequality $(c)$ holds from Theorem 1, and this concludes the proof. ∎

Appendix D Proof of Proposition 2

Recall that the row subspace matrix $\widehat{{\mathbf{V}}}$ is given by the right singular matrix of $\widehat{{\mathbf{Q}}}=\bar{{\mathbf{Q}}}+\bar{{\mathbf{N}}}$ in (25), and the elements in $\bar{{\mathbf{N}}}$ are i.i.d. with each entry being ${\mathcal{C}}{\mathcal{N}}(0,\sigma^{2})$ according to Lemma 2. Thus, Theorem 1 is applied, which gives

\displaystyle\mathbb{E}\left[\sigma_{L}^{2}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}})\right]\geq\left(1-\frac{2N_{t}(\sigma^{2}\sigma_{L}^{2}(\bar{{\mathbf{Q}}})+L\sigma^{4})}{\sigma_{L}^{4}(\bar{{\mathbf{Q}}})}\right)_{+}.

(40)

Then, based on the subspace accuracy metric in (9), it has

$\displaystyle\sqrt{\eta_{r}(\widehat{{\mathbf{F}}})}$	$\displaystyle=$	$\displaystyle\sqrt{\frac{\mathop{\mathrm{tr}}(\widehat{{\mathbf{F}}}^{H}{\mathbf{H}}^{H}{\mathbf{H}}\widehat{{\mathbf{F}}})}{\mathop{\mathrm{tr}}({\mathbf{H}}^{H}{\mathbf{H}})}}$	(41)
	$\displaystyle=$	$\displaystyle\frac{\\|{\mathbf{H}}(\widehat{{\mathbf{F}}}-\widehat{{\mathbf{V}}}+\widehat{{\mathbf{V}}})\\|_{F}}{\\|{\mathbf{H}}\\|_{F}}$
	$\displaystyle{\geq}$	$\displaystyle\frac{\\|{\mathbf{H}}\widehat{{\mathbf{V}}}\\|_{F}}{\\|{\mathbf{H}}\\|_{F}}-\frac{\\|{\mathbf{H}}(\widehat{{\mathbf{F}}}-\widehat{{\mathbf{V}}})\\|_{F}}{\\|{\mathbf{H}}\\|_{F}}$
	$\displaystyle=$	$\displaystyle\frac{\\|{\mathbf{U}}\mathbf{\Sigma}{\mathbf{V}}^{H}\widehat{{\mathbf{V}}}\\|_{F}}{\\|{\mathbf{H}}\\|_{F}}-\frac{\\|{\mathbf{H}}(\widehat{{\mathbf{F}}}-\widehat{{\mathbf{V}}})\\|_{F}}{\\|{\mathbf{H}}\\|_{F}}$
	$\displaystyle{\geq}$	$\displaystyle\sigma_{L}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}})-\\|\widehat{{\mathbf{F}}}-\widehat{{\mathbf{V}}}\\|_{2},$
	$\displaystyle{\geq}$	$\displaystyle\sigma_{L}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}})-\delta_{2}.$

Thus, the inequality (27) is proved. Moreover, under the condition $\delta_{2}\!\rightarrow\!0$ , taking expectation of the squares of both sides in (41) yields

	$\displaystyle\mathbb{E}\left[\eta_{r}(\widehat{{\mathbf{F}}})\right]$	$\displaystyle\geq$	$\displaystyle\mathbb{E}\left[\sigma_{L}^{2}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}})\right]$
		$\displaystyle\overset{(a)}{\geq}$	$\displaystyle\left(1-\frac{2N_{t}(\sigma^{2}\sigma_{L}^{2}(\bar{{\mathbf{Q}}})+L\sigma^{4})}{\sigma_{L}^{4}(\bar{{\mathbf{Q}}})}\right)_{+},$

where the inequality $(a)$ holds from (40). This concludes the proof for the row estimation accuracy bound in (28). ∎

References

[1] T. S. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang, G. N. Wong, J. K. Schulz, M. Samimi, and F. Gutierrez, “Millimeter wave mobile communications for 5G cellular: It will work!” IEEE Access, vol. 1, pp. 335–349, 2013.
[2] R. W. Heath, N. González-Prelcic, S. Rangan, W. Roh, and A. M. Sayeed, “An overview of signal processing techniques for millimeter wave MIMO systems,” IEEE J. Sel. Top. Signal Process., vol. 10, no. 3, pp. 436–453, April 2016.
[3] E. Torkildson, U. Madhow, and M. Rodwell, “Indoor millimeter wave MIMO: Feasibility and performance,” IEEE Trans. Wireless Commun., vol. 10, no. 12, pp. 4150–4160, 2011.
[4] S. Hur, T. Kim, D. J. Love, J. V. Krogmeier, T. A. Thomas, and A. Ghosh, “Millimeter wave beamforming for wireless backhaul and access in small cell networks,” IEEE Trans. Commun., vol. 61, no. 10, pp. 4391–4403, 2013.
[5] O. El Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath, “Spatially sparse precoding in millimeter wave MIMO systems,” IEEE Trans. Wireless Commun., vol. 13, no. 3, pp. 1499–1513, 2014.
[6] J. Wang, Z. Lan, C. Pyo, T. Baykas, C. Sum, M. A. Rahman, J. Gao, R. Funada, F. Kojima, H. Harada, and S. Kato, “Beam codebook based beamforming protocol for multi-Gbps millimeter-wave WPAN systems,” IEEE J. Sel. Areas Commun., vol. 27, no. 8, pp. 1390–1399, Oct 2009.
[7] O. E. Ayach, R. W. Heath, S. Abu-Surra, S. Rajagopal, and Z. Pi, “The capacity optimality of beam steering in large millimeter wave mimo systems,” in 2012 IEEE 13th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), June 2012, pp. 100–104.
[8] J. Lee, G. T. Gil, and Y. H. Lee, “Channel estimation via orthogonal matching pursuit for hybrid MIMO systems in millimeter wave communications,” IEEE Trans. Commun., vol. 64, no. 6, pp. 2370–2386, June 2016.
[9] S. Srivastava, A. Mishra, A. Rajoriya, A. K. Jagannatham, and G. Ascheid, “Quasi-static and time-selective channel estimation for block-sparse millimeter wave hybrid MIMO systems: Sparse bayesian learning (sbl) based approaches,” IEEE Trans. Signal Process., vol. 67, no. 5, pp. 1251–1266, March 2019.
[10] W. Zhang, T. Kim, D. J. Love, and E. Perrins, “Leveraging the restricted isometry property: Improved low-rank subspace decomposition for hybrid millimeter-wave systems,” IEEE Trans. Commun., vol. 66, no. 11, pp. 5814–5827, Nov 2018.
[11] P. Jain, P. Netrapalli, and S. Sanghavi, “Low-rank matrix completion using alternating minimization,” in Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing, ser. STOC ’13. New York, NY, USA: ACM, 2013, pp. 665–674.
[12] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization,” SIAM Rev., vol. 52, no. 3, pp. 471–501, 2010.
[13] X. Li, J. Fang, H. Li, and P. Wang, “Millimeter wave channel estimation via exploiting joint sparse and low-rank structures,” IEEE Trans. Wireless Commun., vol. 17, no. 2, pp. 1123–1133, Feb 2018.
[14] W. Zhang, T. Kim, G. Xiong, and S.-H. Leung, “Leveraging subspace information for low-rank matrix reconstruction,” Signal Process., vol. 163, pp. 123 – 131, 2019.
[15] W. Zhang, T. Kim, and D. Love, “Sparse subspace decomposition for millimeter wave MIMO channel estimation,” in 2016 IEEE Global Communications Conference: Signal Processing for Communications (Globecom2016 SPC), Washington, USA, Dec. 2016.
[16] H. Ghauch, T. Kim, M. Bengtsson, and M. Skoglund, “Subspace estimation and decomposition for large millimeter-wave MIMO systems,” IEEE J. Sel. Top. Signal Process., vol. 10, no. 3, pp. 528–542, April 2016.
[17] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, “Channel estimation and hybrid precoding for millimeter wave cellular systems,” IEEE J. Sel. Top. Signal Process., vol. 8, no. 5, pp. 831–846, Oct 2014.
[18] R. Heath, N. Gonzalez-Prelcic, S. Rangan, W. Roh, and A. Sayeed, “An overview of signal processing techniques for millimeter wave MIMO systems,” IEEE J. Sel. Top. Signal Process., vol. PP, no. 99, pp. 1–1, 2016.
[19] J. Brady, N. Behdad, and A. M. Sayeed, “Beamspace MIMO for millimeter-wave communications: System architecture, modeling, analysis, and measurements,” IEEE Trans. Antennas Propag., vol. 61, no. 7, pp. 3814–3827, 2013.
[20] A. Goldsmith, S. A. Jafar, N. Jindal, and S. Vishwanath, “Capacity limits of MIMO channels,” IEEE J. Sel. Areas Commun., vol. 21, no. 5, pp. 684–702, June 2003.
[21] S. Haghighatshoar and G. Caire, “Massive MIMO channel subspace estimation from low-dimensional projections,” IEEE Trans. Signal Process., vol. 65, no. 2, pp. 303–318, Jan 2017.
[22] X. Zhang, A. F. Molisch, and S.-Y. Kung, “Variable-phase-shift-based RF-baseband codesign for MIMO antenna selection,” IEEE Trans. Signal Process., vol. 53, no. 11, pp. 4091–4103, Nov 2005.
[23] N. El Karoui, “Tracy–widom limit for the largest eigenvalue of a large class of complex sample covariance matrices,” Ann. Probab., vol. 35, no. 2, pp. 663–714, 03 2007.
[24] X. Yu, J. C. Shen, J. Zhang, and K. B. Letaief, “Alternating minimization algorithms for hybrid precoding in millimeter wave MIMO systems,” IEEE J. Sel. Top. Signal Process., vol. 10, no. 3, pp. 485–500, April 2016.
[25] T. T. Cai and A. Zhang, “Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics,” Ann. Statist., vol. 46, no. 1, pp. 60–89, 02 2018.
[26] R. Durrett, Probability: theory and examples. Cambridge University Press, 2010.
[27] F. Wei, “Upper bound for intermediate singular values of random matrices,” Journal of Mathematical Analysis and Applications, vol. 445, no. 2, pp. 1530–1547, 2017.
[28] Y. Dagan, G. Kur, and O. Shamir, “Space lower bounds for linear prediction in the streaming model,” in Proceedings of the Thirty-Second Conference on Learning Theory, ser. Proceedings of Machine Learning Research, A. Beygelzimer and D. Hsu, Eds., vol. 99. Phoenix, USA: PMLR, 25–28 Jun 2019, pp. 929–954.
[29] G. Golub and C. Van Loan, Matrix Computations, ser. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, 2013.
[30] T. Bai, A. Alkhateeb, and R. W. Heath, “Coverage and capacity of millimeter-wave cellular networks,” IEEE Commun. Mag., vol. 52, no. 9, pp. 70–77, Sep. 2014.
[31] Z. Pi and F. Khan, “An introduction to millimeter-wave mobile broadband systems,” IEEE Commun. Mag., vol. 49, no. 6, pp. 101–107, June 2011.