This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Sequential Subspace Method for Millimeter Wave MIMO Channel Estimation

Wei Zhang, , Taejoon Kim, , and Shu-Hung Leung W. Zhang is with the Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR, China (e-mail: wzhang237-c@my.cityu.edu.hk). T. Kim is with the Department of Electrical Engineering and Computer Science, The University of Kansas, KS 66045, USA (e-mail: taejoonkim@ku.edu). S.-H. Leung is with State Key Laboratory of Terahertz and Millimeter Waves and Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR, China (e-mail: eeeugshl@cityu.edu.hk).
Abstract

Data transmission over the millimeter wave (mmWave) in fifth-generation wireless networks aims to support very high speed wireless communications. A substantial increase in spectrum efficiency for mmWave transmission can be achieved by using advanced hybrid analog-digital precoding, for which accurate channel state information (CSI) is the key. Rather than estimating the entire channel matrix, it is now well-understood that directly estimating subspace information, which contains fewer parameters, does have enough information to design transceivers. However, the large channel use overhead and associated computational complexity in the existing channel subspace estimation techniques are major obstacles to deploy the subspace approach for channel estimation. In this paper, we propose a sequential two-stage subspace estimation method that can resolve the overhead issues and provide accurate subspace information. Utilizing a sequential method enables us to avoid manipulating the entire high-dimensional training signal, which greatly reduces the computational complexity. Specifically, in the first stage, the proposed method samples the columns of channel matrix to estimate its column subspace. Then, based on the obtained column subspace, it optimizes the training signals to estimate the row subspace. For a channel with NrN_{r} receive antennas and NtN_{t} transmit antennas, our analysis shows that the proposed technique only requires 𝒪(Nt)\mathcal{O}(N_{t}) channel uses, while providing a guarantee of subspace estimation accuracy. By theoretical analysis, it is shown that the similarity between the estimated subspace and the true subspace is linearly related to the signal-to-noise ratio (SNR), i.e., 𝒪(SNR)\mathcal{O}(\text{SNR}), at high SNR, while quadratically related to the SNR, i.e., 𝒪(SNR2)\mathcal{O}(\text{SNR}^{2}), at low SNR. Simulation results show that the proposed sequential subspace method can provide improved subspace accuracy, normalized mean squared error, and spectrum efficiency over existing methods.

Index Terms:
Channel estimation, compressed sensing, millimeter wave communication, multi-input multi-output, subspace estimation.

I Introduction

Wireless communications using the millimeter wave (mmWave), which occupies the frequency band (30–300 GHz), address the current scarcity of wireless broadband spectrum and enable high speed transmission in fifth-generation (5G) wireless networks [1]. Due to the short wavelength, it is possible to employ large-scale antenna arrays with small-form-factor [2, 3, 4]. To reduce power consumption and hardware complexity, the mmWave systems exploit hybrid analog-digital multiple-input multiple-output (MIMO) architecture operating with a limited number of radio frequency (RF) chains [2]. Under the perfect channel state information (CSI), it has been shown that hybrid precoding can achieve nearly optimal performance as fully-digital precoding [2, 3, 5]. In practice, accurate CSI must be estimated via channel training in order to have effective precoding for robust mmWave MIMO transmission. However, extracting accurate CSI in the mmWave MIMO poses new challenges due to the limited number of RF chains that limits the observability of the channel and greatly increases the channel use overhead.

To reduce the channel use overhead, initial works focused on the beam alignment techniques [6, 7] utilizing beam search codebooks. By exploiting the fact that mmWave propagation exhibits low-rank characteristic, recent researches formulated the channel estimation task as a sparse signal reconstruction problem [8, 9] and low-rank matrix reconstruction problem [10, 11, 12, 13, 14, 15]. By using the knowledge of sparse signal reconstruction, orthogonal matching pursuit (OMP) [8] and sparse Bayesian learning (SBL) [9] were motivated to estimate the sparse mmWave channel in angular domain. Alternatively, if the channel is rank-sparse, it is possible to directly extract sufficient channel subspace information for the precoder design [16, 10, 11]. These subspace-based methods employ the Arnoldi iteration [16] to estimate the channel subspaces and knowledge of matrix completion [10, 11] to estimate the low-rank mmWave channel information.

Though the sparse signal reconstruction [8, 9] and matrix completion [10, 11] techniques can reduce the channel use overhead compared to traditional beam alignment techniques, the training sounders of these techniques [8, 9, 10, 11] are pre-designed and high-dimensional, which leads to the fact that these works suffer from explosive computational complexity as the size of arrays grows. To reduce the computational complexity, the adaptive training techniques have been investigated in [4, 16, 17], where the training sounders can be adaptively designed based on the feedback or two-way training. But these adaptive training techniques could not guarantee the performance on mean squared error (MSE) and/or subspace estimation accuracy. Moreover, the techniques provided in [4, 16, 17] will introduce additional channel use overhead due to the required feedback and two-way training.

To resolve the feedback overhead and maintain the benefit of adaptive training, in this paper, we present a two-stage subspace estimation approach, which sequentially estimates the column and row subspaces of the mmWave MIMO channel. Compared to the existing channel estimation techniques in [8, 9, 10, 11], the training sounders of the proposed approach are adaptively designed to reduce the channel use overhead and computational complexity. Moreover, the proposed approach is open-loop, thus it has no requirements of feedback and two-way channel sounding compared to priori adaptive training techniques [4, 16, 17]. The main contributions of this paper are described as follows:

  • We propose a two-stage subspace estimation technique called a sequential and adaptive subspace estimation (SASE) method. In the channel estimation of the proposed SASE, the column and row subspaces are estimated sequentially. Specifically, in the first stage, we sample a small fraction of columns of the channel matrix to obtain an estimate of the column subspace of the channel. In the second stage, the row subspace of the channel is estimated based on the obtained column subspace. In particular, by using the estimated column subspace obtained in the first stage, the receive training sounders of the second stage are optimized to reduce the number of channel uses. Compared to the existing works with fixed training sounders, where the entire high-dimensional training signals are utilized to obtain the CSI, the proposed adaptation has the advantage that the dimension of signals being processed in each stage is much less than that of the entire training signal, greatly reducing the computational complexity. Thus, the proposed SASE has much less computational complexity than those of the existing methods.

  • We analyze the subspace estimation accuracy, which guarantees the performance of the proposed SASE technique. Through extensive analysis, it is shown that the subspace estimation accuracy of the SASE is linearly related to the signal-to-noise ratio (SNR), i.e., 𝒪(SNR)\mathcal{O}(\text{SNR}), at high SNR, and quadratically related to the SNR, i.e., 𝒪(SNR2){\mathcal{O}}(\text{SNR}^{2}), at low SNR. Moreover, simulation results show that the proposed SASE improves estimation accuracy over the prior arts.

  • After obtaining the estimated column and row subspaces, an efficient method is developed for estimating the high-dimensional but low-rank channel matrix. Specifically, given the subspaces estimated by the proposed SASE, the mmWave channel estimation task can be simplified to solving a low-dimensional least squares problem, whose computation is much lower. Simulation results show that the proposed channel estimation method has lower normalized mean squared error and higher spectrum efficiency than those of the existing methods.

This paper is organized as follows, in Section II, we introduce the mmWave MIMO system model. In Section III, the proposed SASE is developed and analyzed. The channel use overhead, computational complexity, and an extension of the proposed SASE are discussed in Section IV. Finally, the simulation results and the conclusion remarks are provided in Sections V and VI, respectively.

Notation: Bold small and captial letters denote vectors and matrices, respectively. 𝐀T,𝐀H,𝐀1{\mathbf{A}}^{T},{\mathbf{A}}^{H},{\mathbf{A}}^{\!-1}, |𝐀||{\mathbf{A}}|, 𝐀F\|{\mathbf{A}}\|_{F}, tr(𝐀)\mathop{\mathrm{tr}}({\mathbf{A}}), and 𝐚2\|{\mathbf{a}}\|_{2} are, respectively, the transpose, conjugate transpose, inverse, determinant, Frobenius norm, trace of 𝐀{\mathbf{A}}, and l2l_{2}-norm of 𝐚{\mathbf{a}}. [𝐀]:,i[{\mathbf{A}}]_{:,i}, [𝐀]i,:[{\mathbf{A}}]_{i,:}, and [𝐀]i,j[{\mathbf{A}}]_{i,j} are, respectively, the iith column, iith row, and iith row jjth column entry of 𝐀{\mathbf{A}}. vec(𝐀)\mathop{\mathrm{vec}}({\mathbf{A}}) stacks the columns of 𝐀{\mathbf{A}} and forms a column vector. diag(𝐚)\mathop{\mathrm{diag}}({\mathbf{a}}) denotes a square diagonal matrix with vector 𝐚{\mathbf{a}} as the main diagonal. σL(𝐀)\sigma_{L}({\mathbf{A}}) denotes the LLth largest singular value of matrix 𝐀{\mathbf{A}}. 𝐈MM×M{\mathbf{I}}_{M}\!\in\!{\mathbb{R}}^{M\times M} is the identity matrix. The 𝟏M,NM×N\mathbf{1}_{M,N}\!\in\!{\mathbb{R}}^{M\times N} , 𝟎MM×1,𝟎M,NM×N\mathbf{0}_{M}\!\in\!{\mathbb{R}}^{M\times 1},\mathbf{0}_{M,N}\!\in\!{\mathbb{R}}^{M\times N} are the all one matrix, zero vector, and zero matrix, respectively. col(𝐀)\mathop{\mathrm{col}}({\mathbf{A}}) denotes the column subspace spanned by the columns of matrix 𝐀{\mathbf{A}}. The operator ()+(\cdot)_{+} denotes max{0,}\max\{0,\cdot\}. The operator \otimes denotes the Kronecker product.

II MmWave MIMO System Model

Refer to caption
Figure 1: The mmWave MIMO channel sounding model

II-A Channel Sounding Model

The mmWave MIMO channel sounding model is shown in Fig. 1, where the transmitter and receiver are equipped with NtN_{t} and NrN_{r} antennas, respectively. There are NRF2N_{RF}\geq 2 and MRF2M_{RF}\geq 2 RF chains at the transmitter and receiver, respectively. Without loss of generality, we assume NtN_{t} is an integer multiple of NRFN_{RF}, and NrN_{r} is also an integer multiple of MRFM_{RF}. In the considered mmWave channel sounding framework, one sounding symbol is transmitted over a unit time interval from the transmitter, which is defined as one channel use. It is assumed that the system employs KK channel uses for channel sounding. The received signal 𝐲(k)MRF×1{\mathbf{y}}_{(k)}\in{\mathbb{C}}^{M_{RF}\times 1} at the kkth channel use is given by

𝐲(k)=𝐖(k)H𝐇𝐟(k)+𝐖(k)H𝐧(k),k=1,,K,\displaystyle{\mathbf{y}}_{(k)}={\mathbf{W}}_{(k)}^{H}{\mathbf{H}}{\mathbf{f}}_{(k)}+{\mathbf{W}}_{(k)}^{H}{\mathbf{n}}_{(k)},~{}k=1,\ldots,K, (1)

where 𝐖(k)=𝐖A,k𝐖D,kNr×MRF{\mathbf{W}}_{(k)}\!\!=\!\!{\mathbf{W}}_{A,k}{\mathbf{W}}_{D,k}\in{\mathbb{C}}^{N_{r}\times M_{RF}} is the receive sounder composed of receive analog sounder 𝐖A,kNr×MRF{\mathbf{W}}_{A,k}\in{\mathbb{C}}^{N_{r}\times M_{RF}} and receive digital sounder 𝐖D,kMRF×MRF{\mathbf{W}}_{D,k}\in{\mathbb{C}}^{M_{RF}\times M_{RF}} in series, 𝐟(k)=𝐅A,k𝐅D,k𝐬kNt×1{\mathbf{f}}_{(k)}\!\!=\!\!{\mathbf{F}}_{A,k}{\mathbf{F}}_{D,k}{\mathbf{s}}_{k}\!\!\in\!\!{\mathbb{C}}^{N_{t}\times 1} is the transmit sounder composed of transmit analog sounder 𝐅A,kNt×NRF{\mathbf{F}}_{A,k}\in{\mathbb{C}}^{N_{t}\times N_{RF}} and transmit digital sounder 𝐅D,kNRF×NRF{\mathbf{F}}_{D,k}\in{\mathbb{C}}^{N_{RF}\times N_{RF}} in series with transmitted sounding signal 𝐬k{\mathbf{s}}_{k}, and 𝐧(k)Nr×1{\mathbf{n}}_{(k)}\!\in\!{\mathbb{C}}^{{N_{r}}\times 1} is the noise.

Considering that the transmitted sounding signal 𝐬k{\mathbf{s}}_{k} is included in 𝐟(k){\mathbf{f}}_{(k)}, for convenience, we let 𝐬k=1NRF[1,,1]TNRF×1{\mathbf{s}}_{k}\!\!=\!\!\frac{1}{\sqrt{N_{RF}}}[1,\ldots,1]^{T}\in{\mathbb{R}}^{N_{RF}\times 1}, which enables us to focus on the design of 𝐅A,k{\mathbf{F}}_{A,k} and 𝐅D,k{\mathbf{F}}_{D,k}. It is worth noting that the analog sounders are constrained to be constant modulus, that is, |[𝐖A,k]i,j|=1/Nr|[{\mathbf{W}}_{A,k}]_{i,j}|={1}/{\sqrt{N_{r}}}, and |[𝐅A,k]i,j|=1/Nt,i,j|[{\mathbf{F}}_{A,k}]_{i,j}|={1}/{\sqrt{N_{t}}},\forall i,j. Without loss of generality, we assume the power of the transmit sounder is one, that is, 𝐟(k)22=1\|{\mathbf{f}}_{(k)}\|_{2}^{2}=1. The noise 𝐧(k){\mathbf{n}}_{(k)} is an independent zero mean complex Gaussian vector with covariance matrix σ2𝐈Nr\sigma^{2}{\mathbf{I}}_{N_{r}}. Due to the unit power of transmit sounder, we define the signal-to-noise-ratio (SNR) as 1/σ21/{\sigma^{2}}.111Here, the SNR is the ratio of transmitted sounder’s power to the noise’s power, which is a common practice in the channel estimation literature [4, 8, 9, 10, 16, 17]. The details of designing the receive and transmit sounders for facilitating the channel estimation will be discussed in Section III.

To model the point-to-point sparse mmWave MIMO channel, we assume there are LL clusters with Lmin{Nr,Nt}L\ll\min\{N_{r},N_{t}\}, and each constitutes a propagation path. The channel model can be expressed as [18, 19],

𝐇=NrNtLl=1Lhl𝐚r(θr,l)𝐚tH(θt,l).\displaystyle{\mathbf{H}}=\sqrt{\frac{N_{r}N_{t}}{L}}\sum_{l=1}^{L}h_{l}{\mathbf{a}}_{r}(\theta_{r,l}){\mathbf{a}}_{t}^{H}(\theta_{t,l}). (2)

where 𝐚r(θr,l)Nr×1{\mathbf{a}}_{r}(\theta_{r,l})\!\in\!{\mathbb{C}}^{N_{r}\!\times 1}\! and 𝐚t(θt,l)Nt×1{\mathbf{a}}_{t}(\theta_{t,l})\!\in\!{\mathbb{C}}^{N_{t}\!\times 1}\! are array response vectors of the uniform linear arrays (ULAs) at the receiver and transmitter, respectively. We extend it to the channel model with 2D uniform planar arrays (UPAs) in Section IV-C. In particular, 𝐚r(θr,l){\mathbf{a}}_{r}(\theta_{r,l}) and 𝐚t(θt,l){\mathbf{a}}_{t}(\theta_{t,l}) are expressed as

𝐚r(θr,l)=1Nr[1,ej2πλdsinθr,l,,ej2πλd(Nr1)sinθr,l]T,\displaystyle{\mathbf{a}}_{r}(\theta_{r,l})\!=\!\frac{1}{\sqrt{N_{r}}}[1,e^{-j\frac{2\pi}{\lambda}d\sin\theta_{r,l}},\!\cdots\!,e^{-j\frac{2\pi}{\lambda}d(N_{r}-1)\sin\theta_{r,l}}]^{T},
𝐚t(θt,l)=1Nt[1,ej2πλdsinθt,l,,ej2πλd(Nt1)sinθt,l]T,\displaystyle{\mathbf{a}}_{t}(\theta_{t,l})\!=\!\frac{1}{\sqrt{N_{t}}}[1,e^{-j\frac{2\pi}{\lambda}d\sin\theta_{t,l}},\!\cdots\!,e^{-j\frac{2\pi}{\lambda}d(N_{t}-1)\sin\theta_{t,l}}]^{T},

where λ\lambda is the wavelength, d=0.5λd=0.5\lambda is the antenna spacing, θr,l\theta_{r,l} and θt,l\theta_{t,l} are the angle of arrival (AoA) and angle of departure (AoD) of the llth path uniformly distributed in [π/2,π/2)[-\pi/2,\pi/2), respectively, and hl𝒞𝒩(0,σh,l2)h_{l}\sim{\mathcal{C}}{\mathcal{N}}(0,\sigma_{h,l}^{2}) is the complex gain of the llth path.

The channel model in (2) can be rewritten as

𝐇=𝐀rdiag(𝐡)𝐀tH,\displaystyle{\mathbf{H}}={\mathbf{A}}_{r}\mathop{\mathrm{diag}}(\mathbf{h}){\mathbf{A}}_{t}^{H}, (3)

where 𝐀r=[𝐚r(θr,1),,𝐚r(θr,L)]Nr×L\!\!{\mathbf{A}}_{r}\!\!=\!\![{\mathbf{a}}_{r}(\theta_{r,1}),\ldots,{\mathbf{a}}_{r}(\theta_{r,L})]\!\!\in\!\!{\mathbb{C}}^{N_{r}\times L}, 𝐀t=[𝐚t(θt,1),,𝐚t(θt,L)]Nt×L{\mathbf{A}}_{t}\!\!=\!\![{\mathbf{a}}_{t}(\theta_{t,1}),\ldots,{\mathbf{a}}_{t}(\theta_{t,L})]\!\in\!{\mathbb{C}}^{N_{t}\times L}, and 𝐡=[h1,,hL]TL×1\mathbf{h}\!\!=\!\![h_{1},\cdots,h_{L}]^{T}\!\!\in\!\!{\mathbb{C}}^{L\times 1}. The channel estimation task is to obtain an estimate of 𝐇{\mathbf{H}}, i.e., 𝐇^\widehat{{\mathbf{H}}}, from 𝐲(k){\mathbf{y}}_{(k)}, 𝐖(k){\mathbf{W}}_{(k)}, and 𝐟(k){\mathbf{f}}_{(k)}, k=1,,Kk\!=\!\!1,\cdots,K in (1).

II-B Performance Evaluation of Channel Estimation

To evaluate the channel estimation performance, the achieved spectrum efficiency by utilizing the channel estimate 𝐇^\widehat{{\mathbf{H}}} is discussed in the following. Conventionally, the precoder 𝐅^Nt×Nd\widehat{{\mathbf{F}}}\!\in\!{\mathbb{C}}^{N_{t}\times N_{d}} and combiner 𝐖^Nr×Nd\widehat{{\mathbf{W}}}\!\in\!{\mathbb{C}}^{N_{r}\times N_{d}} are designed, based on the estimated 𝐇^\widehat{{\mathbf{H}}}, where NdN_{d} is the number of transmitted data streams with Ndmin{NRF,MRF}N_{d}\!\leq\!\min\{N_{RF},M_{RF}\}. Here, when evaluating the channel estimation performance, it is assumed the number of transmitted data streams is equal to the number of dominant paths, i.e., Nd=LN_{d}=L. After the design of precoder and combiner, the received signal for the data transfer is given by

𝐲=𝐖^H𝐇𝐅^𝐬+𝐖^H𝐧,\displaystyle{\mathbf{y}}=\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}\widehat{{\mathbf{F}}}{\mathbf{s}}+\widehat{{\mathbf{W}}}^{H}{\mathbf{n}}, (4)

where the signal follows 𝐬𝒞𝒩(𝟎L,1L𝐈L){\mathbf{s}}\sim{\mathcal{C}}{\mathcal{N}}(\mathbf{0}_{L},\frac{1}{L}{\mathbf{I}}_{L}) and 𝐧𝒞𝒩(𝟎Nr,σ2𝐈Nr){\mathbf{n}}\sim{\mathcal{C}}{\mathcal{N}}(\mathbf{0}_{N_{r}},\sigma^{2}{\mathbf{I}}_{N_{r}}). It is worth noting that (4) is for data transmission, while (1) is for channel sounding. The spectrum efficiency achieved by 𝐖^\widehat{{\mathbf{W}}} and 𝐅^\widehat{{\mathbf{F}}} in (4) is defined in [20] as,

R=log2|𝐈L+1σ2L𝐑n1𝐇e𝐇eH|,\displaystyle R=\mathrm{log}_{2}\left|{\mathbf{I}}_{L}+\frac{1}{\sigma^{2}L}{\mathbf{R}}_{n}^{-1}{\mathbf{H}}_{e}{{\mathbf{H}}_{e}}^{H}\right|, (5)

where 𝐇e=𝐖^H𝐇𝐅^L×L{\mathbf{H}}_{e}\!=\!\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}\widehat{{\mathbf{F}}}\in{\mathbb{C}}^{{L}\times{L}} and 𝐑n=𝐖^H𝐖^L×L{\mathbf{R}}_{n}\!=\!\widehat{{\mathbf{W}}}^{H}\widehat{{\mathbf{W}}}\in{\mathbb{C}}^{{L}\times{L}}. In this work, we assume that the precoder and combiner are unitary, such that 𝐖^H𝐖^=𝐈L\widehat{{\mathbf{W}}}^{H}\widehat{{\mathbf{W}}}={\mathbf{I}}_{L} and 𝐅^H𝐅^=𝐈L\widehat{{\mathbf{F}}}^{H}\widehat{{\mathbf{F}}}={\mathbf{I}}_{L}. Under this assumption, we have 𝐑n=𝐈L{\mathbf{R}}_{n}={\mathbf{I}}_{L} in (5).

It is worth noting that the spectrum efficiency in (5) is invariant to the right rotations of the precoder and combiner, i.e., substituting 𝐅~=𝐅^𝐑𝐅\widetilde{{\mathbf{F}}}=\widehat{{\mathbf{F}}}{\mathbf{R}}_{\mathbf{F}} and 𝐖~=𝐖^𝐑𝐖\widetilde{{\mathbf{W}}}=\widehat{{\mathbf{W}}}{\mathbf{R}}_{\mathbf{W}} into (5), where 𝐑𝐅L×L{\mathbf{R}}_{\mathbf{F}}\in{\mathbb{C}}^{L\times L} and 𝐑𝐖L×L{\mathbf{R}}_{\mathbf{W}}\in{\mathbb{C}}^{L\times L} are unitary matrices, does not change the spectrum efficiency. Thus, the RR in (5) is a function of subspaces spanned by the precoder and combiner, i.e., col(𝐅^)\mathop{\mathrm{col}}(\widehat{{\mathbf{F}}}) and col(𝐖^)\mathop{\mathrm{col}}(\widehat{{\mathbf{W}}}). Moreover, the highest spectrum efficiency can be achieved when col(𝐅^)\mathop{\mathrm{col}}(\widehat{{\mathbf{F}}}) and col(𝐖^)\mathop{\mathrm{col}}(\widehat{{\mathbf{W}}}) respectively equal to the row and column subspaces of 𝐇{\mathbf{H}}.

Apart from the spectrum efficiency achieved by the signal model in (4), we consider the effective SNR at the receiver,

γ=𝐖^H𝐇𝐅^F2σ2𝐖^F2=𝐖^H𝐇𝐅^F2σ2L.\displaystyle\gamma=\frac{\|\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}\widehat{{\mathbf{F}}}\|_{F}^{2}}{\sigma^{2}\|\widehat{{\mathbf{W}}}\|_{F}^{2}}=\frac{\|\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}\widehat{{\mathbf{F}}}\|_{F}^{2}}{\sigma^{2}L}. (6)

The received SNR γ\gamma in (6) has the same rotation invariance property as the spectrum efficiency. In other words, the γ\gamma in (6) is a function of the estimated column and row subspaces. The maximum of the γ\gamma is also achieved when 𝐖^\widehat{{\mathbf{W}}} and 𝐅^\widehat{{\mathbf{F}}} span the column and row subspaces of 𝐇{\mathbf{H}}, respectively.

Refer to caption
Figure 2: Illustration of SASE Algorithm

Inspired by the definition in (6), in this paper, the accuracy of subspace estimation is defined as the ratio of the power captured by the transceiver matrices [21] 𝐖^\widehat{{\mathbf{W}}} and 𝐅^\widehat{{\mathbf{F}}} to the power of the channel,

η(𝐖^,𝐅^)=𝐖^H𝐇𝐅^F2tr(𝐇H𝐇).\displaystyle\eta(\widehat{{\mathbf{W}}},\widehat{{\mathbf{F}}})=\frac{\|\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}\widehat{{\mathbf{F}}}\|_{F}^{2}}{\mathop{\mathrm{tr}}({\mathbf{H}}^{H}{\mathbf{H}})}. (7)

Similarly, the measures for the accuracy of column subspace and row subspace estimation, i.e., ηc(𝐖^)\eta_{c}(\widehat{{\mathbf{W}}}) and ηr(𝐅^)\eta_{r}(\widehat{{\mathbf{F}}}), are respectively defined as the ratio of the power captured by 𝐖^\widehat{{\mathbf{W}}} and 𝐅^\widehat{{\mathbf{F}}} to the power of the channel in the following,

ηc(𝐖^)\displaystyle\eta_{c}(\widehat{{\mathbf{W}}}) =\displaystyle= tr(𝐖^H𝐇𝐇H𝐖^)tr(𝐇H𝐇),\displaystyle\frac{\mathop{\mathrm{tr}}(\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}{\mathbf{H}}^{H}\widehat{{\mathbf{W}}})}{\mathop{\mathrm{tr}}({\mathbf{H}}^{H}{\mathbf{H}})}, (8)
ηr(𝐅^)\displaystyle\eta_{r}(\widehat{{\mathbf{F}}}) =\displaystyle= tr(𝐅^H𝐇H𝐇𝐅^)tr(𝐇H𝐇).\displaystyle\frac{\mathop{\mathrm{tr}}(\widehat{{\mathbf{F}}}^{H}{\mathbf{H}}^{H}{\mathbf{H}}\widehat{{\mathbf{F}}})}{\mathop{\mathrm{tr}}({\mathbf{H}}^{H}{\mathbf{H}})}. (9)

Moreover, ηc\eta_{c} and ηr\eta_{r} are also rotation invariant. When the values of ηc\eta_{c} or ηr\eta_{r} are closed to one, the corresponding 𝐖^\widehat{{\mathbf{W}}} or 𝐅^\widehat{{\mathbf{F}}} can be treated accurate subspace estimates.

The illustration of the proposed SASE algorithm is shown in Fig. 2. It consists of two stages: one is column subspace estimation and the other is row subspace estimation. In particular, the training sounders of the second stage can be optimized by fully adapting them to the estimated column subspace, which would reduce the number of channel uses and improve the estimation accuracy.

III Sequential and Adaptive Subspace Estimation

III-A Estimate the Column Subspace

In this subsection, we present the design of transmit and receive sounders along with the method for obtaining the column subspace of the mmWave channel. To begin with, the following lemma shows that under the mmWave channel model in (3), the column subspaces of 𝐇{\mathbf{H}} and sub-matrix 𝐇S{\mathbf{H}}_{S} are equivalent.

Lemma 1

Let 𝐇S=𝐇𝐒Nr×m{\mathbf{H}}_{S}={\mathbf{H}}{\mathbf{S}}\in{\mathbb{C}}^{N_{r}\times m} be a sub-matrix that selects the first mm columns of 𝐇{\mathbf{H}} with mLm\geq L, where 𝐒{\mathbf{S}} is expressed as

𝐒=[𝐈m𝟎Ntm,m]Nt×m.\displaystyle{\mathbf{S}}=\begin{bmatrix}{\mathbf{I}}_{m}\\ \mathbf{0}_{N_{t}-m,m}\end{bmatrix}\in{\mathbb{R}}^{N_{t}\times m}.

For the mmWave channel model in (3), if all the values of angles {θt,l}l=1L\{\theta_{t,l}\}_{l=1}^{L} and {θr,l}l=1L\{\theta_{r,l}\}_{l=1}^{L} are distinct, the column subspaces of 𝐇{\mathbf{H}} and 𝐇S{\mathbf{H}}_{S} will be equivalent, i.e., col(𝐇S)=col(𝐇)\mathop{\mathrm{col}}({\mathbf{H}}_{S})=\mathop{\mathrm{col}}({\mathbf{H}}).

Proof:

See Appendix A. ∎∎

Remark 1

Because {θt,l}l=1L\{\theta_{t,l}\}_{l=1}^{L} and {θr,l}l=1L\{\theta_{r,l}\}_{l=1}^{L} are continuous random variables (r.v.s) in [π/2,π/2)[-\pi/2,\pi/2), hence, they are distinct almost surely (i.e., with probability 11).

Lemma 1 reveals that when col(𝐇S)=col(𝐇)\mathop{\mathrm{col}}({\mathbf{H}}_{S})=\mathop{\mathrm{col}}({\mathbf{H}}), to obtain the column subspace of 𝐇{\mathbf{H}}, it suffices to sample the first mm columns of 𝐇{\mathbf{H}}, i.e., 𝐇S{\mathbf{H}}_{S}, which reduces the number of channel uses. However, the mmWave hybrid MIMO architecture can not directly access the entries of 𝐇{\mathbf{H}} due to the analog array constraints. This can be overcome by adopting the technique proposed in [16]. Specifically, to sample the iith column of 𝐇{\mathbf{H}}, i.e., [𝐇]:,i[{\mathbf{H}}]_{:,i}, the transmitter needs to construct the transmit sounder 𝐟(i)=𝐞iNt×1{\mathbf{f}}_{(i)}={\mathbf{e}}_{i}\in{\mathbb{C}}^{N_{t}\times 1}, where 𝐞i{\mathbf{e}}_{i} is the iith column of 𝐈Nt{\mathbf{I}}_{N_{t}}. This is possible due to the fact that any precoder vector can be generated by NRF2N_{RF}\geq 2 RF chains [22]. To be more specific, there exists 𝐅A,i{\mathbf{F}}_{A,i}, 𝐅D,i{\mathbf{F}}_{D,i}, and 𝐬i{\mathbf{s}}_{i} such that 𝐞i=𝐅A,i𝐅D,i𝐬i{\mathbf{e}}_{i}={\mathbf{F}}_{A,i}{\mathbf{F}}_{D,i}{\mathbf{s}}_{i},

𝐞i=1Nt[1111111111]𝐅A,i[NRFNt200NRFNt200000000]𝐅D,i1NRF[111]𝐬i,\displaystyle{\mathbf{e}}_{i}\!\!=\!\!\underbrace{\frac{1}{\sqrt{N_{t}}}\!\!\begin{bmatrix}1&1\!\!\!&\cdots\\ \vdots&\vdots\!\!\!&\cdots\\ 1&1\!\!\!&\cdots\\ 1&-1\!\!\!&\cdots\\ 1&1\!\!\!&\cdots\\ \vdots&\vdots\!\!\!&\cdots\\ 1&1\!\!\!&\cdots\end{bmatrix}}_{\triangleq{\mathbf{F}}_{A,i}}\underbrace{\begin{bmatrix}\frac{\sqrt{N_{RF}N_{t}}}{2}\!\!\!\!\!\!&0\!\!\!&\cdots\!\!\!&0\\ -\frac{\sqrt{N_{RF}N_{t}}}{2}\!\!\!\!\!\!&0\!\!\!&\cdots\!\!\!&0\\ 0\!\!\!\!\!\!&0\!\!\!&\cdots\!\!\!&0\\ \vdots\!\!\!\!\!\!&\vdots\!\!\!&\ddots\!\!\!&\vdots\\ 0\!\!\!\!\!\!&0\!\!\!&\cdots\!\!\!&0\end{bmatrix}}_{\triangleq{\mathbf{F}}_{D,i}}\underbrace{\frac{1}{\sqrt{N_{RF}}}\begin{bmatrix}1\\ 1\\ \vdots\\ 1\end{bmatrix}}_{\triangleq{\mathbf{s}}_{i}},

where 𝐅A,i=1Nt𝟏Nt,NRF{\mathbf{F}}_{A,i}\!\!=\!\!\frac{1}{\sqrt{N_{t}}}\mathbf{1}_{N_{t},N_{RF}} except for [𝐅A,i]i,2=1Nt[{\mathbf{F}}_{A,i}]_{i,2}\!\!=\!\!-\frac{1}{\sqrt{N_{t}}}, the 𝐅D,i=𝟎NRF,NRF{\mathbf{F}}_{D,i}\!\!=\!\!\mathbf{0}_{N_{RF},N_{RF}} except for [𝐅D,i]1,1=NRFNt2,[𝐅D,i]2,1=NRFNt2[{\mathbf{F}}_{D,i}]_{1,1}\!\!=\!\!\frac{\sqrt{N_{RF}N_{t}}}{2},[{\mathbf{F}}_{D,i}]_{2,1}\!\!=\!\!-\frac{\sqrt{N_{RF}N_{t}}}{2}, and 𝐬i=1NRF[1,,1]TNRF×1{\mathbf{s}}_{i}\!\!=\!\!\frac{1}{\sqrt{N_{RF}}}[1,\ldots,1]^{T}\in{\mathbb{R}}^{N_{RF}\times 1}.

At the receiver side, we collect the receive sounders of Nr/MRFN_{r}/M_{RF} channel uses to form the full-rank matrix,

𝐌=[𝐖(i,1),𝐖(i,2),,𝐖(i,Nr/MRF)]Nr×Nr,\displaystyle{\mathbf{M}}=[{\mathbf{W}}_{(i,1)},{\mathbf{W}}_{(i,2)},\cdots,{\mathbf{W}}_{(i,N_{r}/M_{RF})}]\in{\mathbb{C}}^{N_{r}\times N_{r}}, (10)

where 𝐖(i,j)Nr×MRF,j=1,,Nr/MRF{\mathbf{W}}_{(i,j)}\in{\mathbb{C}}^{N_{r}\times M_{RF}},~{}j=1,\ldots,N_{r}/M_{RF}, denotes the jjth receive sounder corresponding to transmit sounder 𝐞i{\mathbf{e}}_{i}. In order to satisfy the analog constraint where the entries in analog sounders should be constant modulus, we let the matrix 𝐌{\mathbf{M}} in (10) be the discrete Fourier transform (DFT) matrix. Specifically, the analog and digital receive sounders associated with 𝐖(i,j){\mathbf{W}}_{(i,j)} in (10) are expressed as follows

𝐖(i,j)=[𝐌]:,(j1)MRF+1:jMRFanalog sounder𝐈MRFdigital sounder.\displaystyle{\mathbf{W}}_{(i,j)}=\underbrace{[{\mathbf{M}}]_{:,(j-1)M_{RF}+1:jM_{RF}}}_{\text{analog sounder}}\underbrace{{\mathbf{I}}_{M_{RF}}}_{\text{digital sounder}}.

Thus, the received signal 𝐲(i,j)MRF×1{\mathbf{y}}_{(i,j)}\!\in\!{\mathbb{C}}^{M_{RF}\times 1} under the transmit sounder 𝐞i{\mathbf{e}}_{i} and receive sounder 𝐖(i,j){\mathbf{W}}_{\!(i,j)} is expressed as follows

𝐲(i,j)=𝐖(i,j)H𝐇𝐞i+𝐖(i,j)H𝐧(i,j),\displaystyle{\mathbf{y}}_{(i,j)}={\mathbf{W}}_{(i,j)}^{H}{\mathbf{H}}{\mathbf{e}}_{i}+{\mathbf{W}}_{(i,j)}^{H}{\mathbf{n}}_{(i,j)},

where 𝐧(i,j)Nr×1{\mathbf{n}}_{(i,j)}\in{\mathbb{C}}^{N_{r}\times 1} is the noise vector with 𝐧(i,j)𝒞𝒩(𝟎Nr,σ2𝐈Nr){\mathbf{n}}_{(i,j)}\sim{\mathcal{C}}{\mathcal{N}}(\bm{0}_{N_{r}},\sigma^{2}{\mathbf{I}}_{N_{r}}). Then we stack the observations of Nr/MRFN_{r}/M_{RF} channel uses as 𝐲i=[𝐲(i,1)T,,𝐲(i,Nr/MRF)T]TNr×1{\mathbf{y}}_{i}=[{\mathbf{y}}_{(i,1)}^{T},\cdots,{\mathbf{y}}^{T}_{(i,N_{r}/M_{RF})}]^{T}\in{\mathbb{C}}^{N_{r}\times 1},

[𝐲(i,1)𝐲(i,2)𝐲(i,NrMRF)]𝐲i\displaystyle\underbrace{\left[\begin{matrix}{\mathbf{y}}_{(i,1)}\\ {\mathbf{y}}_{(i,2)}\\ \vdots\\ {\mathbf{y}}_{(i,\frac{N_{r}}{M_{RF}})}\end{matrix}\right]}_{\triangleq{\mathbf{y}}_{i}}\!\!\!\! =\displaystyle= [𝐖(i,1)H𝐖(i,2)H𝐖(i,NrMRF)H]𝐌H𝐇𝐞i[𝐇]:,i+[𝐖(i,1)H𝐧(i,1)𝐖(i,2)H𝐧(i,2)𝐖(i,NrMRF)H𝐧(i,NrMRF)]𝐧~i\displaystyle\!\!\!\!\underbrace{\left[\begin{matrix}{\mathbf{W}}_{(i,1)}^{H}\\ {\mathbf{W}}_{(i,2)}^{H}\\ \vdots\\ {\mathbf{W}}_{(i,\frac{N_{r}}{M_{RF}})}^{H}\end{matrix}\right]}_{\triangleq{\mathbf{M}}^{H}}\underbrace{{\mathbf{H}}{\mathbf{e}}_{i}}_{\triangleq[{\mathbf{H}}]_{:,i}}\!\!+\!\!\underbrace{\left[\begin{matrix}{\mathbf{W}}_{(i,1)}^{H}{\mathbf{n}}_{(i,1)}\\ {\mathbf{W}}_{(i,2)}^{H}{\mathbf{n}}_{(i,2)}\\ \vdots\\ {\mathbf{W}}_{(i,\frac{N_{r}}{M_{RF}})}^{H}{\mathbf{n}}_{(i,\frac{N_{r}}{M_{RF}})}\end{matrix}\right]}_{\triangleq\tilde{{\mathbf{n}}}_{i}} (11)
=\displaystyle= 𝐌H[𝐇]:,i+𝐧~i,\displaystyle\!\!\!\!{\mathbf{M}}^{H}[{\mathbf{H}}]_{:,i}+\tilde{{\mathbf{n}}}_{i},

where 𝐧~iNr×1\tilde{{\mathbf{n}}}_{i}\in{\mathbb{C}}^{N_{r}\times 1} is the effective noise vector after stacking, whose covariance matrix is expressed as,

𝔼[𝐧~i𝐧~iH]\displaystyle\!\!\!\!\!\!{\mathbb{E}}[\tilde{{\mathbf{n}}}_{i}\tilde{{\mathbf{n}}}_{i}^{H}\!]\!\!\!\!\!\! =\displaystyle= σ2[𝐖(i,1)H𝐖(i,1)𝐖(i,1)H𝐖(i,NrMRF)𝐖(i,NrMRF)H𝐖(i,1)𝐖(i,NrMRF)H𝐖(i,NrMRF)].\displaystyle\!\!\!\!\!\sigma^{2}\!\!\!\begin{bmatrix}\!\!\!\!{\mathbf{W}}_{(i,1)}^{H}{\mathbf{W}}_{(i,1)}\!\!\!\!&\!\!\!\!\cdots\!\!\!\!&\!\!{\mathbf{W}}_{(i,1)}^{H}{\mathbf{W}}_{(i,\frac{N_{r}}{M_{RF}})}\\ \!\vdots\!\!\!\!&\!\!\!\!\ddots\!\!\!\!&\!\!\vdots\\ \!{\mathbf{W}}_{(i,\frac{N_{r}}{M_{RF}})}^{H}{\mathbf{W}}_{(i,1)}\!\!\!\!&\!\!\!\!\cdots\!\!\!\!&\!\!\!{\mathbf{W}}_{(i,\frac{N_{r}}{M_{RF}}\!\!)}^{H}\!\!{\mathbf{W}}_{\!\!(i,\frac{N_{r}}{M_{RF}}\!\!)}\!\!\!\end{bmatrix}\!\!. (12)

Because the DFT matrix 𝐌{\mathbf{M}} in (10) satisfies 𝐌H𝐌=𝐌𝐌H=𝐈Nr{\mathbf{M}}^{H}{\mathbf{M}}={\mathbf{M}}{\mathbf{M}}^{H}={\mathbf{I}}_{N_{r}}, the following holds

𝐖(i,j)H𝐖(i,k)={𝐈MRFj=k,𝟎MRFjk.\displaystyle{\mathbf{W}}_{(i,j)}^{H}{\mathbf{W}}_{(i,k)}=\left\{\begin{array}[]{rcl}{\mathbf{I}}_{M_{RF}}&&{j=k},\\ \mathbf{0}_{M_{RF}}&&{j\neq k}.\end{array}\right. (15)

Substituting (15) into (12), we can verify that 𝔼[𝐧~i𝐧~iH]=σ2𝐈Nr{\mathbb{E}}[\tilde{{\mathbf{n}}}_{i}\tilde{{\mathbf{n}}}_{i}^{H}]=\sigma^{2}{\mathbf{I}}_{N_{r}}, and precisely, 𝐧~i𝒞𝒩(𝟎Nr,σ2𝐈Nr)\tilde{{\mathbf{n}}}_{i}\sim{\mathcal{C}}{\mathcal{N}}(\mathbf{0}_{N_{r}},\sigma^{2}{\mathbf{I}}_{N_{r}}). Moreover, by denoting 𝐍~=[𝐧~1,,𝐧~m]Nr×m\widetilde{{\mathbf{N}}}=[\tilde{{\mathbf{n}}}_{1},\cdots,\tilde{{\mathbf{n}}}_{m}]\in{\mathbb{C}}^{N_{r}\times m}, it is straightforward that the entries in 𝐍~\widetilde{{\mathbf{N}}} are independent, identically distributed (i.i.d.) as 𝒞𝒩(0,σ2){\mathcal{C}}{\mathcal{N}}(0,\sigma^{2}). Here, for convenience, we denote 𝐘~S=[𝐲1,,𝐲m]Nr×m\widetilde{{\mathbf{Y}}}_{S}=[{\mathbf{y}}_{1},\cdots,{\mathbf{y}}_{m}]\in{\mathbb{C}}^{N_{r}\times m} where 𝐲i{\mathbf{y}}_{i} is defined in (11). Then, we apply DFT to the collected observation 𝐘~S\widetilde{{\mathbf{Y}}}_{S}, and obtain 𝐘S=𝐌𝐘~SNr×m{\mathbf{Y}}_{S}={\mathbf{M}}\widetilde{{\mathbf{Y}}}_{S}\in{\mathbb{C}}^{N_{r}\times m} as

𝐘S=𝐇S+𝐍S,\displaystyle{\mathbf{Y}}_{S}={\mathbf{H}}_{S}+{\mathbf{N}}_{S}, (16)

where 𝐍S=𝐌𝐍~Nr×m{\mathbf{N}}_{S}\!=\!{\mathbf{M}}\widetilde{{\mathbf{N}}}\in{\mathbb{C}}^{N_{r}\times m} and 𝐇S=[𝐇]:,1:mNr×m{\mathbf{H}}_{S}\!=\![{\mathbf{H}}]_{:,1:m}\in{\mathbb{C}}^{N_{r}\times m}. Before talking about the noise part 𝐍S{\mathbf{N}}_{S} in (16), the following lemma is a preliminary which gives the distribution of entries in the product of matrices.

Lemma 2

Given a semi-unitary matrix 𝐀d×N{\mathbf{A}}\in{\mathbb{C}}^{d\times N} with 𝐀𝐀H=𝐈d{\mathbf{A}}{\mathbf{A}}^{H}={\mathbf{I}}_{d}, and a random matrix 𝐗N×m{\mathbf{X}}\in{\mathbb{C}}^{N\times m} with i.i.d. entries of 𝒞𝒩(0,σ2){\mathcal{C}}{\mathcal{N}}(0,\sigma^{2}), the product 𝐘=𝐀𝐗d×m{\mathbf{Y}}={\mathbf{A}}{\mathbf{X}}\in{\mathbb{C}}^{d\times m} also has i.i.d. entries with distribution of 𝒞𝒩(0,σ2){\mathcal{C}}{\mathcal{N}}(0,\sigma^{2}).

Proof:

See Appendix B. ∎∎

Therefore, considering the noise part in (16), i.e., 𝐍S=𝐌𝐍~{\mathbf{N}}_{S}={\mathbf{M}}\widetilde{{\mathbf{N}}}, where 𝐌{\mathbf{M}} is unitary and 𝐍~\widetilde{{\mathbf{N}}} has i.i.d. 𝒞𝒩(0,σ2){\mathcal{C}}{\mathcal{N}}(0,\sigma^{2}) entries, the conclusion of Lemma 2 can be applied, which verifies that the entries of 𝐍S{\mathbf{N}}_{S} in (16) are i.i.d. as 𝒞𝒩(0,σ2){\mathcal{C}}{\mathcal{N}}(0,\sigma^{2}).

Given the expression in (16), the column subspace estimation problem is formulated as,

𝐔^=argmax𝐔Nr×L𝐔H𝐘SF2subject to𝐔H𝐔=𝐈L,\displaystyle\widehat{{\mathbf{U}}}=\mathop{\mathrm{argmax}}\limits_{{\mathbf{U}}\in{\mathbb{C}}^{N_{r}\times L}}\left\|{\mathbf{U}}^{H}{\mathbf{Y}}_{S}\right\|_{F}^{2}~{}\text{subject to}~{}{\mathbf{U}}^{H}{\mathbf{U}}={\mathbf{I}}_{L}, (17)

where one of the optimal solutions of (17) can be obtained by taking the dominant LL left singular vectors of 𝐘S{\mathbf{Y}}_{S}. Here, the number of paths, LL, is assumed to be known as a priori. In practice, it is possible to estimate LL by comparing the singular values of 𝐘S{\mathbf{Y}}_{S} [23]. Because 𝐘S=𝐇S+𝐍S{\mathbf{Y}}_{S}={\mathbf{H}}_{S}+{\mathbf{N}}_{S} and rank(𝐇S)=L\mathop{\mathrm{rank}}({\mathbf{H}}_{S})=L, there will be LL singular values of 𝐘S{\mathbf{Y}}_{S} whose magnitudes clearly dominate the other singular values. Alternatively, we can set it to LsupL_{\text{sup}}, which is an upper bound on the number of dominant paths such that LLsupL\leq L_{\text{sup}}.222Due to the limited RF chains, the dimension of channel subspaces for data transmission is less than min{MRF,NRF}\min\{M_{RF},N_{RF}\}. Thus, if the path number estimate is larger than min{MRF,NRF}\min\{M_{RF},N_{RF}\}, we let it be min{MRF,NRF}\min\{M_{RF},N_{RF}\}.

Now, we design the receive combiner 𝐖^\widehat{{\mathbf{W}}} in (4) for data transmission to approximate the estimated 𝐔^Nr×L\widehat{{\mathbf{U}}}\in{\mathbb{C}}^{N_{r}\times L} in (17). Specifically, we design the analog combiner 𝐖^ANr×MRF\widehat{{\mathbf{W}}}_{A}\in{\mathbb{C}}^{N_{r}\times M_{RF}} and digital combiner 𝐖^DMRF×L\widehat{{\mathbf{W}}}_{D}\in{\mathbb{C}}^{M_{RF}\times L} at the receiver by solving the following problem

(𝐖^A,𝐖^D)=argmin𝐖A,𝐖D𝐔^𝐖A𝐖DF,\displaystyle\left(\widehat{{\mathbf{W}}}_{A},\widehat{{\mathbf{W}}}_{D}\right)=\mathop{\mathrm{argmin}}_{{\mathbf{W}}_{A},{\mathbf{W}}_{D}}\|\widehat{{\mathbf{U}}}-{\mathbf{W}}_{A}{\mathbf{W}}_{D}\|_{F},
subject to |[𝐖A]i,j|=1Nr.\displaystyle\text{subject to~{}~{}}\left|[{\mathbf{W}}_{A}]_{i,j}\right|=\frac{1}{\sqrt{N_{r}}}. (18)

The problem above can be solved by using the OMP algorithm [5] or alternating minimization method [24]. The designed receive combiner is given by 𝐖^=𝐖^A𝐖^DNr×L\widehat{{\mathbf{W}}}=\widehat{{\mathbf{W}}}_{A}\widehat{{\mathbf{W}}}_{D}\in{\mathbb{C}}^{N_{r}\times L} with 𝐖^H𝐖^=𝐈L\widehat{{\mathbf{W}}}^{H}\widehat{{\mathbf{W}}}={\mathbf{I}}_{L}. The methods in [5, 24] have shown to guarantee the near optimal performance, such as col(𝐖^)col(𝐔^)\mathop{\mathrm{col}}(\widehat{{\mathbf{W}}})\approx\mathop{\mathrm{col}}(\widehat{{\mathbf{U}}}). The details of our column subspace estimation algorithm are summarized in Algorithm 1.

In general, col(𝐖^)\mathop{\mathrm{col}}(\widehat{{\mathbf{W}}}) is not equal to the column subspace of 𝐇{\mathbf{H}}, i.e., col(𝐔)\mathop{\mathrm{col}}({\mathbf{U}}) with 𝐔Nr×L{\mathbf{U}}\in{\mathbb{C}}^{N_{r}\times L}, due to the noise 𝐍S{\mathbf{N}}_{S} in (16). To analyze the column subspace accuracy ηc(𝐖^)\eta_{c}(\widehat{{\mathbf{W}}}) defined in (8), we introduce the theorem [25] below.

Theorem 1 ([25])

Suppose 𝐗M×N(MN){\mathbf{X}}\in{\mathbb{C}}^{M\times N}(M\geq N) is of rank-rr, and 𝐗^=𝐗+𝐍\widehat{{\mathbf{X}}}={\mathbf{X}}+{\mathbf{N}}, where [𝐍]i,j[{\mathbf{N}}]_{i,j} is i.i.d. with zero mean and unit variance (not necessarily Gaussian). Let the compact SVD of 𝐗{\mathbf{X}} be

𝐗=𝐔𝚺𝐕H,\displaystyle{\mathbf{X}}={\mathbf{U}}\mathbf{\Sigma}{\mathbf{V}}^{H},

where 𝐔M×r{\mathbf{U}}\in{\mathbb{C}}^{M\times r}, 𝐕N×r{\mathbf{V}}\in{\mathbb{C}}^{N\times r}, and 𝚺r×r\mathbf{\Sigma}\in{\mathbb{C}}^{r\times r}. We assume the singular values in 𝚺\mathbf{\Sigma} are in descending order, i,e, σ1(𝐗)σr(𝐗)\sigma_{1}({\mathbf{X}})\geq\cdots\geq\sigma_{r}({\mathbf{X}}). Similarly, we partition the SVD of 𝐗^\widehat{{\mathbf{X}}} as

𝐗^=[𝐔^𝐔^][𝚺^1𝟎𝟎𝚺^2][𝐕^H𝐕^H],\displaystyle\widehat{{\mathbf{X}}}=\begin{bmatrix}\widehat{{\mathbf{U}}}&\widehat{{\mathbf{U}}}_{\perp}\end{bmatrix}\begin{bmatrix}\widehat{\mathbf{\Sigma}}_{1}&\mathbf{0}\\ \mathbf{0}&\widehat{\mathbf{\Sigma}}_{2}\end{bmatrix}\begin{bmatrix}\widehat{{\mathbf{V}}}^{H}\\ \widehat{{\mathbf{V}}}_{\perp}^{H}\end{bmatrix},

where 𝐔^M×r\widehat{{\mathbf{U}}}\in{\mathbb{C}}^{M\times r}, 𝐔^M×(Mr)\widehat{{\mathbf{U}}}_{\perp}\in{\mathbb{C}}^{M\times(M-r)}, 𝐕^N×r\widehat{{\mathbf{V}}}\in{\mathbb{C}}^{N\times r}, 𝐕^N×(Nr)\widehat{{\mathbf{V}}}_{\perp}\in{\mathbb{C}}^{N\times(N-r)}, 𝚺^1r×r\widehat{\mathbf{\Sigma}}_{1}\in{\mathbb{C}}^{r\times r}, and 𝚺^2(Mr)×(Nr)\widehat{\mathbf{\Sigma}}_{2}\in{\mathbb{C}}^{(M-r)\times(N-r)}. Then, there exists a constant C>0C>0 such that

𝔼[σr2(𝐔H𝐔^)](1CM(σr2(𝐗)+N)σr4(𝐗))+,\displaystyle{\mathbb{E}}\left[\sigma_{r}^{2}({\mathbf{U}}^{H}\widehat{{\mathbf{U}}})\right]\geq\left(1-\frac{CM(\sigma_{r}^{2}({\mathbf{X}})+N)}{\sigma_{r}^{4}({\mathbf{X}})}\right)_{+},
𝔼[σr2(𝐕H𝐕^)](1CN(σr2(𝐗)+M)σr4(𝐗))+,\displaystyle{\mathbb{E}}\left[\sigma_{r}^{2}({\mathbf{V}}^{H}\widehat{{\mathbf{V}}})\right]\geq\left(1-\frac{CN(\sigma_{r}^{2}({\mathbf{X}})+M)}{\sigma_{r}^{4}({\mathbf{X}})}\right)_{+},

where the expectation is taken over the random noise 𝐍{\mathbf{N}}. In particular, when the noise is i.i.d. 𝒞𝒩(0,1){\mathcal{C}}{\mathcal{N}}(0,1), it has C=2C=2.

Algorithm 1 Column subspace estimation
1:  Input: channel dimension: NrN_{r}, NtN_{t}; number of RF chains at receiver: MRFM_{RF}; channel paths: LL; parameter: mm.
2:  Initialization: channel use index k=1k=1.
3:  for i=1i=1 to mm do
4:     Set transmit sounder as 𝐟(i)=𝐞i{\mathbf{f}}_{(i)}={\mathbf{e}}_{i}.
5:     for j=1j=1 to Nr/MRFN_{r}/M_{RF} do
6:        Design receive training sounder as 𝐖(i,j)=[𝐌]:,(j1)MRF+1:jMRF𝐈MRF{\mathbf{W}}_{(i,j)}=[{\mathbf{M}}]_{:,(j-1)M_{RF}+1:jM_{RF}}{\mathbf{I}}_{M_{RF}}.
7:        Obtain the received signal 𝐲(i,j)=𝐖(i,j)H𝐇𝐟(i)+𝐖(i,j)H𝐧(i,j){\mathbf{y}}_{(i,j)}\!\!\!\!=\!\!\!\!{\mathbf{W}}_{(i,j)}^{H}{\mathbf{H}}{\mathbf{f}}_{(i)}\!\!+\!\!{\mathbf{W}}_{(i,j)}^{H}{\mathbf{n}}_{(i,j)}.
8:        Update k=k+1k=k+1.
9:     end for
10:     𝐲i=[𝐲(i,1)T,,𝐲(i,Nr/MRF)T]T{\mathbf{y}}_{i}=\left[{\mathbf{y}}_{(i,1)}^{T},\cdots,{\mathbf{y}}^{T}_{(i,N_{r}/M_{RF})}\right]^{T}.
11:  end for
12:  𝐘S=𝐌[𝐲1,,𝐲m]{\mathbf{Y}}_{S}={\mathbf{M}}\left[{\mathbf{y}}_{1},\cdots,{\mathbf{y}}_{m}\right].
13:  Column subspace 𝐔^\widehat{{\mathbf{U}}} is obtained by the dominant LL left singular vectors of 𝐘S{\mathbf{Y}}_{S}.
14:  Design 𝐖^\widehat{{\mathbf{W}}} based on 𝐔^\widehat{{\mathbf{U}}} by solving (18).
15:  Output: Column subspace estimation 𝐖^\widehat{{\mathbf{W}}}.

We have the following proposition for the accuracy of the column subspace estimation in Algorithm 1.

Proposition 1

If the Euclidean distance 𝐖^𝐔^Fδ1\|\widehat{{\mathbf{W}}}-\widehat{{\mathbf{U}}}\|_{F}\leq\delta_{1} in (18), then the accuracy of the estimated column subspace matrix 𝐖^\widehat{{\mathbf{W}}} obtained from Algorithm 1 is lower bounded as

ηc(𝐖^)σL(𝐔^H𝐔)δ1,\displaystyle\sqrt{\eta_{c}(\widehat{{\mathbf{W}}})}\geq\sigma_{L}(\widehat{{\mathbf{U}}}^{H}{\mathbf{U}})-\delta_{1}, (19)

where 𝐔Nr×L{\mathbf{U}}\in{\mathbb{C}}^{N_{r}\times L} is the matrix composed of LL dominant left singular vectors of 𝐇{\mathbf{H}}. In particular, if δ10\delta_{1}\rightarrow 0, we have

𝔼[ηc(𝐖^)]\displaystyle\mathbb{E}\left[\eta_{c}(\widehat{{\mathbf{W}}})\right]\!\!\!\! \displaystyle\geq σL2(𝐔^H𝐔)\displaystyle\!\!\!\!\sigma_{L}^{2}(\widehat{{\mathbf{U}}}^{H}{\mathbf{U}}) (20)
\displaystyle\geq (12Nr(σ2σL2(𝐇S)+mσ4)σL4(𝐇S))+,\displaystyle\!\!\!\!\left(1-\frac{2N_{r}(\sigma^{2}\sigma_{L}^{2}({\mathbf{H}}_{S})+m\sigma^{4})}{\sigma_{L}^{4}({\mathbf{H}}_{S})}\right)_{+},

where the σL(𝐇S)\sigma_{L}({\mathbf{H}}_{S}) is the LLth largest singular value of 𝐇S{\mathbf{H}}_{S}.

Proof:

See Appendix C. ∎∎

From (20), the larger the value of mm is, the more accurate the column subspace estimation. Thus, when more columns are used for the column subspace estimation, the estimated column subspace will be more reliable. In particular, when the noise level is low such that σL2(𝐇S)mσ2\sigma_{L}^{2}({\mathbf{H}}_{S})\!\!\gg\!m\sigma^{2} in (20), we have

𝔼[ηc(𝐖^)](12Nrσ2σL2(𝐇S))+.\displaystyle\mathbb{E}\left[\eta_{c}(\widehat{{\mathbf{W}}})\right]\geq\left(1-\frac{2N_{r}\sigma^{2}}{\sigma_{L}^{2}({\mathbf{H}}_{S})}\right)_{+}.

It means that the column subspace estimation accuracy is linearly related to the value of σ2/σL2(𝐇S)\sigma^{2}\!\!/\!\sigma_{L}^{2}({\mathbf{H}}_{S}), i.e., 𝒪(SNR){\mathcal{O}}(\text{SNR}). On the other hand, when the noise level is high such that σL2(𝐇S)mσ2\sigma_{L}^{2}({\mathbf{H}}_{S})\!\ll\!m\sigma^{2}, the bound in (20) can be written as

𝔼[ηc(𝐖^)](12Nrmσ4σL4(𝐇S))+.\displaystyle\mathbb{E}\left[\eta_{c}(\widehat{{\mathbf{W}}})\right]\geq\left(1-\frac{2N_{r}m\sigma^{4}}{\sigma_{L}^{4}({\mathbf{H}}_{S})}\right)_{+}.

At low SNR, the column subspace estimation accuracy is quadratically related to σ4/σL4(𝐇S)\sigma^{4}/\sigma_{L}^{4}({\mathbf{H}}_{S}), i.e., 𝒪(SNR2){\mathcal{O}}(\text{SNR}^{2}).

Remark 2

When the number of paths, LL, increases, the value of σL(𝐇S)\sigma_{L}({\mathbf{H}}_{S}) in (20) will decrease, which can be interpreted as follows. When m,Nrm,N_{r}\!\rightarrow\!\infty, the entries in 𝐇SNr×m{\mathbf{H}}_{S}\!\in\!{\mathbb{C}}^{N_{r}\times m} can be generally approximated as standard Gaussian r.v.s [26]. Moreover, it has been shown in [27, 28] that the LLth largest singular value of σL(𝐇S)Nr+1LNr\sigma_{L}({\mathbf{H}}_{S})\!\propto\!\frac{N_{r}+1-L}{\sqrt{N_{r}}} with high probability. As a result, the accuracy of column subspace estimation will be decreased as LL increases due to (20) of Proposition 1.

III-B Estimate the Row Subspace

In this subsection, we present how to learn the row subspace by leveraging the estimated column subspace matrix 𝐖^\widehat{{\mathbf{W}}}. Because we have already sampled the first mm columns of 𝐇{\mathbf{H}} in the first stage, we only need to sample the remaining NtmN_{t}-m columns to estimate the row subspace as shown in Fig. 2.

At the kkth channel use of the second stage, we observe the (m+k)(m+k)th column of 𝐇{\mathbf{H}}, k=1,,(Ntm)k=1,\ldots,(N_{t}-m). To achieve this, we employ the transmit sounder as

𝐟(k)=𝐞m+k.\displaystyle{\mathbf{f}}_{(k)}={\mathbf{e}}_{m+k}. (21)

For the receive sounder, given the estimated column subspace matrix 𝐖^\widehat{{\mathbf{W}}} in the first stage, we just let the receive sounder of the second stage be 𝐖^Nr×L\widehat{{\mathbf{W}}}\in{\mathbb{C}}^{N_{r}\times L}.333It is worth noting that because the estimated column subspace of the first stage is 𝐖^Nr×L\widehat{{\mathbf{W}}}\in{\mathbb{C}}^{N_{r}\times L}, thus the dimension for receive sounder of second stage is Nr×L{N_{r}\times L} rather than Nr×MRF{N_{r}\times M_{RF}} in (1). It is worth noting 𝐖^\widehat{{\mathbf{W}}} is trivially applicable for hybrid precoding architecture since 𝐖^\widehat{{\mathbf{W}}} is obtained from (18). Therefore, under the transmit sounder 𝐟(k){\mathbf{f}}_{(k)} in (21) and receive sounder 𝐖^\widehat{{\mathbf{W}}} in (18), the observation 𝐲(k)L×1{\mathbf{y}}_{(k)}\in{\mathbb{C}}^{L\times 1} at the receiver can be given by

𝐲(k)\displaystyle{\mathbf{y}}_{(k)} =\displaystyle= 𝐖^H𝐇𝐟(k)+𝐖^H𝐧(k)\displaystyle\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}{\mathbf{f}}_{(k)}+\widehat{{\mathbf{W}}}^{H}{\mathbf{n}}_{(k)} (22)
=\displaystyle= 𝐖^H[𝐇]:,m+k+𝐖^H𝐧(k),\displaystyle\widehat{{\mathbf{W}}}^{H}[{\mathbf{H}}]_{:,m+k}+\widehat{{\mathbf{W}}}^{H}{\mathbf{n}}_{(k)},

where 𝐧(k)Nr×1{\mathbf{n}}_{(k)}\in{\mathbb{C}}^{N_{r}\times 1} is the noise vector with 𝐧(k)𝒞𝒩(𝟎Nr,σ2𝐈Nr){\mathbf{n}}_{(k)}\sim{\mathcal{C}}{\mathcal{N}}(\bm{0}_{N_{r}},\sigma^{2}{\mathbf{I}}_{N_{r}}). Then, the observations k=1,,(Ntm)k=1,\ldots,(N_{t}-m) in (22) are packed into a matrix 𝐐^CL×(Ntm)\widehat{{\mathbf{Q}}}_{C}\in{\mathbb{C}}^{L\times(N_{t}-m)} as

𝐐^C\displaystyle\widehat{{\mathbf{Q}}}_{C} =\displaystyle= [𝐲(1),𝐲(2),,𝐲(Ntm)]\displaystyle[{\mathbf{y}}_{(1)},{\mathbf{y}}_{(2)},\cdots,{\mathbf{y}}_{(N_{t}-m)}] (23)
=\displaystyle= 𝐖^H(𝐇C+𝐍C),\displaystyle\widehat{{\mathbf{W}}}^{H}({\mathbf{H}}_{C}+{\mathbf{N}}_{C}),

where 𝐇C=[[𝐇]:,m+1,,[𝐇]:,Nt]Nr×(Ntm){\mathbf{H}}_{C}=\left[[{\mathbf{H}}]_{:,m+1},\ldots,[{\mathbf{H}}]_{:,N_{t}}\right]\in{\mathbb{C}}^{N_{r}\times(N_{t}-m)}, and 𝐍C=[𝐧(1),,𝐧(Ntm)]Nr×(Ntm){\mathbf{N}}_{C}=[{\mathbf{n}}_{(1)},\ldots,{\mathbf{n}}_{(N_{t}-m)}]\in{\mathbb{C}}^{N_{r}\times(N_{t}-m)}.

Algorithm 2 Row subspace estimation
1:  Input: channel dimension: NrN_{r}, NtN_{t}; channel paths: LL; estimated column subspace: 𝐖^\widehat{{\mathbf{W}}}; observations of first stage: 𝐘S{\mathbf{Y}}_{S}; parameter: mm.
2:  Set the receive training sounder as 𝐖^\widehat{{\mathbf{W}}}.
3:  for k=k= 1 to (Ntm)(N_{t}-m) do
4:     Set the transmit training sounder as 𝐟(k)=𝐞m+k{\mathbf{f}}_{(k)}={\mathbf{e}}_{m+k}.
5:     Obtain the received signal:
6:     𝐲(k)=𝐖^H𝐇𝐟(k)+𝐖^H𝐧(k){\mathbf{y}}_{(k)}=\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}{\mathbf{f}}_{(k)}+\widehat{{\mathbf{W}}}^{H}{\mathbf{n}}_{(k)}.
7:  end for
8:  Stack all the observations and (24):
9:  𝐐^C=[𝐲(1),𝐲(2),,𝐲(Ntm)]\widehat{{\mathbf{Q}}}_{C}=[{\mathbf{y}}_{(1)},{\mathbf{y}}_{(2)},\cdots,{\mathbf{y}}_{(N_{t}-m)}].
10:  Calculate 𝐐^\widehat{{\mathbf{Q}}}: 𝐐^=[𝐖^H𝐘S,𝐐^C]\widehat{{\mathbf{Q}}}=\left[\widehat{{\mathbf{W}}}^{H}{\mathbf{Y}}_{S},\widehat{{\mathbf{Q}}}_{C}\right].
11:  Row subspace matrix 𝐕^\widehat{{\mathbf{V}}} is obtained by the dominant LL right singular vectors of 𝐐^\widehat{{\mathbf{Q}}}.
12:  Design 𝐅^\widehat{{\mathbf{F}}} based on 𝐕^\widehat{{\mathbf{V}}} by solving (26).
13:  Output: row subspace estimation 𝐅^\widehat{{\mathbf{F}}}.

In addition, given the receive sounder 𝐖^\widehat{{\mathbf{W}}} and observations 𝐘S{\mathbf{Y}}_{S} of the first stage in (16), we define 𝐐^SL×m\widehat{{\mathbf{Q}}}_{S}\in{\mathbb{C}}^{L\times m} as,

𝐐^S=𝐖^H𝐘S=𝐖^H(𝐇S+𝐍S).\displaystyle\widehat{{\mathbf{Q}}}_{S}=\widehat{{\mathbf{W}}}^{H}{\mathbf{Y}}_{S}=\widehat{{\mathbf{W}}}^{H}({\mathbf{H}}_{S}+{\mathbf{N}}_{S}). (24)

Combining (24) and (23) yields 𝐐^L×Nt\widehat{{\mathbf{Q}}}\in{\mathbb{C}}^{L\times N_{t}} expressed as,

𝐐^\displaystyle\widehat{{\mathbf{Q}}} =\displaystyle= [𝐐^S,𝐐^C]\displaystyle\left[\widehat{{\mathbf{Q}}}_{S},\widehat{{\mathbf{Q}}}_{C}\right] (25)
=\displaystyle= [𝐖^H(𝐇S+𝐍S),𝐖^H(𝐇C+𝐍C)]\displaystyle\left[\widehat{{\mathbf{W}}}^{H}({\mathbf{H}}_{S}+{\mathbf{N}}_{S}),\widehat{{\mathbf{W}}}^{H}({\mathbf{H}}_{C}+{\mathbf{N}}_{C})\right]
=\displaystyle= 𝐖^H𝐇𝐐¯+𝐖^H𝐍𝐍¯,\displaystyle\underbrace{\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}}_{\triangleq\bar{{\mathbf{Q}}}}+\underbrace{\widehat{{\mathbf{W}}}^{H}{\mathbf{N}}}_{\triangleq\bar{{\mathbf{N}}}},

where 𝐍=[𝐍S,𝐍C]Nr×Nt{\mathbf{N}}=[{\mathbf{N}}_{S},{\mathbf{N}}_{C}]\!\!\in{\mathbb{C}}^{N_{r}\times N_{t}}, 𝐇=[𝐇S,𝐇C]Nr×Nt{\mathbf{H}}=[{\mathbf{H}}_{S},{\mathbf{H}}_{C}]\in{\mathbb{C}}^{N_{r}\times N_{t}}, 𝐐¯=𝐖^H𝐇L×Nt\bar{{\mathbf{Q}}}=\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}\in{\mathbb{C}}^{L\times N_{t}}, and 𝐍¯=𝐖^H𝐍Nr×Nt\bar{{\mathbf{N}}}=\widehat{{\mathbf{W}}}^{H}{\mathbf{N}}\in{\mathbb{C}}^{N_{r}\times N_{t}}. Meanwhile, since 𝐖^\widehat{{\mathbf{W}}} is semi-unitary and the entries in 𝐍{{\mathbf{N}}} are i.i.d. with distribution 𝒞𝒩(0,σ2){\mathcal{C}}{\mathcal{N}}(0,\sigma^{2}), according to Lemma 2, the entries in 𝐍¯\bar{{\mathbf{N}}} are also i.i.d. with distribution 𝒞𝒩(0,σ2){\mathcal{C}}{\mathcal{N}}(0,\sigma^{2}).

Now, given the expression 𝐐^\widehat{{\mathbf{Q}}} in (25), the row subspace estimation problem is formulated as,

𝐕^=argmax𝐕Nt×L𝐐^𝐕F2subject to𝐕H𝐕=𝐈L,\displaystyle\widehat{{\mathbf{V}}}=\mathop{\mathrm{argmax}}\limits_{{\mathbf{V}}\in{\mathbb{C}}^{N_{t}\times L}}\|\widehat{{\mathbf{Q}}}{\mathbf{V}}\|_{F}^{2}~{}\text{subject to}~{}{\mathbf{V}}^{H}{\mathbf{V}}={\mathbf{I}}_{L},

where the estimated row subspace matrix 𝐕^Nt×L\widehat{{\mathbf{V}}}\in{\mathbb{C}}^{N_{t}\times L} is obtained as the dominant LL right singular vectors of 𝐐^\widehat{{\mathbf{Q}}}. Similarly, in order to design the precoder 𝐅^\widehat{{\mathbf{F}}} in (4) for data transmission, we need to approximate the estimated row subspace matrix 𝐕^\widehat{{\mathbf{V}}} under the hybrid precoding architecture. Specifically, we design the analog precoder 𝐅^ANt×NRF\widehat{{\mathbf{F}}}_{A}\in{\mathbb{C}}^{N_{t}\times N_{RF}}and digital precoder 𝐅^DNRF×L\widehat{{\mathbf{F}}}_{D}\in{\mathbb{C}}^{N_{RF}\times L} by solving the following problem

(𝐅^A,𝐅^D)=argmin𝐅A,𝐅D𝐕^𝐅A𝐅DF,\displaystyle\left(\widehat{{\mathbf{F}}}_{A},\widehat{{\mathbf{F}}}_{D}\right)=\mathop{\mathrm{argmin}}_{{\mathbf{F}}_{A},{\mathbf{F}}_{D}}\|\widehat{{\mathbf{V}}}-{\mathbf{F}}_{A}{\mathbf{F}}_{D}\|_{F},
subject to |[𝐅A]i,j|=1Nt.\displaystyle\text{subject to~{}~{}}\left|[{\mathbf{F}}_{A}]_{i,j}\right|=\frac{1}{\sqrt{N_{t}}}. (26)

Therefore, the transmit precoder is given by 𝐅^=𝐅^A𝐅^DNt×L\widehat{{\mathbf{F}}}=\widehat{{\mathbf{F}}}_{A}\widehat{{\mathbf{F}}}_{D}\in{\mathbb{C}}^{N_{t}\times L} with 𝐅^H𝐅^=𝐈L\widehat{{\mathbf{F}}}^{H}\widehat{{\mathbf{F}}}={\mathbf{I}}_{L}. Similarly, the method on solving (26) in [5] can guarantee col(𝐅^)col(𝐕^)\text{col}(\widehat{{\mathbf{F}}})\approx\text{col}(\widehat{{\mathbf{V}}}). The details of our row subspace estimation algorithm are shown in Algorithm 2. We have the following proposition about the estimated row subspace accuracy for Algorithm 2.

Proposition 2

If the Euclidean distance 𝐅^𝐕^Fδ2\|\widehat{{\mathbf{F}}}-\widehat{{\mathbf{V}}}\|_{F}\leq\delta_{2} in (26), then the accuracy of the estimated row subspace matrix 𝐅^\widehat{{\mathbf{F}}} obtained from Algorithm 2 is lower bounded as

ηr(𝐅^)σL(𝐕^H𝐕)δ2,\displaystyle\sqrt{\eta_{r}(\widehat{{\mathbf{F}}})}\geq\sigma_{L}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}})-\delta_{2}, (27)

where 𝐕Nt×L{\mathbf{V}}\in{\mathbb{C}}^{N_{t}\times L} is the matrix composed of the LL dominant right singular vectors of 𝐇{\mathbf{H}}. In particular, if δ20\delta_{2}\rightarrow 0, we have

𝔼[ηr(𝐅^)]\displaystyle\mathbb{E}\left[\eta_{r}(\widehat{{\mathbf{F}}})\right]\!\!\!\! \displaystyle\geq σL2(𝐕^H𝐕)\displaystyle\!\!\!\!\sigma_{L}^{2}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}}) (28)
\displaystyle\geq (12Nt(σ2σL2(𝐐¯)+Lσ4)σL4(𝐐¯))+,\displaystyle\!\!\!\!\left(1-\frac{2N_{t}(\sigma^{2}\sigma_{L}^{2}(\bar{{\mathbf{Q}}})+L\sigma^{4})}{\sigma_{L}^{4}(\bar{{\mathbf{Q}}})}\right)_{+},

where σL(𝐐¯)\sigma_{L}(\bar{{\mathbf{Q}}}) is the LLth largest singular value of 𝐐¯\bar{{\mathbf{Q}}} in (25).

Proof:

See Appendix D. ∎∎

Similar as the column subspace estimation, the row subspace accuracy linearly increases with the SNR, i.e., 𝒪(SNR)\mathcal{O}(\text{SNR}) at high SNR, and quadratically increases with SNR, i.e., 𝒪(SNR2){\mathcal{O}}(\text{SNR}^{2}), at low SNR. Also, the accuracy of row subspace estimation decreases with the number of paths, LL. As the value of σL(𝐐¯)\sigma_{L}(\bar{{\mathbf{Q}}}) in (28) grows, we can have a more accurate row subspace estimation. Moreover, considering 𝐐¯=𝐖^H𝐇\bar{{\mathbf{Q}}}=\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}, it is intuitive that the estimated column subspace matrix 𝐖^\widehat{{\mathbf{W}}} will affect the value of σL(𝐐¯)\sigma_{L}(\bar{{\mathbf{Q}}}), and then affect the accuracy of row subspace estimation. Specifically, when col(𝐖^)=col(𝐔)\mathop{\mathrm{col}}(\widehat{{\mathbf{W}}})=\mathop{\mathrm{col}}({{\mathbf{U}}}), we will have σL(𝐐¯)=σL(𝐇)\sigma_{L}(\bar{\mathbf{Q}})=\sigma_{L}({\mathbf{H}}), which attains the maximum. In the following, we further discuss the relationship between σL(𝐐¯)\sigma_{L}(\bar{{\mathbf{Q}}}) and σL(𝐇)\sigma_{L}({\mathbf{H}}).

With the SVD of 𝐇{\mathbf{H}}, i.e., 𝐇=𝐔𝚺𝐕H{\mathbf{H}}={\mathbf{U}}\mathbf{\Sigma}{\mathbf{V}}^{H}, we have 𝐐¯=𝐖^H𝐇=𝐖^H𝐔𝚺𝐕H\bar{{\mathbf{Q}}}=\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}=\widehat{{\mathbf{W}}}^{H}{\mathbf{U}}\mathbf{\Sigma}{\mathbf{V}}^{H}. Then, the following relationship is true due to the singular value product inequality,

σL(𝐐¯)\displaystyle\sigma_{L}(\bar{{\mathbf{Q}}}) \displaystyle\geq σL(𝐖^H𝐔)σL(𝚺𝐕H)\displaystyle\sigma_{L}(\widehat{{\mathbf{W}}}^{H}{\mathbf{U}})\sigma_{L}(\mathbf{\Sigma}{\mathbf{V}}^{H}) (29)
=\displaystyle= σL(𝐖^H𝐔)σL(𝐇).\displaystyle\sigma_{L}(\widehat{{\mathbf{W}}}^{H}{\mathbf{U}})\sigma_{L}({\mathbf{H}}).

Therefore, σL(𝐐¯)\sigma_{L}(\bar{{\mathbf{Q}}}) is lower bounded by the product of the LLth largest singular values of 𝐖^H𝐔\widehat{{\mathbf{W}}}^{H}{\mathbf{U}} and 𝐇{\mathbf{H}}. When the estimation of the column subspace becomes accurate, the σL(𝐖^H𝐔)\sigma_{L}(\widehat{{\mathbf{W}}}^{H}{\mathbf{U}}) will approach to one. As a result, the value of σL(𝐐¯)\sigma_{L}(\bar{{\mathbf{Q}}}) is approximately equal to σL(𝐇)\sigma_{L}({\mathbf{H}}), resulting in a further enhanced row subspace estimation. The inequality in (29) reveals that the column subspace estimation affects the accuracy of the row subspace estimation.

Given the estimated column subspace 𝐖^\widehat{{\mathbf{W}}} in Algorithm 1 and row subspace 𝐅^\widehat{{\mathbf{F}}} in Algorithm 2, the following lemma shows the subspace estimation accuracy of the proposed SASE, i.e., η(𝐖^,𝐅^)\eta(\widehat{{\mathbf{W}}},\widehat{{\mathbf{F}}}) defined in (7).

Lemma 3

If we assume δ10\delta_{1}\rightarrow 0 and δ20\delta_{2}\rightarrow 0 in (18) and (26), the subspace estimation accuracy defined in (7) associated with 𝐖^\widehat{{\mathbf{W}}} and 𝐅^\widehat{{\mathbf{F}}} is lower bounded as

η(𝐖^,𝐅^)σL2(𝐔^H𝐔)σL2(𝐕^H𝐕).\displaystyle\eta(\widehat{{\mathbf{W}}},\widehat{{\mathbf{F}}})\geq\sigma_{L}^{2}(\widehat{{\mathbf{U}}}^{H}{\mathbf{U}})\sigma_{L}^{2}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}}). (30)
Proof:

Using the definition of η(𝐖^,𝐕^)\eta(\widehat{{\mathbf{W}}},\widehat{{\mathbf{V}}}) in (7), we have the following expressions,

η(𝐖^,𝐅^)\displaystyle\eta(\widehat{{\mathbf{W}}},\widehat{{\mathbf{F}}}) =\displaystyle= 𝐖^H𝐇𝐅^F2/tr(𝐇H𝐇)\displaystyle{\|\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}\widehat{{\mathbf{F}}}\|_{F}^{2}}/{\mathop{\mathrm{tr}}({\mathbf{H}}^{H}{\mathbf{H}})}
=(a)\displaystyle\overset{(a)}{=} 𝐔^H𝐇𝐕^F2/tr(𝐇H𝐇)\displaystyle{\|\widehat{{\mathbf{U}}}^{H}{\mathbf{H}}\widehat{{\mathbf{V}}}\|_{F}^{2}}/{\mathop{\mathrm{tr}}({\mathbf{H}}^{H}{\mathbf{H}})}
=\displaystyle= 𝐔^H𝐔𝚺𝐕H𝐕^F2/tr(𝐇H𝐇)\displaystyle{\|\widehat{{\mathbf{U}}}^{H}{\mathbf{U}}\mathbf{\Sigma}{\mathbf{V}}^{H}\widehat{{\mathbf{V}}}\|_{F}^{2}}/{\mathop{\mathrm{tr}}({\mathbf{H}}^{H}{\mathbf{H}})}
(b)\displaystyle\overset{(b)}{\geq} σL2(𝐔^H𝐔)σL2(𝐕^H𝐕),\displaystyle\sigma_{L}^{2}(\widehat{{\mathbf{U}}}^{H}{\mathbf{U}})\sigma_{L}^{2}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}}),

where the equality (a)(a) holds for δ10\delta_{1}\rightarrow 0 and δ20\delta_{2}\rightarrow 0, and the inequality (b)(b) holds based on the singular value product inequality. ∎∎

Lemma 3 tells that the power captured by 𝐖^\widehat{{\mathbf{W}}} and 𝐅^\widehat{{\mathbf{F}}} is lower bounded by the product of σL2(𝐔^H𝐔)\sigma_{L}^{2}(\widehat{{\mathbf{U}}}^{H}{\mathbf{U}}) and σL2(𝐕^H𝐕)\sigma_{L}^{2}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}}). These two parts denotes the two stages in the proposed SASE, which are column subspace estimation and row subspace estimation, respectively. Ideally, when col(𝐔^)=col(𝐔)\text{col}(\widehat{{\mathbf{U}}})=\text{col}({{\mathbf{U}}}) and col(𝐕^)=col(𝐕)\text{col}(\widehat{{\mathbf{V}}})=\text{col}({{\mathbf{V}}}), we have η(𝐖^,𝐅^)=1\eta(\widehat{{\mathbf{W}}},\widehat{{\mathbf{F}}})=1. Nevertheless, the proposed SASE can still achieve nearly optimal η(𝐖^,𝐅^)\eta(\widehat{{\mathbf{W}}},\widehat{{\mathbf{F}}}). This is because σL2(𝐔^H𝐔)\sigma_{L}^{2}(\widehat{{\mathbf{U}}}^{H}{\mathbf{U}}) and σL2(𝐕^H𝐕)\sigma_{L}^{2}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}}) are close to one according to the bounds provided in (20) and (28), respectively.

III-C Channel Estimation Based on the Estimated Subspaces

In this subsection, we introduce a channel estimation method based on the estimated column subspace 𝐖^Nr×L\widehat{{\mathbf{W}}}\in{\mathbb{C}}^{N_{r}\times L} and row subspace 𝐅^Nt×L\widehat{{\mathbf{F}}}\in{\mathbb{C}}^{N_{t}\times L}. Let the channel estimate be expressed as

𝐇^=𝐖^𝐑^𝐅^H,\displaystyle\widehat{{\mathbf{H}}}=\widehat{{\mathbf{W}}}\widehat{{\mathbf{R}}}\widehat{{\mathbf{F}}}^{H}, (31)

where 𝐑^L×L\widehat{{\mathbf{R}}}\in{\mathbb{C}}^{L\times L}. Now, given 𝐖^\widehat{{\mathbf{W}}} and 𝐅^\widehat{{\mathbf{F}}}, it only needs to obtain 𝐑^\widehat{{\mathbf{R}}} in an optimal manner.

Recalling the column subspace estimation in Section III-A and row subspace estimation in Section III-B, the corresponding received signals are expressed as

𝐘S\displaystyle{\mathbf{Y}}_{S} =\displaystyle= 𝐇S+𝐍S\displaystyle{\mathbf{H}}_{S}+{\mathbf{N}}_{S}
𝐐^C\displaystyle\widehat{{\mathbf{Q}}}_{C} =\displaystyle= 𝐖^H𝐇C+𝐖^H𝐍C.\displaystyle\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}_{C}+\widehat{{\mathbf{W}}}^{H}{\mathbf{N}}_{C}.

It is worth noting that the entries in 𝐍S{\mathbf{N}}_{S} and 𝐖^H𝐍C\widehat{{\mathbf{W}}}^{H}{\mathbf{N}}_{C} are both i.i.d with distribution 𝒞𝒩(0,σ2){\mathcal{C}}{\mathcal{N}}(0,\!\sigma^{2}). Based on the expression of 𝐇^\widehat{{\mathbf{H}}} in (31), the maximum likelihood estimation of 𝐑^\widehat{{\mathbf{R}}} in (31) can be obtained through the following least squares problem,

min𝐑L×L𝐘S𝐇^SF2+𝐐^C𝐖^H𝐇^CF2\displaystyle\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\min\limits_{{\mathbf{R}}\in{\mathbb{C}}^{L\times L}}\|{\mathbf{Y}}_{S}-\widehat{{\mathbf{H}}}_{S}\|_{F}^{2}+\|\widehat{{\mathbf{Q}}}_{C}-\widehat{{\mathbf{W}}}^{H}\widehat{{\mathbf{H}}}_{C}\|_{F}^{2}
subject to𝐇^S=[𝐖^𝐑𝐅^H]:,1:m,𝐇^C=[𝐖^𝐑𝐅^H]:,m+1:Nt.\displaystyle\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\text{subject to}~{}\widehat{{\mathbf{H}}}_{S}\!\!=\!\![\widehat{{\mathbf{W}}}{\mathbf{R}}\widehat{{\mathbf{F}}}^{H}]_{:,1:m},~{}\!\!\widehat{{\mathbf{H}}}_{C}\!\!=\!\![\widehat{{\mathbf{W}}}{\mathbf{R}}\widehat{{\mathbf{F}}}^{H}]_{:,m+1:N_{t}}\!. (32)

Before discussing how to solve the problem in (32), for convenience, we define

𝐫\displaystyle{\mathbf{r}} =\displaystyle= vec(𝐑)L2×1,\displaystyle\mathop{\mathrm{vec}}({\mathbf{R}})\in{\mathbb{C}}^{L^{2}\times 1},
𝐲S\displaystyle{\mathbf{y}}_{S} =\displaystyle= vec(𝐘S)mNr×1,\displaystyle\mathop{\mathrm{vec}}({\mathbf{Y}}_{S})\in{\mathbb{C}}^{mN_{r}\times 1},
𝐪^C\displaystyle\widehat{{\mathbf{q}}}_{C} =\displaystyle= vec(𝐐^C)(Ntm)L×1,\displaystyle\mathop{\mathrm{vec}}(\widehat{{\mathbf{Q}}}_{C})\in{\mathbb{C}}^{(N_{t}-m)L\times 1},
𝐀1\displaystyle{\mathbf{A}}_{1} =\displaystyle= ([𝐅^]:,1:mH)T𝐖^mNr×L2,\displaystyle([\widehat{{\mathbf{F}}}]_{:,1:m}^{H})^{T}\otimes\widehat{{\mathbf{W}}}\in{\mathbb{C}}^{mN_{r}\times L^{2}},
𝐀2\displaystyle{\mathbf{A}}_{2} =\displaystyle= ([𝐅^]:,m+1:NtH)T𝐈L(Ntm)L×L2.\displaystyle([\widehat{{\mathbf{F}}}]_{:,m+1:N_{t}}^{H})^{T}\otimes{\mathbf{I}}_{L}\in{\mathbb{C}}^{(N_{t}-m)L\times L^{2}}.

Using the definitions above, the minimization problem in (32) can be rewritten as

min𝐫L2×1𝐲S𝐀1𝐫22+𝐪^C𝐀2𝐫22.\displaystyle\min\limits_{{\mathbf{r}}\in{\mathbb{C}}^{L^{2}\times 1}}\left\|{\mathbf{y}}_{S}\!-\!{\mathbf{A}}_{1}{\mathbf{r}}\right\|_{2}^{2}\!+\!\left\|\widehat{{\mathbf{q}}}_{C}\!-\!{\mathbf{A}}_{2}{\mathbf{r}}\right\|_{2}^{2}. (33)

The following lemma provides the solution of problem (33).

Lemma 4

Given the problem below

min𝐫L2×1𝐲S𝐀1𝐫22+𝐪^C𝐀2𝐫22,\displaystyle\min\limits_{{\mathbf{r}}\in{\mathbb{C}}^{L^{2}\times 1}}\left\|{\mathbf{y}}_{S}-{\mathbf{A}}_{1}{\mathbf{r}}\right\|_{2}^{2}\!+\!\left\|\widehat{{\mathbf{q}}}_{C}\!-\!{\mathbf{A}}_{2}{\mathbf{r}}\right\|_{2}^{2},

the optimal solution is given by

𝐫^=(𝐀1H𝐀1+𝐀2H𝐀2)1(𝐀1H𝐲S+𝐀2H𝐪^C).\displaystyle\widehat{{\mathbf{r}}}=({\mathbf{A}}_{1}^{H}{\mathbf{A}}_{1}+{\mathbf{A}}_{2}^{H}{\mathbf{A}}_{2})^{-1}({\mathbf{A}}_{1}^{H}{\mathbf{y}}_{S}+{\mathbf{A}}_{2}^{H}\widehat{{\mathbf{q}}}_{C}). (34)
Proof:

The problem is convex with respect to 𝐫{\mathbf{r}}. Thus, the optimal solution can be obtained by setting the first order derivative of the objective function to zero as

𝐀1H(𝐀1𝐫𝐲S)+𝐀2H(𝐀2𝐫𝐪^C)=𝟎.\displaystyle{\mathbf{A}}_{1}^{H}({\mathbf{A}}_{1}{\mathbf{r}}-{\mathbf{y}}_{S})+{\mathbf{A}}_{2}^{H}({\mathbf{A}}_{2}{\mathbf{r}}-\widehat{{\mathbf{q}}}_{C})=\mathbf{0}. (35)

The solution of (35) is exactly the result in (34), which concludes the proof. ∎∎

It is worth noting that after we have obtained the column and row subspace estimates, i.e., 𝐖^\widehat{{\mathbf{W}}} and 𝐅^\widehat{{\mathbf{F}}}, the channel estimation is simply to compute 𝐫^=vec(𝐑^)\widehat{{\mathbf{r}}}=\mathop{\mathrm{vec}}(\widehat{{\mathbf{R}}}) in (34). Since the dimension of 𝐑^\widehat{{\mathbf{R}}} is much lower than that of 𝐇{\mathbf{H}}, the channel estimation complexity is substantially reduced as shown in Lemma 4.

IV Discussion of Algorithm

In this section, we analyze the complexity of the proposed SASE method in terms of the channel use overhead and computational complexity. Moreover, we discuss the application of the SASE in other channel scenarios.

IV-A Channel Use Overhead

TABLE I: Channel Uses of Algorithms
Algorithms Number of Channel Uses
SASE mNr/MRF+(Ntm){mN_{r}}/{M_{RF}}+(N_{t}-m)
MF [11] 𝒪(L(Nr+Nt)/MRF)\mathcal{O}(L(N_{r}+N_{t})/{M_{RF}})
SD [10] 𝒪(L(Nr+Nt)/MRF)\mathcal{O}(L(N_{r}+N_{t})/{M_{RF}})
Arnoldi [16] 2qNr/MRF+2qNt/NRF2qN_{r}/{M_{RF}}+2qN_{t}/{N_{RF}}
OMP [8] 𝒪(Lln(G2)/MRF)\mathcal{O}(L\ln(G^{2})/{M_{RF}})
SBL [9] 𝒪(Lln(G2)/MRF)\mathcal{O}(L\ln(G^{2})/{M_{RF}})
ACE [17] s2L3logs(Nm/L)/MRFs^{2}L^{3}\mathrm{log}_{s}(N_{m}/L)/{M_{RF}}

Considering the channel uses in each stage, the total number of channel uses for the SASE is given by

KSASE\displaystyle K_{\text{SASE}} =\displaystyle= mNr/MRF+(Ntm).\displaystyle{mN_{r}}/{M_{RF}}+(N_{t}-m). (36)

Therefore, the number of channel uses grows linearly with the channel dimension, i.e., 𝒪(Nt)\mathcal{O}(N_{t}). In particular, when we let m=Lm=L, the number of channel uses in (36) will be LNr/MRF+(NtL){LN_{r}}/{M_{RF}}+(N_{t}-L). Considering that each channel use contributes to MRFM_{RF} observations in the first stage, and LL observations in the second stage, the total number of the observations is LNr+L(NtL)LN_{r}+L(N_{t}-L), which is equivalent to the degrees of freedom of rank\mathop{\mathrm{rank}}-LL matrix 𝐇Nr×Nt{\mathbf{H}}\in{\mathbb{C}}^{N_{r}\times N_{t}} [29].

The numbers of channel uses of the proposed SASE and other benchmarks [8, 9, 17, 10, 11, 16] are compared in Table I. For the angle estimation methods in [8, 9, 17], the number of required channel uses for the OMP [8] and SBL [9] is KOMP=KSBL=𝒪(Lln(G2)/MRF)K_{\text{OMP}}=K_{\text{SBL}}=\mathcal{O}(L\ln(G^{2})/{M_{RF}}), where GG is the number of grids with Gmax{Nr,Nt}G\geq\max\{N_{r},N_{t}\}. The number of channel uses for adaptive channel estimation (ACE) [17] is KACE=s2L3logs(Nm/L)/MRFK_{\text{ACE}}=s^{2}L^{3}\mathrm{log}_{s}(N_{m}/L)/{M_{RF}}, where 2π/Nm2\pi/N_{m} with Nmmax{Nr,Nt}N_{m}\geq\max\{N_{r},N_{t}\} is the desired angle resolution for the ACE, and ss is the number of beamforming vectors in each stage of the ACE. For the subspace estimation methods in [10, 11, 16], the numbers of required channel uses for subspace decomposition (SD) [10] and matrix factorization (MF) [11] are KSD=KMF=𝒪(L(Nr+Nt)/MRF)K_{\text{SD}}=K_{\text{MF}}=\mathcal{O}(L(N_{r}+N_{t})/{M_{RF}}), while it requires KArnoldi=2qNr/MRF+2qNt/NRFK_{\text{Arnoldi}}=2qN_{r}/{M_{RF}}+2qN_{t}/{N_{RF}} channel uses where qLq\geq L for Arnoldi approach [16]. Because the number of estimated parameters of the angle estimation methods such as OMP, SBL, and ACE, is less than that of the proposed SASE, they require slightly fewer channel uses than SASE. Nevertheless, the proposed SASE consumes fewer channel uses than those of the existing subspace estimation methods [10, 11, 16] as shown in Table I.

IV-B Computational Complexity

For the proposed SASE, the computational complexity of the first stage comes from the SVD of 𝐘S{{\mathbf{Y}}}_{S}, which is 𝒪(m2Nr)\mathcal{O}(m^{2}N_{r}) [28]. The complexity of the second stage is dominated by the design of 𝐖^\widehat{{\mathbf{W}}} in (18), which is 𝒪(LDNr)\mathcal{O}(LDN_{r}), where DNrD\geq N_{r} denotes the cardinality of an over-complete dictionary. Hence, the overall complexity of the proposed SASE algorithm is 𝒪(m2Nr+LDNr)=𝒪(LDNr)\mathcal{O}(m^{2}N_{r}+LDN_{r})=\mathcal{O}(LDN_{r}). The computational complexities of benchmarks, i.e., the angle estimation methods OMP [8], SBL [9], and ACE [17] along with the subspace estimation methods Arnoldi [16], SD [10], and MF [11] are compared in Table II, where KK denotes the number of channel uses. For a fair comparison, when comparing the computational complexity, we assume the number of channel uses, KK, is equal among the benchmarks. As we can see from Table II, the proposed SASE has the lowest computational complexity.

TABLE II: Computational Complexity of Algorithms
Algorithms Computational Complexity
SASE 𝒪(LDNr)\mathcal{O}(LDN_{r})
MF [11] 𝒪(KMRFL2(Nr2+Nt2))\mathcal{O}(KM_{RF}L^{2}(N_{r}^{2}+N_{t}^{2}))
SD [10] 𝒪(KMRFL2(Nr2+Nt2))\mathcal{O}(KM_{RF}L^{2}(N_{r}^{2}+N_{t}^{2}))
Arnoldi [16] 𝒪(K2MRF2/(Nr+Nt))\mathcal{O}(K^{2}M_{RF}^{2}/(N_{r}+N_{t}))
OMP [8] 𝒪(LKMRFG2)\mathcal{O}(LKM_{RF}G^{2})
SBL [9] 𝒪(G6)\mathcal{O}(G^{6})
ACE [17] 𝒪(KMRF2DNr/(sL)+KNRF2DNt/(sL))\mathcal{O}(KM_{RF}^{2}DN_{r}/(sL)+KN_{RF}^{2}DN_{t}/(sL))

IV-C Extension of SASE

In this subsection, we extend the proposed SASE to the 2D mmWave channel model with UPAs. There are NclN_{cl} clusters, and each of cluster is composed of NrayN_{ray} rays. For this model, the mmWave channel matrix is expressed as [30, 31, 5]

𝐇=NrNtNclNrayi=1Nclj=1Nrayhij𝐚r(ϕijr,θijr)𝐚tH(ϕijt,θijt),\displaystyle{\mathbf{H}}=\sqrt{\frac{N_{r}N_{t}}{N_{cl}N_{ray}}}\sum_{i=1}^{N_{cl}}\sum_{j=1}^{N_{ray}}h_{ij}{\mathbf{a}}_{r}(\phi^{r}_{ij},\theta^{r}_{ij}){\mathbf{a}}_{t}^{H}(\phi^{t}_{ij},\theta^{t}_{ij}), (37)

where hijh_{ij} represents the complex gain associated with the jjth path of the iith cluster. The 𝐚r(ϕijr,θijr)Nr×1{\mathbf{a}}_{r}(\phi^{r}_{ij},\theta^{r}_{ij})\in{\mathbb{C}}^{N_{r}\times 1} and 𝐚t(ϕijt,θijt)Nt×1{\mathbf{a}}_{t}(\phi^{t}_{ij},\theta^{t}_{ij})\in{\mathbb{C}}^{N_{t}\times 1} are the receive and transmit array response vectors, where ϕijr(ϕijt)\phi^{r}_{ij}(\phi^{t}_{ij}) and θijr(θijt)\theta^{r}_{ij}(\theta^{t}_{ij}) denote the azimuth and elevation angles of the receiver (transmitter). Specifically, the 𝐚r(ϕijr,θijr){\mathbf{a}}_{r}(\phi^{r}_{ij},\theta^{r}_{ij}) and 𝐚t(ϕijt,θijt){\mathbf{a}}_{t}(\phi^{t}_{ij},\theta^{t}_{ij}) are expressed as

𝐚r(ϕijr,θijr)=1Nr[1,,ej2πλd(mrsinϕijrsinθijr+nrcosθijr),\displaystyle{\mathbf{a}}_{r}(\phi^{r}_{ij},\theta^{r}_{ij})=\frac{1}{\sqrt{N_{r}}}[1,\cdots,e^{j\frac{2\pi}{\lambda}d(m_{r}\sin\phi^{r}_{ij}\sin\theta^{r}_{ij}+n_{r}\cos\theta^{r}_{ij})},
,ej2πλd((Nr1)sinϕijrsinθijr+(Nr1)cosθijr)],\displaystyle\cdots,e^{j\frac{2\pi}{\lambda}d((\sqrt{N_{r}}-1)\sin\phi^{r}_{ij}\sin\theta^{r}_{ij}+(\sqrt{N_{r}}-1)\cos\theta^{r}_{ij})}],
𝐚t(ϕijt,θijt)=1Nt[1,,ej2πλd(mtsinϕijtsinθijt+ntcosθijt),\displaystyle{\mathbf{a}}_{t}(\phi^{t}_{ij},\theta^{t}_{ij})=\frac{1}{\sqrt{N_{t}}}[1,\cdots,e^{j\frac{2\pi}{\lambda}d(m_{t}\sin\phi^{t}_{ij}\sin\theta^{t}_{ij}+n_{t}\cos\theta^{t}_{ij})},
,ej2πλd((Nt1)sinϕijtsinθijt+(Nt1)cosθijt)],\displaystyle\cdots,e^{j\frac{2\pi}{\lambda}d((\sqrt{N_{t}}-1)\sin\phi^{t}_{ij}\sin\theta^{t}_{ij}+(\sqrt{N_{t}}-1)\cos\theta^{t}_{ij})}],

where dd and λ\lambda are the antenna spacing and the wavelength, respectively, 0mr,nr<Nr0\leq m_{r},n_{r}<\sqrt{N_{r}} and 0mt,nt<Nt0\leq m_{t},n_{t}<\sqrt{N_{t}} are the antenna indices in the 2D plane.

For the channel model in (37), it is worth noting that the rank of 𝐇{\mathbf{H}} is at most NclNrayN_{cl}N_{ray}. Using the similar derivations as the proof of Lemma 1, we can verify that when mNclNraym\geq N_{cl}N_{ray}, the sub-matrix 𝐇S=[𝐇]:,1:mNr×m{\mathbf{H}}_{S}=[{\mathbf{H}}]_{:,1:m}\in{\mathbb{C}}^{N_{r}\times m} satisfies rank(𝐇S)=rank(𝐇)\mathop{\mathrm{rank}}({\mathbf{H}}_{S})=\mathop{\mathrm{rank}}({\mathbf{H}}). Therefore, it is possible to sample the first mm columns of 𝐇{\mathbf{H}} in (37) to obtain column subspace information, and sample the remaining columns to obtain the row subspace information. This means that the proposed SASE can be extended directly to the channel model given in (37).

In summary, the proposed SASE has no strict limitations to be applied to other channel models if the channel matrix 𝐇{\mathbf{H}} experiences sparse propagation and col(𝐇S)=col(𝐇)\text{col}({\mathbf{H}}_{S})=\text{col}({\mathbf{H}}). Moreover, because the proposed SASE is an open-loop framework, it can be easily extended to multiuser MIMO downlink scenarios.

V Simulation Results

In this section, we evaluate the performance of the proposed SASE algorithm by simulation.

V-A Simulation Setup

In the simulation, we consider the numbers of the receive and transmit antennas are Nr=36N_{r}=36, and Nt=144N_{t}=144, respectively, and the numbers of the RF chains at the receiver and transmitter are MRF=6M_{RF}=6 and NRF=8N_{RF}=8, respectively. Without lose of generality, it is assumed that the variance of the complex gain of the llth path is σh,l2=1,l\sigma_{h,l}^{2}=1,\forall l. We consider three subspace-based channel estimation methods as the benchmarks, i.e., SD [10] and MF [11], and Arnoldi [16], where SD and MF aim to recover the low-rank mmWave channel matrix, and Arnoldi is to estimate the dominant singular subspaces of the mmWave channel. For a fair comparison, the considered benchmarks are to estimate the subspace rather than the parameters such as the angles of the paths.

V-B Numerical Results

Refer to caption
Figure 3: rank(𝐇S)\mathop{\mathrm{rank}}({\mathbf{H}}_{S}) versus mm (Nt=144;Nr=36;L=4N_{t}=144;N_{r}=36;L=4)

In order to evaluate the subspace accuracy of different methods, we compute the subspace accuracy η(𝐖^,𝐅^)\eta(\widehat{{\mathbf{W}}},\widehat{{\mathbf{F}}}) in (7), column subspace accuracy ηc(𝐖^)\eta_{c}(\widehat{{\mathbf{W}}}) in (8), and row subspace accuracy ηr(𝐅^)\eta_{r}(\widehat{{\mathbf{F}}}) in (9) for comparison. We also evaluate the normalized mean squared error (NMSE) and spectrum efficiency. The NMSE is defined as NMSE=𝔼[𝐇𝐇^F2/𝐇F2]\text{NMSE}={\mathbb{E}}[{\|{\mathbf{H}}-\widehat{{\mathbf{H}}}\|_{F}^{2}}/{\|{\mathbf{H}}\|_{F}^{2}}], where 𝐇^\widehat{{\mathbf{H}}} denotes the channel estimate. In particular, the channel estimate of the SASE is obtained by the method derived in Section III-C. The spectrum efficiency in (5) is calculated with the combiner 𝐖^\widehat{{\mathbf{W}}} and precoder 𝐅^\widehat{{\mathbf{F}}}, which are designed according to the precoding design techniques provided in [5] with the obtained channel estimate 𝐇^\widehat{{\mathbf{H}}} via channel estimation.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 4: The subspace accuracy versus SNR (dB) when Nt=144,Nr=36,L=4,MRF=6,NRF=8,K=244N_{t}=144,N_{r}=36,L=4,M_{RF}=6,N_{RF}=8,K=244: (a) Column subspace accuracy ηc\eta_{c}, (b) Row subspace accuracy ηr\eta_{r}, (c) Subspace accuracy η\eta.

V-B1 Equivalence of Subspace

It is worth noting that the column subspace estimation in Section III-A depends on the fact of subspace equivalence between 𝐇S{\mathbf{H}}_{S} and 𝐇{\mathbf{H}} in (16). We illustrate in Fig. 3 the rank of 𝐇S{\mathbf{H}}_{S} with different mm. In this simulation, we set L=4L=4 and m={1L,2L,,10L}m=\{1L,2L,\ldots,10L\}. It can be seen in Fig. 3 that the rank of 𝐇S{\mathbf{H}}_{S} is equal to LL for all the values of mm, i.e., the rank of 𝐇S{\mathbf{H}}_{S} is equal to the rank of 𝐇{\mathbf{H}}, for mLm\geq L. This validates the fact that col(𝐇S)=col(𝐇)\text{col}({\mathbf{H}}_{S})=\text{col}({\mathbf{H}}).

V-B2 Performance versus Signal-to-Noise Ratio

In Fig. 4 and Fig. 5, we compare the performance versus SNR of the proposed SASE algorithm to SD, MF and Arnoldi methods. The number of paths is set as L=4L=4. For a fair comparison, the numbers of channel uses for the benchmarks are kept approximately equal, i.e., K=244K=244.

Refer to caption
(a)
Refer to caption
(b)
Figure 5: The channel estimation performance versus SNR (dB) when Nt=144,Nr=36,L=4,MRF=6,NRF=8,K=244N_{t}=144,N_{r}=36,L=4,M_{RF}=6,N_{RF}=8,K=244: (a) NMSE, (b) Spectrum efficiency.

In Fig. 4(a), the column subspace accuracy ηc\eta_{c} of the proposed SASE is compared with the benchmarks. As we can see, the SASE and SD methods obtain nearly similar column subspace accuracy, and they outperform over the MF and Arnoldi. It means that sampling the sub-matrix 𝐇S{\mathbf{H}}_{S} of the channel 𝐇{\mathbf{H}} can provide a robust column subspace estimation. In Fig. 4(b), the row subspace accuracy ηr\eta_{r} versus SNR is plotted. We found that the proposed SASE outperforms over the others. It verifies that adapting the receiver sounders to the column subspace can surely improve the accuracy of row subspace estimation. In Fig. 4(c), the subspace accuracy η\eta defined in (7) of the proposed SASE is evaluated. As can be seen that the proposed SASE achieves the most accurate subspace estimation over the other methods. For the SASE, MF and SD, the nearly optimal subspace estimation, i.e., η1\eta\approx 1, can be achieved in the high SNR region (10dB20dB10\text{dB}\sim 20\text{dB}). Since the performance of the Arnoldi highly depends on the number of available channel uses, its accuracy is degraded and saturated at high SNR due to the limited channel uses (K=244K=244). Thus, the ideal performance of the Arnoldi relies on a large number of channel uses or enough RF chains [16].

In Fig. 5(a), the NMSE of the proposed SASE is decreased as the SNR increases. It has similar characteristis as that of the MF, but has much lower value. The NMSE of the SD is almost constant in the low SNR region and decreases in higher SNR region. Overall, the SASE outperforms the SD when SNR15dB\text{SNR}\geq-15\text{dB}. In Fig. 5(b), the spectral efficiency of the SASE is plotted. The curve for perfect CSI with fully digital precoding is plotted for comparison. The proposed SASE achieves the nearly optimal spectrum efficiency among all the methods. It is observed that the spectrum efficiency of the SASE has a different trend from the NMSE in Fig. 5(a), while it has similar characteristic as the subspace accuracy in Fig. 4(c). The evaluation validates the effectiveness of the SASE in channel estimation to provide good spectrum efficiency.

Refer to caption
(a)
Refer to caption
(b)
Figure 6: The channel estimation performance versus the number of channel uses KK when Nt=144,Nr=36,L=4,MRF=6,NRF=8,SNR=5dB,20dBN_{t}=144,N_{r}=36,L=4,M_{RF}=6,N_{RF}=8,\text{SNR}=5\text{dB},20\text{dB}: (a) Subspace accuracy η\eta, (b) Spectrum efficiency.

V-B3 Performance versus Number of Channel Uses

In Fig. 6, we show the channel estimation performance of the SASE for different numbers of channel uses. The simulation setting is L=4,SNR=5,20dBL=4,\text{SNR}=5,20\text{dB}. The value of mm in (36) is in the set of {4,8,,48}\{4,8,\cdots,48\}. Accordingly, the set of the numbers of channel uses is K={164,184,,384}K=\{164,184,\cdots,384\}.

Refer to caption
(a)
Refer to caption
(b)
Figure 7: The channel estimation performance versus the number of paths LL when Nt=144,Nr=36,MRF=6,NRF=8,K=244,SNR=5dB,20dBN_{t}=144,N_{r}=36,M_{RF}=6,N_{RF}=8,K=244,\text{SNR}=5\text{dB},20\text{dB}: (a) Subspace accuracy η\eta, (b) Spectrum efficiency.

Fig. 6(a) shows the subspace estimation performance versus the number of channel uses. As the number of channel uses increases, the subspace accuracy of all the methods is increased monotonically. It is worth noting that when K=164K=164 (m=4m=4), the subspace accuracy of the SASE is slightly lower than that of the SD. This is because there are only m=L=4m=L=4 columns sampled for column subspace estimation that affects the column subspace accuracy of the SASE slightly. Nevertheless, when m8m\geq 8, i.e., K184K\geq 184, the SASE obtains the most accurate subspace estimation, i.e., η1\eta\approx 1, among all the methods. In particular, when the SNR is moderate, i.e., SNR=5=5dB, the SASE clearly outperforms over the other methods. This means that the SASE requires less channel uses to provide a robust subspace estimation.

Fig. 6(b) shows the spectrum efficiency versus the number of channel uses. The curve for perfect CSI with fully digital precoding is also plotted for comparison. Again, the SASE achieves nearly optimal spectrum efficiency compared to the other methods. The performance gap between the SASE and the other methods are more noticeable at SNR=5=5dB. In particular, as seen in the figure, when the number of channel uses, K244K\geq 244, the performance gap between the SASE and perfect curve at SNR=5=5dB is less than 1.51.5bits/s/Hz.

V-B4 Performance versus Number of Paths

In Fig. 7, we evaluate the estimation performance of the SASE for different numbers of paths, LL. The number of channel uses is K=244K=244 and SNR=5,20dB\text{SNR}=5,20\text{dB}. Due to the limited number of channel uses, the Anorldi method can not perform the channel estimation for L5L\geq 5. Thus, we only show the performance of the Arnoldi for L4L\leq 4.

In Fig. 7(a), the subspace accuracy η\eta of different methods versus number of paths, LL, is illustrated. As we can see, the SASE, SD and MF achieve a more accurate subspace estimation compared to the Arnoldi. It is seen that the Arnoldi has a sharp decrease in the accuracy for L>2L>2. It means that the Arnoldi can provide a good channel estimate only for L2L\leq 2 with the use of K=244K=244 channel uses. When SNR=5=5dB, the SASE outperforms over the other methods. When the SNR is high, i.e., SNR=20=20dB, for the proposed SASE, the subspace accuracy decreases slightly with the number of paths, LL, which verifies our discussion about the effect of LL in Remark 2 of Section III.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 8: The channel estimation performance with inaccurate path information when Nt=144;Nr=36;L=4;MRF=6;NRF=8;K=244N_{t}=144;N_{r}=36;L=4;M_{RF}\!=\!6;N_{RF}=8;K=244: (a) Subspace accuracy η\eta, (b) NMSE, (c) Spectrum efficiency.

In Fig. 7(b), the spectrum efficiency versus number of paths, LL, is shown. Apart from the Arnoldi, the spectrum efficiency achieved by the SASE, MF and SD increases with the number of paths. When the SNR is high, i.e., SNR=20=20dB, the SASE, MF and SD can achieve nearly optimal performance. When the SNR is moderate, i.e., SNR=5=5dB, the proposed SASE achieves the highest spectrum efficiency among all the methods. Moreover, for the SASE, MF and SD, their performance gaps with the curve of perfect CSI is getting wider as LL increases. Nevertheless, the spectrum efficiency of the SASE is more closer than the other methods, which implies that the SASE can leverage the property of limited number of paths in mmWave channels more effectively than the other methods.

V-B5 Performance versus Inaccurate Path information

Thus far, in the previous simulations, we have assumed the number of paths, LL, is known a priori. In Fig. 8, we evaluate the performance of the SASE under the situation that the accurate path information is not available. As discussed in Section III-A, we utilize the upper bound of the number of paths for simplicity, where we let Lsup={5,6}L_{\text{sup}}=\{5,6\} while L=4L=4.444If the LsupL_{\text{sup}} with LsupLL_{\text{sup}}\geq L is utilized for SASE, the estimated subspaces will be 𝐖^Nr×Lsup\widehat{{\mathbf{W}}}\in{\mathbb{C}}^{N_{r}\times L_{\text{sup}}} and 𝐅^Nt×Lsup\widehat{{\mathbf{F}}}\in{\mathbb{C}}^{N_{t}\times L_{\text{sup}}}. For a fair comparison, we choose the dominant LL modes in 𝐖^\widehat{{\mathbf{W}}} and 𝐅^\widehat{{\mathbf{F}}} when evaluating the performance. For a clear illustration, we also evaluate the performance of proposed SASE by using the lower bound of number of paths, i.e., Linf=3L_{\text{inf}}=3. As can be seen in Fig. 8, compared to the case of Linf=3L_{\text{inf}}=3, using the upper bound Lsup={5,6}L_{\text{sup}}=\{5,6\} for SASE achieves a similar performance as the accurate path information of L=4L=4. In particular, it is noted in Fig. 8(a) and Fig. 8(b) that the estimation performance of LsupL_{\text{sup}} is slightly worse than that of accurate path information when SNR is high, while it is marginally better when SNR is low. This is because using inaccurate path information LsupL_{\text{sup}} with LsupLL_{\text{sup}}\geq L does not affect the column subspace estimation, but according to Proposition 2, it provides worse row subspace estimation at high SNR and more accurate row subspace estimation at low SNR.555It is worth noting that if LsupL_{\text{sup}} is utilized for SASE, according to Proposition 2, the row subspace accuracy is bounded as 𝔼[ηr(𝐅^)](12Nt(σ2σL2(𝐐¯)+Lsupσ4)/σL4(𝐐¯))+\mathbb{E}[\eta_{r}(\widehat{{\mathbf{F}}})]\geq\left(1-{2N_{t}(\sigma^{2}\sigma_{L}^{2}(\bar{{\mathbf{Q}}})+L_{\text{sup}}\sigma^{4})}/{\sigma_{L}^{4}(\bar{{\mathbf{Q}}})}\right)_{+}. The statements can be verified easily through analyzing this row subspace accuracy bound. Nevertheless, in overall, the performance of proposed SASE is not sensitive to the inaccurate path number.

VI Conclusion

In this paper, we formulate the mmWave channel estimation as a subspace estimation problem and propose the SASE algorithm. In the SASE algorithm, the channel estimation task is divided into two stages: the first stage is to obtain the column channel subspace, and in the second stage, based on the acquired column subspace, the row subspace is estimated with optimized training signals. By estimating the column and row subspaces sequentially, the computational complexity of the proposed SASE was reduced substantially to 𝒪(LDNr)\mathcal{O}(LDN_{r}) with DNrD\geq N_{r}. It was analyzed that 𝒪(Nt){\mathcal{O}}(N_{t}) channel uses are sufficient to guarantee subspace estimation accuracy of the proposed SASE. By simulation, the proposed SASE has better subspace accuracy, lower NMSE, and higher spectrum efficiency than those of the existing subspace methods for practical SNRs.

Appendix A Proof of Lemma 1

From the mmWave channel model in (3), when the angles {θt,l}l=1L\{\theta_{t,l}\}_{l=1}^{L} and {θr,l}l=1L\{\theta_{r,l}\}_{l=1}^{L} are distinct,

rank(𝐀t)=rank(𝐀r)=L,\displaystyle\mathop{\mathrm{rank}}({\mathbf{A}}_{t})=\mathop{\mathrm{rank}}({\mathbf{A}}_{r})=L,

which holds due to the fact that 𝐀t{\mathbf{A}}_{t} and 𝐀r{\mathbf{A}}_{r} are both Vandermonde matrices. Then, 𝐇S=𝐇𝐒{\mathbf{H}}_{S}={\mathbf{H}}{\mathbf{S}} can be expressed as

𝐇S=𝐀rdiag(𝐡)𝐀tH𝐒.\displaystyle{\mathbf{H}}_{S}={\mathbf{A}}_{r}\mathop{\mathrm{diag}}(\mathbf{h}){\mathbf{A}}_{t}^{H}{\mathbf{S}}.

Combining the rank inequality of matrix product rank(𝐇S)rank(𝐇)=L\mathop{\mathrm{rank}}({\mathbf{H}}_{S})\leq\mathop{\mathrm{rank}}({\mathbf{H}})=L and the following lower bound,

rank(𝐇S)\displaystyle\mathop{\mathrm{rank}}({\mathbf{H}}_{S}) \displaystyle\geq rank(𝐀rdiag(𝐡))+rank(𝐀tH𝐒)L\displaystyle\mathop{\mathrm{rank}}({\mathbf{A}}_{r}\mathop{\mathrm{diag}}(\mathbf{h}))+\mathop{\mathrm{rank}}({\mathbf{A}}_{t}^{H}{\mathbf{S}})-L
=\displaystyle= rank(𝐀tH𝐒),\displaystyle\mathop{\mathrm{rank}}({\mathbf{A}}_{t}^{H}{\mathbf{S}}),

yields Lrank(𝐇S)rank(𝐀tH𝐒)L\geq\mathop{\mathrm{rank}}({\mathbf{H}}_{S})\geq\mathop{\mathrm{rank}}({\mathbf{A}}_{t}^{H}{\mathbf{S}}). Therefore, in order to show col(𝐇S)=col(𝐇)\mathop{\mathrm{col}}({\mathbf{H}}_{S})=\mathop{\mathrm{col}}({\mathbf{H}}), namely, rank(𝐇S)=L\mathop{\mathrm{rank}}({\mathbf{H}}_{S})=L, it suffices to show that rank(𝐀tH𝐒)=L\mathop{\mathrm{rank}}({\mathbf{A}}_{t}^{H}{\mathbf{S}})=L. Considering that 𝐀tH𝐒{\mathbf{A}}_{t}^{H}{\mathbf{S}} is a Vandermonde matrix, it has rank(𝐀tH𝐒)=L\mathop{\mathrm{rank}}({\mathbf{A}}_{t}^{H}{\mathbf{S}})=L. This completes the proof. ∎

Appendix B Proof of Lemma 2

It is trivial that the entries in 𝐘{\mathbf{Y}} follow the identical distribution of 𝒞𝒩(0,σ2){\mathcal{C}}{\mathcal{N}}(0,\sigma^{2}). Therefore, it remains to show that all the entries in 𝐘{\mathbf{Y}} are independent. Because of the typical property of Gaussian distribution, it suffices to prove that they are uncorrelated. For any iji\neq j or mnm\neq n, the following holds,

𝔼[[𝐘]i,m[𝐘]j,n]\displaystyle{\mathbb{E}}\left[[{\mathbf{Y}}]_{i,m}[{\mathbf{Y}}]_{j,n}\right] =\displaystyle= 𝔼[𝐀i,:[𝐗]:,m[𝐗]:,nH[𝐀]j,:H]\displaystyle{\mathbb{E}}\left[{\mathbf{A}}_{i,:}[{\mathbf{X}}]_{:,m}[{\mathbf{X}}]_{:,n}^{H}[{\mathbf{A}}]_{j,:}^{H}\right]
=\displaystyle= 0.\displaystyle 0.

Therefore, the entries in 𝐘{\mathbf{Y}} are uncorrelated and thus independent, which concludes the proof. ∎

Appendix C Proof of Proposition 1

Based on the definition of ηc(𝐖^)\eta_{c}(\widehat{{\mathbf{W}}}) in (8), it has

ηc(𝐖^)\displaystyle\sqrt{\eta_{c}(\widehat{{\mathbf{W}}})} =\displaystyle= tr(𝐖^H𝐇𝐇H𝐖^)tr(𝐇H𝐇)\displaystyle\sqrt{\frac{\mathop{\mathrm{tr}}(\widehat{{\mathbf{W}}}^{H}{\mathbf{H}}{\mathbf{H}}^{H}\widehat{{\mathbf{W}}})}{\mathop{\mathrm{tr}}({\mathbf{H}}^{H}{\mathbf{H}})}} (38)
=\displaystyle= (𝐖^𝐔^+𝐔^)H𝐇F𝐇F\displaystyle\frac{\|(\widehat{{\mathbf{W}}}-\widehat{{\mathbf{U}}}+\widehat{{\mathbf{U}}})^{H}{\mathbf{H}}\|_{F}}{\|{\mathbf{H}}\|_{F}}
(a)\displaystyle\overset{(a)}{\geq} 𝐔^H𝐇F𝐇F(𝐖^𝐔^)H𝐇F𝐇F\displaystyle\frac{\|\widehat{{\mathbf{U}}}^{H}{\mathbf{H}}\|_{F}}{\|{\mathbf{H}}\|_{F}}-\frac{\|(\widehat{{\mathbf{W}}}-\widehat{{\mathbf{U}}})^{H}{\mathbf{H}}\|_{F}}{\|{\mathbf{H}}\|_{F}}
=\displaystyle= 𝐔^H𝐔𝚺𝐕HF𝐇F(𝐖^𝐔^)H𝐇F𝐇F\displaystyle\frac{\|\widehat{{\mathbf{U}}}^{H}{\mathbf{U}}\mathbf{\Sigma}{\mathbf{V}}^{H}\|_{F}}{\|{\mathbf{H}}\|_{F}}-\frac{\|(\widehat{{\mathbf{W}}}-\widehat{{\mathbf{U}}})^{H}{\mathbf{H}}\|_{F}}{\|{\mathbf{H}}\|_{F}}
(b)\displaystyle\overset{(b)}{\geq} σL(𝐔^H𝐔)𝐖^𝐔^2,\displaystyle\sigma_{L}(\widehat{{\mathbf{U}}}^{H}{\mathbf{U}})-\|\widehat{{\mathbf{W}}}-\widehat{{\mathbf{U}}}\|_{2},
\displaystyle{\geq} σL(𝐔^H𝐔)δ1,\displaystyle\sigma_{L}(\widehat{{\mathbf{U}}}^{H}{\mathbf{U}})-\delta_{1},

where the inequality (a)(a) holds from the triangle inequality, and the inequality (b)(b) comes from the fact that for 𝐀n×n{\mathbf{A}}\in{\mathbb{C}}^{n\times n} with rank(𝐀)=n\mathop{\mathrm{rank}}({\mathbf{A}})=n and 𝐁n×k{\mathbf{B}}\in{\mathbb{C}}^{n\times k}, 𝐀𝐁F2σn2(𝐀)𝐁F2\|{\mathbf{A}}{\mathbf{B}}\|_{F}^{2}\geq\sigma_{n}^{2}({\mathbf{A}})\|{\mathbf{B}}\|_{F}^{2}, where the latter follows by 𝐀𝐁F2=i=1k𝐀[𝐁]:,i22i=1kσn2(𝐀)[𝐁]:,i22=σn2(𝐀)𝐁F2\|{\mathbf{A}}{\mathbf{B}}\|_{F}^{2}=\sum_{i=1}^{k}\|{\mathbf{A}}[{\mathbf{B}}]_{:,i}\|_{2}^{2}\geq\sum_{i=1}^{k}\sigma_{n}^{2}({\mathbf{A}})\|[{\mathbf{B}}]_{:,i}\|_{2}^{2}=\sigma_{n}^{2}({\mathbf{A}})\|{\mathbf{B}}\|_{F}^{2}. Thus, this concludes the proof for the inequality in (19).

Then, by letting δ10\delta_{1}\rightarrow 0 in (19), we take expectation of squares of both sides in (38), then it has the following

𝔼[ηc(𝐖^)]\displaystyle\mathbb{E}\left[\eta_{c}(\widehat{{\mathbf{W}}})\right] \displaystyle\geq 𝔼[σL2(𝐔^H𝐔)]\displaystyle\mathbb{E}\left[\sigma_{L}^{2}(\widehat{{\mathbf{U}}}^{H}{\mathbf{U}})\right] (39)
(c)\displaystyle\overset{(c)}{\geq} (12Nr(σ2σL2(𝐇S)+mσ4)σL4(𝐇S))+,\displaystyle\!\!\left(1-\frac{2N_{r}(\sigma^{2}\sigma_{L}^{2}({\mathbf{H}}_{S})+m\sigma^{4})}{\sigma_{L}^{4}({\mathbf{H}}_{S})}\right)_{+}\!\!,

where the inequality (c)(c) holds from Theorem 1, and this concludes the proof. ∎

Appendix D Proof of Proposition 2

Recall that the row subspace matrix 𝐕^\widehat{{\mathbf{V}}} is given by the right singular matrix of 𝐐^=𝐐¯+𝐍¯\widehat{{\mathbf{Q}}}=\bar{{\mathbf{Q}}}+\bar{{\mathbf{N}}} in (25), and the elements in 𝐍¯\bar{{\mathbf{N}}} are i.i.d. with each entry being 𝒞𝒩(0,σ2){\mathcal{C}}{\mathcal{N}}(0,\sigma^{2}) according to Lemma 2. Thus, Theorem 1 is applied, which gives

𝔼[σL2(𝐕^H𝐕)](12Nt(σ2σL2(𝐐¯)+Lσ4)σL4(𝐐¯))+.\displaystyle\mathbb{E}\left[\sigma_{L}^{2}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}})\right]\geq\left(1-\frac{2N_{t}(\sigma^{2}\sigma_{L}^{2}(\bar{{\mathbf{Q}}})+L\sigma^{4})}{\sigma_{L}^{4}(\bar{{\mathbf{Q}}})}\right)_{+}. (40)

Then, based on the subspace accuracy metric in (9), it has

ηr(𝐅^)\displaystyle\sqrt{\eta_{r}(\widehat{{\mathbf{F}}})} =\displaystyle= tr(𝐅^H𝐇H𝐇𝐅^)tr(𝐇H𝐇)\displaystyle\sqrt{\frac{\mathop{\mathrm{tr}}(\widehat{{\mathbf{F}}}^{H}{\mathbf{H}}^{H}{\mathbf{H}}\widehat{{\mathbf{F}}})}{\mathop{\mathrm{tr}}({\mathbf{H}}^{H}{\mathbf{H}})}} (41)
=\displaystyle= 𝐇(𝐅^𝐕^+𝐕^)F𝐇F\displaystyle\frac{\|{\mathbf{H}}(\widehat{{\mathbf{F}}}-\widehat{{\mathbf{V}}}+\widehat{{\mathbf{V}}})\|_{F}}{\|{\mathbf{H}}\|_{F}}
\displaystyle{\geq} 𝐇𝐕^F𝐇F𝐇(𝐅^𝐕^)F𝐇F\displaystyle\frac{\|{\mathbf{H}}\widehat{{\mathbf{V}}}\|_{F}}{\|{\mathbf{H}}\|_{F}}-\frac{\|{\mathbf{H}}(\widehat{{\mathbf{F}}}-\widehat{{\mathbf{V}}})\|_{F}}{\|{\mathbf{H}}\|_{F}}
=\displaystyle= 𝐔𝚺𝐕H𝐕^F𝐇F𝐇(𝐅^𝐕^)F𝐇F\displaystyle\frac{\|{\mathbf{U}}\mathbf{\Sigma}{\mathbf{V}}^{H}\widehat{{\mathbf{V}}}\|_{F}}{\|{\mathbf{H}}\|_{F}}-\frac{\|{\mathbf{H}}(\widehat{{\mathbf{F}}}-\widehat{{\mathbf{V}}})\|_{F}}{\|{\mathbf{H}}\|_{F}}
\displaystyle{\geq} σL(𝐕^H𝐕)𝐅^𝐕^2,\displaystyle\sigma_{L}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}})-\|\widehat{{\mathbf{F}}}-\widehat{{\mathbf{V}}}\|_{2},
\displaystyle{\geq} σL(𝐕^H𝐕)δ2.\displaystyle\sigma_{L}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}})-\delta_{2}.

Thus, the inequality (27) is proved. Moreover, under the condition δ20\delta_{2}\!\rightarrow\!0, taking expectation of the squares of both sides in (41) yields

𝔼[ηr(𝐅^)]\displaystyle\mathbb{E}\left[\eta_{r}(\widehat{{\mathbf{F}}})\right] \displaystyle\geq 𝔼[σL2(𝐕^H𝐕)]\displaystyle\mathbb{E}\left[\sigma_{L}^{2}(\widehat{{\mathbf{V}}}^{H}{\mathbf{V}})\right]
(a)\displaystyle\overset{(a)}{\geq} (12Nt(σ2σL2(𝐐¯)+Lσ4)σL4(𝐐¯))+,\displaystyle\left(1-\frac{2N_{t}(\sigma^{2}\sigma_{L}^{2}(\bar{{\mathbf{Q}}})+L\sigma^{4})}{\sigma_{L}^{4}(\bar{{\mathbf{Q}}})}\right)_{+},

where the inequality (a)(a) holds from (40). This concludes the proof for the row estimation accuracy bound in (28). ∎

References

  • [1] T. S. Rappaport, S. Sun, R. Mayzus, H. Zhao, Y. Azar, K. Wang, G. N. Wong, J. K. Schulz, M. Samimi, and F. Gutierrez, “Millimeter wave mobile communications for 5G cellular: It will work!” IEEE Access, vol. 1, pp. 335–349, 2013.
  • [2] R. W. Heath, N. González-Prelcic, S. Rangan, W. Roh, and A. M. Sayeed, “An overview of signal processing techniques for millimeter wave MIMO systems,” IEEE J. Sel. Top. Signal Process., vol. 10, no. 3, pp. 436–453, April 2016.
  • [3] E. Torkildson, U. Madhow, and M. Rodwell, “Indoor millimeter wave MIMO: Feasibility and performance,” IEEE Trans. Wireless Commun., vol. 10, no. 12, pp. 4150–4160, 2011.
  • [4] S. Hur, T. Kim, D. J. Love, J. V. Krogmeier, T. A. Thomas, and A. Ghosh, “Millimeter wave beamforming for wireless backhaul and access in small cell networks,” IEEE Trans. Commun., vol. 61, no. 10, pp. 4391–4403, 2013.
  • [5] O. El Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath, “Spatially sparse precoding in millimeter wave MIMO systems,” IEEE Trans. Wireless Commun., vol. 13, no. 3, pp. 1499–1513, 2014.
  • [6] J. Wang, Z. Lan, C. Pyo, T. Baykas, C. Sum, M. A. Rahman, J. Gao, R. Funada, F. Kojima, H. Harada, and S. Kato, “Beam codebook based beamforming protocol for multi-Gbps millimeter-wave WPAN systems,” IEEE J. Sel. Areas Commun., vol. 27, no. 8, pp. 1390–1399, Oct 2009.
  • [7] O. E. Ayach, R. W. Heath, S. Abu-Surra, S. Rajagopal, and Z. Pi, “The capacity optimality of beam steering in large millimeter wave mimo systems,” in 2012 IEEE 13th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), June 2012, pp. 100–104.
  • [8] J. Lee, G. T. Gil, and Y. H. Lee, “Channel estimation via orthogonal matching pursuit for hybrid MIMO systems in millimeter wave communications,” IEEE Trans. Commun., vol. 64, no. 6, pp. 2370–2386, June 2016.
  • [9] S. Srivastava, A. Mishra, A. Rajoriya, A. K. Jagannatham, and G. Ascheid, “Quasi-static and time-selective channel estimation for block-sparse millimeter wave hybrid MIMO systems: Sparse bayesian learning (sbl) based approaches,” IEEE Trans. Signal Process., vol. 67, no. 5, pp. 1251–1266, March 2019.
  • [10] W. Zhang, T. Kim, D. J. Love, and E. Perrins, “Leveraging the restricted isometry property: Improved low-rank subspace decomposition for hybrid millimeter-wave systems,” IEEE Trans. Commun., vol. 66, no. 11, pp. 5814–5827, Nov 2018.
  • [11] P. Jain, P. Netrapalli, and S. Sanghavi, “Low-rank matrix completion using alternating minimization,” in Proceedings of the Forty-fifth Annual ACM Symposium on Theory of Computing, ser. STOC ’13.   New York, NY, USA: ACM, 2013, pp. 665–674.
  • [12] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization,” SIAM Rev., vol. 52, no. 3, pp. 471–501, 2010.
  • [13] X. Li, J. Fang, H. Li, and P. Wang, “Millimeter wave channel estimation via exploiting joint sparse and low-rank structures,” IEEE Trans. Wireless Commun., vol. 17, no. 2, pp. 1123–1133, Feb 2018.
  • [14] W. Zhang, T. Kim, G. Xiong, and S.-H. Leung, “Leveraging subspace information for low-rank matrix reconstruction,” Signal Process., vol. 163, pp. 123 – 131, 2019.
  • [15] W. Zhang, T. Kim, and D. Love, “Sparse subspace decomposition for millimeter wave MIMO channel estimation,” in 2016 IEEE Global Communications Conference: Signal Processing for Communications (Globecom2016 SPC), Washington, USA, Dec. 2016.
  • [16] H. Ghauch, T. Kim, M. Bengtsson, and M. Skoglund, “Subspace estimation and decomposition for large millimeter-wave MIMO systems,” IEEE J. Sel. Top. Signal Process., vol. 10, no. 3, pp. 528–542, April 2016.
  • [17] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, “Channel estimation and hybrid precoding for millimeter wave cellular systems,” IEEE J. Sel. Top. Signal Process., vol. 8, no. 5, pp. 831–846, Oct 2014.
  • [18] R. Heath, N. Gonzalez-Prelcic, S. Rangan, W. Roh, and A. Sayeed, “An overview of signal processing techniques for millimeter wave MIMO systems,” IEEE J. Sel. Top. Signal Process., vol. PP, no. 99, pp. 1–1, 2016.
  • [19] J. Brady, N. Behdad, and A. M. Sayeed, “Beamspace MIMO for millimeter-wave communications: System architecture, modeling, analysis, and measurements,” IEEE Trans. Antennas Propag., vol. 61, no. 7, pp. 3814–3827, 2013.
  • [20] A. Goldsmith, S. A. Jafar, N. Jindal, and S. Vishwanath, “Capacity limits of MIMO channels,” IEEE J. Sel. Areas Commun., vol. 21, no. 5, pp. 684–702, June 2003.
  • [21] S. Haghighatshoar and G. Caire, “Massive MIMO channel subspace estimation from low-dimensional projections,” IEEE Trans. Signal Process., vol. 65, no. 2, pp. 303–318, Jan 2017.
  • [22] X. Zhang, A. F. Molisch, and S.-Y. Kung, “Variable-phase-shift-based RF-baseband codesign for MIMO antenna selection,” IEEE Trans. Signal Process., vol. 53, no. 11, pp. 4091–4103, Nov 2005.
  • [23] N. El Karoui, “Tracy–widom limit for the largest eigenvalue of a large class of complex sample covariance matrices,” Ann. Probab., vol. 35, no. 2, pp. 663–714, 03 2007.
  • [24] X. Yu, J. C. Shen, J. Zhang, and K. B. Letaief, “Alternating minimization algorithms for hybrid precoding in millimeter wave MIMO systems,” IEEE J. Sel. Top. Signal Process., vol. 10, no. 3, pp. 485–500, April 2016.
  • [25] T. T. Cai and A. Zhang, “Rate-optimal perturbation bounds for singular subspaces with applications to high-dimensional statistics,” Ann. Statist., vol. 46, no. 1, pp. 60–89, 02 2018.
  • [26] R. Durrett, Probability: theory and examples.   Cambridge University Press, 2010.
  • [27] F. Wei, “Upper bound for intermediate singular values of random matrices,” Journal of Mathematical Analysis and Applications, vol. 445, no. 2, pp. 1530–1547, 2017.
  • [28] Y. Dagan, G. Kur, and O. Shamir, “Space lower bounds for linear prediction in the streaming model,” in Proceedings of the Thirty-Second Conference on Learning Theory, ser. Proceedings of Machine Learning Research, A. Beygelzimer and D. Hsu, Eds., vol. 99.   Phoenix, USA: PMLR, 25–28 Jun 2019, pp. 929–954.
  • [29] G. Golub and C. Van Loan, Matrix Computations, ser. Johns Hopkins Studies in the Mathematical Sciences.   Johns Hopkins University Press, 2013.
  • [30] T. Bai, A. Alkhateeb, and R. W. Heath, “Coverage and capacity of millimeter-wave cellular networks,” IEEE Commun. Mag., vol. 52, no. 9, pp. 70–77, Sep. 2014.
  • [31] Z. Pi and F. Khan, “An introduction to millimeter-wave mobile broadband systems,” IEEE Commun. Mag., vol. 49, no. 6, pp. 101–107, June 2011.