\setstackEOL

A General Approach to Fully Linearize the Power Amplifiers in mMIMO with Less Complexity

Ganesh Prasad, Member, IEEE, Håkan Johansson Senior Member, IEEE, and Rabul Hussain Laskar G. Prasad and H. Johansson are with the Division of Communication Systems, Department of Electrical Engineering, Linköping University, 581 83 Linköping, Sweden (e-mail: {ganesh.prasad, hakan.johansson}@liu.se). R. H. Laskar is with the Department of Electronics and Communication Engineering, National Institute of Technology Silchar, 788 010 Silchar, India (e-mail: rhlaskar@ece.nits.ac.in).

Abstract

A radio frequency (RF) power amplifier (PA) plays an important role to amplify the message signal at higher power to transmit it to a distant receiver. Due to a typical nonlinear behavior of the PA at high power transmission, a digital predistortion (DPD), exploiting the preinversion of the nonlinearity, is used to linearize the PA. However, in a massive MIMO (mMIMO) transmitter, a single DPD is not sufficient to fully linearize the hundreds of PAs. Further, for the full linearization, assigning a separate DPD to each PA is complex and not economical. In this work, we address these challenges via the proposed low-complexity DPD (LC-DPD) scheme. Initially, we describe the fully-featured DPD (FF-DPD) scheme to linearize the multiple PAs and examine its complexity. Thereafter, using it, we derive the LC-DPD scheme that can adaptively linearize the PAs as per the requirement. The coefficients in the two schemes are learned using the algorithms that adopt indirect learning architecture based recursive prediction error method (ILA-RPEM) due to its adaptive and free from matrix inversion operations. Furthermore, for the LC-DPD structure, we have proposed three algorithms based on correlation of its common coefficients with the distinct coefficients. Lastly, the performance of the algorithms are quantified using the obtained numerical results.

Index Terms:

Digital predistortion, massive MIMO, direct learning architecture, indirect learning architecture, recursive prediction error method.

I Introduction

In the wireless transmitters, the radio frequency (RF) power amplifiers (PAs) are used to amplify the modulated signals for distant transmissions. However, the in-band and out-of-band nonlinear distortions occur to the signals amplified near to saturation region of the PAs [1]. This can be reduced by employing some backoff to the peak power of the signals. But, it reduces the efficiency of the PAs. Therefore, the preprocessing like digital predistortion (DPD) over the transmit signals before the PAs are required to linearize the resultant signals towards the saturation region. Since a decade, many works have focused on the linearization of multiple power amplifiers in the transmitters like massive MIMO (mMIMO) transmitters. But, they have focused on the linearization in a particular direction of beamforming instead of linearizing all the PAs. Because, the linearization of each PA requires separate DPD block along with the driving RF chain. Thus, due to high complexity, it is not suitable for an economical mMIMO transmitter. To deal with it, in this work, we have proposed a most general approach to fully linearize all the PAs with less complexity. Also, we have discoursed in detail about the fundamentals behind the challenges and the procedure to tackle it.

I-A Related Works

The preprocessing using DPD has an inverse property to the nonlinear PA to mitigate the nonlinearties in the desired transmit signal [2]. From the state-of-the-art, mostly the linear parametric models have been used for the the DPD [3]. One of the methods to identify the DPD coefficients is least square (LS) due to its fast convergence [4, 5, 6]. But, despite mathematical simplicity, its computational complexity is high due to engagement of inverse operations of the matrices of large sizes that correspond to the estimation of large number of DPD coefficients. However, many works have proposed the algorithms to reduce the complexity for the identification of the DPD based on LS method [7, 8, 9, 10]. For example, the size of the matrix is reduced by normalization of the DPD basis functions (BFs) followed by their pruning [7]. Also, based on stationary random process, the time varying matrix associated with the DPD coefficients is replaced by a constant covariance matrix [8]. Further, in an iterative algorithm based on LS, the samples of the DPD coefficients (or the size of the matrix) can be reduced by considering the correlation in the observation errors between two iterations [9]. Besides, the matrix size can also be reduced using the eigenvalue decomposition and principal component analysis (PCA) that decreases the order of the memory polynomial model of the DPD [11, 10]. In eigenvalue decomposition, the number of DPD coefficients can be reduced by considering only the dominant eigenvectors. Whereas, in PCA, the reduction is achieved by converting the correlated BFs of the DPD into uncorrelated BFs.

Although, the above techniques help in reduction of the size of the matrices, but, for time varying and highly nonlinear PAs, still, the required number of DPD coefficients is large. Thus, it leads to an undesirable large matrix operations. Therefore, the recursive based algorithms like least mean square (LMS) [12], recursive least squares (RLS) [13, 14], and recursive prediction error method (RPEM) [15] are computationally more reliable at the cost of their slow convergence to the desired optimal value of the variables. Using LMS, the DPD adjusts its coefficients to minimize the mean square error (MSE) between the PA output and the desired signal. The coefficients are updated using stochastic gradient decent method that minimizes the instantaneous error in each iteration. However, LMS is quite unstable and it is very sensitive in the step size for the update [16]. In conventional LS estimation, a batch of input and output data samples of the PA are used to update the DPD coefficients. But, in RLS, using a set of equations, the LS estimation is represented recursively and the coefficients are updated accordingly for the obtained new data sample of the input and output. To discount the influence of older samples, it uses an exponential weighing known as forgetting factor. The chosen value of the forgetting factor gives a trade-off between the precision and convergence and its low value provides high fluctuation to noise. Therefore, the forgetting factor is improved further in RPEM by considering its variation with time [17]. In the existing works, mostly, these adaptive algorithms are applied to two types of DPD learning architectures: (i) direct learning architecture (DLA) [18, 19] and (ii) indirect learning architecture (ILA) [13, 14]. DLA has better performance in the presence of noise at the output the PA, but, ILA is more effective in the convergence rate [20]. Therefore, ILA is widely used for the identification of the DPD. Also, in our proposed work, we have considered RPEM algorithm in an ILA architecture for the DPD identification. Next, we describe the state-of-the-art for the linearization of the multi-antenna transmitters.

In the multi-antenna systems like MIMO or mMIMO, although the PAs in the transmitters are of same type, but, in practice, they have different nonlinearties due to their sensitivity to process, supply voltage, and temperature (PVT) [21, 22]. Therefore, a single DPD is not capable to linearize all the PAs [23] and ideally, each PA requires a separate DPD [24]. But, the ideal case provides undesirably high complexity in hardware implementation as well as in processing and even not feasible for a mMIMO transmitter where hundreds of PAs need to be linearized. Subsequently, instead of linerizing all PAs, a resultant single PA can be linearized whose output is the sum of the outputs of the PAs [25, 26]. However, it addresses the average nonlinerites of the PAs, thus, none of them is fully linearized. On the other hand, instead of sum, the beam-oriented (BO) output of the PAs in a given direction can be linearized using a single DPD [27, 28, 29]. As it addresses the nonlinearity of the BO output in the desired direction (main lobe), again, the PAs are not fully linearized. Thereby, they are not able to linearize the outputs in other directions that gives the nonlinear sidelobes and typically, their power level is only $10$ dB lesser than the linear main lobe [30]. This can be improved by frequently updating the DPD for different directions. Also, the number of DPD coefficients per update can be reduced using the pruning algorithms [31]. However, the frequent update is not reliable for online operations and it leads to high computational complexity. Moreover, at a time, the DPD is identified for a particular BO direction and the PAs are not fully linearized that still gives the comparable nonlinear sidelobes to the main lobe. If we assume the similar distribution of nonlinearites over the PAs, the side lobes can be reduced by optimally adjusting the amplitude of the phase shifters in the BO output [32]. However, in general, it cannot provide the full linearization of the PAs. The performance towards the full linearization can be improved by including extra tuning box to each PA. The tuning boxes compensate the nonlinear differences between the PAs such that the resultant nonlinearity is same for each PA, thereafter, using a single DPD, the PAs are fully linearized [33, 34, 35]. Nonetheless, for the compensation of the differences, each tuning box is modeled using a polynomial model which needs to be identified using a learning algorithm. Therefore, its complexity is approximately similar to incorporate separate DPD for each PA. Different from only DPD operations, the sidelobes in the BO output can be linearized more reliably using the two layer of operations: the DPD training followed by the post-weighting coefficients optimization that are multipled by the DPD output signal and distributed at the respective PAs’ input [36, 37]. In a simplified analysis, different post-weighting coefficients are assigned for each PA, but, in a branch of a PA, same post-weighting coefficient are multiplied to the BFs of the DPD. Thus, due to less degree of freedom per branch, it is less reliable in post-weighting linearization of the PAs [36]. Also, to distribute different signals to the branches of the PAs, separate RF-chain is needed for each PA that gives a high complexity in a mMIMO transmitter. Later, in our proposed work [37], we adopted an adaptive post-weighting architecture that increases the degree of freedom (DOF) per branch as well as reduces the number of RF-chains requirement. But, still, due to optimization of post-weighting coefficients for discrete range of directions, the PAs are not fully linearized.

I-B Motivation and Key Contribution

As described earlier, the PAs in a multi-antenna transmitter can be fully linearized using identification of a separate DPD to each PA [24]. But, it leads to high complexity in the structure and in the computation to learn the coefficients. Also, for the distribution of the predistorted signals, it requires a separate RF-chain to each PA. Based on it, we propose a most general approach using a low-complexity DPD (LC-DPD) structure which approximates seperate DPD identification requirement as well as the reduction of the number of RF-chains as per the requirement in the mMIMO transmitters. The key contribution of this work is four-fold as follows. (i) First, we deduce the reduction in the number of coefficients for a given type of PAs in a subarray from the measurement data and the obtained numerical result of a system setting. Then, we propose a fully-featured DPD (FF-DPD) scheme to fully linearize the PAs in a subarray and describe its complexity in terms of number of multipliers, adders, and RF chains. (ii) Using the FF-DPD structure, we derive the less complex and non-trivial LC-DPD structure. The number of coefficients in it is reduced based on a geometric sequence and corresponding coefficients are represented in a block vector form. Based on the geometric sequence, we derive the expression of the number of multipliers, adders, and RF chains which are significantly reduced; thus, reduces the complexity. (iii) Next, for the training of the coefficients for the two schemes, we propose four algorithms based on indirect learning architecture based recursive prediction error method (ILA-RPEM): one for the FF-DPD scheme and three for the LC-DPD. Apart from the structural complexity of the FF-DPD, we also describe the computational complexity of its training. The performance of the three algorithms for the LC-DPD is determined based on the correlation of its common coefficients to the distinct coefficients in the structure. It is also shown that the complexities of the three algorithms are less than the algorithm for FF-DPD. Further, for various operations in the four algorithms, we define the four operators and describe their properties. (iv) Lastly, we obtain the numerical results in terms of power spectral density (PSD) and error vector magnitude (EVM) using the algorithms for the two schemes and obtain the various insights by comparing their performances.

II Structures for Full Linearization

In this section, first, we describe the ideal structure for the predistortion to fully linearize the multiple PAs. Thereafter, we derive a low-complexity structure that approximates the full linearization.

Refer to caption — Figure 1: An ideal structure to linearize a subarray of $S=4$ PAs.

If we consider a subarray of $S$ PAs in a mMIMO transmitter as shown in Fig. 1 (where $S=4$ ), ideally, separate DPDs are applied to the respective PAs for the full linearization. As each DPD output signal is different, therefore, a separate RF-chain is employed. Thereafter, the following signals are phase shifted using the analog phase shifters (analog beamforming weights), $\{w_{l}\}$ ; $l\in{\{1,\cdots,S\}}$ to get the BO output from the PAs in a specific direction. Based on general memory polynomial (GMP) [3], the $l$ th DPD output $x_{l}(n)$ to the input message $s(n)$ can be expressed as:

\displaystyle x_{l}(n)=\sum_{p=0}^{P_{l}-1}\sum_{m=0}^{M_{l}-1}\phi_{p,m}^{l}s(n-m)|s(n-m)|^{p},

(1)

where $\phi_{p,m}^{l}$ is the coefficient for the BF, $s(n-m)|s(n-m)|^{p}$ of $p$ th power and $m$ th delay. Eq. (1) represents the most general model where the memory length $M_{l}$ and the order $P_{l}$ of the polynomial depends on the $l$ th PA. The outputs¹¹1For convenience, the time marker index of the signals are omitted. of the $S$ DPDs in the subarray are represented by a vector $\bm{X}=[x_{1},\cdots,x_{S}]^{T}$ . Further, $x_{l}$ is multiplied by the beamforming weight $w_{l}$ and inputted to the respective $l$ th nonlinear PA. Output $y_{l}(n)$ of the PA can be expressed as:

\displaystyle y_{l}(n)=f_{non}^{l}(w_{l}x_{l}(n)),

(2)

where $f_{non}^{l}(\cdot)$ represents the nonlinear function for the $l$ th PA. For the $S$ PAs in the subarray, the output vector $\bm{Y}$ can be expressed as: $\bm{Y}=[y_{1},\cdots,y_{S}]^{T}$ . Nevertheless, the implementation of the general architecture in Fig. 1 to completely linearize all the PAs is highly complex. Because, the different set of BFs with their coefficients for each of the DPDs require many delays, multipliers, and adders. Further, the computational complexity of the iterative/learning algoirthm to identify the coefficients for each of the DPDs is undesirably high. Also, the number of RF chains is same as the number of PAs, $S$ in the subarray which is not economical for a mMIMO transmitter.

In order to simplify the structure, first, we analyze the values of the identified DPDs’ coefficients for the given PAs in a subarray. In Fig. 2, we have plotted the real and imaginary values of the identified coefficients of $8$ DPDs to fully linearize the respective $S=8$ traveling-wave tube (TWT) PAs based on Saleh model [38] having different AM/AM and AM/PM nonlinearities as described in Section V. Here, $P_{l}=5$ and $M_{l}=5$ ; $\forall l\in\{1,\cdots,S\}$ . The DPDs are trained using the adaptive ILA-RPEM algorithm which is described in detail in the following sections. From the figure, it can be observed that out of $25$ BFs, the coefficients of some BFs with indices in the set $\mathcal{I}=\{4,5,9,10,14,15,19,20,24,25\}$ are non-zero. Further, the index set $\mathcal{I}$ of the BFs of non-zero coefficients is same for all PAs, because, the PAs are of same type²²2Note that in the supplementary file of [39], from the measurement of outputs of 16 HMC943APM5E PA ICs for the input signal at $28.5$ GHz, all the PAs nonlinerties are identified using the coefficients of same BFs. Therefore, they can be linearized using the DPDs’ coefficients of same BFs.. Also, the deviation in the values of a coefficient for different PAs is higher for higher value of the coefficient than the coefficients of lower values. For example, the mean deviations for the indices $5$ and $10$ are $0.0249$ and $4.0248\times 10^{-4}$ for real part and $0.0248$ and $5.7523\times 10^{-4}$ for the imaginary part, respectively. Thus, the coefficients with higher values dominate in the linarization of the $S$ PAs. Based on these observations, next, we reduce the number of coefficients in the proposed two DPD schemes.

II-A Fully-Featured DPD (FF-DPD)

As described earlier in this section, the predistortion signals for a given type of PAs can be obtained using the same set of BFs based on GMP (cf. Fig. 2). Thus, in FF-DPD, we consider the same set of BFs of order $P$ and memory length $M$ for all the PAs in the subarray. Therefore, the $l$ th output $x_{l}$ from the FF-DPD can be obtained again using (1) after substituting $P_{l}=P$ and $M_{l}=M$ . The total number of BFs in the set $\{s(n-m)|s(n-m)|^{p}\}$ for $p\in\{0,\cdots,P-1\}$ and $m\in\{0,\cdots,M-1\}$ is $PM$ . However, as observed in Fig. 2, out of $PM$ BFs, some BFs have nonzero DPD coefficients for a given type of PAs. Therefore, in general, we represent $Q$ BFs as a vector $\bm{\Psi}=[\psi_{1},\cdots,\psi_{Q}]^{T}$ with their respective nonzero coefficient vector for the $l$ th PA as $\bm{\Phi}_{l}=[\phi_{l,1},\cdots,\phi_{l,Q}]^{T}$ , where $\psi_{i}$ is the $i$ th BF³³3BF $\psi_{i}$ is a function of $s(n)$ , is given by $\psi_{i}(s(n))$ for $i\in\{1,\cdots,Q\}$ . and $\phi_{l,i}$ is its nonzero coefficient for $i\in\{1,\cdots,Q\}$ . Using $\bm{\Psi}$ and $\bm{\Phi}_{l}$ , $x_{l}$ in (1) can be expressed in matrix form as:

\displaystyle x_{l}=\bm{\Phi}_{l}^{T}\bm{\Psi},

(3)

Using $x_{l}$ , the output $y_{l}$ of the $l$ th PA is obtained using the same process as in (2) where $x_{l}$ is multiplied by the phase shifter $w_{l}$ and inputted to the PA to get $y_{l}$ , thus, output vector $\bm{Y}$ is obtained. Moreover, the coefficient vectors for the $S$ PAs in the subarray can be expressed in a block vector as: $\bm{\Phi}=[\bm{\Phi}_{1}^{T},\cdots,\bm{\Phi}_{S}^{T}]^{T}$ . From Fig. 3(a), $S$ coefficients are multiplied by each BF. Thus, for $Q$ BFs, the FF-DPD structure has $QS$ coefficients. Besides, the number of multipliers, $N_{m}^{F}$ in the structure is same as the number of coefficients, i.e., $N_{m}^{F}=QS$ . Further, using the structure of FF-DPD in Fig. 3(a), the number of adders, $N_{a}^{F}$ can be determined as follows⁴⁴4In this work, the numbers of adders for the different structures are determined for the assumption that an adder has two inputs and one output.. Each predistorted output ( $x_{l}$ ; $l\in\{1,2,3,4\}$ ) is determined though the sum of the multiplications of the $Q$ coefficients by respective BFs. Therefore, in generation of each output, the number of adders is $(Q-1)$ , i.e., one less than the number of coefficients. Thus, for $S$ outputs, the total number of adders, $N_{a}^{F}=(Q-1)S$ . Moreover, the number of RF chains, $N_{RF}^{F}$ is same as the number of PAs, i.e., $N_{RF}^{F}=S$ . For instance, Fig. 3(a) depicts the FF-DPD for $S=4$ and $Q=4$ . It has $N_{m}^{F}=16$ , $N_{a}^{F}=12$ , and $N_{RF}^{F}=4$ . If we compare Fig. 3(a) with Fig. 1, the ideal structure for the linearization of the subarray is same as the structure of FF-DPD except the same set of BFs has been used for all PAs in the later. Thus, the complexity of FF-DPD is still high in terms of multipliers, adders, and the RF chains. These complexities can be reduced using LC-DPD which is described next.

II-B Low-Complexity DPD (LC-DPD)

Using LC-DPD, we reduce the complexity to fully linearize the PAs as follows. As described earlier (cf. Fig. 2), coefficients of a dominant BF in the generation of the predistorted signals for different PAs has higher deviation in its value. Therefore, to reduce the number of coefficients in LC-DPD, the BFs in the vector $\bm{\Psi}=[\psi_{1},\cdots,\psi_{Q}]^{T}$ are arranged in decreasing order of their dominance. Then, more coefficients are multiplied by a higher dominant BFs than the lower dominant BFs to generate the predistorted signals. Therefore, different from FF-DPD, in LC-DPD, the number of coefficients are multiplied adaptively by the BFs according to their dominance. Therefore, we decrease the number of coefficients based on a geometric series as follows.

\displaystyle\left\{\begin{array}[]{cc}\{n_{1}Sr^{\nu},\cdots,n_{g}Sr^{(\nu+g-1)}\},&\!\!\!\!\text{ Case I}\\ \{n_{1}Sr^{\nu},\cdots,n_{g-1}Sr^{(\nu+g-2)},n_{g}\times 1\},&\text{ Case II}\end{array}\right.

(6)

where $\nu\in\mathbb{Z}^{+}=\{0,1,\cdots\}$ , $\sum_{i=1}^{g}n_{i}=Q$ ; $n_{i}\in\mathbb{P}=\{1,2,\cdots\}$ and the cases are: Case I: $Sr^{(\nu+q-1)}\geq 1$ and Case II: $\{Sr^{(\nu+g-2)}\geq 1\}\wedge\{Sr^{(\nu+g-1)}<1\}$ . According to the sequence in (6), the $Q$ BFs are divided into $g$ groups, where each of the $n_{i}$ BFs in the $i$ th group are multiplied by $Sr^{(\nu+i-1)}$ coefficients; thus, the total coefficients in the $i$ th group is $n_{i}Sr^{(\nu+i-1)}$ . Further, over the groups, the number multiplied coefficients per BF decreases as a geometric sequence with a common ratio $r$ $(<1)$ . The sequence in Case II is same as in Case I except each BF in the last group $g$ is multiplied by one coefficient in Case II, because, $Sr^{(\nu+g-1)}<1$ and the number of coefficient per BF cannot be a fraction. Now, using this sequence, we define the coefficient vector $\bm{\overline{\Phi}}$ of the LC-DPD as below.

Definition 1.

For the sequence of the number of coefficients multiplied by different BFs as expressed in (6), the coefficient vector $\bm{\overline{\Phi}}$ for the LC-DPD can be represented as:


	$\displaystyle\bm{\overline{\Phi}}=\left[\bm{\overline{\Phi}}_{1}^{T},\cdots,\bm{\overline{\Phi}}_{g}^{T}\right]^{T},\hskip 29.87538pt$		(7a)

	$\displaystyle\bm{\overline{\Phi}}_{i}=$	$\displaystyle\left[\phi_{{}_{1,\left(\sigma_{i}+1\right)}},\phi_{{}_{\left(1+r^{-(\nu+i-1)}\right),\left(\sigma_{i}+1\right)}},\cdots,\phi_{{}_{\left(1+\left(Sr^{\left(\nu+i-1\right)}-1\right)r^{-(\nu+i-1)}\right),\left(\sigma_{i}+1\right)}},\cdots,\phi_{{}_{1,\left(\sigma_{i}+n_{i}\right)}},\phi_{{}_{\left(1+r^{-(\nu+i-1)}\right),\left(\sigma_{i}+n_{i}\right)}},\cdots,\right.$
		$\displaystyle\left.\phi_{{}_{\left(1+\left(Sr^{(\nu+i-1)}-1\right)r^{-(\nu+i-1)}\right),\left(\sigma_{i}+n_{i}\right)}}\right]^{T}$		(7b)

\displaystyle\bm{\overline{\Phi}}_{g}=\left[\phi_{{}_{1,\left(\sigma_{g}+1\right)}},\phi_{{}_{1,\left(\sigma_{g}+2\right)}},\cdots,\phi_{{}_{1,\left(\sigma_{g}+n_{g}\right)}}\right]^{T};\text{ for Case II}

(7c)

where $\bm{\overline{\Phi}}_{i}$ is given in (1) for $i\in\{1,\cdots,g\}$ for both the cases except $\bm{\overline{\Phi}}_{g}$ for Case II is obtained by (7c). Besides, $\sigma_{i}=\sum_{j=1}^{i-1}n_{j}$ and $\phi_{{}_{l,q}}$ is the coefficient which is multiplied by the $q$ th BF $\psi_{q}$ to contribute to the $l$ th predistorted output of the LC-DPD.

Example: For LC-DPD in Fig. 3(b) where $S=4$ and $Q=4$ , the BFs in the vector $\bm{\Psi}=[\underbrace{\psi_{1},\psi_{2}}_{n_{1}=2},\underbrace{\psi_{3},\psi_{4}}_{n_{2}=2}]^{T}$ are divided into $g=2$ groups and each group contains two BFs. For $\nu=1$ and $r=1/2$ , the sequence of number of coefficients is: $\{n_{1}Sr^{\nu},n_{2}Sr^{\nu+1}\}=\{4,2\}$ ; thus, the total number of coefficients is $4+2=6$ . Here, each of the BF in the first group is assigned by $Sr^{\nu}=2$ coefficients, while for the second group, it is $Sr^{\nu+1}=1$ as depicted in the figure. Further, for the LC-DPD coefficient vector $\bm{\overline{\Phi}}$ , the Case I in (7a) is satisfied, thus, using (7), $\sigma_{1}=0$ , and $\sigma_{2}=2$ , the coefficient vector $\bm{\overline{\Phi}}=[\underbrace{\phi_{1,1},\phi_{3,1},\phi_{1,2},\phi_{3,2}}_{\text{elements of }\bm{\overline{\Phi}}_{1}},\underbrace{\phi_{1,3},\phi_{1,4}}_{\text{elements of }\bm{\overline{\Phi}}_{2}}]$ . $\Box$

Furthermore, the total number of coefficients (or multipliers) in $\bm{\overline{\Phi}}$ , the number of adders, and the number of RF chains in the structure of LC-DPD can be determined using Lemma 1 as described below.

Lemma 1.

Using the sequence in (6), the total number of coefficients (or multipliers), $N_{m}^{L}$ in $\bm{\overline{\Phi}}$ can be determined as in (8c). Further, for a special case when $n_{i}=n_{j}=n$ ; $\forall i\neq j$ , $N_{m,eq.}^{L}$ is expressed in (8f). Moreover, the number of adders, $N_{a}^{L}$ and the number of RF chains, $N_{RF}^{L}$ are given by (9).


	$\displaystyle N_{m}^{L}=\left\{\begin{array}[]{cc}Sr^{\nu}\sum_{i=1}^{g}n_{i}r^{i-1}\big{\|}\sum_{i=1}^{g}n_{i}=Q;\!\!\!\!\!&\text{ Case I}\\ Sr^{\nu}\sum_{i=1}^{g-1}n_{i}r^{i-1}+n_{g};\!\!\!\!\!&\text{ Case II},\end{array}\right.$		(8c)
	$\displaystyle N_{m,eq.}^{L}=\left\{\begin{array}[]{cc}nSr^{\nu}(1-r^{g})/(1-r);\!\!\!\!\!&\text{ Case I}\\ nSr^{\nu}(1-r^{(g-1)})/(1-r)+n_{g};\!\!\!\!\!&\text{ Case II}.\end{array}\right.$		(8f)

\displaystyle N_{a}^{L}=N_{m}^{L}-\left\lceil Sr^{(\nu+g-1)}\right\rceil,\;\;N_{RF}^{L}=Sr^{\nu}

(9)

Proof:

The expression of $N_{m}^{L}$ in (8c) is trivial which is the sum of the terms of the sequences in (6) for the two cases. After substituting the condition of the special case: $n_{i}=n_{j}=n$ ; $\forall i\neq j$ in (8c), $n$ is taken common, thus, it becomes a geometric series with common ratio $r$ . After simplifying, we get $N_{m,eq.}^{L}$ as in (8f). The number of adders, $N_{a}^{L}$ in the LC-DPD structure can be determined as follows. As described in the structure of FF-DPD, for generation of a predistorted output, if the coefficients to the respective BFs are completely different, then, the number of adder used in the generation is $Q-1$ , i.e., one less than the number of coefficients. It can also be observed for the output $x_{1}$ of the LC-DPD in Fig. 3(b). But, for the output $x_{3}$ , the coefficients for BFs $\psi_{1}$ and $\psi_{2}$ are different, while the coefficients for $\psi_{3}$ and $\psi_{4}$ are equal to the respective coefficients for $x_{1}$ . We find that for the generation of $x_{3}$ , the number of adders is $2$ which is equal to the number of different coefficients. Based on it, the total number of adders, $N_{a}^{L}$ is obtained as in (9) which is equal to the total number of coefficients (or multipliers) subtracted by the number of predistorted outputs which uses completely different set of coefficients, given by: $Sr^{(\nu+g-1)}$ . For Case II in (6), $Sr^{(\nu+g-1)}<1$ and only one output is generated by completed different set of coefficients. So, $\lceil Sr^{(\nu+g-1)}\rceil$ represents the number of different sets of coefficients, thus, $N_{a}^{L}$ in (9) is for both the cases. Moreover, the number of RF chains, $N_{RF}^{L}$ depends on the number of coefficients multiplied by each of the BF in the first group, i.e., $N_{RF}^{L}=Sr^{\nu}$ as expressed in (9). For instance, in Fig. 3(b), each of $\psi_{1}$ and $\psi_{2}$ are assigned by $4\times(1/2)^{1}=2$ coefficients, thus, $N_{RF}^{L}=2$ . ∎

If we compare the LC-DPD to FF-DPD in Fig. 3, the number of multipliers, adders, and RF chains in LC-DPD are reduced by the factors, $N_{m}^{F}/N_{m}^{L}=2.67$ , $N_{a}^{F}/N_{a}^{L}=2.4$ , and $N_{RF}^{F}/N_{RF}^{L}=2$ , respectively.

III Training Based on ILA-RPEM: Part I

Now, we describe the ILA-RPEM based learning for FF-DPD and LC-DPD schemes to fully linearize a subarray of PAs. In Part I, we focus on the learning using FF-DPD structure where first, we describe the learning for FF-DPD coefficients, then, the learning for LC-DPD coefficients is realized by utilizing the structure of FF-DPD. Whereas, in Part II, the learning completely exploits the structure of LC-DPD.

Fig. 4 represents a general ILA architecture to linearize a subarray of PAs using RPEM algorithm. Here, the message $s(n)$ is inputted to the FF-DPD (or LC-DPD) block with coefficient vector $\bm{\Phi}$ (or $\bm{\overline{\Phi}}$ ) which generates a vector $\bm{X}$ of predistorted signals using (3). Thereafter, these signals are inputted to the respective PAs to get the output vector $\bm{Y}$ using (2). To minimize the nonlinearties in $\bm{Y}$ , in the feedback loop, first, $\bm{Y}$ is scaled by the inverse gain $1/G$ of the PAs to get $\bm{Y}^{{}^{\prime}}=[y_{1}^{{}^{\prime}}(n)\cdots y_{S}^{{}^{\prime}}(n)]^{T}$ , where $y_{l}^{{}^{\prime}}(n)=(1/G)y_{l}(n)$ . Then, it is inputted to the training block. Based on RPEM, the block estimates the FF-DPD (or LC-DPD) coefficient vector $\bm{\tilde{\Phi}}$ (or $\bm{\tilde{\overline{\Phi}}}$ ), where $\bm{\tilde{\Phi}}$ is defined similar to $\bm{\Phi}$ as in Section II-A for FF-DPD structure, whereas, $\bm{\tilde{\overline{\Phi}}}$ is defined as $\bm{\overline{\Phi}}$ in (7) for LC-DPD structure. Thereafter, it is copied to FF-DPD (or LC-DPD) block, i.e., $\bm{\Phi}=\bm{\tilde{\Phi}}$ (or $\bm{\overline{\Phi}}=\bm{\tilde{\overline{\Phi}}}$ ) and again generates $\bm{X}$ followed by $\bm{Y}$ . The process repeats until $\bm{\Phi}$ (or $\bm{\overline{\Phi}}$ ) converges. Thereafter, the FF-DPD (or LC-DPD) is trained to fully linearize the PAs.

III-A Linearization of a PA using ILA-RPEM

We study the processing behind the linearization of $l$ th PA of the subarray using the ILA-RPEM in Fig. 4 as follows. Using the input $s(n)$ , the predistorted output $x_{l}(n)$ followed by the PA output $y_{l}(n)$ are obtained using (3) and (2), respectively. To capture the inverse of the nonlinear behavior of the PA, the scaled output of the PA $y_{l}^{{}^{\prime}}(n)$ is inputted to the RPEM based learning block which generates the postdistorter signal $\tilde{x}_{l}(n)$ similar to (3) as:

\displaystyle\tilde{x}_{l}(n)=\bm{\tilde{\Phi}}_{l}^{T}\bm{\Psi}_{l}^{{}^{\prime}},

(10)

where $\bm{\tilde{\Phi}}_{l}$ is the $l$ th vector of the block vector $\bm{\tilde{\Phi}}$ and $\bm{\Psi}_{l}^{{}^{\prime}}$ is the vector of BFs defined using same polynomial terms as in $\bm{\Psi}$ , only the difference is that instead of $s(n)$ , $y_{l}^{{}^{\prime}}(n)$ is the input to the BFs in $\bm{\Psi}$ . Thus, the $i$ th element of the vector $\bm{\Psi}_{l}^{{}^{\prime}}$ is: $\psi_{l,i}^{{}^{\prime}}=\psi_{i}(y_{l}^{{}^{\prime}}(n))$ . Now, the goal of the RPEM algorithm is to minimize the difference between the postdistorted signal $\tilde{x}_{l}$ in (10) and the predistorted signal $x_{l}$ in (3) iteratively by optimizing $\bm{\tilde{\Phi}}_{l}$ . At the end of each iteration, $\bm{\tilde{\Phi}}_{l}$ is copied to $\bm{\Phi}_{l}$ , i.e., $\bm{\Phi}_{l}=\bm{\tilde{\Phi}}_{l}$ and using it, the algorithm tries to capture the inverse of the nonlinear characteristics of the PA. Thus, using the estimated value $\bm{\widehat{\Phi}}_{l}$ at the end of the convergence, the FF-DPD generates the optimal predistored signal $\widehat{x}_{l}$ to linearize the $l$ th PA.

Now, we describe the process to minimize the difference $e_{l}(n)=x_{l}(n)-\tilde{x}_{l}(n)$ in each iteration. The cost function $\mathcal{L}(\bm{\tilde{\Phi}}_{l})$ based on the average of the power content of $e_{l}(n)$ over the long horizon is defined as:

\displaystyle\mathcal{L}(\bm{\tilde{\Phi}}_{l})=\textstyle\lim_{N\to\infty}\frac{1}{N}\sum_{n=1}^{N}\mathbb{E}\left[|e_{l}(n)|^{2}\right],

(11)

From [40], $\mathcal{L}(\bm{\tilde{\Phi}}_{l})$ in (11) can be minimized using the negative gradient of $e_{l}(n)$ in $\bm{\tilde{\Phi}}_{l}$ . It can be obtained as:

\displaystyle-\frac{\mathrm{d}e(n)}{\mathrm{d}\bm{\tilde{\Phi}}_{l}}=\frac{\mathrm{d}\tilde{x}_{l}(n)}{\mathrm{d}\bm{\tilde{\Phi}}_{l}}=\frac{\mathrm{d}\bm{\tilde{\Phi}}_{l}^{T}\bm{\Psi}^{{}^{\prime}}_{l}}{\mathrm{d}\bm{\tilde{\Phi}}_{l}}=\bm{\Psi}^{{}^{\prime}}_{l}.

(12)

Using the gradient in (12), the training block performs the below computations in (13) based on RPEM to get the trained coefficients $\bm{\tilde{\Phi}}_{l}^{(n)}$ for the $l$ th PA in the $n$ th iteration [40].


$\displaystyle\!\!\!\!e_{l}(n)$	$\displaystyle=x_{l}(n)-\tilde{x}_{l}(n),\!\!\!\!\!\!\!$	(13a)
$\displaystyle\!\!\!\!\xi_{l}^{(n)}$	$\displaystyle=\rho\xi_{l}^{(n-1)}+1-\rho,\!\!\!\!\!\!\!$	(13b)
$\displaystyle\!\!\!\!Z_{l}^{(n)}$	$\displaystyle=\bm{\Psi}_{l}^{{}^{\prime}T}(n)P_{l}^{(n-1)}{\bm{\Psi}_{l}^{{}^{\prime}}}^{*}(n)+\xi_{l}^{(n)},\!\!\!\!\!\!\!$	(13c)
$\displaystyle\!\!\!\!P_{l}^{(n)}$	$\displaystyle=(P_{l}^{(n-1)}-P_{l}^{(n-1)}{\bm{\Psi}_{l}^{{}^{\prime}}}^{*}(n){Z_{l}^{(n)}}^{-1}\bm{\Psi}_{l}^{{}^{\prime}T}(n)P_{l}^{(n-1)})/\xi_{l}^{(n)},\!\!\!\!\!\!\!$	(13d)
$\displaystyle\!\!\!\!\bm{\tilde{\Phi}}_{l}^{(n)}$	$\displaystyle=\bm{\tilde{\Phi}}_{l}^{(n-1)}+P_{l}^{(n)}{\bm{\Psi}_{l}^{{}^{\prime}}}^{*}(n)e_{l}(n),\!\!\!\!\!\!\!$	(13e)

Here, in (13a), $e_{l}(n)$ is computed. Thereafter, the forgetting factor $\xi_{l}$ is determined recursively in (13b) using its value in the previous iteration and the rate of growth $\rho$ . The initial value $\xi_{l}^{(0)}=\lambda_{0}$ and $\xi_{l}$ grows exponentially to $1$ with iterations. Using $\xi_{l}$ , BF vector $\bm{\Psi}_{l}^{{}^{\prime}}$ , and the covariance matrix $P_{l}$ , the scalar $Z_{l}$ is computed in (13c). Initial value $P_{l}^{(0)}=\mu\bm{I}$ , where $\bm{I}$ is the identity matrix and $\mu$ is a constant. Finally, matrix $P_{l}$ is obtained in (13d) followed by $\bm{\tilde{\Phi}}_{l}$ is determined in (13e) recursively at the end of the iteration. Moreover, RPEM is free from complex matrix inverse operations like in a LS estimation as $Z_{l}$ is scalar. Now, using the above study for the linearization of $l$ th PA, we realize the full linearization of PAs of a subarray.

III-B FF-DPD to Linearize a Subarray using ILA-RPEM

As described earlier, the structure of FF-DPD in Fig. 3(a) is similar to assigning individual DPD in Fig. 1 to each of the PAs. Its complexity is reduced only due to utilizing same set of $Q$ BFs for a given type of PAs. Therefore, using FF-DPD, the full linearization for $S$ PAs through ILA-RPEM is same as the parallel linearization of each of the $S$ PAs using the same process as for the $l$ th PA (cf. previous paragraph). For the parallel operations, the different parameters are arranged in the matrix form as follows. We define, $\bm{E}=[e_{1},\cdots,e_{S}]^{T}$ , $\bm{\xi}\triangleq\text{diag}(\xi_{1},\cdots,\xi_{S})$ , $\bm{Z}\triangleq\text{diag}(Z_{1},\cdots,Z_{S})$ , $\bm{\Upsilon}\triangleq\text{diag}(\bm{\Psi}_{1}^{{}^{\prime}},\cdots,\bm{\Psi}_{S}^{{}^{\prime}})$ , $\bm{P}\triangleq\text{diag}(P_{1},\cdots,P_{S})$ , and $\bm{\Xi}\triangleq\text{diag}(\xi_{1}\bm{I}_{Q},\cdots,\xi_{S}\bm{I}_{Q})$ . Here, $\text{diag}(\cdot)$ is diagonal matrix constructed using the input scalar or matrix elements.

Algorithm 1 Iterative estimation of coefficients for FF-DPD.

1:The values of

\rho

\lambda_{0}

\mu

, and

\bm{\tilde{\Phi}}^{(0)}

2:The estimated coefficient vector

\bm{\widehat{\Phi}}

\bm{P}^{(0)}=\text{diag}(\underbrace{\mu\bm{I}_{Q},\cdots,\mu\bm{I}_{Q}}_{S\text{ times}})

\bm{\xi}^{(0)}=\lambda_{0}\bm{I}_{S}

5:Assign

\bm{\Phi}^{(0)}=\bm{\tilde{\Phi}}^{(0)}

and

\bm{X}(1)

using (3) followed by determine

\bm{\Upsilon}(1)

using the outputs

\bm{Y}(1)

of the PAs

\bm{\tilde{X}}(1)=\bm{\Upsilon}(1)^{T}\bm{\tilde{\Phi}}^{(0)}

n=1

8:repeat

\bm{E}(n)=\bm{X}(n)-\bm{\tilde{X}}(n)

10:

\bm{\xi}^{(n)}=\rho\bm{\xi}^{(n-1)}+\bm{I}_{S}-\rho\bm{I}_{S}

11:

\bm{Z}^{(n)}=\bm{\Upsilon}^{T}(n)\bm{P}^{(n-1)}\bm{\Upsilon}^{*}(n)+\bm{\xi}^{(n)}

12:

\bm{P}^{(n)}=(\bm{P}^{(n-1)}-\bm{P}^{(n-1)}\bm{\Upsilon}^{*}(n){\bm{Z}^{(n)}}^{-1}\bm{\Upsilon}^{T}(n)\bm{P}^{(n-1)}){\bm{\Xi}^{(n)}}^{-1}

13:

\bm{\tilde{\Phi}}^{(n)}=\bm{\tilde{\Phi}}^{(n-1)}+\bm{P}^{(n)}\bm{\Upsilon}^{*}(n)\bm{E}(n)

14: Assign

\bm{\Phi}^{(n)}=\bm{\tilde{\Phi}}^{(n)}

and compute

\bm{X}(n+1)

using

\eqref{eq:x_l_FF}

, then find

\bm{\Upsilon}(n+1)

using the outputs

\bm{Y}(n+1)

of the PAs

15:

\bm{\tilde{X}}(n+1)=\bm{\Upsilon}(n+1)^{T}\bm{\tilde{\Phi}}^{(n)}

16:

n=n+1

17:until

\bm{\tilde{\Phi}}^{(n)}

converges

18:

\bm{\widehat{\Phi}}=\bm{\tilde{\Phi}}^{(n)}

In Algorithm 1, the values are assigned to the independent parameters $\rho$ , $\lambda_{0}$ , and $\mu$ . Then, the initial values of the covariance matrix $\bm{P}$ and the forgetting factor $\bm{\xi}$ are computed in Steps 3 and 4. Thereafter, the calculations from Step 9 to Step 15 are repeated until the coefficient vector $\bm{\tilde{\Phi}}$ converges. Lastly, we obtain the optimal coefficient vector $\bm{\widehat{\tilde{\Phi}}}$ in Step 18 which is copied to the FF-DPD, i.e., $\bm{\widehat{\Phi}}=\bm{\widehat{\tilde{\Phi}}}$ .

Performance

If we examine Step 13 of Algorithm 1, $\bm{\Phi}$ is iteratively estimated using the correlation matrix $\bm{P}$ . Therefore, the coefficients in the vector $\bm{\Phi}_{l}$ of the block vector $\bm{\Phi}$ are correlated with each other to provide optimal predistorted signal $x_{l}$ to the linearize the $l$ th PA. As each PA is provided a separate predistorted signal, hence, FF-DPD gives the best performance.

Complexity

As the matrix multiplication dominates in the complexity of an algorithm [41], therefore, we consider Steps 11, 12, and 13 to determine the computational complexity of Algorithm 1 in an iteration. In Step 11, the matrices $\bm{\Upsilon}$ , $\bm{P}$ have the sizes $QS\times S$ and $QS\times QS$ , respectively. The computational complexity of $\bm{\Upsilon}^{T}\bm{P}$ is $O(SQSQS)=O(Q^{2}S^{3})$ . Now, the matrix $\bm{\Upsilon}^{T}\bm{P}$ has the size $S\times QS$ which is further multiplied by $\bm{\Upsilon}^{*}$ with complexity $O(SQSS)=O(QS^{3})$ . Thus, the total complexity of Step 11 is $O(Q^{2}S^{3}+QS^{3})$ . Similarly, the complexities of Steps 12 and 13 are $O(2Q^{3}S^{3}+2Q^{2}S^{3}+QS^{3})$ and $O(Q^{2}S^{3}+QS^{2})$ , respectively⁵⁵5Note that the algorithm is still free from the matrix inverse operations, because, although, $\bm{Z}$ and $\bm{\Xi}$ are the matrices, but, they are diagonal matrices and their inverses are only the inverse of their diagonal elements.. In these operations, the highest order term is $Q^{3}S^{3}$ , therefore, per iteration, the computational complexity of the algorithm is $O(Q^{3}S^{3})$ .

Lemma 2.

Using property of diagonal matrix multiplication, complexity of Algorithm 1 is reduced by a factor of $S^{2}$ .

Proof:

As the direct multiplication of diagonal matrices is not computationally efficient due to redundant multiplication of $0$ entries. For example, the two diagonal matrices of size $2\times 2$ complies the following multiplication, $\text{diag}(a_{1},a_{2})\text{diag}(b_{1},b_{2})=\text{diag}(a_{1}b_{1},a_{2}b_{2})$ , where $a_{i}$ and $b_{i}$ ; $i\in\{1,2\}$ are the scalars. If we consider the conventional matrix multiplication, total number of multiplications is $2^{3}$ , but, according to the diagonal matrix multiplication, only $2$ multiplications are required of their diagonal entries. Thus, there are $6$ redundant multiplications in the former method. The same multiplication property is applied if $a_{i}$ and $b_{i}$ are matrices provided their sizes should satisfy the multiplication $a_{i}b_{i}$ . If we employ it in the matrix multiplication $\bm{\Upsilon}^{T}\bm{P}\bm{\Upsilon}^{*}$ of Step 11, there are $2S$ respective diagonal elements multiplications. Further, in each diagonal multiplication, for example, in $l$ th, the matrix multiplication is $\bm{\Psi}_{l}^{{}^{\prime}T}P_{l}{\bm{\Psi}_{l}^{{}^{\prime}}}$ with total number of multiplications $O(Q^{2}+Q)$ . Thus, total multiplication is $O(Q^{2}S+QS)$ . Similarly, the computational complexities of Steps 12 and 13 are $O(2Q^{3}S+2Q^{2}S+QS)$ and $O(Q^{2}S+QS)$ , respectively. Thus, per iteration, the computational complexity is $O(Q^{3}S)$ which is $S^{2}$ times less than the conventional method. ∎

III-C Realization of Learning for LC-DPD using FF-DPD

To learn the LC-DPD coefficient vector $\bm{\overline{\Phi}}$ , we exploit the learning process for FF-DPD. In this regard, vector $\bm{\overline{\Phi}}$ is converted into FF-DPD coefficient vector $\bm{\Phi}$ and vice versa. To realize the conversions mathematically, we define two operators $\bm{\mathfrak{M}}_{1}$ and $\bm{\mathfrak{M}}_{2}$ .

Definition 2 (A Linear Operator $\bm{\mathfrak{M}}_{1}$ ).

The function $f_{1}(\cdot)$ that transforms the shape of the coefficient vector $\bm{\overline{\Phi}}$ into the shape of $\bm{\Phi}$ as expressed in (14a) is a linear operator $\bm{\mathfrak{M}}_{1}$ as defined in (14b).


		$\displaystyle\!\!\!\!\!\bm{\Phi}=f_{1}(\bm{\overline{\Phi}})=\bm{\mathfrak{M}}_{1}\bm{\overline{\Phi}},\!\!$		(14a)
		$\displaystyle\!\!\!\!\!\bm{\mathfrak{M}}_{1}\triangleq\begin{bmatrix}\bm{M}_{11}&\cdots&\bm{M}_{1g}\\ \vdots&\ddots&\vdots\\ \bm{M}_{S1}&\cdots&\bm{M}_{Sg}\end{bmatrix}\!\!,\bm{M}_{ij}=\begin{bmatrix}m_{11}^{ij}&\cdots&m_{1L_{j}}^{ij}\\ \vdots&\ddots&\vdots\\ m_{Q1}^{ij}&\cdots&m_{QL_{j}}^{ij}\end{bmatrix}\!\!$		(14b)

where $L_{j}=n_{j}Sr^{(\nu+j-1)}$ , $m_{uv}^{ij}\in\{0,1\}$ for $u\in\{1,\cdots,Q\}$ , $v\in\{1,\cdots,L_{j}\}$ , $i\in\{1,\cdots,S\}$ and $j\in\{1,\cdots,g\}$ . Here, $m_{uv}^{ij}=1$ indicates that after performing the operation in (14a), the $v$ th element of vector $\bm{\overline{\Phi}}_{j}$ is assigned to the $u$ th element of vector $\bm{\Phi}_{i}$ . Furthermore, the operator $\bm{\mathfrak{M}}_{1}$ has the following two properties.

(i)

In each row vector of the matrix $\bm{\mathfrak{M}}_{1}$ , only one element is $1$ and the remaining elements are $0$ .
(ii)

The sum of the elements in the $z$ th column vector of the operator depicts the repetition of $z$ th element of vector $\bm{\overline{\Phi}}$ in the vector $\bm{\Phi}$ . Also, if the $z$ th element of $\bm{\overline{\Phi}}$ lies in the $j$ th vector $\bm{\overline{\Phi}}_{j}$ of the block vector $\bm{\overline{\Phi}}$ , then the number of repetition of $z$ th coefficient of $\bm{\overline{\Phi}}$ in $\bm{\Phi}$ is $r^{-(\nu+j-1)}$ which is same as the sum of the elements of the $z$ th column vector of $\bm{\mathfrak{M}}_{1}$ .

Example: From Fig. 3(b), the block vector $\bm{\overline{\Phi}}=[\bm{\overline{\Phi}}_{1}^{T},\bm{\overline{\Phi}}_{2}^{T}]^{T}$ , where $\bm{\overline{\Phi}}_{1}=[\overline{\phi}_{1,1},\overline{\phi}_{3,1},\overline{\phi}_{1,2},\overline{\phi}_{3,2}]^{T}$ , and $\bm{\overline{\Phi}}_{2}=[\overline{\phi}_{1,3},\overline{\phi}_{1,4}]^{T}$ . Here, the parameters for LC-DPD are: $\nu=1$ , $r=1/2$ , $g=2$ , $n_{1}=n_{2}=2$ , $Q=4$ , and $S=4$ . To reshape $\bm{\overline{\Phi}}$ into the shape of $\bm{\Phi}=[\bm{\Phi}_{1}^{T},\bm{\Phi}_{2}^{T},\bm{\Phi}_{3}^{T},\bm{\Phi}_{4}^{T}]^{T}$ as for FF-DPD in Fig. 3(a), we use the operator $\mathfrak{M}_{1}$ as given below, where $\bm{\Phi}_{i}=[\phi_{i,1},\phi_{i,2},\phi_{i,3},\phi_{i,4}]^{T}$ for $i\in\{1,2,3,4\}$ .

\displaystyle\bm{\mathfrak{M}}_{1}=\begin{bmatrix}\bm{M}_{11}&\bm{M}_{12}\\ \bm{M}_{21}&\bm{M}_{22}\\ \bm{M}_{31}&\bm{M}_{32}\\ \bm{M}_{41}&\bm{M}_{42}\end{bmatrix}=\footnotesize\left[\begin{array}[]{@{}c|c@{}}\begin{matrix}[0.7]1&0&0&0\\ 0&0&1&0\\ 0&0&0&0\\ 0&0&0&0\end{matrix}&\begin{matrix}[0.7]0&0\\ 0&0\\ 1&0\\ 0&1\end{matrix}\\ \hline\cr\begin{matrix}[0.7]1&0&0&0\\ 0&0&1&0\\ 0&0&0&0\\ 0&0&0&0\end{matrix}&\begin{matrix}[0.7]0&0\\ 0&0\\ 1&0\\ 0&1\end{matrix}\\ \hline\cr\begin{matrix}[0.7]0&1&0&0\\ 0&0&0&1\\ 0&0&0&0\\ 0&0&0&0\end{matrix}&\begin{matrix}[0.7]0&0\\ 0&0\\ 1&0\\ 0&1\end{matrix}\\ \hline\cr\begin{matrix}[0.7]0&1&0&0\\ 0&0&0&1\\ 0&0&0&0\\ 0&0&0&0\end{matrix}&\begin{matrix}[0.7]0&0\\ 0&0\\ 1&0\\ 0&1\end{matrix}\end{array}\right]

(19)

In (19), $\bm{\mathfrak{M}}_{1}$ satisfies the property (i) as each of its row vector contains one element as $1$ and remaining are $0$ . Next, to verify the property (ii), for instance, the sum of the elements of the $5$ th $(z=5)$ column vector is $4$ which entails that $5$ th coefficient $\overline{\phi}_{1,3}$ of $\bm{\overline{\Phi}}$ repeats $4$ times in $\bm{\Phi}$ . It can also be determined using the expression $r^{-(\nu+j-1)}$ where $j=2$ as the 5th $(z=5)$ column vector lies in the 2nd $(j=2)$ group and it gives $4$ after substitution of the value of the parameters. Besides, the block vector $\bm{\Phi}=[\bm{\Phi}_{1}^{T},\bm{\Phi}_{2}^{T},\bm{\Phi}_{3}^{T},\bm{\Phi}_{4}^{T}]^{T}$ is obtained using the computation in (14a), where $\bm{\Phi}_{1}=[\overline{\phi}_{1,1},\overline{\phi}_{1,2},\overline{\phi}_{1,3},\overline{\phi}_{1,4}]^{T}$ , $\bm{\Phi}_{2}=[\overline{\phi}_{1,1},\overline{\phi}_{1,2},\overline{\phi}_{1,3},\overline{\phi}_{1,4}]^{T}$ , $\bm{\Phi}_{3}=[\overline{\phi}_{3,1},\overline{\phi}_{3,2},\overline{\phi}_{1,3},\overline{\phi}_{1,4}]^{T}$ , and $\bm{\Phi}_{4}=[\overline{\phi}_{3,1},\overline{\phi}_{3,2},\overline{\phi}_{1,3},\overline{\phi}_{1,4}]^{T}$ . $\Box$

Definition 3 (A Linear Operator $\bm{\mathfrak{M}}_{2}$ ).

The function $f_{2}(\cdot)$ that transforms the shape of the coefficient vector $\bm{\Phi}$ into the shape of $\bm{\overline{\Phi}}$ as expressed in (20a), where some of the elements of $\bm{\overline{\Phi}}$ are the average of some of elements of $\bm{\Phi}$ , is a linear operator $\bm{\mathfrak{M}}_{2}$ as defined in (20b).


		$\displaystyle\!\!\!\!\!\bm{\overline{\Phi}}=f_{2}(\bm{\Phi})=\bm{\mathfrak{M}}_{2}\bm{\Phi},\!\!$		(20a)
		$\displaystyle\!\!\!\!\!\bm{\mathfrak{M}}_{2}\triangleq\begin{bmatrix}\bm{M}_{11}&\cdots&\bm{M}_{1S}\\ \vdots&\ddots&\vdots\\ \bm{M}_{g1}&\cdots&\bm{M}_{gS}\end{bmatrix}\!\!,\bm{M}_{ij}=\begin{bmatrix}m_{11}^{ij}&\cdots&m_{1Q}^{ij}\\ \vdots&\ddots&\vdots\\ m_{L_{i}1}^{ij}&\cdots&m_{L_{i}Q}^{ij}\end{bmatrix}\!\!$		(20b)
		$\displaystyle\!\!\!\!\!\left\{\begin{array}[]{cc}m_{uv_{1}}^{ij_{1}}=\cdots=m_{uv_{N_{i}}}^{ij_{N_{i}}}=1/{N_{i}};&\text{for }\overline{\phi}_{i,u}=1/{N_{i}}\sum_{t=1}^{N_{i}}\phi_{j_{t},v_{t}},\\ m_{uv}^{ij}=0;&\text{ Otherwise},\end{array}\right.\!\!$		(20e)

where $L_{i}=n_{i}Sr^{(\nu+i-1)}$ , $N_{i}=r^{(\nu+i-1)}$ , $u\in\{1,\cdots,L_{i}\}$ , $v,v_{1},v_{2},\cdots\in\{1,\cdots,Q\}$ , $i\in\{1,\cdots,g\}$ , and $j\in\{1,\cdots,S\}$ . The value of the matrix $\bm{\mathfrak{M}}_{2}$ elements is determined using (20e) based on the relationship between the elements of the vectors $\bm{\overline{\Phi}}$ and $\bm{\Phi}$ . Besides, the operator $\bm{\mathfrak{M}}_{2}$ has the following two properties.

(i)

In each row vector of the matrix $\bm{\mathfrak{M}}_{2}$ , the sum of the elements is $1$ .
(ii)

Each column vector of $\bm{\mathfrak{M}}_{2}$ has only one nonzero element which takes the value $1/{N_{i}}$ .

Example: If we consider Fig. 3 for this example, from Fig. 3(a), the block vector $\bm{\Phi}$ for FF-DPD is expressed as $\bm{\Phi}=[\bm{\Phi}_{1}^{T},\bm{\Phi}_{2}^{T},\bm{\Phi}_{3}^{T},\bm{\Phi}_{4}^{T}]^{T}$ , where $\bm{\Phi}_{j}=[\phi_{j,1},\phi_{j,2},\phi_{j,3},\phi_{j,4}]^{T}$ for $S=4$ and $Q=4$ . To transform $\bm{\Phi}$ into the shape of $\bm{\overline{\Phi}}=[\bm{\overline{\Phi}}_{1}^{T},\bm{\overline{\Phi}}_{2}^{T}]^{T}$ as for the LC-DPD in Fig. 3(b), we use the operator $\mathfrak{M}_{2}$ as given below, where $\bm{\overline{\Phi}}_{1}=[\overline{\phi}_{1,1},\overline{\phi}_{3,1},\overline{\phi}_{1,2},\overline{\phi}_{3,2}]^{T}$ , and $\bm{\overline{\Phi}}_{2}=[\overline{\phi}_{1,3},\overline{\phi}_{1,4}]^{T}$ .

\displaystyle\bm{\mathfrak{M}}_{2}=\footnotesize\left[\begin{array}[]{@{}c|c|c|c@{}}\begin{matrix}[1.3]\frac{1}{2}&0&0&0\\ 0&0&0&0\\ 0&\frac{1}{2}&0&0\\ 0&0&0&0\end{matrix}&\begin{matrix}[1.3]\frac{1}{2}&0&0&0\\ 0&0&0&0\\ 0&\frac{1}{2}&0&0\\ 0&0&0&0\end{matrix}&\begin{matrix}[1.3]0&0&0&0\\ \frac{1}{2}&0&0&0\\ 0&0&0&0\\ 0&\frac{1}{2}&0&0\end{matrix}&\begin{matrix}[1.3]0&0&0&0\\ \frac{1}{2}&0&0&0\\ 0&0&0&0\\ 0&\frac{1}{2}&0&0\end{matrix}\\ \hline\cr\begin{matrix}[1.3]0&0&\frac{1}{4}&0\\ 0&0&0&\frac{1}{4}\end{matrix}&\begin{matrix}[1.3]0&0&\frac{1}{4}&0\\ 0&0&0&\frac{1}{4}\end{matrix}&\begin{matrix}[1.3]0&0&\frac{1}{4}&0\\ 0&0&0&\frac{1}{4}\end{matrix}&\begin{matrix}[1.3]0&0&\frac{1}{4}&0\\ 0&0&0&\frac{1}{4}\end{matrix}\end{array}\right]

(23)

In (23), $\bm{\mathfrak{M}}_{2}$ satisfies the property (i) as sum of the elements in each of its row vector is $1$ . Further, in each column vector, only one element is nonzero and its value is $1/{N_{i}}$ . For instance, in the second column vector, the third element is nonzero and for it, parameter $i=1$ . So, $N_{1}=(1/2)^{(1+1-1)}=1/2$ . Thus, property (ii) is also satisfied by $\bm{\mathfrak{M}}_{2}$ . Moreover, after performing the operation in (20b), the relationship between the elements of $\bm{\overline{\Phi}}$ and $\bm{\Phi}$ can be expressed as: $\overline{\phi}_{1,1}=(\phi_{1,1}+\phi_{2,1})/2$ , $\overline{\phi}_{3,1}=(\phi_{3,1}+\phi_{4,1})/2$ , $\overline{\phi}_{1,2}=(\phi_{1,2}+\phi_{2,2})/2$ , $\overline{\phi}_{3,2}=(\phi_{3,2}+\phi_{4,2})/2$ , $\overline{\phi}_{1,3}=(\phi_{1,3}+\phi_{2,3}+\phi_{3,3}+\phi_{4,3})/4$ , and $\overline{\phi}_{1,4}=(\phi_{1,4}+\phi_{2,4}+\phi_{3,4}+\phi_{4,4})/4$ . $\Box$

Algorithm 2 Estimation of coefficients for LC-DPD (Method-I).

1:The values of

\rho

\lambda_{0}

\mu

\bm{\mathfrak{M}}_{1}

\bm{\mathfrak{M}}_{2}

, and

\bm{\tilde{\overline{\Phi}}}^{(0)}

2:The estimated coefficient vector

\bm{\widehat{\overline{\Phi}}}

3:Determine the initial values

\bm{P}^{(0)}

and

\bm{\xi}^{(0)}

using Steps 3 and 4 of Algorithms 1

4:Find

\bm{\tilde{\Phi}}^{(0)}=\mathfrak{M}_{1}\bm{\tilde{\overline{\Phi}}}^{(0)}

, then, assign

\bm{\Phi}^{(0)}=\bm{\tilde{\Phi}}^{(0)}

and compute

\bm{X}(1)

\bm{\Upsilon}(1)

, and

\bm{\tilde{X}}(1)

as in Steps 5 and 6 of Algorithm 1

n=1

6:repeat

7: Compute

\bm{E}(n)

\bm{\xi}^{(n)}

\bm{Z}^{(n)}

\bm{P}^{(n)}

, and

\bm{\tilde{\Phi}}^{(n)}

from Step 9 to Step 13 of Algorithm 1.

8: Using operator

\bm{\mathfrak{M}}_{2}

in (20b), compute

\bm{\tilde{\overline{\Phi}}}^{(n)}=\bm{\mathfrak{M}}_{2}\bm{\tilde{\Phi}}^{(n)}

9: Then, using

\bm{\mathfrak{M}}_{1}

in (14b), compute

\bm{\tilde{\Phi}}^{(n)}=\bm{\mathfrak{M}}_{1}\bm{\tilde{\overline{\Phi}}}^{(n)}

10: Assign

\bm{\Phi}^{(n)}=\bm{\tilde{\Phi}}^{(n)}

and compute

\bm{X}(n+1)

\bm{\Upsilon}(n+1)

as in Step 14 of Algorithm 1

11:

\bm{\tilde{X}}(n+1)=\bm{\Upsilon}(n+1)^{T}\bm{\tilde{\Phi}}^{(n)}

12:

n=n+1

13:until

\bm{\tilde{\overline{\Phi}}}^{(n)}

converges

14:

\bm{\widehat{\overline{\Phi}}}=\bm{\tilde{\overline{\Phi}}}^{(n)}

Now, using Algorithm 2, we train the coefficient vector $\bm{\tilde{\overline{\Phi}}}$ of LC-DPD as follows. Apart from the parameters, $\rho$ , $\lambda_{0}$ , $\mu$ , and $\bm{\tilde{\overline{\Phi}}}^{(0)}$ , the values of the operators, $\bm{\mathfrak{M}}_{1}$ and $\bm{\mathfrak{M}}_{2}$ are also inputted to the algorithm. As we realize the training of $\bm{\tilde{\overline{\Phi}}}$ by exploiting the training of $\bm{\tilde{\Phi}}$ , the steps of Algorithm 2 are same as of Algorithm 1 except the Steps 4, 8, and 9. In Step 4, using operator $\mathfrak{M}_{1}$ , $\bm{\tilde{\overline{\Phi}}}^{(0)}$ is converted into $\bm{\tilde{\Phi}}^{(0)}$ to compute other initial values of the parameters for the FF-DPD structure. While Steps 8 and 9 enforce the learning of FF-DPD coefficients in $\bm{\tilde{\Phi}}$ to incorporate the repetitive characteristics of LC-DPD coefficients in $\bm{\tilde{\overline{\Phi}}}$ in each iteration. The forth process in Step 8 takes the average of some coefficients of $\bm{\tilde{\Phi}}$ which has to be repeated in the LC-DPD structure and is assigned to a coefficient of $\bm{\tilde{\overline{\Phi}}}$ (cf. example of Definition 3). In the back process in Step 9, again, this coefficient in $\bm{\tilde{\overline{\Phi}}}$ is repeated in $\bm{\tilde{\Phi}}$ (cf. example of Definition 2). Thus, it enforces the repetitive coefficients in $\bm{\tilde{\Phi}}$ to have equal value in each iteration as the learning is based on FF-DPD. Finally, after the convergence, it returns the estimated LC-DPD coefficient vector $\bm{\widehat{\overline{\Phi}}}$ .

Performance and Complexity

As in Algorithm 2, the LC-DPD coefficient vector $\bm{\overline{\Phi}}$ is trained by exploiting the training of the FF-DPD coefficient vector $\bm{\Phi}$ where using the operator $\mathfrak{M}_{2}$ , the common coefficients in $\bm{\overline{\Phi}}$ is obtained by taking the average of some of the coefficients in $\bm{\Phi}$ (cf. Example of Definition 3). However, the obtained common coefficients after the average loose the correlation with the distinct coefficients. Therefore, using it, the generated predistorted signals are not optimal as in FF-DPD to linearize the PAs; thus, its performance is low. Further, the complexity of the algorithm is described as follows. Although, the operators, $\mathfrak{M}_{1}$ and $\mathfrak{M}_{2}$ are represented as the matrices in (14b) and (20b) to analyze the operations mathematically, but, in practice, they have only assignment and average operations whose complexities are negligible compared to the matrix multiplications. Therefore, the dominant operations in Algorithm 2 are same as Algorithm 1, thus, its complexity per iteration is $O(Q^{3}S)$ . To enhance the performance and to reduce the complexity, we propose the improved algorithms in next section.

IV Training Based on ILA-RPEM: Part II

For reducing the complexity of the algorithm, we need to train LC-DPD coefficient vector $\bm{\overline{\Phi}}$ by only exploiting the structure of LC-DPD instead of enforcing its training using the FF-DPD. Because, the length of vector $\bm{\overline{\Phi}}$ as given by (8) is less than $\bm{\Phi}$ . Thus, the training only using the length of vector $\bm{\overline{\Phi}}$ reduces the sizes of the matrices, $\bm{P}$ and $\bm{\Upsilon}$ in the dominant matrix multiplications. Based on it, we propose two algorithms.

In order to train $\bm{\overline{\Phi}}$ by completely exploiting the LC-DPD structure, first, we need to represent $\bm{\overline{\Phi}}$ in a suitable form, i.e., in another block vector where each vector in it consists the coefficients that are multiplied by the BFs to generate a predistorted signal that is distributed to a subgroup of PAs⁶⁶6In LC-DPD assisted subarray, the $S$ PAs of the subarray are divided into $\overline{\sigma}_{g}=Sr^{\nu}$ subgroups, where each subgroup consists $n_{PA}=S/\overline{\sigma}_{g}=r^{-\nu}$ number of PAs. Thus, LC-DPD structure generates $\overline{\sigma}_{g}$ predistorted signals to distribute them to respective subgroups..

Definition 4 (Reshape of $\bm{\overline{\Phi}}$ as $\bm{\overline{\Phi}^{{}^{\prime}}}$ ).

To generate the predistorted signals using the LC-DPD coefficient vector $\bm{\overline{\Phi}}$ , it can be reshaped as the block vector $\bm{\overline{\Phi}^{{}^{\prime}}}$ , given by:

\displaystyle\bm{\overline{\Phi}}^{{}^{\prime}}=[\underbrace{\bm{\overline{\Phi}}_{1}^{{}^{\prime}T},\cdots,\bm{\overline{\Phi}}_{\overline{\sigma}_{1}}^{{}^{\prime}T}}_{1st\text{ gr., each of len. }J_{1}},\cdots,\underbrace{\bm{\overline{\Phi}}_{(\overline{\sigma}_{g-1}+1)}^{{}^{\prime}T},\cdots,\bm{\overline{\Phi}}_{\overline{\sigma}_{g}}^{{}^{\prime}T}}_{gth\text{ gr., each of len. }J_{g}}]^{T},

(24)

where $\bm{\overline{\Phi}}_{i}^{{}^{\prime}}$ is the $i$ th vector in $\bm{\overline{\Phi}}^{{}^{\prime}}$ . Here, the grouping of the vectors is based on their lengths, i.e., the vectors in a group have same length. The total number of groups is $g$ which is equal to the number of vectors in $\bm{\overline{\Phi}}$ (cf. (7)). In the $j$ th group, the number of vectors is $T_{j}$ and each of the vector is of $J_{j}$ length as shown in (24). Besides, $\overline{\sigma}_{j}=\sum_{q=1}^{j}T_{q}$ . Further, $T_{j}$ and $J_{j}$ are given by:


$\displaystyle T_{j}$	$\displaystyle=Sr^{\nu+g-j}\left[1-r\tilde{u}\left(r^{1-j}-1\right)\right],$	(25a)
$\displaystyle J_{j}$	$\displaystyle=Q-\sum_{q=1}^{j-1}n_{g-q+1}\tilde{u}\left(r^{1-j}-1\right),$	(25b)

where $j\in{\{1,\cdots,g\}}$ , $r<1$ and $\tilde{u}(x)=1$ for $x>0$ ; otherwise, $\tilde{u}(x)=0$ .

Example: Again, we consider the instance of LC-DPD structure with $S=4$ , $Q=4$ , $r=1/2$ , and $\nu=1$ in Fig. 3(b) to reshape $\bm{\overline{\Phi}}$ to $\bm{\overline{\Phi}^{{}^{\prime}}}$ . From (7), $\bm{\overline{\Phi}}=[\bm{\overline{\Phi}}_{1}^{T},\bm{\overline{\Phi}}_{2}^{T}]^{T}$ where $\bm{\overline{\Phi}}_{1}=[\underbrace{\overline{\phi}_{1,1},\overline{\phi}_{3,1},\overline{\phi}_{1,2},\overline{\phi}_{3,2}}_{n_{1}=4}]^{T}$ , $\bm{\overline{\Phi}}_{2}=[\underbrace{\overline{\phi}_{1,3},\overline{\phi}_{1,4}}_{n_{2}=2}]^{T}$ , and $g=2$ . Using (24), its reshape as $\bm{\overline{\Phi}^{{}^{\prime}}}$ to generate the predistored signals is given as: $\bm{\overline{\Phi}^{{}^{\prime}}}=[\underbrace{\bm{\overline{\Phi}}_{1}^{{}^{\prime}T}}_{1\text{st}\text{ gr.}},\underbrace{\bm{\overline{\Phi}}_{2}^{{}^{\prime}T}}_{2\text{nd}\text{ gr.}}]^{T}$ . Here, number of groups, $g=2$ and substituting the parameters values in (25), we get $T_{1}=T_{2}=1$ , $J_{1}=4$ , and $J_{2}=2$ . Thus, $\bm{\overline{\Phi}}_{1}^{{}^{\prime}}=[\overline{\phi}_{1,1},\overline{\phi}_{1,2},\overline{\phi}_{1,3},\overline{\phi}_{1,4}]^{T}$ and $\bm{\overline{\Phi}}_{2}^{{}^{\prime}}=[\overline{\phi}_{3,1},\overline{\phi}_{3,2}]^{T}$ . $\Box$

Corollary 1.

As $\bm{\overline{\Phi}^{{}^{\prime}}}$ is the reshape of $\bm{\overline{\Phi}}$ , therefore, the elements in the former vector is same as in the later, thus, the length of the two vectors is equal. The length of $\bm{\overline{\Phi}^{{}^{\prime}}}$ can be obtained by $\sum_{i=1}^{g}T_{i}J_{i}$ , hence, from the length of $\bm{\overline{\Phi}}$ in (8c), $N_{m}^{L}=\sum_{i=1}^{g}T_{i}J_{i}$ .

The above corollary can be proved by substituting $Q=\sum_{i=1}^{g}n_{i}$ in (25) followed by simplifying $\sum_{i=1}^{g}T_{i}J_{i}$ , it gives $N_{m}^{L}$ in (8c). Furthermore, to reshape $\bm{\overline{\Phi}}$ into $\bm{\overline{\Phi}^{{}^{\prime}}}$ , we define a linear operator $\bm{\mathfrak{M}}_{3}$ as below.

Definition 5 (A Linear Operator $\bm{\mathfrak{M}}_{3}$ ).

A linear operator $\bm{\mathfrak{M}}_{3}$ is defined as in (26b) which is used to reshape the LC-DPD coefficient vector $\bm{\overline{\Phi}}$ into the vector $\bm{\overline{\Phi}^{{}^{\prime}}}$ as given by (24).


		$\displaystyle\bm{\overline{\Phi}^{{}^{\prime}}}=\bm{\mathfrak{M}}_{3}\bm{\overline{\Phi}},$		(26a)
		$\displaystyle\bm{\mathfrak{M}}_{3}=\begin{bmatrix}\bm{M}_{11}&\cdots&\bm{M}_{1g}\\ \vdots&\ddots&\vdots\\ \bm{M}_{\overline{\sigma}_{1}1}&\cdots&\bm{M}_{\overline{\sigma}_{1}g}\\ \vdots&\ddots&\vdots\\ \bm{M}_{(\overline{\sigma}_{g-1}+1)1}&\cdots&\bm{M}_{(\overline{\sigma}_{g-1}+1)g}\\ \vdots&\ddots&\vdots\\ \bm{M}_{\overline{\sigma}_{g}1}&\cdots&\bm{M}_{\overline{\sigma}_{g}g}\end{bmatrix},$		(26b)
		$\displaystyle\bm{M}_{ij}=\begin{bmatrix}m_{11}^{ij}&\cdots&m_{1L_{j}}^{ij}\\ \vdots&\ddots&\vdots\\ m_{J_{\tau}1}^{ij}&\cdots&m_{J_{\tau}L_{j}}^{ij}\end{bmatrix},$		(26c)

where $T_{\tau}$ and $J_{\tau}$ is given by (25) for $\tau\in\{1,\cdots,g\}$ , $L_{j}=n_{j}Sr^{(\nu+j-1)}$ , $m_{uv}^{ij}\in\{0,1\}$ for $i\in\{\overline{\sigma}_{\tau-1}+1,\cdots,\overline{\sigma}_{\tau}\}$ $(\overline{\sigma}_{0}=1)$ , $j\in\{1,\cdots,g\}$ , $u\in\{1,\cdots,J_{\tau}\}$ , and $v\in\{1,\cdots,L_{j}\}$ . Moreover, sum of the elements in each of the row or column vector of $\bm{\mathfrak{M}}_{3}$ is $1$ .

Example: Again, we consider the example for Fig. 3(b) to reshape $\bm{\overline{\Phi}}=[\bm{\overline{\Phi}}_{1}^{T},\bm{\overline{\Phi}}_{2}^{T}]^{T}$ into $\bm{\overline{\Phi}^{{}^{\prime}}}=[\bm{\overline{\Phi}}_{1}^{{}^{\prime}T},\bm{\overline{\Phi}}_{2}^{{}^{\prime}T}]^{T}$ , where $\bm{\overline{\Phi}}_{1}=[\overline{\phi}_{1,1},\overline{\phi}_{3,1},\overline{\phi}_{1,2},\overline{\phi}_{3,2}]^{T}$ , $\bm{\overline{\Phi}}_{2}=[\overline{\phi}_{1,3},\overline{\phi}_{1,4}]^{T}$ , $\bm{\overline{\Phi}}_{1}^{{}^{\prime}}=[\overline{\phi}_{1,1},\overline{\phi}_{1,2},\overline{\phi}_{1,3},\overline{\phi}_{1,4}]^{T}$ , and $\bm{\overline{\Phi}}_{2}^{{}^{\prime}}=[\overline{\phi}_{3,1},\overline{\phi}_{3,2}]^{T}$ . For it, the operator $\bm{\mathfrak{M}}_{3}$ is given by:

\displaystyle\bm{\mathfrak{M}}_{3}=\begin{bmatrix}\bm{M}_{11}&\bm{M}_{12}\\ \bm{M}_{21}&\bm{M}_{22}\end{bmatrix}=\footnotesize\left[\begin{array}[]{@{}c|c@{}}\begin{matrix}[0.7]1&0&0&0\\ 0&0&1&0\\ 0&0&0&0\\ 0&0&0&0\end{matrix}&\begin{matrix}[0.7]0&0\\ 0&0\\ 1&0\\ 0&1\end{matrix}\\ \hline\cr\begin{matrix}[0.7]0&1&0&0\\ 0&0&0&1\end{matrix}&\begin{matrix}[0.7]0&0\\ 0&0\end{matrix}\end{array}\right]

(29)

$\Box$

Corollary 2.

The operator $\bm{\mathfrak{M}}_{3}$ is always a square matrix as it is used to reshape the vector $\bm{\overline{\Phi}}$ into the vector $\bm{\overline{\Phi}^{{}^{\prime}}}$ using the same elements. Also, the column vectors in $\bm{\mathfrak{M}}_{3}$ are unit vectors and they are orthogonal to each other. Therefore, they form a orthonormal basis in the space $\mathbb{R}^{N_{m}^{L}}$ . Furthermore, the inverse of $\bm{\mathfrak{M}}_{3}$ is its transpose, i.e., $\bm{\mathfrak{M}}_{3}^{-1}=\bm{\mathfrak{M}}_{3}^{T}$ [42]. Hence, from (26a), using $\bm{\mathfrak{M}}_{3}^{-1}$ , $\bm{\overline{\Phi}^{{}^{\prime}}}$ can be reshaped back to $\bm{\overline{\Phi}}$ as: $\bm{\overline{\Phi}}=\bm{\mathfrak{M}}_{3}^{T}\bm{\overline{\Phi}^{{}^{\prime}}}$ .

Algorithm 3 Estimation of coefficients for LC-DPD (Method-II).

1:The values of

\rho

\lambda_{0}

\mu

\bm{\mathfrak{M}}_{1}

\bm{\mathfrak{M}}_{2}

\bm{\mathfrak{M}}_{3}

, and

\bm{\tilde{\overline{\Phi}}}^{(0)}

2:The estimated coefficients

\bm{\widehat{\overline{\Phi}}}

\bm{\overline{P}}^{(0)}=\text{diag}(\underbrace{\mu\bm{I}_{J_{1}},\cdots,\mu\bm{I}_{J_{1}}}_{T_{1}\text{ times}},\cdots,\underbrace{\mu\bm{I}_{J_{g}},\cdots,\mu\bm{I}_{J_{g}}}_{T_{g}\text{ times}})

\bm{\overline{\xi}}^{(0)}=\lambda_{0}\bm{I}_{\overline{\sigma}_{g}}

5:Operate

\bm{\tilde{\Phi}}^{(0)}=\bm{\mathfrak{M}}_{1}\bm{\tilde{\overline{\Phi}}}^{(0)}

and assign

\bm{\Phi}^{(0)}=\bm{\tilde{\Phi}}^{(0)}

6:Obtain

\bm{X}(1)

using (3) followed by

\bm{Y}(1)

using (2), then, compute

\bm{\Psi}^{{}^{\prime}}

\bm{\Upsilon}(1)

similar to Step 5 of Algorithm 1

7:Compute

\bm{{\overline{\Psi}^{{}^{\prime}}}}(1)=\bm{\mathfrak{M}}_{3}(\bm{\mathfrak{M}}_{2}\bm{\Psi}^{{}^{\prime}}(1))

and

\bm{\overline{\Upsilon}}(1)

8:Compute

\bm{\tilde{X}}(1)=\bm{\Upsilon}(1)^{T}\bm{\tilde{\Phi}}^{(0)}

and

\bm{{\tilde{\overline{\Phi}}^{{}^{\prime}}}}^{(0)}=\mathfrak{M}_{3}\bm{\tilde{\overline{\Phi}}}^{(0)}

n=1

10:repeat

11:

\bm{E}(n)=\bm{X}(n)-\bm{\tilde{X}}(n)

12:

\bm{\overline{E}}(n)=\bm{\mathfrak{M}}_{3}(\bm{\mathfrak{M}}_{2}(\bm{E}(n))\bigotimes\bm{1}_{Q})

13:

\bm{\overline{\xi}}^{(n)}=\rho\bm{\overline{\xi}}^{(n-1)}+\bm{I}_{\overline{\sigma}_{g}}-\rho\bm{I}_{\overline{\sigma}_{g}}

14:

\bm{\overline{Z}}^{(n)}=\bm{\overline{\Upsilon}}^{T}(n)\bm{\overline{P}}^{(n-1)}\bm{\overline{\Upsilon}}^{*}(n)+\bm{\overline{\xi}}^{(n)}

15:

\bm{\overline{P}}^{(n)}=(\bm{\overline{P}}^{(n-1)}-\bm{\overline{P}}^{(n-1)}\bm{\overline{\Upsilon}}^{*}(n){\bm{\overline{Z}}^{(n)}}^{-1}\bm{\overline{\Upsilon}}^{T}(n)\bm{\overline{P}}^{(n-1)}){\bm{\overline{\Xi}}^{(n)}}^{-1}

16:

\bm{{\tilde{\overline{\Phi}}^{{}^{\prime}}}}^{(n)}=\bm{{\tilde{\overline{\Phi}}^{{}^{\prime}}}}^{(n-1)}+(\bm{\overline{P}}^{(n)}\bm{\overline{\Upsilon}}^{*}(n)\bm{1}_{\overline{\sigma}_{g}})\bigodot\bm{\overline{E}}(n)

17: Operate

\bm{\tilde{\overline{\Phi}}}^{(n)}=\mathfrak{M}_{3}^{T}\bm{{\tilde{\overline{\Phi}}^{{}^{\prime}}}}^{(n)}

followed by

\bm{\tilde{\Phi}}^{(n)}=\bm{\mathfrak{M}}_{1}\bm{\tilde{\overline{\Phi}}}^{(n)}

, then, assign

\bm{\Phi}^{(n)}=\bm{\tilde{\Phi}}^{(n)}

18: Using obtained

\bm{X}(n+1)

followed by

\bm{Y}(n+1)

and then

\bm{\Psi}^{{}^{\prime}}(n+1)

, find

\bm{\Upsilon}(n+1)

\bm{{\overline{\Psi}^{{}^{\prime}}}}(n+1)=\bm{\mathfrak{M}}_{3}(\bm{\mathfrak{M}}_{2}\bm{\Psi}^{{}^{\prime}}(n+1))

, and

\bm{\overline{\Upsilon}}(n+1)

19:

\bm{\tilde{X}}(n+1)=\bm{\Upsilon}(n+1)^{T}\bm{\tilde{\Phi}}^{(n)}

20:

n=n+1

21:until

\bm{\tilde{\overline{\Phi}}}^{(n)}

converges

22:

\bm{\widehat{\overline{\Phi}}}=\bm{\tilde{\overline{\Phi}}}^{(n)}

Now, we utilize the operator $\bm{\mathfrak{M}}_{3}$ to train the coefficient vector $\bm{\overline{\Phi}}$ in Algorithm 3. In the algorithm based on ILA-RPEM, the sizes of different matrices and vectors used in the training are determined according to the shape of the vector $\bm{\overline{\Phi}^{{}^{\prime}}}$ in (24). They are defined as: the forgetting vector $\bm{\overline{\xi}}\triangleq\text{diag}(\xi_{1},\cdots,\xi_{\overline{\sigma}_{g}})$ , $\bm{\overline{Z}}\triangleq\text{diag}(Z_{1},\cdots,Z_{\overline{\sigma}_{g}})$ , $\bm{\overline{P}}\triangleq\text{diag}(P_{1},\cdots,P_{\overline{\sigma}_{g}})$ , and $\bm{\Xi}\triangleq\text{diag}(\xi_{1}\bm{I}_{J_{1}},\cdots,\xi_{\overline{\sigma}_{1}}\bm{I}_{J_{1}},\cdots,\xi_{(\overline{\sigma}_{g-1}+1)}\bm{I}_{J_{g}},\cdots,\xi_{\overline{\sigma}_{g}}\bm{I}_{J_{g}})$ . To determine the parameter $\bm{\overline{\Upsilon}}$ , first, using the outputs $\bm{Y}$ of the PAs, we obtain the block vector $\bm{\Psi}^{{}^{\prime}}=[\bm{\Psi}_{1}^{{}^{\prime}T},\cdots,\bm{\Psi}_{S}^{{}^{\prime}T}]^{T}$ , where $\bm{\Psi}_{l}^{{}^{\prime}}=[\psi_{l,1}^{{}^{\prime}},\cdots,\psi_{l,Q}^{{}^{\prime}}]^{T}$ . From $\bm{\Psi}^{{}^{\prime}}$ , the block vector $\bm{\overline{\Psi^{{}^{\prime}}}}$ for LC-DPD can be obtained using the operator $\bm{\mathfrak{M}}_{2}$ as: $\bm{\overline{\Psi^{{}^{\prime}}}}=\bm{\mathfrak{M}}_{2}\bm{\Psi}^{{}^{\prime}}$ . Here, $\bm{\overline{\Psi^{{}^{\prime}}}}=[\bm{\overline{\Psi^{{}^{\prime}}}}_{1}^{T},\cdots,\bm{\overline{\Psi^{{}^{\prime}}}}_{g}^{T}]^{T}$ and $\bm{\overline{\Psi^{{}^{\prime}}}}_{i}$ can be represented similar to $\bm{\overline{\Phi}}_{i}$ in (1) for $i\in\{1,\cdots,g\}$ , where $\phi$ is replaced by $\psi^{{}^{\prime}}$ . Further, using operator $\bm{\mathfrak{M}}_{3}$ , $\bm{\overline{\Psi^{{}^{\prime}}}}$ can be reshaped into $\bm{\overline{\Psi}^{{}^{\prime}}}$ as: $\bm{\overline{\Psi}^{{}^{\prime}}}=\bm{\mathfrak{M}}_{3}\bm{\overline{\Psi^{{}^{\prime}}}}$ . Now, using the vectors in the block vector $\bm{\overline{\Psi}^{{}^{\prime}}}$ , $\bm{\overline{\Upsilon}}$ can be expressed as⁷⁷7This procedure is used to compute $\bm{\overline{\Upsilon}}$ from $\bm{Y}$ in Steps 7 and 18 of Algorithm 3.: $\bm{\overline{\Upsilon}}=\text{diag}(\bm{\overline{\Psi}^{{}^{\prime}}}_{1},\cdots,\bm{\overline{\Psi}^{{}^{\prime}}}_{\overline{\sigma}_{g}})$ . Now the process in the Algorithm 3 can be described as follows. The inputs to the algorithm are same as in Algorithm 2 along with the operator $\bm{\mathfrak{M}}_{3}$ . In the first two steps, correlation matrix $\bm{\overline{P}}$ and $\bm{\overline{\xi}}$ are initialized. Using Steps 5 to 8, the initial values of $\bm{\overline{\Upsilon}}$ and the postdistorted signal vector $\bm{\tilde{X}}$ are determined. Thereafter, similar to Algorithms 1 and 2, the iterative steps are followed to get the the converged value of $\bm{\widehat{\overline{\Phi}}}$ . In the loop, the operators, $\bigotimes$ and $\bigodot$ represent the Kronecker and Hadamard products, respectively.

Performance and Complexity

In Algorithm 3, the block vector $\bm{\overline{\Phi}}$ is reshaped as the block vector $\bm{{\overline{\Phi}}^{{}^{\prime}}}$ to correlate the coefficients in its vector $\bm{{\overline{\Phi}}^{{}^{\prime}}}_{i}$ using the correlation matrix $\bm{\overline{P}}$ while the training (cf. Step 16 of Algorithm 3). However, the common coefficients in it are correlated with the distinct coefficients in the first $\overline{\sigma}_{1}=T_{1}$ vectors of the block vector $\bm{{\overline{\Phi}}^{{}^{\prime}}}$ (cf. (24)). For example, in Fig. 3(b), the coefficients, $\overline{\phi}_{1,3}$ , $\overline{\phi}_{1,4}$ are commonly shared with the remaining coefficients $\overline{\phi}_{1,1}$ , $\overline{\phi}_{1,2}$ and $\overline{\phi}_{3,1}$ , $\overline{\phi}_{3,2}$ to generate the predistorted signals $x_{1}$ and $x_{2}$ , respectively. But, using Algorithm 3, $\overline{\phi}_{1,3}$ , $\overline{\phi}_{1,4}$ are only correlated with $\overline{\phi}_{1,1}$ , $\overline{\phi}_{1,2}$ , thus, the predistorted signal $x_{1}$ is better to linearize $1$ st subgroup of PAs, whereas, $x_{2}$ performs less to linearize the $2$ nd subgroup. Therefore, the predistorted signals generated using the coefficients in first $T_{1}$ vectors in $\bm{{\overline{\Phi}}^{{}^{\prime}}}$ are optimal to linearize the respective subgroups of PAs. But, still, this algorithm provides a low performance for the remaining PAs. Therefore, next, we propose an algorithm that establishes the correlation of the common coefficients with the all distinct coefficients. Besides, the complexity of Algorithm 3 in an iteration can be determined using the dominant matrix multiplications in Steps 14, 15, and 16. As the matrices, $\bm{\overline{\Upsilon}}$ , $\bm{\overline{P}}$ , $\bm{\overline{Z}}$ , and $\bm{\overline{\Xi}}$ are diagonal, the respective complexities of the three steps are: $O(Q^{2}T_{1}+QT_{1})$ , $O(2Q^{3}T_{1}+2Q^{2}T_{1}+QT_{1})$ , and $O(Q^{2}T_{1}+QT_{1})$ . Based on the dominant term, the complexity of Algorithm 3 is $O(Q^{3}T_{1})$ . As $T_{1}<S$ , the complexity of Algorithm 3 is lesser than Algorithms 1 and 2. Next, to correlate the common coefficients in LC-DPD structure with all the remaining, first, we define a sequence of linear operators as below.

Definition 6 (A Sequence of Linear Operators for the Back and Forth Operations).

To establish the correlation of the common coefficients to $r^{-(g-1)}$ set of distinct coefficients in the LC-DPD structure, a sequence of $r^{-(g-1)}$ operators is defined in (30a), where the $t$ th operator $\bm{\mathfrak{M}}_{4,t}$ in the sequence is given by (30b) and its $ij$ th matrix element $\bm{M}_{ij,t}$ is expressed in (30c).


		$\displaystyle\bm{\mathfrak{M}}_{4}=\{\bm{\mathfrak{M}}_{4,1},\bm{\mathfrak{M}}_{4,2},\cdots,\bm{\mathfrak{M}}_{4,r^{-(g-1)}}\}$		(30a)
		$\displaystyle\bm{\mathfrak{M}}_{4,t}=\begin{bmatrix}\bm{M}_{11,t}&\cdots&\bm{M}_{1g,t}\\ \vdots&\ddots&\vdots\\ \bm{M}_{T_{1}1,t}&\cdots&\bm{M}_{T_{1}g,t}\\ \bm{M}_{(T_{1}+1)1,t}&\cdots&\bm{M}_{(T_{1}+1)g,t}\end{bmatrix}$		(30b)
		$\displaystyle\bm{M}_{ij,t}=\begin{bmatrix}m_{11,t}^{ij}&\cdots&m_{1L_{j},t}^{ij}\\ \vdots&\ddots&\vdots\\ m_{\overline{L}_{i}1,t}^{ij}&\cdots&m_{\overline{L}_{i}L_{j},t}^{ij}\end{bmatrix},$		(30c)
		$\displaystyle\bm{\Phi}_{tr.,t}^{(n+1)}=\text{trunc}(\bm{\mathfrak{M}}_{4,t}\bm{\overline{\Phi}}^{(n)},Q),$		(30d)
		$\displaystyle\bm{\overline{\Phi}}^{(n+1)}=\bm{\mathfrak{M}}_{4,t}^{T}\text{merge}(\bm{\Phi}_{tr.,t}^{(n+1)},\bm{\mathfrak{M}}_{4,t}\bm{\overline{\Phi}}^{(n)},Q)$		(30e)

where $t\in\{1,\cdots,r^{-(g-1)}\}$ , $\overline{L}_{i}=Q$ for $i\in{\{1,\cdots,T_{1}\}}$ ; otherwise $\overline{L}_{i}=N_{m}^{L}-T_{1}Q$ for $i=T_{1}+1$ . $L_{j}=n_{j}Sr^{(\nu+j-1)}$ , $m_{uv,t}^{ij}\in\{0,1\}$ for $i\in\{1,\cdots,T_{1}+1\}$ , $j\in\{1,\cdots,g\}$ , $u\in\{1,\cdots,\overline{L}_{i}\}$ , and $v\in\{1,\cdots,L_{j}\}$ . Again, like $\bm{\mathfrak{M}}_{3}$ in (26b), the sum of the elements in each of the row or column vector of $\bm{\mathfrak{M}}_{4,t}$ is $1$ . Further, from Corollary 2, $\bm{\mathfrak{M}}_{4,t}^{-1}=\bm{\mathfrak{M}}_{4,t}^{T}$ . Besides, the common coefficients are with the $t$ th set of distinct coefficients in the vector $\bm{\Phi}_{tr.,t}$ which is obtained using the $t$ th operator in (30d). Here, first, $\bm{\mathfrak{M}}_{4,t}\bm{\overline{\Phi}}$ is multiplied by $\bm{\overline{\Phi}}$ , then, using a truncation function $\text{trunc}(\bm{\mathfrak{M}}_{4,t}\bm{\overline{\Phi}},Q)$ , the first $Q$ elements of the vector $\bm{\mathfrak{M}}_{4,t}\bm{\overline{\Phi}}$ is truncated to get $\bm{\Phi}_{tr.,t}$ . Its reverse operation, i.e., the conversion of $\bm{\Phi}_{tr.,t}$ to $\bm{\overline{\Phi}}$ can be performed using (30e), where $\text{merge}(\bm{a},\bm{b},Q)$ updates the first $Q$ elements of $\bm{b}$ by merging them with $\bm{a}$ of length $Q$ .

Example: For the instance of the LC-DPD structure with $S=4$ , $Q=4$ , $r=1/2$ , and $\nu=1$ in Fig. 3(b), $r^{-(g-1)}=(1/2)^{-(2-1)}=2$ , thus, from (30a), $\bm{\mathfrak{M}}_{4}=\{\bm{\mathfrak{M}}_{4,1},\bm{\mathfrak{M}}_{4,2}\}$ . As described earlier, the coefficients, $\overline{\phi}_{1,3}$ , $\overline{\phi}_{1,4}$ are commonly shared with $\overline{\phi}_{1,1}$ , $\overline{\phi}_{1,2}$ and $\overline{\phi}_{3,1}$ , $\overline{\phi}_{3,2}$ . From (30d), we can find the vectors, $\bm{\Phi}_{tr.,1}=[\overline{\phi}_{1,1},\overline{\phi}_{1,2},\overline{\phi}_{1,3},\overline{\phi}_{1,4}]^{T}$ and $\bm{\Phi}_{tr.,2}=[\overline{\phi}_{3,1},\overline{\phi}_{3,2},\overline{\phi}_{1,3},\overline{\phi}_{1,4}]^{T}$ from $\bm{\overline{\Phi}}=[\overline{\phi}_{1,1},\overline{\phi}_{3,1},\overline{\phi}_{1,2},\overline{\phi}_{3,2},\overline{\phi}_{1,3},\overline{\phi}_{1,4}]^{T}$ using the operators $\bm{\mathfrak{M}}_{4,1}$ and $\bm{\mathfrak{M}}_{4,2}$ as given by (30b) which are:

\displaystyle\bm{\mathfrak{M}}_{4,1}=\footnotesize\left[\begin{array}[]{@{}c|c@{}}\begin{matrix}1&0&0&0\\ 0&0&1&0\\ 0&0&0&0\\ 0&0&0&0\end{matrix}&\begin{matrix}0&0\\ 0&0\\ 1&0\\ 0&1\end{matrix}\\ \hline\cr\begin{matrix}0&1&0&0\\ 0&0&0&1\end{matrix}&\begin{matrix}0&0\\ 0&0\end{matrix}\end{array}\right],\;\;\bm{\mathfrak{M}}_{4,2}=\footnotesize\left[\begin{array}[]{@{}c|c@{}}\begin{matrix}0&1&0&0\\ 0&0&0&1\\ 0&0&0&0\\ 0&0&0&0\end{matrix}&\begin{matrix}0&0\\ 0&0\\ 1&0\\ 0&1\end{matrix}\\ \hline\cr\begin{matrix}1&0&0&0\\ 0&0&1&0\end{matrix}&\begin{matrix}0&0\\ 0&0\end{matrix}\end{array}\right].

(35)

Also, using this example, we can realize (30e). $\Box$

Algorithm 4 Estimation of coefficients for LC-DPD (Method-III).

1:The values of

\rho

\lambda_{0}

\mu

\bm{\mathfrak{M}}_{1}

\bm{\mathfrak{M}}_{2}

\bm{\mathfrak{M}}_{3,F}

\bm{\mathfrak{M}}_{3,B}

\bm{\tilde{\overline{\Phi}}}^{(0)}

, and

\mathcal{N}

2:The estimated coefficients

\bm{\widehat{\overline{\Phi}}}

\bm{P}_{t}^{(0)}=\text{diag}(\underbrace{\mu\bm{I}_{Q},\cdots,\mu\bm{I}_{Q}}_{T_{1}=Sr^{(\nu+g-1)}})

;

t\in\{1,\cdots,r^{-(g-1)}\}

\bm{\xi}_{t}^{(0)}=\lambda_{0}\bm{I}_{T_{1}}

;

t\in\{1,\cdots,r^{-(g-1)}\}

5:Operate

\bm{\tilde{\Phi}}^{(0)}=\bm{\mathfrak{M}}_{1}\bm{\tilde{\overline{\Phi}}}^{(0)}

and assign

\bm{\Phi}^{(0)}=\bm{\tilde{\Phi}}^{(0)}

6:Using obtained

\bm{X}(1)

followed by

\bm{Y}(1)

and then

\bm{\Psi}^{{}^{\prime}}

, find

\bm{\Upsilon}(1)

\bm{\overline{\Psi^{{}^{\prime}}}}(1)=\bm{\mathfrak{M}}_{2}\bm{\Psi}^{{}^{\prime}}(1)

, and

\bm{\overline{\Upsilon}}(1)

7:Compute

\bm{\tilde{X}}(1)=\bm{\Upsilon}(1)^{T}\bm{\tilde{\Phi}}^{(0)}

n=0

9:repeat

10:

t=1

11: repeat

12: repeat

13: Operate

\bm{\tilde{\Phi}}^{(n)}=\bm{\mathfrak{M}}_{1}\bm{\tilde{\overline{\Phi}}}^{(n)}

and assign

\bm{\Phi}^{(n)}=\bm{\tilde{\Phi}}^{(n)}

14: Using obtained

\bm{X}(n+1)

followed by

\bm{Y}(n+1)

and then

\bm{\Psi}^{{}^{\prime}}(n+1)

, find

\bm{\Upsilon}(n+1)

\bm{\overline{\Psi^{{}^{\prime}}}}(n+1)=\bm{\mathfrak{M}}_{2}\bm{\Psi}^{{}^{\prime}}(n+1)

\bm{\Psi}^{{}^{\prime}}_{t}(n+1)=\text{trunc}(\bm{\mathfrak{M}}_{4,t}\bm{\overline{\Psi}}^{{}^{\prime}}(n+1),Q)

\bm{\Upsilon}_{t}(n+1)

, and

\bm{\Phi}_{tr.,t}^{(n)}=\text{trunc}(\bm{\mathfrak{M}}_{4,t}\bm{\tilde{\overline{\Phi}}}^{(n)},Q)

15:

\bm{\tilde{X}}(n+1)=\bm{\Upsilon}(n+1)^{T}\bm{\tilde{\Phi}}^{(n)}

16:

\bm{E}(n+1)=\bm{X}(n+1)-\bm{\tilde{X}}(n+1)

17:

\bm{\overline{E}}(n+1)=\bm{\mathfrak{M}}_{2}(\bm{E}(n+1)\bigotimes\bm{1}_{Q})

18:

\bm{E}_{t}(n+1)=\bm{\mathfrak{M}}_{4,t}\bm{\overline{E}}(n+1)

19:

\bm{\xi}_{t}^{(n+1)}=\rho\bm{\xi}_{t}^{(n)}+\bm{I}_{{}_{T_{1}}}-\rho\bm{I}_{{}_{T_{1}}}

20:

\bm{Z}_{t}^{(n+1)}=\bm{\Upsilon}_{t}^{T}(n+1)\bm{P}_{t}^{(n)}\bm{\Upsilon}_{t}^{*}(n+1)+\bm{\xi}_{t}^{(n+1)}

21:

\bm{P}_{t}^{(n+1)}=(\bm{P}_{t}^{(n)}-\bm{P}_{t}^{(n)}\bm{\Upsilon}_{t}^{*}(n+1){\bm{Z}_{t}^{(n+1)}}^{-1}\bm{\Upsilon}_{t}^{T}(n+1)\bm{P}_{t}^{(n)}){\bm{\Xi}_{t}^{(n+1)}}^{-1}

22:

\bm{\Phi}_{tr.,t}^{(n+1)}=\bm{\Phi}_{tr.,t}^{(n)}+(\bm{P}_{t}^{(n+1)}\bm{\Upsilon}_{t}^{*}(n+1)\bm{1}_{T_{1}})\bigodot\bm{E}_{t}(n+1)

23:

\bm{\tilde{\overline{\Phi}}}^{(n+1)}=\bm{\mathfrak{M}}_{4,t}^{T}\text{merge}(\bm{\Phi}_{tr.,t}^{(n+1)},\bm{\mathfrak{M}}_{4,t}\bm{\tilde{\overline{\Phi}}}^{(n)},Q)

24:

n=n+1

25: until

n\;\%\;\mathcal{N}==0

26:

t=t+1

27: until

t>r^{-(g-1)}

28:until

\bm{\tilde{\overline{\Phi}}}^{(n)}

converges

29:

\bm{\widehat{\overline{\Phi}}}=\bm{\tilde{\overline{\Phi}}}^{(n)}

Now, use of the sequence of operators, $\bm{\mathfrak{M}}_{4}$ is described in Algorithm 4 to correlate the common coefficients with the remaining distinct coefficients. Apart from the input parameters in Algorithm 2, $\bm{\mathfrak{M}}_{4}$ and the number $\mathcal{N}$ (defined later) are inputted in Algorithm 4. Then, it initializes the correlation matrix $\bm{P}_{t}$ and the forgetting matrix $\bm{\xi}_{t}$ for $t\in\{1,\cdots,r^{-(g-1)}\}$ . In Steps 5, 6, and 7, similar to earlier algorithms, it determines the initial values of $\bm{\tilde{X}}$ and $\bm{\overline{\Upsilon}}$ . Thereafter, three nested loops are initialized. In Steps 13 and 14 of the innermost loop, the algorithm determines $\bm{\Upsilon}_{t}$ and $\bm{\Phi}_{tr.,t}$ using (30d) in the $n$ th iteration. Steps 15 to 22 follow the process to update $\bm{\Phi}_{tr.,t}$ in the current iteration. Then, $\bm{\Phi}_{tr.,t}$ is converted back to $\bm{\tilde{\overline{\Phi}}}$ using (30e). This process repeats for $\mathcal{N}$ iterations to correlate the common coefficients to the $t$ th set of distinct coefficients. Thereafter, $t$ increases by unity to establish the correlation of the common coefficients with next set of distinct coefficients. Thus, the two inner loops repeat until $t>r^{-(g-1)}$ and at this point, the algorithm completes the one cycle to correlate the common coefficients with all sets of distinct coefficients. The outermost loop repeats this cycle until $\bm{\tilde{\overline{\Phi}}}$ converges.

Performance and Complexity

Algorithm 4 performs better than Algorithms 2 and 3, because, the common coefficients establish the correlation with all distinct coefficients in the vector $\bm{\overline{\Phi}}$ . Therefore, using it, the generated predistorted signal vector $\bm{X}$ gives the better linearization of the PAs in the subarray. Moreover, for the fair comparison of this algorithm with the other earlier algorithms in terms of computational complexity, we assign $\mathcal{N}=1$ . The complexity of a correlation cycle depends on the dominant matrix multiplications in Steps 20, 21, and 22. For the correlation of the common coefficients with the $r^{-(g-1)}$ distinct set of coefficients, the complexities of the three steps are: $O(Q^{2}T_{1}+QT_{1})r^{-(g-1)}$ , $O(2Q^{3}T_{1}+2Q^{2}T_{1}+QT_{1})r^{-(g-1)}$ , and $O(Q^{2}T_{1}+QT_{1})r^{-(g-1)}$ . Thus, considering the dominant term, the complexity is $O(Q^{3}T_{1})r^{-(g-1)}$ . Taking $r^{-(g-1)}$ inside the big O, the complexity can be approximated as: $O(Q^{3}T_{1})r^{-(g-1)}\approx O(Q^{3}\overline{\sigma}_{g})$ . As $\overline{\sigma}_{g}>T_{1}$ , but, $<S$ , hence, the complexity of Algorithm 4 is greater than Algorithms 3, but, it is lesser than that of Algorithms 1 and 2.

V Numerical Results

V-A Evaluation Environment

To evaluate the performance of the proposed analysis, we consider a subarray of $S=8$ PAs. The PAs follow Saleh model as in [43]. The parameters, $\alpha_{a,i}$ and $\beta_{a,i}$ for the AM/AM distortion and $\alpha_{\phi,i}$ and $\beta_{\phi,i}$ for the AM/PM distortion are given by: $\alpha_{a,i}=0.9445+0.1u_{a,i}$ , $\beta_{a,i}=0.5138+0.1v_{a,i}$ , $\alpha_{\phi,i}=4.0033+u_{\phi,i}$ , and $\beta_{\phi,i}=9.1040+v_{\phi,i}$ , where $u_{a,i}$ , $v_{a,i}$ , $u_{\phi,i}$ and $v_{\phi,i}$ are uniformly distributed over $[0,1]$ for $i\in\{1,\cdots,S\}$ . The GMP used for a DPD has the order $P=5$ and each order has the memory length, $M=5$ . But, as described using Fig. 2, the BFs with indices in the set $\mathcal{I}$ have the nonzero coefficients; thus $Q=10$ . Further, for the LC-DPD scheme, the BFs are arranged in decreasing order of their dominance. The arrangement is represented using the indices of the BFs as: $\{4,5,14,15,19,20,24,25,9,10\}$ . Moreover, for the LC-DPD structure, the geometric sequence in (6) has the following parameters’ values: $g=2$ , $n_{1}=4$ $n_{2}=6$ , $\nu=1$ , and $r=1/2$ . The bandwidth of the input signal $s(n)$ is $4$ MHz. To get the insights on the linearization using the obtained results, the in-band average powers of the power spectral density (PSD) of the input signal $s(n)$ and the output of the PAs, $y_{l}(n)$ ; $l\in\{1\cdots S\}$ , are normalized to $0$ dB. Moreover, for the algorithms based on ILA-RPEM, the input parameters are set as: $\lambda_{0}=0.99$ , $\mu=0.2$ , and $\rho=0.95$ . The linearization of the PAs is determined using the error vector magnitudes (EVMs) of their outputs with respect to the reference message signal $s(n)$ . It is computed as: $\text{EVM}_{PA_{l}}=\sqrt{\sum_{n}(y_{l}(n)-s(n))^{2}/\sum_{n}s^{2}(n)}$ ; $l\in\{1,\cdots,S\}$ . The simulation for the proposed analysis is performed using MATLAB/Simulink ${}^{\text{\textregistered}}$ .

V-B Performance Comparison

In Fig. 5, the different DPD schemes are compared in terms of linearization of the PAs in the subarray. It can be observed in Fig. 5(a) that FF-DPD gives the best performance to linearize all the PAs, whereas, single-DPD has the least performance. Because, in FF-DPD, each PA has a separate DPD to linearize itself, but, in single-DPD, all the PAs are linearized using a single DPD. Also, from the bar plot in Fig. 5(e), all the PAs have almost same EVMs and their values are around $3.04\%$ . Whereas, for single-DPD, the EVMs values of the PAs are different and the maximum value goes upto $33.31\%$ . If we compare the three LC-DPD schemes in Figs. 5(b), 5(c), and 5(d), the linearization performance of LC-DPD I is least, because, LC-DPD coefficients in it are determined using the structure of FF-DPD where correlation of the common coefficients are least to the distinct coefficients. Therefore, although, it is giving the better performance than single-DPD, but, none of the PAs is linearized properly. The average EVMs for the first four and the next four PAs are $17.99\%$ and $11.56\%$ , respectively. In LC-DPD II scheme, the structure of LC-DPD is completely exploits, but, the common coefficients are correlated with distinct coefficients for the first two PAs. Therefore, the linearization of these PAs are same as FF-DPD with average EVM equal to $2.71\%$ , while the linerization of the remaining $6$ PAs is less and their average EVM value is $12.09\%$ . In LC-DPD III, the common coefficients are partially correlated with each of the distinct coefficients, therefore, its overall performance is better than the previous two schemes. The average EVM values of LC-DPD I, LC-DPD II, and LC-DPD III schemes are $14.77\%$ , $9.74\%$ , and $8.98\%$ , respectively.

VI Conclusion

In this work, we have proposed two schemes, FF-DPD and LC-DPD to fully linearize the PAs in a subarray of a mMIMO transmitter. Although, FF-DPD provides the best performance but it has high complexity. Using the structure of FF-DPD, we derive a less complex LC-DPD. For the two schemes, four algorithms based on ILA-RPEM are described and their performances and complexities are investigated. From the obtained results we find that FF-DPD almost linearizes the PAs fully with on average $3.04\%$ EVM. The computational complexities of the three algorithms for LC-DPD is less complex, but, their performances in EVM are less than the algorithm for FF-DPD. Furthermore, among the three algorithms for LC-DPD, the third algorithm (LC-DPD III) provides the better performance as it better correlates the common coefficients to the distinct coefficients of the scheme.

References

[1] J. Kenney and A. Leke, “Design considerations for multicarrier CDMA base station power amplifiers,” Microw. J., vol. 42, no. 2, pp. 76–83, Feb. 1999.
[2] A. Katz, J. Wood, and D. Chokola, “The evolution of PA linearization: From classic feedforward and feedback through analog and digital predistortion,” IEEE Microw. Mag., vol. 17, no. 2, pp. 32–40, Feb. 2016.
[3] D. R. Morgan, Z. Ma, J. Kim, M. G. Zierdt, and J. Pastalan, “A generalized memory polynomial model for digital predistortion of RF power amplifiers,” IEEE Trans. Signal Process., vol. 54, no. 10, pp. 3852–3860, Oct. 2006.
[4] A. N. D’Andrea, V. Lottici, and R. Reggiannini, “RF power amplifier linearization through amplitude and phase predistortion,” IEEE Trans. Commun., vol. 44, no. 11, pp. 1477–1484, Nov. 1996.
[5] Y. Liu, W. Pan, S. Shao, and Y. Tang, “A general digital predistortion architecture using constrained feedback bandwidth for wideband power amplifiers,” IEEE Trans. Microw. Theory Tech., vol. 63, no. 5, pp. 1544–1555, Feb. 2015.
[6] Z. Wang, W. Chen, G. Su, F. M. Ghannouchi, Z. Feng, and Y. Liu, “Low feedback sampling rate digital predistortion for wideband wireless transmitters,” IEEE Trans. Microw. Theory Tech., vol. 64, no. 11, pp. 3528–3539, Nov. 2016.
[7] S. Zhang, W. Chen, F. M. Ghannouchi, and Y. Chen, “An iterative pruning of 2-D digital predistortion model based on normalized polynomial terms,” in Proc. IEEE MTT-S Int. Microw. Symp. Digest, Seattle, WA, USA, Jun. 2013, pp. 1–4.
[8] Z. Wang, W. Chen, G. Su, F. M. Ghannouchi, Z. Feng, and Y. Liu, “Low computational complexity digital predistortion based on direct learning with covariance matrix,” IEEE Trans. Microw. Theory Tech., vol. 65, no. 11, pp. 4274–4284, Nov. 2017.
[9] L. Guan and A. Zhu, “Optimized low-complexity implementation of least squares based model extraction for digital predistortion of RF power amplifiers,” IEEE Trans. Microw. Theory Tech., vol. 60, no. 3, pp. 594–603, Mar. 2012.
[10] P. L. Gilabert, G. Montoro, D. López, N. Bartzoudis, E. Bertran, M. Payaro, and A. Hourtane, “Order reduction of wideband digital predistorters using principal component analysis,” in Proc. IEEE MTT-S Int. Microw. Symp. Digest, Seattle, WA, USA, Jun. 2013, pp. 1–7.
[11] R. N. Braithwaite, “Wide bandwidth adaptive digital predistortion of power amplifiers using reduced order memory correction,” in Proc. IEEE MTT-S Int. Microw. Symp. Digest, Atlanta, GA, USA, Jun. 2008, pp. 1517–1520.
[12] J. Swaminathan, P. Kumar, and M. Vinoth, “Performance analysis of LMS filter in linearization of different memoryless non linear power amplifier models,” in Proc. Int. Conf. Advances Computing, Commun. Control, Berlin, Heidelberg, Jan. 2013, pp. 459–464.
[13] P. M. Suryasarman and A. Springer, “A comparative analysis of adaptive digital predistortion algorithms for multiple antenna transmitters,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 62, no. 5, pp. 1412–1420, May 2015.
[14] D. R. Morgan, Z. Ma, and L. Ding, “Reducing measurement noise effects in digital predistortion of RF power amplifiers,” in Proc. IEEE ICC, vol. 4, Anchorage, AK, USA, May 2003, pp. 2436–2439.
[15] L. Gan and E. Abd-Elrady, “Digital predistortion of memory polynomial systems using direct and indirect learning architectures,” in Proc. Int. Conf. IASTED, vol. 654, p. 802.
[16] B. Mohr, W. Li, and S. Heinen, “Analysis of digital predistortion architectures for direct digital-to-RF transmitter systems,” in Proc. IEEE Int. Midwest Symp. Circuits Syst., Boise, ID, USA, Aug. 2012, pp. 650–653.
[17] T. Söderström and P. Stoica, System identification. Prentice-Hall International, 1989.
[18] D. Zhou and V. E. DeBrunner, “Novel adaptive nonlinear predistorters based on the direct learning algorithm,” IEEE Trans. Signal Process., vol. 55, no. 1, pp. 120–133, Jan. 2006.
[19] H. Paaso and A. Mammela, “Comparison of direct learning and indirect learning predistortion architectures,” in Proc. IEEE Int. Symp. Wireless Commun. Syst., Reykjavik, Iceland, Oct. 2008, pp. 309–313.
[20] J. Chani-Cahuana, P. N. Landin, C. Fager, and T. Eriksson, “Iterative learning control for RF power amplifier linearization,” IEEE Trans. Microw. Theory Tech., vol. 64, no. 9, pp. 2778–2789, Sep. 2016.
[21] H. Chauhan, V. Kvartenko, and M. Onabajo, “A tuning technique for temperature and process variation compensation of power amplifiers with digital predistortion,” in Proc. IEEE North Atlantic Test Workshop, Providence, RI, USA, May 2016, pp. 38–45.
[22] E. Jarvinen, S. Kalajo, and M. Matilainen, “Bias circuits for GaAs HBT power amplifiers,” in Proc. IEEE MTT-S Int. Microw. Symps. Digest, vol. 1, Phoenix, AZ, USA, May 2001, pp. 507–510.
[23] E. Ng, Y. Beltagy, P. Mitran, and S. Boumaiza, “Single-input single-output digital predistortion of power amplifier arrays in millimeter wave RF beamforming transmitters,” in Proc. IEEE Int. Microw. Symp.-IMS, Philadelphia, PA, USA, Jun. 2018, pp. 481–484.
[24] K. Hausmair, P. N. Landin, U. Gustavsson, C. Fager, and T. Eriksson, “Digital predistortion for multi-antenna transmitters affected by antenna crosstalk,” IEEE Trans. Microw. Theory Tech., vol. 66, no. 3, pp. 1524–1535, Mar. 2018.
[25] S. Choi and E.-R. Jeong, “Digital predistortion based on combined feedback in MIMO transmitters,” IEEE Commun. Lett., vol. 16, no. 10, pp. 1572–1575, Oct. 2012.
[26] Q. Luo, X.-W. Zhu, C. Yu, and W. Hong, “Single-receiver over-the-air digital predistortion for massive MIMO transmitters with antenna crosstalk,” IEEE Trans. Microw. Theory Tech., vol. 68, no. 1, pp. 301–315, Jan. 2019.
[27] X. Liu, Q. Zhang, W. Chen, H. Feng, L. Chen, F. M. Ghannouchi, and Z. Feng, “Beam-oriented digital predistortion for 5G massive MIMO hybrid beamforming transmitters,” IEEE Trans. Microw. Theory Tech., vol. 66, no. 7, pp. 3419–3432, Jul. 2018.
[28] C. Tarver, A. Balatsoukas-Stimming, C. Studer, and J. R. Cavallaro, “Ofdm-based beam-oriented digital predistortion for massive MIMO,” in Proc. IEEE Int. Symp. Circuits Syst., Daegu, Korea, May 2021, pp. 1–5.
[29] X. Liu, W. Chen, L. Chen, F. M. Ghannouchi, and Z. Feng, “Power scalable beam-oriented digital predistortion for compact hybrid massive MIMO transmitters,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 67, no. 12, pp. 4994–5006, Aug. 2020.
[30] C. Yu, J. Jing, H. Shao, Z. H. Jiang, P. Yan, X.-W. Zhu, W. Hong, and A. Zhu, “Full-angle digital predistortion of 5G millimeter-wave massive MIMO transmitters,” IEEE Trans. Microw. Theory Tech., vol. 67, no. 7, pp. 2847–2860, Jun. 2019.
[31] A. Brihuega, M. Abdelaziz, L. Anttila, M. Turunen, M. Allén, T. Eriksson, and M. Valkama, “Piecewise digital predistortion for mmWave active antenna arrays: Algorithms and measurements,” IEEE Trans. Microw. Theory Tech., vol. 68, no. 9, pp. 4000–4017, Sep. 2020.
[32] N. Tervo, B. Khan, J. P. Aikio, O. Kursu, M. Jokinen, M. E. Leinonen, M. Sonkki, T. Rahkonen, and A. Pärssinen, “Combined sidelobe reduction and omnidirectional linearization of phased array by using tapered power amplifier biasing and digital predistortion,” IEEE Trans. Microw. Theory Tech., vol. 69, no. 9, pp. 4284–4299, Sep. 2021.
[33] C. Yu, J. Jing, H. Shao, Z. H. Jiang, P. Yan, X.-W. Zhu, W. Hong, and A. Zhu, “Full-angle digital predistortion of 5G millimeter-wave massive MIMO transmitters,” IEEE Trans. Microw. Theory Tech., vol. 67, no. 7, pp. 2847–2860, Jul. 2019.
[34] P. Diao, L. Zhang, L. Tao, Y. Zhang, Y. Yi, H. Liu, and D. Zhao, “Full-angle digital predistortion technique for 5G millimeter-wave integrated phased array,” in Proc. IEEE MTT-S Int. Wireless Symp., vol. 1, Harbin, China, Aug. 2022, pp. 1–3.
[35] J. Zhao, P. Liu, L. Zhai, and F. Yang, “A novel digital predistortion based on flexible characteristic detection for 5G massive MIMO transmitters,” IEEE Microw. Wireless Compon. Lett., vol. 32, no. 4, pp. 363–366, Apr. 2021.
[36] J. Yan, H. Wang, and J. Shen, “Novel post-weighting digital predistortion structures for hybrid beamforming systems,” IEEE Commun. Lett., vol. 25, no. 12, pp. 3980–3984, Dec. 2021.
[37] G. Prasad and H. Johansson, “A low-complexity post-weighting predistorter in a mMIMO transmitter under crosstalk,” arXiv preprint arXiv:2304.05795, 2023.
[38] A. A. Saleh, “Frequency-independent and frequency-dependent nonlinear models of TWT amplifiers,” IEEE Trans. Commun., vol. 29, no. 11, pp. 1715–1720, Nov. 1981.
[39] A. Brihuega, L. Anttila, M. Abdelaziz, T. Eriksson, F. Tufvesson, and M. Valkama, “Digital predistortion for multiuser hybrid MIMO at mmwaves,” IEEE Trans. Signal Process., vol. 68, pp. 3603 – 3618, May 2020.
[40] L. Ljung and T. Söderström, Theory and practice of recursive identification. MIT press, 1983.
[41] Y. Li, S.-L. Hu, J. Wang, and Z.-H. Huang, “An introduction to the computational complexity of matrix multiplication,” Journal of the Operations Research Society of China, vol. 8, pp. 29–43, Dec. 2020.
[42] G. Strang, “Linear algebra and its applications 4th ed.” 2012.
[43] C. Liu, W. Feng, Y. Chen, C.-X. Wang, and N. Ge, “Optimal beamforming for hybrid satellite terrestrial networks with nonlinear PA and imperfect CSIT,” IEEE Wireless Commun. Lett., vol. 9, no. 3, pp. 276–280, Mar. 2019.

A General Approach to Fully Linearize the Power Amplifiers in mMIMO with Less Complexity

Abstract

Index Terms:

I Introduction

I-A Related Works

I-B Motivation and Key Contribution

II Structures for Full Linearization

II-A Fully-Featured DPD (FF-DPD)

II-B Low-Complexity DPD (LC-DPD)

Definition 1.

Lemma 1.

Proof:

III Training Based on ILA-RPEM: Part I

III-A Linearization of a PA using ILA-RPEM

III-B FF-DPD to Linearize a Subarray using ILA-RPEM

Performance

Complexity

Lemma 2.

Proof:

III-C Realization of Learning for LC-DPD using FF-DPD

Definition 2 (A Linear Operator 𝔐1\bm{\mathfrak{M}}_{1}).

Definition 3 (A Linear Operator 𝔐2\bm{\mathfrak{M}}_{2}).

Performance and Complexity

IV Training Based on ILA-RPEM: Part II

Definition 4 (Reshape of 𝚽¯\bm{\overline{\Phi}} as 𝚽¯′\bm{\overline{\Phi}^{{}^{\prime}}}).

Corollary 1.

Definition 5 (A Linear Operator 𝔐3\bm{\mathfrak{M}}_{3}).

Corollary 2.

Performance and Complexity

Definition 6 (A Sequence of Linear Operators for the Back and Forth Operations).

Performance and Complexity

V Numerical Results

V-A Evaluation Environment

V-B Performance Comparison

VI Conclusion

References

Definition 2 (A Linear Operator $\bm{\mathfrak{M}}_{1}$ ).

Definition 3 (A Linear Operator $\bm{\mathfrak{M}}_{2}$ ).

Definition 4 (Reshape of $\bm{\overline{\Phi}}$ as $\bm{\overline{\Phi}^{{}^{\prime}}}$ ).

Definition 5 (A Linear Operator $\bm{\mathfrak{M}}_{3}$ ).