This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

User Association and Interference Management in Massive MIMO HetNets

Qiaoyang Ye, Ozgun Y. Bursalioglu, Haralabos C. Papadopoulos, Constantine Caramanis and Jeffrey G. Andrews Q. Ye, C. Caramanis and J. G. Andrews are with WNCG, The University of Texas at Austin, Austin, TX, USA, O. Y. Bursalioglu and H. C. Papadopoulos are with Docomo Innovations Inc, Palo Alto, CA, USA. Email: qye@utexas.edu, {obursalioglu, hpapadopoulos}@docomoinnovations.com, constantine@utexas.edu, jandrews@ece.utexas.edu. Manuscript last revised: September 28, 2025.
Abstract

Two key traits of 5G cellular networks are much higher base station (BS) densities – especially in the case of low-power BSs – and the use of massive MIMO at these BSs. This paper explores how massive MIMO can be used to jointly maximize the offloading gains and minimize the interference challenges arising from adding small cells. We consider two interference management approaches: joint transmission (JT) with local precoding, where users are served simultaneously by multiple BSs without requiring channel state information exchanges among cooperating BSs, and resource blanking, where some macro BS resources are left blank to reduce the interference in the small cell downlink. A key advantage offered by massive MIMO is channel hardening, which enables to predict instantaneous rates a priori. This allows us to develop a unified framework, where resource allocation is cast as a network utility maximization (NUM) problem, and to demonstrate large gains in cell-edge rates based on the NUM solution. We propose an efficient dual subgradient based algorithm, which converges towards the NUM solution. A scheduling scheme is also proposed to approach the NUM solution. Simulations illustrate more than 2x rate gain for 10th percentile users vs. an optimal association without interference management.

I Introduction

Smart densification & heterogeneity in base station (BS) deployments and massive MIMO are considered as two of the most important technologies in 5G cellular systems [1, 2, 3].111As higher-frequency spectrum being available, large arrays become practical even for small cells. For example, at 3.5GHz band, a 36-antenna (arranged on a square grid at half-wavelength separation) can be implemented on a 26cm ×\times 26cm surface. The required implementation area would be smaller as the carrier frequency becomes higher. The massive MIMO regime is the setting when the number of antennas at a BS is significantly larger than the number of users that are simultaneously served by the BS [4, 5, 6]. In this paper, various aspects such as user association, load balancing, scheduling and interference management are considered for future networks with massive MIMO deployments where these technologies are adapted together.

I-A Motivation and Related Work

Conventionally, mobile user equipments (UEs) are served by the BS providing the largest signal-to-interference-plus-noise ratio (SINR) or the largest received power [7] – called max-SINR association in this paper. In heterogeneous networks (HetNets), however, different types of BSs can have large disparities in transmit power, so a max-SINR association results in heavily congested macrocell BSs and lightly loaded low-power small cell BSs. This results in a very inefficient use of available time-frequency resources, and strongly motivates load balancing, which in effect means pushing some UE traffic onto lightly loaded small cells even if it requires reducing their SINRs by many dBs [8].

Load Balancing. Several approaches have been used to study load balancing in HetNets, including stochastic geometry [9, 10], game theory [11] and system-level simulations [12, 13]. Meanwhile, in industry, proactive load balancing is accomplished by biasing UE association towards the small cells [12, 13].222Biasing refers to artificially adding a bias value (e.g., 1010 dB) to received signal power from small cell layer at UEs. Our initial study on load balancing [14] formulated a network utility maximization (NUM) problem for user association in HetNets with single-antenna BSs, where resources are equally allocated among users in the same cell. The equal resource allocation can be suboptimal if the user associations happen on a much slower time scale than the channel variations. In general, the user association and scheduling (resource allocation) problems are coupled, and it is quite difficult to jointly optimize them.

Massive MIMO. A key benefit of massive MIMO is that the extra diversity afforded by the large antenna array averages out the fast fading, and thus the instantaneous rate stabilizes to the long-term mean which changes on slow time scales. As shown in [15], the instantaneous rates can be predicted with peak-rate proxies, which are independent of scheduled instances. This property allows the decoupling of user association and scheduling, which is exploited to achieve near-optimal load balancing in massive MIMO HetNets with cellular transmission (where data for each user is transmitted from a single BS) [15].

Coordinated multi-point transmission. MIMO techniques also provide the option of serving a user at high rates from multiple BSs – referred to as coordinated multi-point transmission (CoMP), which is proposed as one of the core features in LTE-Advanced [16, 17, 18]. The set of BSs that cooperatively serve the same user is referred in this paper as a BS cluster. Paper [19] studies how to determine the BS clusters, while [20, 21, 22, 23, 24, 25, 26] investigate jointly optimized designs involving some of the following aspects: BS cluster selection, beamforming (e.g., coordinated beamforming and joint transmissions), user scheduling and power allocation. In studies such as [23, 24, 25, 26], the complexity of proposed algorithms can become prohibitive as number of antennas increases. On the other hand, an efficient suboptimal precoder for the single-cell scenario with large antenna arrays is proposed in [23].

Resource Blanking. As the macrocells that users are offloaded from now become strong interferers for these offloaded users, the increased interference eats into the gains offered by load balancing. This motivates us to jointly consider user association and interference management. Besides CoMP, another popular interference management approach is to leave some macro resource blocks (RBs) blank, similar to enhanced intercell interference coordination (eICIC) in 3GPP [27]. The key difference between RB blanking in our work and eICIC is that eICIC focuses on the time domain, while in this work blanking is applied in both time and frequency domains. We call the RBs where macro BSs are muted the blank RBs, while the rest of RBs are called normal RBs. Several works have considered the joint problem of user association and RB blanking. For example, [28, 29] proposes a dynamic approach adapting the muting duty cycle to load variations, while [30, 31, 32, 33, 34] consider a more static approach.

For general multi-cell massive MIMO HetNets, the joint optimization of user association and interference management including joint transmission and resource blanking is still an open issue. In this work, we combine various aspects of resource allocation and interference management including resource blanking, joint transmission, association, user scheduling, etc. for massive MIMO deployments. We focus on a distributed-MIMO form of CoMP, which allows local precoding at each BS and does not require channel state information (CSI) exchanges among cooperating BSs [35]. We call this specific form of CoMP as Local Joint Transmission (LJT). LJT allows us to develop a systematic resource allocation approach for CoMP (including cellular transmission as a special case). Other interference management approaches (e.g. [36]) can also be adopted at the cost of additional complexity and overheads (e.g., schemes with joint precoding as discussed in [18]), but such designs are beyond the scope of this paper.

Cross-layer Optimization. To study the joint user association and interference management problem, we propose to use the cross-layer optimization approach, aiming to improve the rate distribution, particularly the cell-edge performance. Cross-layer optimization is quite popular in the study of resource allocation (see, e.g., [37, 38] and references therein). Among these, besides studies with disjoint clusters (i.e., each BS belongs to at most one cluster on any RB), e.g. [22], user-specific clusters with overlapping BSs have also been considered [39, 40]. At any scheduling instant, the cluster formation method we consider can be described using “Dynamic Cooperation Clusters (DCC)” concept of [39]. An important difference between works following the DCC scheme [39] and our work resides in the selection of which BSs serve which users (which also inherently specify active clusters at an RB). For the former case, this selection is based on instantaneous channel gains between users and BSs as in [40]. In particular, the resource allocation in [40, 39] (consisting of as precoding design, scheduling and power allocation) is done at each RB to optimize, for example, instantaneous user rates for some utility function. On the other hand, in our setting massive MIMO rate hardening allows us to allocate resources over many RBs to optimize a utility function over long-term user rates. Thus, the BS cluster selection for each user is determined as a result of load balancing and resource allocation across many RBs, which are performed ahead of time at a much coarser time scale than an individual-RB time scale. We then present scheduling policies at a finer time scale (i.e., RB level) to approach the optimized (coarser time scale) resource allocation.333 In this paper similar to [29], we also consider resource allocation over two different time scales. [29] exploits the sparsity of the interference graph of the HetNet topology to overcome the complex coupling between user scheduling and RB blanking.

I-B Contributions and Organization

In this paper, we present a novel framework for the joint optimization of user association and interference management in massive MIMO HetNets, resulting in the following main contributions.

A unified NUM problem. By exploiting the predictable instantaneous rate, user association and scheduling problems can be decoupled, allowing us in Sec. IV to formulate a unified convex optimization problem for resource allocation with both LJT and RB blanking. Note that in the considered LJT, the clusters are user-specific (i.e., different users can be served by different clusters). The formulated NUM problem can also be applied to scenarios where some bandwidth resources are explicitly reserved for macro or small cell operation, while some resources are reserved for being shared by both layers. As an extension of [15], the optimal solutions can always be realized by a suitably designed scheduler when blanking is used in cellular transmission. On the other hand, with LJT, we show that there exist some solutions that are not implementable. Naturally, the solution of the NUM problem – called the NUM solution – upper bounds the network performance and can serve as a useful benchmark.

Dual subgradient based algorithm. Sec. V presents an efficient algorithm based on the dual sub-gradient method, which converges towards the optimal dual variables. As the objective function is not strictly convex, it is difficult to get the optimal primal variables given optimal dual variables. Exploring the solution structure, we formulate a small-size linear program (LP) to get the optimal primal variables.

Simple scheduling scheme to approach the NUM solution. Note that the NUM solution provides the desirable resource allocation at a coarser time scale. In Sec. VI, we further present scheduling policies at a finer time scale (i.e. RB level), which target approaching the optimized resource allocation in the long term.

Simulations444Some of these results are also published in [41]. in Sec. VII show that the proposed harmonized CoMP/cellular operation can provide significant gains with respect to cellular-only massive MIMO operation [15], especially for cell-edge users. For example, the rate of bottom (10th percentile) users in our setup is about 2.2×\times with respect to the optimal user association without interference management, which itself is much larger than the max-SINR association. Also, the utility provided by the proposed scheduling scheme is within 90% of the utility provided by the NUM solution.

For convenience, the key notation in this paper is summarized in Table I.

TABLE I: Notation Summary
Notation Description
m\mathcal{B}_{\rm m}, s\mathcal{B}_{\rm s}, \mathcal{B}, 𝒰\mathcal{U} set of macro BSs, small cell BSs, all BSs, UEs, respectively
MjM_{j} number of antennas at BS jj
PjP_{j} transmit power of BS jj
𝐆j\mathbf{G}_{j} channel matrix between BS jj and its users
𝐠kj\mathbf{g}_{kj} channel vector between BS jj and UE kk
gkj,i,hkj,ig_{kj,i},h_{kj,i} channel, fast fading between iith antenna of BS jj and UE kk, respectively
βkj\beta_{kj} slow fading between BS jj and UE kk
𝐟kj\mathbf{f}_{kj} precoder of BS jj for UE kk
wk,σ2w_{k},\sigma^{2} AWGN, variance of AWGN, respectively
LL cluster size
SjS_{j} number of users that can be simultaneously served by BS jj in cellular transmission
Sj(L)S_{j}(L) number of users that can be simultaneously served by BS jj in cluster of size LL
AA operation option
(A)\mathcal{B}^{(A)} set of active BSs in operation AA
Lmax(A)L_{\max}^{(A)} maximal cluster size in operation AA
𝒞\mathcal{C} cluster index
𝒰𝒞\mathcal{U}_{\mathcal{C}} set of users served by cluster 𝒞\mathcal{C}
yk(t)y_{k}(t) received signal at UE kk on RB tt
sks_{k} transmitted signal for UE kk
rk𝒞(A)r_{k\mathcal{C}}^{(A)} instantaneous rate of user kk from cluster 𝒞\mathcal{C} in Band-AA in massive MIMO regime
xk𝒞(A)x_{k\mathcal{C}}^{(A)} activity fraction of UE kk from cluster 𝒞\mathcal{C} in Band-AA
RkR_{k} long-term rate of UE kk
μA\mu_{A} fraction of RBs allocated to Band-AA
λAL\lambda_{AL} fraction of RBs allocated to size-LL clusters in Band-AA
x~k𝒞(A)\tilde{x}_{k\mathcal{C}}^{(A)} approximate activity fraction by unique association
𝒞(k)\mathcal{C}^{*}(k) the cluster that serves UE kk in the unique association
αk\alpha_{k} the desired fraction of resources for UE kk in the considered band
R~k\tilde{R}_{k} assumed rate for the VQ scheduling, where R~k=1/αk\tilde{R}_{k}=1/\alpha_{k}
Amax,VA_{\max},V sufficiently large parameters in VQ scheduling scheme
Qk(t),Ak(t)Q_{k}(t),A_{k}(t) VQ length, arrival rate of UE kk in VQ scheduling scheme, respectively
ρ\rho a tunable parameter to characterize how many users the BSs can schedule simultaneously

II System Model

In this paper, we focus on best-effort traffic and consider downlink (DL) transmission in a HetNet with JJ BSs and KK single-antenna users. We let j={1,2,,J}j\!\in\!\mathcal{B}\!=\!\{1,2,\dots,J\} and k𝒰={1,2,,K}k\!\in\!\mathcal{U}\!=\!\{1,2,\dots,K\} denoted the indices of BSs and users, respectively. Without loss of generality, we focus on two-tier HetNets comprising a macro layer and a small-cell layer, such as the one considered in Fig. 1. Letting m\mathcal{B}_{\rm m} and s\mathcal{B}_{\rm s} be the set of macro and small cell BSs, respectively, the BSs belonging to m\mathcal{B}_{\rm m} and s\mathcal{B}_{\rm s} can differ in terms of transmit power, size, density, and number of antennas [1]. The number of antennas at BS jj is denoted by MjM_{j} with Mj1M_{j}\gg 1. We assume time division duplex (TDD) operation with reciprocity-based CSI acquisition [4, 42]. Hence, each user sends a single uplink (UL) pilot to train multiple nearby BSs. In contrast to feedback-based CSI acquisition, this enables the training of large antenna arrays with overhead proportional to the number of simultaneously served users. Moreover, it enables CoMP with practical CSI acquisition overheads. We also assume a block-fading channel model where the channel coefficients remain constant within each RB [43, 4, 42, 5].

Refer to caption
(a) Cellular
Refer to caption
(b) Local Joint Transmission
Refer to caption
(c) Global Joint Transmission
Figure 1: Various MIMO transmission schemes. Color of the beam serving a user indicates where the CSI used for precoding is obtained. For both cellular and local joint transmission, beams are the same color with the BS they are emitted from because in these cases precoding is done only using locally obtained CSI. On the other hand with global joint transmission beams are precoded jointly using CSI from all BSs hence the beams are gray colored.

With massive MIMO, a subset of users are scheduled (or, in other words, are active) on each RB. Various transmission schemes in terms of BS-cluster options are possible with this setup as shown in Fig. 1. At one extreme is cellular transmission shown in Fig. 1(a), where the coded data for any given active user is transmitted from a single BS and each BS manages interference only to the users it serves. This corresponds to having |||\mathcal{B}| many disjoint clusters of size 11. Fig. 1(c) shows the other extreme, Global Joint Transmission (ideal/full CoMP) where all BSs serve and coordinate interference to all active users in the network. In this case there is a single cluster serving all of the users. Other transmission schemes with cluster options in between these two extremes are also possible. DCC [39] can be used to describe any of these transmission schemes. Fig. 1(b) shows the scheme considered in this work where BS-clusters are user-specific. According to this scheme, different active users can be served simultaneously by different potentially overlapping clusters.

CoMP has some well-known advantageous over cellular transmission:
(1) Performance gain at the cell edge: The beamforming (BF) gain becomes intra-cluster BF gain in CoMP, as the same coded data is transmitted from all BSs serving the user. Similarly, the intra-cell interference mitigation is extended across the cluster of BSs by which the user is served. As a result, the performance gain can be realized at the cell edge.

(2) Low training overhead: A single UL pilot from a user can be received at all nearby BS antennas, whether these are in the same or different locations. Thus, the CSI acquisition between a user and nearby BSs need not incur additional overheads with respect to cellular transmission in the TDD system. The total number of users that can served by the system depends on the number of available UL pilots, and it should be the same regardless of cellular or CoMP if the UL pilots in cellular transmission and CoMP are the same.

A major disadvantage of the general CoMP [44, 45, 18] is that it incurs additional overhead compared to cellular transmission for CSI exchanges among cooperating BSs. To overcome this challenge, we focus on schemes that perform local precoding at each BS: The user beams (i.e., precoding vectors) at any RB for any BS jj are designed as if BS jj was in cellular multi-user (MU)-MIMO transmission over all the users it serves. All BSs serving a user transmit the same coded data stream to the user [35]. Each BS transmits the stream on a beam that is (independently) designed for the users at that BS. For instance, the beam with Linear Zero forcing Beamforming (LZFBF) for each user served by BS jj is chosen within the null space of the channels of all the other users served by BS jj, no matter whether there are additional BSs serving the user on the same RB or not. Due to local precoding, BS jj only needs CSI between the users it serves and its own antennas to generate the user beams. Hence, the challenge of costly CSI exchanges between BSs is eliminated.

In contrast to full CoMP555With general CoMP schemes (e.g., cluster ZFBF transmission), the instantaneous rate that is provided to a user by a cluster of BSs is also a function of the identity (and in fact the large-scale channel coefficients) of the set of the other users scheduled for cluster transmission with the user, and can thus vary from slot to slot depending on the scheduling set [45]. As a result, in contrast to the LJT schemes we consider (whereby a user’s instantaneous rate is independent of the identity of the active users in the slot), general CoMP schemes are not amenable to the load balancing and scheduling techniques presented in this work., with local precoding of the form depicted in Fig. 1(b), the instantaneous rate is independent of the other active users’ channels and only depends on the power allocation for user streams served by the BS. By allocating BS power equally across the set of active users666Power allocation is a thoroughly studied topic in the context of MIMO in general (see, e.g.  [46]). With large antenna arrays, massive MIMO systems are able to get substantially better SINRs even without considering any power allocation optimization. For example, [45] considers equal power allocation while Marzetta in his pioneering work [4] allocates power to users proportional to their channel gains, which is in the reverse direction of typical power allocation in the context of a fairness criterion (whereby more power is typically allocated at the cell-edge). Following this trend in massive MIMO and considering the high complexity of power allocation optimization, we consider equal active user-stream power allocation at each BS. This approach simplifies the parametrization of peak-rate calculations in Section III and yields the convex NUM formulation in Section IV. and fixing the scheduling set sizes for BSs belong to common BS clusters, the instantaneous rate can be predicted a priori and independently of the other active-user set, thereby substantially reducing the complexity of the resource allocation problem.

In this work, we restrict our attention to a small set of predefined possible scheduling set sizes, while how to select these sizes depend on many factors (e.g., number of antennas at the BSs, network deployment, etc.) and we leave the study to future work. Due to the fact that an uplink user-pilot trains all antennas at nearby BSs, we assume BSs that serving users at larger-size clusters (during a given scheduling slot) can schedule a larger number of users in the same slot than BSs serving users at smaller-size clusters or with cellular transmission. We thus let the size of scheduling set of any BS jj depend on the size of the cluster including BS jj. To avoid ambiguity on the scheduling set size of different BSs in the same BS cluster, we enforce the following condition: All users served by a given BS jj are served in clusters of the same size on a given RB. This constraint ensures that overlapping clusters on any RB are of the same size. Let SjS_{j} denote the number of users that can be simultaneously served by BS jj in the cellular transmission. We summarized the properties of the specific form of LJT considered in this work in the following definition:

Definition 1.

Admissible Local Joint Transmission Schemes (ALJTSs): An ALJTS schedules users for transmission on a sequence of RBs, and satisfies the following on each RB:

  1. (1)

    All users served by a given BS jj are served in BS clusters of the same size LL, for some L1L\geq 1;

  2. (2)

    BS jj in BS clusters of size LL serves at most Sj(L)S_{j}(L) users with SjSj(L)LSjS_{j}\leq S_{j}(L)\leq LS_{j} and MjSj(L)M_{j}\gg S_{j}(L)777In this work, the SjS_{j}’s and Sj(L)S_{j}(L)’s are assumed to be predefined parameters. Their impact on performance (and their choice) in practice depends on many factors (e.g., available UL pilot dimensions per RB for training in the network, spatial pilot reuse, network deployment, etc.).;

  3. (3)

    The user beams (i.e., precoding vectors) at BS jj are designed as if BS jj were performing cellular multi-user (MU)-MIMO transmission over (at most) Sj(L)S_{j}(L) users;

  4. (4)

    All BSs serving an active user transmit the same coded data stream to the user. Each BS transmits this user stream on a beam that is (independently) designed for this user at that BS.

  5. (5)

    Any BS jj share its transmit power PjP_{j} equally among the users it serves.

As explained in Sec. III, the properties of ALJTS enable us to decouple the problems scheduling and load balancing, since the instantaneous active-user rates depend only on the serving BS cluster and are predictable a priori based on the given Sj(L)S_{j}(L) value.

To illustrate these principles, we give LJT examples that obey these rules in Table II, involving clusters of sizes 1 (i.e., cellular transmission) and 2. Four BSs are considered with Pj=1P_{j}\!=\!1, Sj(1)=2S_{j}(1)\!=\!2 and Sj(2)=3S_{j}(2)\!=\!3. As the table reveals, each BS on RB #1 engages in cellular transmission. On RB #2, pairs of BSs perform LJT with each BS pair serving a triplet of users. RBs #3 and #4 provide additional, more interesting modes. On RB #3, no user is served by the same cluster. On RB #4, BSs 1 and 2 serve users in clusters of size 2, while BSs 3 and 4 serve users in cellular transmission. Note that if orthogonal pilots are used, (at least) 8, 6, 6 and 7 uplink pilot dimensions (one dimension per user) are needed to enable RBs #1, #2, #3 and #4, respectively. Evidently, the choice of scheduled user sizes Sj(L)S_{j}(L) signifies how aggressively pilot dimensions are reused across the network (e.g., SjS_{j} for fully reused pilots and LSjLS_{j} for orthogonal pilots). Inspection of Table II reveals that the first ALJTS property can be satisfied either if all clusters at an RB are of the same size (RBs #1,2,3) or if different size clusters are disjoint across BSs (RB #4).

TABLE II: Example of RBs enabled by LJT over 4 BSs.
RB BS 1 BS 2 BS 3 BS 4
Cluster Size 1 1 1 1
#1 User Power 1/2 1/2 1/2 1/2
Served Users 1,2 3,4 5,6 7,8
Cluster Size 2 2 2 2
#2 User Power 1/3 1/3 1/3 1/3
Served Users 1,2,3 1,2,3 4,5,6 4,5,6
Cluster Size 2 2 2 2
#3 User Power 1/3 1/3 1/3 1/3
Served Users 1,2,3 1,4,5 2,4,6 3,5,6
Cluster Size 2 2 1 1
#4 User Power 1/3 1/3 1/2 1/2
Served Users 1,2,3 1,2,3 4,5 6,7

Depending on the availability of bands and the preferences of the operator, in 5G HetNets, different groups of BSs might be allowed to jointly transmit in clusters across groups of RBs (e.g., across different frequency bands). Each such combination of BS clusters is considered separately as a distinct entity in the resource allocation problem. Given the set of BSs \mathcal{B} in the network, there are 2||12^{|\mathcal{B}|}-1 many possible different entity/operation options. Although in principle it is straightforward to take into account all these different options, for simplicity and considering the general interest in 5G HetNet deployments, we focus only on the following options: 1) macro and small cell layers may operate together; 2) only the macro layer operates; 3) only the small cell layer operates. We call these 3 different operations as shared, macro-only and blanking operations, respectively. Let A{1,2,3}A\!\in\!\{1,2,3\} denote the operation type, where A=1A\!=\!1, A=2A\!=\!2 and A=3A\!=\!3 denote shared, macro-only and blanking operations, respectively. Let RBs allocated to operation AA form Band-AA, and the set of BSs that can transmit in Band-AA be (A)\mathcal{B}^{(\!A)}. We have the following cases:
1) A=1A\!=\!1: shared operation is considered for this band, where macro and small cell BSs can both transmit, i.e., (1)=\mathcal{B}^{(1)}\!=\!\mathcal{B}. In this band, clusters can be formed by BSs from different layers.
2) A=2A\!=\!2: macro-only operation is considered for this band, and only macros can transmit, i.e., (2)=m\mathcal{B}^{(2)}\!=\!\mathcal{B}_{m}.
3) A=3A\!=\!3: blanking is applied to this band, where all of the macro BSs are muted, i.e., (3)=s.\mathcal{B}^{(3)}\!=\!\mathcal{B}_{s}.888In general, blanking can be applied to only a subset of of macro BSs. In fact, [29] has shown gains compared to blanking all macro BSs. Our formulation can include partial blanking by considering new operation options. To simplify the exposition, in this work we confine ourselves to blanking all of the macro BSs.

Different resource allocations among operations refer to different scenarios. For example, scenarios with A{2,3}A\!\in\!\{2,3\} correspond to cases where orthogonal RBs are allocated to macro and small cells, while scenarios with A{1,3}A\!\in\!\{1,3\} can be applied to cases with eICIC. In some scenarios, resource allocation among operations can be fixed a priori. In more flexible scenarios, resource allocation among operations can be a part of the optimization problem. Our formulation in Sec. IV can be applied to both cases.

To show how these different transmission operations of practical interest can be considered within our specific LJT, we give a small example in Table III. As different bands may prefer different cluster sizes, we consider clusters up to size Lmax(A)L_{\max}^{(A)} in Band-AA. Then the potential BS clusters that can be active in this band is given by the subsets of (A)\mathcal{B}^{(A)} with size less than or equal to Lmax(A)L_{\max}^{(A)}.999Many of these clusters are not necessary. For example, clusters between BSs that are geographically distant are not necessary to consider, as in a practical system no user would be assigned to these clusters. This type of practical observations can eliminate many cluster options for all RBs. In the following, to avoid cumbersome notation, while listing potential clusters, we only use the cluster size and the active BS set of the corresponding band to describe potential BS clusters. Table III is extended from Table II by adding the macro BS (BS #5) and considering other BSs as small cell BSs, i.e., m={5}\mathcal{B}_{m}\!=\!\{5\} and s={1,2,3,4}\mathcal{B}_{s}\!=\!\{1,2,3,4\}. Assume Lmax(1)=2,Lmax(2)=1L_{\max}^{(1)}\!=\!2,L_{\max}^{(2)}\!=\!1 and Lmax(3)=2L_{\max}^{(3)}\!=\!2 in this example. RBs bb and dd are in Band-11 (shared operation), RB ee is in Band-22 (macro only operation), and RBs aa and cc are in Band-33 (blanking operation). In RB bb, each BS is performing cellular transmission, while RB dd considers clusters of size 2 including both macro and small cell BSs. In RB ee, the macro BS (BS #5) serves users via cellular transmission. In RBs aa and cc, only small cells serve users while the macro BS is muted. In fact, clusters in RBs aa and cc are the same as in RBs #1 and #3 in Table II, respectively.

TABLE III: Example of RBs enabled by ALJTS over 4 small cell BSs (BSs 1-4) and 1 macro BS (BS 5).
RB BS 1 BS 2 BS 3 BS 4 (Macro) BS 5
Cluster Size 1 1 1 1 -
aa User Power 1/2 1/2 1/2 1/2 -
Served Users 1,2 3,4 5,6 7,8 -
Cluster Size 1 1 1 1 1
bb User Power 1/2 1/2 1/2 1/2 1/2
Served Users 1,2 3,4 5,6 7,8 9,10
Cluster Size 2 2 2 2 -
cc User Power 1/3 1/3 1/3 1/3 -
Served Users 1,2,3 1,4,5 2,4,6 3,5,6 -
Cluster Size 2 2 2 2 2
dd User Power 1/3 1/3 1/3 1/2 1/3
Served Users 1,2,3 1,4,5 2,4,6 3,7 5,6,7
Cluster Size - - - - 1
ee User Power - - - - 1/2
Served Users - - - - 1,2
Remark 1.

For any band, potential BS clusters are given by subsets (of appropriate size) of the active BS set on that band. As noted earlier, some subsets can be eliminated given the topology and constraints in the network. At the scheduling slot scale, the active BS clusters are determined by the scheduler that is operated by the central controller. Fig. 2 provides a flow diagram for the interactions between different entities, such as BSs, users and central controller and includes a load balancing and a scheduling unit. These interactions between different entities and the processing performed within different entities are discussed in the Sections III, VI and VII.

Refer to caption
Figure 2: Given the resource allocation done by the load balancer at the central controller, the scheduling is done at the scheduler unit in the central controller as shown in Fig. 2.

III Instantaneous and Long-term Rates

In this section, we provide proxy expressions for instantaneous rates and long-term rates (throughput) with either LZFBF or Maximum Ratio Transmission (MRT, also known as Conjugate Beamforming). We denote the transpose and conjugate transpose of matrices by ()T(\cdot)^{T} and ()H(\cdot)^{H}, respectively.

On a generic RB tt, we let gkj,i=βkjhkj,ig_{kj,i}=\sqrt{\beta_{kj}}h_{kj,i} be the channel between the ithi^{\rm th} transmit antenna of BS jj and any user kk, and it includes both slow fading βkj\beta_{kj} and fast fading hkj,ih_{kj,i}. The slow fading βkj\beta_{kj} characterizes the combined effect of distance-based path loss and location-based shadowing. Let 𝒦j(t)\mathcal{K}_{j}(t) be the set of users served by BS jj and Sj(t)=|𝒦j(t)|S_{j}(t)=|\mathcal{K}_{j}(t)| denote the number of users scheduled by BS jj on RB tt. The channel matrix between BS jj and its active users (the users scheduled by this BS at this RB) is denoted by 𝐆j\mathbf{G}_{j}, where the dimension of 𝐆j\mathbf{G}_{j} is Mj×Sj(t)M_{j}\!\times\!S_{j}(t). The mthm^{\rm th} column of 𝐆j\mathbf{G}_{j} corresponds to the channel of user k=kj(m)k=k_{j}(m) for some k{1,2,,𝒦}k\in\{1,2,\ldots,\mathcal{K}\}. That is, for a given mm we have [𝐆j]im=gkj,i[\mathbf{G}_{j}]_{im}=g_{kj,i}, with k=kj(m)k=k_{j}(m). This expression can also be interpreted in terms of the inverse mapping m=mj(k)m=m_{j}(k), that is, for a given mm we have [𝐆j]im=gkj,i[\mathbf{G}_{j}]_{im}=g_{kj,i}, with m=mj(k)m=m_{j}(k).

We assume each link experiences independent Rayleigh fading, i.e., 𝐡kj=[hkj,1,,hkj,Mj]T\mathbf{h}_{kj}=[h_{kj,1},\cdots,h_{kj,M_{j}}]^{T} are complex Gaussian i.i.d. random variables.101010The instantaneous user-rate expressions in this paper assume no spatial correlation in the user channels. In principle, instantaneous user-rate expressions can be developed for spatially correlated user channels with a given spatial correlation structure by using the method of deterministic equivalent [5]. The framework presented in Secs. IVVI, however, is not directly applicable with spatially correlated user channels, since a user’s instantaneous rate would in general depend on the spatial correlation of the other scheduled user channels. Although beyond the scope of this paper, extensions involving spatially correlated user channels is an area worth further investigation. We let 𝐅j\mathbf{F}_{j} denote the precoding matrix at BS jj with dimension Mj×Sj(t)M_{j}\!\times\!S_{j}(t), whose mthm^{\rm th} column 𝐟mj\mathbf{f}_{mj} is the beam (i.e., the precoding vector) for the mthm^{\rm th} user of BS jj, i.e., the user whose channel to the BS is given by the mm-th column of 𝐆j\mathbf{G}_{j}. The signal symbol of user kk is denoted by sks_{k}, where sks_{k} has unit energy. The thermal noise at user kk is denoted by wkw_{k}, which is assumed to be additive white Gaussian noise (AWGN) with variance σ2\sigma^{2}.

We consider a scheduling policy on RBs {1, 2,T}\{1,\,2\,\cdots,T\} and assume that all the large-scale coefficients stay fixed within this period. Any such scheduling policy can be described in terms of the scheduling sets {𝒰𝒞(t);𝒞,t{1, 2,T}}\{\mathcal{U}_{\mathcal{C}}(t);\forall\mathcal{C},\forall t\in\{1,\,2\,\cdots,T\}\}, where 𝒰𝒞(t)\mathcal{U}_{\mathcal{C}}(t) denotes the set of users served by cluster 𝒞\mathcal{C} on RB tt and 𝒞\forall\mathcal{C} denotes all of the possible cluster options considered for all bands taking into account active BS sets, cluster sizes and possible cluster selections/eliminations. Without loss of generality, we assume that RB tt is in Band-AA and 𝒞(A)\mathcal{C}\!\subset\!\mathcal{B}^{(A)}. Thus, the received signal at user k𝒰𝒞(t)k\in\mathcal{U}_{\mathcal{C}}(t) on RB tt is

yk(t)=\displaystyle y_{k}(t)= j𝒞PjSj(|𝒞|)𝐠kj(t)H𝐟mj(k)j(t)skdesired+j𝒞u(𝒞:j𝒞)𝒰𝒞(t),ukPjSj(|𝒞|)𝐠kj(t)H𝐟mj(u)j(t)suintra-cluster interference\displaystyle\underbrace{\sum_{j\in\mathcal{C}}\sqrt{\frac{P_{j}}{S_{j}(|\mathcal{C}|)}}\mathbf{g}_{kj}(t)^{H}\mathbf{f}_{m_{j}(k)j}(t)s_{k}}_{\text{desired}}+\underbrace{\sum_{j\in\mathcal{C}}\sum_{u\in\cup_{(\mathcal{C}^{\prime}:j\in\mathcal{C}^{\prime})}\mathcal{U}_{\mathcal{C}^{\prime}}(t),\;u\neq k}\sqrt{\frac{P_{j}}{S_{j}(|\mathcal{C}|)}}\mathbf{g}_{kj}(t)^{H}\mathbf{f}_{m_{j}(u)j}(t)s_{u}}_{\text{intra-cluster interference}} (1)
+l𝒞u(𝒞:l𝒞)𝒰𝒞(t)PlSl(|𝒞|)𝐠kl(t)H𝐟m(u)l(t)suinter-cluster interference+wknoise.\displaystyle+\underbrace{\sum_{l\notin\mathcal{C}}\sum_{u\in\cup_{(\mathcal{C^{\prime}}:l\in\mathcal{C^{\prime}})}\mathcal{U}_{\mathcal{C^{\prime}}}(t)}\sqrt{\frac{P_{l}}{S_{l}(|\mathcal{C^{\prime}}|)}}\mathbf{g}_{kl}(t)^{H}\mathbf{f}_{m_{\ell}(u)l}(t)s_{u}}_{\text{inter-cluster interference}}+\underbrace{w_{k}}_{\text{noise}}.

Adopting LZFBF, the precoding matrix at BS jj is 𝐅j=𝐆j(𝐆jH𝐆j)1𝐀j1/2\mathbf{F}_{j}\!=\!\mathbf{G}_{j}\left(\mathbf{G}_{j}^{H}\mathbf{G}_{j}\right)^{-1}\mathbf{A}_{j}^{1/2}, where 𝐀j\mathbf{A}_{j} is the normalizing coefficients matrix. Specifically, 𝐀j\mathbf{A}_{j} is a Sj(L)×Sj(L)S_{j}(L)\times S_{j}(L) diagonal matrix with the kkth diagonal element being ak,k=1[(𝐆jH𝐆j)1]k,ka_{k,k}=\frac{1}{\left[\left(\mathbf{G}_{j}^{H}\mathbf{G}_{j}\right)^{-1}\right]_{k,k}}, where [(𝐆jH𝐆j)1]k,k[\left(\mathbf{G}_{j}^{H}\mathbf{G}_{j}\right)^{-1}]_{k,k} denotes the kkth row and kkth column element of the matrix (𝐆jH𝐆j)1\left(\mathbf{G}_{j}^{H}\mathbf{G}_{j}\right)^{-1}. In this case, the intra-cluster interference is 0. With MRT, the precoding matrix at BS jj is 𝐅j\mathbf{F}_{j} with the mmth column being 𝐟mj=𝐠kj(m)j𝐠kj(m)j\mathbf{f}_{mj}\!=\!\frac{\mathbf{g}_{k_{j}(m)j}}{\|\mathbf{g}_{k_{j}(m)j}\|}. Note that the set of interfering BSs depends on the operation band.

III-A Instantaneous Rate

In this paper, we assume that each BS has available perfect CSI regarding the user terminals it serves.111111Massive MIMO rate calculations for general CoMP setting with practical TDD UL training for CSI acquisition can be found in [45]. The instantaneous rate expression in this manuscript can be updated accordingly taking into account the pilot contamination term and the training overhead as a special case of [45]. This was done for the cellular case in [15]. Let SjS_{j} denote the number of users served by BS jj in cellular transmission, with SjMjS_{j}\!\ll\!M_{j}. Under mild assumptions on fading, the user instantaneous rates on RB tt, rkj(t)r_{kj}(t), can be predicted a priori in the massive MIMO regime [15]. In particular, there exist deterministic quantities {rkj}\{r_{kj}\} such that rkj(t)a.s.rkjr_{kj}(t)\stackrel{{\scriptstyle{\rm a.s.}}}{{\rightarrow}}r_{kj}, k𝒰\forall k\!\in\!\mathcal{U} and j\forall j\!\in\!\mathcal{B}, as Mj,SjM_{j},S_{j}\!\rightarrow\!\infty, with fixed vj=Sj/Mj0v_{j}\!=\!S_{j}/M_{j}\!\geq\!0 [4, 5, 42]. This convergence is very fast with respect to MjM_{j}’s. Unlike general CoMP, where a user’s instantaneous rate depends on the other users co-scheduled on the same RB [45], the ALJTS makes a user’s instantaneous rate independent of the other users in the scheduling set. Let rk𝒞(A)(t)r_{k\mathcal{C}}^{(A)}(t) be the instantaneous rate of user kk from cluster 𝒞\mathcal{C} on RB tt in Band-AA. There exist deterministic quantities {rk𝒞(A)}\{r_{k\mathcal{C}}^{(A)}\} such that rk𝒞(A)(t)a.s.rk𝒞(A)r_{k\mathcal{C}}^{(A)}(t)\stackrel{{\scriptstyle{\rm a.s.}}}{{\rightarrow}}r_{k\mathcal{C}}^{(A)} as Mj,Sj(|𝒞|)M_{j},S_{j}(|\mathcal{C}|)\!\rightarrow\!\infty with vj=Sj(|𝒞|)/Mj0,j𝒞v_{j}\!=\!S_{j}(|\mathcal{C}|)/M_{j}\!\geq\!0,\forall j\in\mathcal{C}.

Using the techniques in [47, 45], we can show that the approximate instantaneous rate of user kk from cluster 𝒞(A)\mathcal{C}\!\subset\!\mathcal{B}^{(A)} in Band-AA using LZFBF is

rk𝒞(A)=log2(1+j𝒞l𝒞PjPlβkjβklbj(|𝒞|)bl(|𝒞|)σ2+l𝒞,l(A)Plβkl),r_{k\mathcal{C}}^{(A)}=\log_{2}\left(1+\frac{\sum_{j\in\mathcal{C}}\sum_{l\in\mathcal{C}}\sqrt{P_{j}P_{l}\beta_{kj}\beta_{kl}b_{j}(|\mathcal{C}|)b_{l}(|\mathcal{C}|)}}{\sigma^{2}+\sum_{l\notin\mathcal{C},l\in\mathcal{B}^{(A)}}P_{l}\beta_{kl}}\right), (2)

where bj(a)=MjSj(a)+1Sj(a)b_{j}(a)=\frac{M_{j}-S_{j}(a)+1}{S_{j}(a)}. Similarly, the approximate instantaneous rate of user kk from cluster 𝒞\mathcal{C} in Band-AA using MRT is

rk𝒞(A)=log2(1+j𝒞l𝒞PjPlMjMlβkjβklSj(|𝒞|)Sl(|𝒞|)σ2+Ik𝒞+l𝒞,l(A)Plβkl),r_{k\mathcal{C}}^{(A)}=\log_{2}\left(1+\frac{\sum_{j\in\mathcal{C}}\sum_{l\in\mathcal{C}}\sqrt{\frac{P_{j}P_{l}M_{j}M_{l}\beta_{kj}\beta_{kl}}{S_{j}(|\mathcal{C}|)S_{l}(|\mathcal{C}|)}}}{\sigma^{2}+I_{k\mathcal{C}}+\sum_{l\notin\mathcal{C},l\in\mathcal{B}^{(A)}}P_{l}\beta_{kl}}\right), (3)

where Ik𝒞=j𝒞(Sj(|𝒞|)1)Sj(|𝒞|)PjβkjI_{k\mathcal{C}}=\sum_{j\in\mathcal{C}}\frac{(S_{j}(|\mathcal{C}|)-1)}{S_{j}(|\mathcal{C}|)}P_{j}\beta_{kj} is the non-zero intra-cluster interference. Clearly, rk𝒞(A)=0r_{k\mathcal{C}}^{(A)}\!=\!0 if 𝒞(A)\mathcal{C}\!\not\subset\!\mathcal{B}^{(A)}.

Eqs. (2) and (3) assume that j𝒞\forall j\!\in\!\mathcal{C}, BS jj serves Sj(|𝒞|)S_{j}(|\mathcal{C}|) users and allocates Pj/Sj(|𝒞|)P_{j}/S_{j}(|\mathcal{C}|) fraction of its power to each user. In the case that fewer users are served by one of the BSs, (2) and (3) represent achievable lower-bound instantaneous rates.

It is worth noting that LJT provides instantaneous-rate improvements at the cell-edge with respect to cellular, as it replaces the intra-cell BF gain provided by the cellular schemes with intra-cluster BF gain. Also, given that macro BSs do not transmit in Band-33 (i.e., the blanking operation), users in small cells benefit from larger SINRs in Band-33, as there is no interference from macro BSs to small cell users in this band.

III-B Long-term Rates

As discussed in [15], users can be served (at distinct scheduling instances) by more than one BS in massive MIMO networks in cellular transmission. Similarly, users can be served by different clusters on different RBs in ALJTS. Let xk𝒞(A)=limT|{t:1tT,k𝒰𝒞(A)(t),t in Band-A}|Tx_{k\mathcal{C}}^{(A)}\!=\!\lim_{T\rightarrow\infty}\frac{|\{t:1\leq t\leq T,\ k\in\mathcal{U}_{\mathcal{C}}^{(A)}(t),\ t\textrm{ in Band-}A\}|}{T} denote the activity fraction of user kk to cluster 𝒞\mathcal{C} in Band-AA, that is, the fraction of resources allocated by cluster 𝒞\mathcal{C} to user kk in Band-AA. The activity fraction xk𝒞(A)x_{k\mathcal{C}}^{(A)} is a real number showing the fraction of RBs (averaged over many slots within which the load balancing is considered) where user-kk is served by cluster-𝒞\mathcal{C} in Band-AA. If xk𝒞(A)>0x_{k\mathcal{C}}^{(A)}\!>\!0, user kk is served by cluster 𝒞\mathcal{C} in Band-AA. We obtain the long-term rate similar to [15], using instantaneous rates and activity fractions from the scheduling policy. In particular, in the limit TT\to\infty, the long-term rate of user kk equals121212Convergence to the limiting expressions of interest is very quick [15].

Rk=A=13𝒞:𝒞(A)|𝒞|Lmax(A)xk𝒞(A)rk𝒞(A).R_{k}=\sum_{A=1}^{3}\sum_{\begin{subarray}{c}\mathcal{C}:\mathcal{C}\subset\mathcal{B}^{(A)}\\ |\mathcal{C}|\leq L_{\max}^{(A)}\end{subarray}}x_{k\mathcal{C}}^{(A)}\,r_{k\mathcal{C}}^{(A)}. (4)

IV Unified NUM Problem Formulation

While ALJTS allows for varying cluster sizes within an RB as revealed in Table II, in the sequel we specialize to the tractable case of practical interest involving equal-size clusters on each RB. Consequently, the following framework allows for cluster options as seen in RBs #1-3 but not for RB #4 in Table II. We call this new, additionally constrained, scheme the Uniform Cluster-Size scheme (UCS).131313 Unlike UCS, when different size clusters are allowed to operate together, great care in the scheduling design must be given to avoid overlapping clusters with different sizes to operate in the same RB. This makes scheduling very complicated. The considered UCS provides a lower bound on the performance compared to more general ALJTS, which can serve a useful benchmark.

Definition 2.

Uniform Cluster-Size Scheme (UCS): ALJTS is a UCS if

  1. (1)

    μA\mu_{A} fraction of RBs is allocated to Band-AA, with AμA1\sum_{A}\mu_{A}\leq 1;

  2. (2)

    For each Band-AA, λAL\lambda_{AL} fraction of RBs is allocated to size-LL clusters for 1LLmax(A)1\leq L\leq L_{\max}^{(A)}, with L=1Lmax(A)λALμA\sum_{L=1}^{L_{\max}^{(A)}}\lambda_{AL}\leq\mu_{A};

  3. (3)

    on any RB in the λAL\lambda_{AL} fraction, the scheduled users are served by (user-dependent) clusters of the same size LL and these clusters are formed by BSs in (A)\mathcal{B}^{(A)};

  4. (4)

    on any RB in the λAL\lambda_{AL} fraction, each BS does not serve more than Sj(L)S_{j}(L) users.

Then LJT designs considered in the rest of the paper are all Uniform Cluster-Size Schemes. RBs allocated to serving size-LL clusters in Band-AA comprise what we call the LthL^{\rm th} subband of Band-AA. The NUM problem for the UCS optimizes activity fractions and subband/band allocations is as follows:

maxλAL,xk𝒞(A),μA\displaystyle\max\limits_{\lambda_{AL},x_{k\mathcal{C}}^{(A)},\mu_{A}}\ k𝒰U(A=13𝒞:𝒞(A),|𝒞|Lmax(A)xk𝒞(A)rk𝒞(A))\displaystyle\sum_{k\in\mathcal{U}}U\left(\sum_{A=1}^{3}\sum_{\begin{subarray}{c}\mathcal{C}:\mathcal{C}\subset\mathcal{B}^{(A)},\\ |\mathcal{C}|\leq L_{\max}^{(A)}\end{subarray}}x_{k\mathcal{C}}^{(A)}r_{k\mathcal{C}}^{(A)}\right) (5a)
s.t. 𝒞:𝒞(A),j𝒞,|𝒞|=Lk𝒰xk𝒞(A)Sj(L)λAL,j(A),LLmax(A),A,\displaystyle\sum_{\begin{subarray}{c}\mathcal{C}:\mathcal{C}\subset\mathcal{B}^{(A)},\\ j\in\mathcal{C},|\mathcal{C}|=L\end{subarray}}\frac{\sum_{k\in\mathcal{U}}x_{k\mathcal{C}}^{(A)}}{S_{j}(L)}\leq\lambda_{AL},\ \forall j\in\mathcal{B}^{(A)},\forall L\leq L_{\max}^{(A)},\forall A, (5b)
𝒞:|𝒞|=L,𝒞(A)xk𝒞(A)λAL,k𝒰,LLmax(A),A,\displaystyle\sum_{\mathcal{C}:|\mathcal{C}|=L,\mathcal{C}\subset\mathcal{B}^{(A)}}x_{k\mathcal{C}}^{(A)}\leq\lambda_{AL},\ \forall k\in\mathcal{U},\forall L\leq L_{\max}^{(A)},\forall A, (5c)
L=1Lmax(A)λALμA,A,\displaystyle\sum_{L=1}^{L_{\max}^{(A)}}\lambda_{AL}\leq\mu_{A},\forall A, (5d)
A=13μA1,\displaystyle\sum_{A=1}^{3}\mu_{A}\leq 1, (5e)
xk𝒞(A),λAL,μA0,k𝒰,𝒞,LLmax(A),A,\displaystyle x_{k\mathcal{C}}^{(A)},\lambda_{AL},\mu_{A}\geq 0,\ \forall k\in\mathcal{U},\forall\mathcal{C},\forall L\leq L_{\max}^{(A)},\forall A, (5f)

where the utility function U()U(\cdot) is a continuously differentiable, monotonically increasing, and strictly concave function [48]. Constraint (5b) signifies that the total activity fractions allocated by BS jj in clusters of size LL in Band-AA cannot exceed the total available resources λALSj(L)\lambda_{AL}S_{j}(L). On the other hand, recalling that each user cannot be served by multiple clusters on the same RBs, (5c) signifies that the fraction of RBs over which user kk is served by clusters of size LL in Band-AA cannot exceed RBs allocated to the clusters of size LL in Band-AA, λAL\lambda_{AL}. (5d) ensures that the total resources allocated to the subbands in Band-AA are no more than the resources allocated to that band. Finally, (5e) signifies the fact that the summation of resources allocated to different bands is equal to all available resources.

Remark 2.

The formulation (5) is quite flexible. It can be applied not only to the scenarios that optimize the resource allocation among operations, but also to the scenarios where resources given to each operation are fixed a priori by setting the corresponding μA\mu_{A} values to constants in (5). The cellular transmission [15] can be recovered as a special case of (5) by setting A=1A=1 and Lmax(1)=1L_{\max}(1)=1.

Any concave function (e.g., general α\alpha-fairness, [49]) can be applied to the problem formulation (5). The formulated optimization is convex as long as the utility function is concave [50]. In this paper, among various concave utility functions available in the literature, we work with the logarithmic utility which is also known as “proportional fairness” [14, 33, 15].141414There are many options for the utility function such as (weighted) arithmetic mean, geometric mean, max-min fairness. Each option can be relevant for certain scenario and interest. Both arithmetic mean and max-min fairness have their shortcomings [39, Chapter 1]. Geometric mean (aka proportional fairness), promotes a trade-off of user-rates between the other two utility functions. The “log\log” function in the utility ensures diminishing returns for individual users rates as they get higher. This de-motivates the optimization to give high rates to a few users. General numerical solvers (e.g., CVX) can be used. Since CVX is not well-suited for large instances [51], we alternatively propose an efficient algorithm that can be applied to large networks in the next section and the complexity difference between the proposed algorithm and a general numerical solver is investigated in Appendix C.

V Dual Subgradient Based Algorithm

In this section, we propose an efficient algorithm based on the dual subgradient method [50]. We let νjL(A)\nu_{jL}^{(A)} and θkL(A)\theta_{kL}^{(A)} be the Lagrange multipliers corresponding to (5b) and (5c), respectively. The dual problem of (5) is minνjL(A),θkL(A)0k𝒰fk(νjL(A),θkL(A))+g(νjL(A),θkL(A)),\min\limits_{\nu_{jL}^{(A)},\theta_{kL}^{(A)}\geq 0}\ \sum_{k\in\mathcal{U}}f_{k}\left(\nu_{jL}^{(A)},\theta_{kL}^{(A)}\right)+g\left(\nu_{jL}^{(A)},\theta_{kL}^{(A)}\right), where

fk(νjL(A),θkL(A))=maxxk𝒞(A)0\displaystyle f_{k}\left(\nu_{jL}^{(A)},\theta_{kL}^{(A)}\right)=\max\limits_{x_{k\mathcal{C}}^{(A)}\geq 0} log(A=13𝒞:𝒞(A),|C|Lmax(A)xk𝒞(A)rk𝒞(A))A=13L=1Lmax(A)𝒞:𝒞(A),|𝒞|=Lj:j𝒞νjL(A)Sj(L)xk𝒞(A)\displaystyle\log\left(\sum_{A=1}^{3}\sum_{\begin{subarray}{c}\mathcal{C}:\mathcal{C}\subset\mathcal{B}^{(A)},\\ |C|\leq L_{\max}^{(A)}\end{subarray}}x_{k\mathcal{C}}^{(A)}r_{k\mathcal{C}}^{(A)}\right)-\sum_{A=1}^{3}\sum_{L=1}^{L_{\max}^{(A)}}\sum_{\begin{subarray}{c}\mathcal{C}:\mathcal{C}\subset\mathcal{B}^{(A)},\\ |\mathcal{C}|=L\end{subarray}}\sum_{j:j\in\mathcal{C}}\frac{\nu_{jL}^{(A)}}{S_{j}(L)}x_{k\mathcal{C}}^{(A)} (6)
A=13L=1Lmax(A)θkL(A)𝒞:𝒞(A),|𝒞|=Lxk𝒞(A),\displaystyle-\sum_{A=1}^{3}\sum_{L=1}^{L_{\max}^{(A)}}\theta_{kL}^{(A)}\sum_{\mathcal{C}:\mathcal{C}\subset\mathcal{B}^{(A)},|\mathcal{C}|=L}x_{k\mathcal{C}}^{(A)},

and

g(νjL(A),θkL(A))=maxL=1Lmax(A)λALμA,A=13μA1A=13L=1Lmax(A)(j:j(A)νjL(A)+k𝒰θkL(A))λAL.g(\nu_{jL}^{(A)},\theta_{kL}^{(A)})=\max\limits_{\begin{subarray}{c}\sum_{L=1}^{L_{\max}^{(A)}}\lambda_{AL}\leq\mu_{A},\\ \sum_{A=1}^{3}\mu_{A}\leq 1\end{subarray}}\sum_{A=1}^{3}\sum_{L=1}^{L_{\max}^{(A)}}\left(\sum_{j:j\in\mathcal{B}^{(A)}}\nu_{jL}^{(A)}+\sum_{k\in\mathcal{U}}\theta_{kL}^{(A)}\right)\lambda_{AL}. (7)

The constraints of (5) satisfy the Slater condition [50], and thus strong duality holds (i.e., the dual problem and the original problem (5) have the same optimal value).

V-A The Dual Subgradient Method

The optimization problem (6) has the closed-form optimal solution

xk𝒞(A)={1L:L=|𝒞|(j:j𝒞νjL(A)/Sj(L)+θkL(A)),if {𝒞,A}={𝒞,A},0,otherwise,x_{k\mathcal{C}}^{(A)}=\begin{cases}\frac{1}{\sum_{L:L=|\mathcal{C}|}\left(\sum_{j:j\in\mathcal{C}}\nu_{jL}^{(A)}/S_{j}(L)+\theta_{kL}^{(A)}\right)},&\text{if }\{\mathcal{C},A\}=\{\mathcal{C}^{*},A^{*}\},\\ 0,&\text{otherwise},\end{cases} (8)

where {𝒞,A}=argmax𝒞,Ark𝒞(A)L:L=|𝒞|(j:j𝒞νjL(A)/Sj(L)+θkL(A)).\{\mathcal{C}^{*},A^{*}\}=\arg\max_{\mathcal{C},A}\frac{r_{k\mathcal{C}}^{(A)}}{\sum_{L:L=|\mathcal{C}|}\left(\sum_{j:j\in\mathcal{C}}\nu_{jL}^{(A)}/S_{j}(L)+\theta_{kL}^{(A)}\right)}.151515If we have multiple pairs of {𝒞,A}\{\mathcal{C}^{*},A^{*}\}, we just randomly pick one pair.

The problem (7) is an LP and one optimal solution is161616If we have multiple {A,L}\{A,L\} pairs that maximize the j:j(A)νjL(A)+k𝒰θkL(A)\sum_{j:j\in\mathcal{B}(A)}\nu_{jL}^{(A)}+\sum_{k\in\mathcal{U}}\theta_{kL}^{(A)}, we just randomly pick one.

λAL={1, if {A,L}=argmaxA,Lj:j(A)νjL(A)+k𝒰θkL(A),0, otherwise,\lambda_{AL}=\left\{\begin{aligned} &1,\text{ if }\{A,L\}=\arg\max_{A^{\prime},L^{\prime}}\sum_{j:j\in\mathcal{B}(A^{\prime})}\nu_{jL^{\prime}}^{(A^{\prime})}+\sum_{k\in\mathcal{U}}\theta_{kL^{\prime}}^{(A^{\prime})},\\ &0,\text{ otherwise},\end{aligned}\right. (9)
μA={1, if there exists a band A such that the above λAL>0,0, otherwise.\mu_{A}=\left\{\begin{aligned} &1,\text{ if there exists a band $A$ such that the above }\lambda_{AL}>0,\\ &0,\text{ otherwise}.\end{aligned}\right. (10)

The ttth iteration of the algorithm is as follows.

  1. 1.

    Update the activity fractions by (8).

  2. 2.

    Update resource allocation for different bands and clusters by (9) and (10).

  3. 3.

    Update the Lagrangian multipliers by

    νjL(A)(n+1)=[νjL(A)(n)δ(n)(λAL(n)𝒞:𝒞(A),j𝒞,|𝒞|=Lk𝒰xk𝒞(A)(n)Sj(L))]+,\nu_{jL}^{(A)}(n+1)=\left[\nu_{jL}^{(A)}(n)-\delta(n)\left(\lambda_{AL}(n)-\sum_{\begin{subarray}{c}\mathcal{C}:\mathcal{C}\subset\mathcal{B}(A),\\ j\in\mathcal{C},|\mathcal{C}|=L\end{subarray}}\frac{\sum_{k\in\mathcal{U}}x_{k\mathcal{C}}^{(A)}(n)}{S_{j}(L)}\right)\right]^{+}, (11)

    and

    θkL(A)(n+1)=θkL(A)(n)δ(n)(λAL(n)𝒞:𝒞(A),|𝒞|=Lxk𝒞(A)),\theta_{kL}^{(A)}(n+1)=\theta_{kL}^{(A)}(n)-\delta(n)\left(\lambda_{AL}(n)-\sum_{\mathcal{C}:\mathcal{C}\subset\mathcal{B}(A),|\mathcal{C}|=L}x_{k\mathcal{C}}^{(A)}\right), (12)

    where [z]+=max{z,0}[z]^{+}=\max\{z,0\} and δ(n)\delta(n) is the stepsize at the nthn^{\rm th} iteration.

By adding redundant constraints xk𝒞(A)1x_{k\mathcal{C}}^{(A)}\!\leq\!1 and choosing an appropriate stepsize (e.g, a diminishing stepsize δ(n)=an+b\delta(n)\!=\!\frac{a}{n+b}, where aa and bb are some positive scalars), the subgradients can be bounded. This allows us leveraging Prop. 6.3.4. in [50] to show the convergence of the dual subgradient algorithm. The detailed steps for the algorithm with redundant constraints can be found in Appendix D.

V-B Finding the Optimal Primal Solutions Given the Optimal Dual Variables

Note that the objective function of (5) is not strictly convex and we may have multiple optimal solutions. In this case, given the optimal dual variables, it is generally difficult to find the optimal primal solutions that satisfy the KKT conditions. However, by exploring the structure of (5) as follows, we propose to obtain the optimal primal solutions by solving a small-size LP.

The optimal long-term rate Rk=A=13𝒞:𝒞(A)xk𝒞(A)rk𝒞(A)R_{k}^{*}\!=\!\sum_{A=1}^{3}\sum_{\mathcal{C}:\mathcal{C}\subset\mathcal{B}^{(A)}}x_{k\mathcal{C}}^{*(A)}r_{k\mathcal{C}}^{(A)} in (5) is unique, since the function log(Rk)\log(R_{k}) is strictly concave with respect to RkR_{k}. KKT conditions of problem (5) imply

Rkrk𝒞(A)j:j𝒞νj|𝒞|(A)/Sj(|𝒞|)+θk|𝒞|(A).R_{k}\geq\frac{r_{k\mathcal{C}}^{(A)}}{\sum_{j:j\in\mathcal{C}}\nu_{j|\mathcal{C}|}^{(A)}/S_{j}(|\mathcal{C}|)+\theta_{k|\mathcal{C}|}^{(A)}}. (13)

Thus, given the optimal dual variables, the unique optimal rate can be easily obtained by Rk=max𝒞,A{rk𝒞(A)j:j𝒞νj|𝒞|(A)/Sj(|𝒞|)+θk|𝒞|(A)}R_{k}^{*}=\max_{\mathcal{C},A}\left\{\frac{r_{k\mathcal{C}}^{(A)}}{\sum_{j:j\in\mathcal{C}}\nu_{j|\mathcal{C}|}^{(A)}/S_{j}(|\mathcal{C}|)+\theta_{k|\mathcal{C}|}^{(A)}}\right\}. We observe from (13) that in the optimal solutions, each user only has positive activity fractions xk𝒞(A)x_{k\mathcal{C}}^{(A)} to clusters providing the maximum term of the right-hand side of (13). Based on this conclusion, we propose the following LP, whose size is reduced by only focusing on the positive xk𝒞(A)x_{k\mathcal{C}}^{(A)} obtained from (13).

maxη,x,λ\displaystyle\max_{\eta,x,\lambda} η\displaystyle\eta (14)
s.t. ηA=13𝒞(A)xk𝒞(A)rk𝒞(A)Rk,k𝒰,\displaystyle\eta\leq\sum_{A=1}^{3}\sum_{\mathcal{C}\subset\mathcal{B}^{(A)}}\frac{x_{k\mathcal{C}}^{(A)}r_{k\mathcal{C}}^{(A)}}{R_{k}^{*}},\ \forall k\in\mathcal{U},
(5b)(5f).\displaystyle(\ref{eq:opt-ct-cluster-cvx})-(\ref{eq:opt-ct-positive-cvx}).
Proposition 1.

Given that RkR_{k}^{*} is the exact optimal rate of (5), the solution of (14) is the same as the optimal solution of problem (5).

Proof.

Similar techniques in the proof of Lemma 1 in [15] can be used to complete this proof. ∎

Prop. 1 implies that we can obtain the solutions of (5) given the optimal dual variables. Though we can show the convergence of the dual subgradient algorithm by adding redundant constraints, there may exist a small gap between the obtained dual variables and the optimal ones, due to the numerical precision or the limit on the number of iterations. Exploiting the well-behaved structure of (14), i.e., finite coefficients and a bounded feasible set [15], it is expected that the solution of (14) is near optimal in the presence of a small gap between the obtained dual variables and the optimal ones.

Empirical evidence reveals that in a heavily loaded network, where constraints (5c) are inactive (i.e., 𝒞:|𝒞|=L,𝒞(A)xk𝒞(A)<λAL\sum_{\begin{subarray}{c}\mathcal{C}:|\mathcal{C}|=L,\\ \mathcal{C}\subset\mathcal{B}^{(A)}\end{subarray}}x_{k\mathcal{C}}^{(A)}\!<\!\lambda_{AL}), most users are uniquely served by one cluster on each subband. Insight regarding this observation can be obtained by examining KKT conditions of (5) as follows.

Proposition 2.

For a given Band-AA and a cluster size LL, if (5c) are inactive k𝒰\forall k\in\mathcal{U}, the number of users that are served by multiple BS clusters on RBs allocated to LthL^{\rm th} subband of Band-AA is at most N𝒞L(A)1N_{\mathcal{C}L}^{(A)}\!-\!1, where N𝒞L(A)N_{\mathcal{C}L}^{(A)} is the number of clusters in the LthL^{\rm th} subband of Band-AA.

Proof.

See Appendix B. ∎

Prop. 2 implies that the optimal user associations in each subband are mostly unique. We call the users served by more than one cluster on any subband as “fractional users”. Note that Prop. 2 provides an upper bound (i.e., N𝒞L(A)N_{\mathcal{C}L}^{(A)}) on the number of fractional users, while simulations show a much smaller number of fractional users (less than 3.5%K3.5\%K in Sec. VII). Recall that the dual subgradient algorithm determines the set of positive xk𝒞(A)x_{k\mathcal{C}}^{(A)} of users to their cluster-band pairs {𝒞,A}\{\mathcal{C}^{*},A^{*}\}, while the rest of activity fractions are zero. Thus, unknown activity fractions that needs to be solved via (14) are only the positive activity fractions. Based on Prop. 2, most users (with unique association) have at most one positive activity fraction on any subband. Thus, the size of (14) is significantly reduced, implying the efficiency of the proposed algorithm. Further details on the algorithm complexity can be found in Appendix C.

In summary, Proposition 1 has revealed the optimality of the specific method proposed in Sec. V-B to obtain the primal variables given the dual variables. Furthermore, the analysis of the number of iterations required for convergence of the proposed algorithm and its complexity reveal the efficiency of the proposed algorithm with respect to its application to large network instances. Unlike the cellular case [15], it is not a priori known whether the NUM solution can be implemented via any scheduler or not. The implementation of NUM solutions is discussed below.

VI Scheduling

In this section, we develop scheduling policies that yield activity fractions closely matching the NUM solution. Scheduling is done independently and in parallel for each band. As seen in Fig. 2, a scheduler at a central controller can collect the needed scheduling information and schedule the users according to the proposed scheduling scheme independently for each band (i.e., for each operation option). Considering a scheduling policy for Band-AA and letting L(t)L(t) be the cluster size in RB tt, we define the feasible scheduling policy as follows.

Definition 3.

Feasible Schedule: A scheduling policy {𝒰𝒞(t);,𝒞(A),|𝒞|Lmax(A),t in Band-A}\left\{\mathcal{U}_{\mathcal{C}}(t);\ ,\forall\mathcal{C}\subset\mathcal{B}^{(A)},|\mathcal{C}|\leq L_{\max}^{(A)},\forall t\text{ in Band-}A\right\} is feasible with respect to the UCS based on Defn. 2, if it satisfies the following:

  1. (i)

    For each tt, the policy assigns RB tt to clusters with 𝒞(A)\mathcal{C}\!\subset\!\mathcal{B}^{(A)} and |𝒞|=L(t)|\mathcal{C}|\!=\!L(t) in Band-AA; that is, for each cluster 𝒞\mathcal{C} with 𝒰𝒞(t)\mathcal{U}_{\mathcal{C}}(t) being non-empty, we have 𝒞(A)\mathcal{C}\!\subset\!\mathcal{B}^{(A)} and |𝒞|=L(t)|\mathcal{C}|\!=\!L(t).

  2. (ii)

    For each tt, each user is served by at most one cluster; that is, |𝒞 1{k𝒰𝒞(A)(t)}|1|\sum_{\mathcal{C}\subset\mathcal{B}}\,\mathbbm{1}\{k\in\mathcal{U}^{(A)}_{\mathcal{C}}(t)\}|\leq 1.

  3. (iii)

    For each tt in Band-AA and for each BS j(A)j\in\mathcal{B}^{(A)}, BS jj serves at most Sj(L(t))S_{j}(L(t)) users; that is, |𝒞:j𝒞,𝒞(A)𝒰𝒞(t)|Sj(L(t))\left|\cup_{\mathcal{C}:j\in\mathcal{C},\mathcal{C}\subset\mathcal{B}^{(A)}}\mathcal{U}_{\mathcal{C}}(t)\right|\leq S_{j}(L(t)).

VI-A The Feasibility of the NUM Solution in Implementation

It is easy to verify that {xk𝒞(A)}\left\{x_{k\mathcal{C}}^{(A)}\right\} yielded by any feasible schedules defined by Defn. 3 satisfy (5b)-(5f). In fact, when Lmax(A)=1L_{\max}^{(A)}\!=\!1 (i.e., cellular cases), there exists at least one feasible schedule that can provide long-term activity fractions approaching the solution of (5) [15]. However in the general case Lmax>1L_{\max}>1 this is not necessarily true. For instance, for networks with cluster combinations {j1,j2}\{j_{1},j_{2}\}, {j1,j3}\{j_{1},j_{3}\} and {j2,j3}\{j_{2},j_{3}\}, where j1,j2j_{1},j_{2} and j3j_{3} are BS indexes, there exist {xk𝒞(A)}\left\{x_{k\mathcal{C}}^{(A)}\right\} satisfying (5b)–(5f), for which no feasible schedule of Defn. 3 exists.

Theorem 1.

In the UCSs with Lmax(A)>1L_{\max}^{(A)}>1 in some Band-AA and with the type of cluster combinations {j1,j2}\{j_{1},j_{2}\}, {j1,j3}\{j_{1},j_{3}\} and {j2,j3}\{j_{2},j_{3}\}, where j1,j2j_{1},j_{2} and j3j_{3} are BSs in (A)\mathcal{B}^{(A)}, there exist some activity fractions satisfying (5b)-(5f) that cannot be implemented by any feasible schedule in Defn. 3.

Proof.

See Appendix A. ∎

Hence, the coarser time-scale NUM problem (5) does not capture the finer time-scale constraints associated with feasible schedulers. Although, in general, (5) provides an upper bound on the network performance, as we show next, using activity fractions that are the solution to (5), we can design scheduling policies, whose performance is close to the utility provided by the solution to (5).

VI-B Virtual Queue Based Scheduling Scheme

We next present scheduling policies for the UCS architecture comprised of A=13Lmax(A)\sum_{A=1}^{3}L_{\max}(A) parallel schedulers, one per each subband. We describe a method for scheduling users over the RBs from the λAL>0\lambda_{AL}>0 fraction of RBs dedicated to clusters of size LL in band-AA.

Given the limited number of fractional users per cluster size LL, the scheduler approximates the optimal {xk𝒞(A)}\{x_{k\mathcal{C}}^{(A)}\} by unique association activity fractions, {x~k𝒞(A)}\{\tilde{x}_{k\mathcal{C}}^{(A)}\}, given by

x~k𝒞(A)={xk𝒞(A)if 𝒞=𝒞(k)0otherwise,\tilde{x}_{k\mathcal{C}}^{(A)}=\begin{cases}x_{k\mathcal{C}}^{(A)}&\text{if $\mathcal{C}=\mathcal{C}^{*}(k)$}\\ 0&\text{otherwise}\end{cases}, (15)

with 𝒞(k)=argmax𝒞:|𝒞|=L,𝒞(A)xk𝒞(A)\mathcal{C}^{*}(k)=\operatorname*{arg\,max}_{\mathcal{C}:\ |\mathcal{C}|=L,\mathcal{C}\subset\mathcal{B}^{(A)}}x_{k\mathcal{C}}^{(A)}. Letting 𝒰𝒞(A)\mathcal{U}_{\mathcal{C}}^{(A)} denote the users for which x~k𝒞(A)>0\tilde{x}_{k\mathcal{C}}^{(A)}>0, we have 𝒰𝒞(A)𝒰𝒞(A)=\mathcal{U}_{\mathcal{C}}^{(A)}\cap\mathcal{U}_{\mathcal{C}^{\prime}}^{(A)}\!=\!\emptyset for all 𝒞𝒞\mathcal{C}\!\neq\!\mathcal{C}^{\prime} with |𝒞|=|𝒞||\mathcal{C}|\!=\!|\mathcal{C}^{\prime}|. We also let 𝒰(AL)=𝒞:|𝒞|=L𝒰𝒞(A)\mathcal{U}^{(AL)}\!=\!\cup_{\mathcal{C}:\,|\mathcal{C}|\!=\!L}\,\mathcal{U}_{\mathcal{C}}^{(A)} denote the set of users that receive non-zero activity fractions from clusters of size LL in Band-AA. In the rest of this section, we focus on clusters 𝒞\mathcal{C} satisifying |𝒞|=L|\mathcal{C}|\!=\!L and 𝒞(A)\mathcal{C}\!\in\!\mathcal{B}^{(A)}, unless otherwise specified.

To assign user kk a fraction of RBs close to the desired fraction in the LthL^{\rm th} subband of Band-AA, i.e., αk=x~k𝒞(k)(A)/λAL\alpha_{k}\!=\!\tilde{x}_{k\mathcal{C}^{*}(k)}^{(A)}/\lambda_{AL}, we consider a max-min scheduling policy based on virtual queues (VQ), which assumes user kk receives rate R~k=1/αk\tilde{R}_{k}\!=\!1/\alpha_{k} when user kk is scheduled for transmission by cluster 𝒞(k)\mathcal{C}^{*}(k) (i.e., k𝒰𝒞(k)(A)(t)k\in\mathcal{U}^{(A)}_{\mathcal{C}^{*}(k)}(t)). The cluster-size LL scheduler performs at each tt a weighted sum rate maximization (WSRM) of the form [52]:

max𝒰~𝒰(AL)\displaystyle\max_{\tilde{\mathcal{U}}\subseteq\mathcal{U}^{(AL)}}\ k𝒰~Qk(t)R~k,\displaystyle\ \sum_{k\in\tilde{\mathcal{U}}}Q_{k}(t)\tilde{R}_{k}, (16a)
s.t. k𝒰~𝟙{j𝒞(k)}Sj(L),j,\displaystyle\ \ \sum_{k\in\tilde{\mathcal{U}}}\mathbbm{1}\{j\in\mathcal{C}^{*}(k)\}\leq S_{j}(L),\ \ \forall j\in\mathcal{B}, (16b)
where the weight of user kk at time tt, Qk(t)Q_{k}(t), is the VQ length of user kk at time tt. For max-min fairness [52], Qk(t)Q_{k}(t) is updated by Qk(t+1)=max{0,Qk(t)R~k(t)}+Ak(t),Q_{k}(t+1)=\max\{0,Q_{k}(t)-\tilde{R}_{k}(t)\}+A_{k}(t), where
R~k(t)={R~kif user k is scheduled at time t0otherwise,\tilde{R}_{k}(t)=\begin{cases}\tilde{R}_{k}&\text{if user $k$ is scheduled at time $t$}\\ 0&\text{otherwise}\end{cases}, (16c)
Ak(t)={Amaxif V>kQk(t)0otherwise,A_{k}(t)=\begin{cases}A_{\rm max}&\text{if $V>\sum_{k}Q_{k}(t)$}\\ 0&\text{otherwise}\end{cases}, (16d)

with AmaxA_{\max} and VV chosen sufficiently large [52]. Note that in the absence of constraints (16b), the max-min scheduler (16) schedules user kk the desired fraction of RBs, αk\alpha_{k}.

Scheduling via (16) is impractical, as it amounts to solving for each RB tt an integer linear program (16). A number of heuristic algorithms can be used to provide feasible (though generally suboptimal) solutions to (16). In this paper, we consider a rudimentary greedy algorithm. Letting KAL=|𝒰(AL)|K_{AL}\!=\!|\mathcal{U}^{(AL)}| be the total number of users to be served by clusters of size LL, the greedy algorithm for size-LL clusters at time tt operates as follows:

  1. 1.

    Determine a user order π(k)\pi(k), where Qπ(k)(t)R~π(k)Qπ(k+1)(t)R~π(k+1)Q_{\pi(k)}(t)\tilde{R}_{\pi(k)}\geq Q_{\pi(k+1)}(t)\tilde{R}_{\pi(k+1)} for all k𝒰(AL)k\in\mathcal{U}^{(AL)}.

  2. 2.

    Initialization: k=1k=1, and 𝒰~=\tilde{\mathcal{U}}=\emptyset.

  3. 3.

    If the user set 𝒰~{π(k)}\tilde{\mathcal{U}}\cup\{\pi(k)\} satisfies all the constraints in (16b), set 𝒰~=𝒰~{π(k)}\tilde{\mathcal{U}}=\tilde{\mathcal{U}}\cup\{\pi(k)\}.

  4. 4.

    If k<KALk<K_{AL}, set k=k+1k=k+1 and go to step 3.

  5. 5.

    Output 𝒰~\tilde{\mathcal{U}} as the scheduling user set for size-LL clusters in Band-AA at time tt.

VII Performance Evaluation

In this section, we present a simulation-based evaluation based on the “wrap-around” layout in Fig. 3. We also present the simulation results with the network deployment including more hexagonal modeled macrocells and non-uniformly distributed users (based on 3GPP layout in TR 36.872).171717We assume full-buffer traffic model, while the study of more general traffic models are left for future work.

The parameters used for the layout in Fig. 3 are given as follows unless otherwise specified. There are 4 macros with Mj=100M_{j}\!=\!100 and Sj(|𝒞|)=max{10ρ|C|,10}S_{j}(|\mathcal{C}|)\!=\!\max\{10\rho|C|,10\}, and 32 small cell BSs with Mj=40M_{j}\!=\!40 and Sj(|𝒞|)=max{4ρ|C|,4}S_{j}(|\mathcal{C}|)\!=\!\max\{4\rho|C|,4\}, where ρ\rho is a tunable parameter in [0,1][0,1]. There is 1 small cell BS at the center of each white square, while 3 small cell BSs being dropped uniformly within each shaded square (hotspot). Also, 15 and 90 single-antenna users are dropped uniformly in each white and shaded square, respectively. The macro and small cell BS transmit powers are 46dBm and 35dBm, respectively. The path-loss for macro-user links and small cell BS-user links are 128.1+37.6log10d128.1\!+\!37.6\log_{10}d and 140.7+36.7log10d140.7\!+\!36.7\log_{10}d, respectively, with the distance dd in km. The noise power spectral density is 174-174 dBm/Hz.

We consider three distinct macro-small cell resource sharing scenarios: (i) the shared scenario with macros and small cell BSs transmitting on the same RBs – operations with A=1A\!=\!1; (ii) the orthogonal scenario with macros and small cell BSs transmitting on different bands – operations with A{2,3}A\!\in\!\{2,3\}, where we provide macros (Band-22) 20% RBs as an illustrative example; (iii) RB blanking with macros muted on certain RBs – operations with A{1,3}A\!\in\!\{1,3\}. Note that, although we can jointly optimize the resource partition (i.e., μA\mu_{A} ) among different bands and user activity fractions (i.e., xk𝒞(A)x_{k\mathcal{C}}^{(A)}) using (5) in the orthogonal scenario, we fix μA\mu_{A} due to the following reasons. The resource partition among macro and small cells is most likely static (or semi-static) in practice. Moreover, the macro and small cells may operate on different frequency bands (e.g., the macro and small cells may transmit on lower-frequency bands and higher-frequency bands, respectively), where μA\mu_{A} then depends on the available resources on each band and thus is not a variable to optimize. As for the selection of the fixed values for μA\mu_{A}, we set μ2=0.2\mu_{2}=0.2 and μ3=0.8\mu_{3}=0.8 as an illustrative example. In our selection, we let μ3\mu_{3} to be larger than μ2\mu_{2} since the small cells are deployed more densely than the macro BSs. Our simulation results can be easily updated with other values. For completeness, we have provided the simulation results in the orthogonal scenarios with different μ\mu at the end of subsection VII-A.

We make comparisons between the conventional approach (i.e., max-SINR), the approach of [15] and the proposed UCS of this work. Both max-SINR and [15] are cellular approaches. In fact, the formulation of [15] is equivalent to UCS in Scenario (i) with Lmax(1)=1L_{\max}(1)=1.

Lmax(A)L_{\max}^{(A)} depends on the band: For our simulations, we consider Lmax(1){1,4}L_{\max}^{(1)}\in\{1,4\} and Lmax(3){1,4}L_{\max}^{(3)}\in\{1,4\}. For Band-22, only cellular transmission from macro BSs are allowed, and hence Lmax(2)=1L_{\max}^{(2)}\!=\!1. The number of all possible clusters of size greater than 44 is too large for any practical purpose. Besides, not all subsets of BSs are good candidates for being clusters. We determine the set of potential BSs from the perspective of users: we let each user pick the strongest 8 BSs providing the largest signal strength to that user181818We pick the strongest 8 BSs, since the performance of picking the strongest 9 BSs is almost the same as the 8-BS case, while the utility of picking the strongest 7 BSs is less than the 8-BS case., and the potential BS clusters that can serve the user only include BSs among these 8 BSs.

Refer to caption
Figure 3: The illustration of network deployment. The white grids are the regular areas, while the shadowed grids are hotspots.

There is a one-to-one mapping between the log utility and the geometric mean of rates as (k=1KRk)1K\left(\prod_{k=1}^{K}R_{k}\right)^{\frac{1}{K}} =exp(1Kk=1KlogRk)=\!\exp\left(\frac{1}{K}\sum_{k=1}^{K}\log R_{k}\right), thus we use geometric mean of rates as the metric for performance evaluation.

VII-A Simulation of Layout 1 (Figure 3)

Fig. 4(a) show the geometric mean of rates in scenarios (i) and (ii). The optimal solution to (5), hereby denoted as UCS-NUM, is obtained by CVX. We provide performance comparisons between the CVX solution (denoted as the UCS-NUM) of (5) and the solution of the dual subgradient based algorithm. The latter has almost the same performance as the NUM solution, which validates our analysis. We observe this also for our later simulations, hence the results of the dual algorithm are skipped in following figures for the sake of clarity. It can be seen that when the solution to (5) is approximated by (15) with unique association, the utility loss is insignificant thanks to a very few number of fractional users (as shown in Prop. 2). Moreover, the proposed greedy VQ scheduling scheme provides performance close to the NUM solution, and in particular within 90% of the utility provided by the NUM solution in both scenarios (i) and (ii). Note that in cellular transmission, the NUM solution is feasible via some scheduler, and thus VQ based scheduling is unnecessary [15]. We can observe that the UCS significantly improves the geometric mean of rates versus the optimal cellular performance and the max-SINR association (about 1.6×\times in the shared scenario and 1.35×\times in the orthogonal scenario versus the optimal cellular result).

In Fig. 4(b), we compare the performance of scenarios (i) and (iii). We can observe that RB blanking further improves the network utility.

Refer to caption
(a) Scenario(i) and Scenario(ii)
Refer to caption
(b) Scenario(i) and Scenario(iii)
Figure 4: The geometric mean of rates using different approaches (ρ=1\rho\!=\!1): (a) The UCS with VQ based scheduling scheme provides a large performance gain (about 1.6×\times and 1.35×\times in the shared and orthogonal scenarios, respectively) versus the optimal cellular result. (b) RB blanking further improves the network performance.

Observation of Fig. 5 yields similar conclusions. Indeed Fig. 5 shows the rate cumulative distribution function (CDF) with different approaches. We illustrate the results of the shared and RB blanking scenarios in the same figure, as RB blanking is essentially motivated from the shared scenario to manage the interference from macros to small cell users. The rate of bottom (the 10th percentile) users using UCS in scenario (i) is about 2.2×\times of the optimal cellular solution of [15]. The gain is even larger in scenario (ii).

Refer to caption
(a) Scenarios (i) and (iii)
Refer to caption
(b) Scenario (ii)
Figure 5: The long-term rate CDF using different approaches (ρ=1\rho\!=\!1). The rate of bottom (10th percentile) users using UCS is about 2.2×\times of the cellular transmission case with optimal user association but without interference management.

The number of users served by different clusters with UCS is illustrated in Fig. 6. In the shared scenario with max-SINR association, most users connect to macro BSs, since macro BSs have much larger transmit power than small cell BSs. By load balancing, many users are offloaded to small BSs in the optimal cellular solution. In our proposed framework, all users are served by BS clusters with multiple BSs, which implies the potential gain using UCS. In the orthogonal scenario, there is no cross-tier interference and more users may get larger SINR from small BSs than macro BSs, hence more users connect to small cell BSs in the max-SINR association compared to the shared scenario. Due to the limited resources (20% RBs) available in macro BSs, more users are offloaded to small BSs using the load balancing approach in orthogonal cellular transmission. For scenario (i), the percentage of fractional users is about 3.3% using UCS, and 1.2% in the case with optimal cellular. In the RB blanking scenario, the percentage of fractional users in the case using LJT with RB blanking (scenario (iii)) is about 2.5%, while the percentage of fractional users adopting cellular transmission with blanking is less than 1%. Thus, we can conclude that the number of fractional users in all cases is very small, which validates our analysis.

Refer to caption
(a) Scenario (i)
Refer to caption
(b) Scenario (ii)
Figure 6: The number of users served by different clusters (ρ=1\rho\!=\!1). Most users have unique association. The “Cluster UEs” refer to the users served by clusters of size larger than 1.
Refer to caption
Refer to caption
Figure 7: (a) The geometric mean of rates using different approaches versus ρ\rho. As ρ\rho decreases, the gain from JT decreases in both shared and orthogonal scenarios. (b)The fraction of resources allocated to clusters of different sizes in the RB blanking scenario. As ρ\rho decreases, more resources are allocated to clusters with smaller size.

In Fig. 7, we show the geometric mean of rates versus different ρ\rho. We observe that the performance gain using UCS decreases as ρ\rho decreases, since the number of users that can be served by clusters decreases. This implies that the gain from UCS increases as more UL pilot resources are available in the system. With limited UL pilot resources, the gain from UCS would be quite small.

Fig. 7 illustrates the resource allocation for clusters of different sizes versus ρ\rho in the RB blanking scenario. The macro BSs are off for about 65% RBs in cellular transmission. In scenario (iii), as ρ\rho decreases, the clusters serve less users, and more resources are allocated to the clusters of smaller sizes. When ρ=0.25\rho=0.25, all resources are allocated to single-BS clusters in normal RBs, and most of the resources are allocated to single-BS clusters in blank RBs. This again suggests that when the available pilot resources are strictly constrained, the gain from LJT would be limited.

Fig. 8 illustrates the simulation results in the orthogonal scenario (Scenario (ii)) with different values of μ\mu. As μ2\mu_{2} increases, the utility of max-SINR increases due to the fact that most users are associated to macro BSs in max-SINR association. As μ2\mu_{2} increases, more resources are available to macro users and thus the utility can be improved. On the other hand, if the available resources are limited for small cells, it would result in less motivation for load balancing (i.e. pushing users off from macro to small cells), since the users, if offloaded to small cell, still suffer limited resources and thus limited rate. Therefore, the gain from load balancing and JT will be limited when small fraction of resources are allocated to small cells (i.e. whenμ2\mu_{2} is large), as can be observed from Figure 8.

Refer to caption
Figure 8: The illustration of geometric mean of rates in Scenario (ii) (the orthogonal scenario) with different fractions of resources allocated to macro and small cell layers.

VII-B Simulation of Layout 2 (Figure 9)

In this subsection, we provide similar simulation results for a network topology complaint with 3GPP HetNet scenario [7] as shown in Fig. 9. In particular, we have a cellular layout with 7 macro-cell BSs and 3 hotspots per macrocell. Within each hotspot region there are 4 randomly dropped small cell BSs. 120 UEs are uniformly dropped in each hotspot region while 60 more UEs are dropped randomly in the whole coverage area of each macro cell. The macro/small cell powers and the pathloss models used in this experiment are identical to those used in the previous layout.

Fig. 10(a) compares the geometric mean of rate with various methods in scenarios (i) and (ii), and Fig. 10(b) presents the geometric mean of rate with various methods in scenarios (i) and (iii). Similar to the layout illustrated Fig. 3, we also observe a significant gain in geometric mean of rate by using LJT in all considered scenarios. Specifically, the UCS with VQ based scheduling scheme provides a large performance gain (about 1.35 ) versus the optimal cellular result, as illustrated in both Figs. 10(a) and 10(b).

Figs. 11(a) and 11(b) illustrate the user rate CDFs with different approaches. An increase of 83% can be observed for the cell-edge users at the 10th percentile compared to the cellular case with optimal load balancing but no interference management. It can also be observed that joint RB blanking and JT further improve the network performance.

Refer to caption
Figure 9: The illustration of 3GPP layout.
Refer to caption
(a) Scenarios (i) and (ii)
Refer to caption
(b) Scenarios (i) and (iii)
Figure 10: Geometric mean of rate in 3GPP layout with ρ=1\rho=1.
Refer to caption
(a) Scenarios (i) and (iii)
Refer to caption
(b) Scenario (ii)
Figure 11: The long-term rate CDF in 3GPP layout with ρ=1\rho=1.

VIII Conclusion

In this paper, we investigate the joint optimization problem of user association and interference management in the massive MIMO HetNets. We consider both LJT and RB blanking approaches for interference management. We first provide the instantaneous rate from BS clusters by exploiting massive MIMO properties, namely the rate hardening and the independence of peak rate from the user scheduling. We then formulate a convex NUM problem to obtain the optimal user-specific BS clusters and the corresponding resource allocation. The unified formulation can be applied to both LJT and blanking approaches, as well as the case where macro and small BSs use orthogonal resources. We further propose an efficient dual subgradient based algorithm, which is shown to converge towards the NUM solution. We show that the NUM solution with LJT may not be implementable by a feasible scheduler, and thus it provides an upper bound on the performance. Showing that most users connect to at most one cluster per RB in heavily loaded networks, we propose to approximate the NUM solution to a unique association, given which we propose a VQ based scheduling scheme to provide approximate but implementable results. Simulations show that the proposed scheduling scheme yields results that closely match the NUM solution. Investigations involving more dynamic settings (e.g., users with high mobility) and the impact of different factors (such as imperfect CSI acquisition, number of users simultaneously served by BSs in different cluster sizes) on the overall system performance are all subjects of future work. It is also of interest to theoretically bound the gap between the NUM solution and the results of proposed VQ based scheduling scheme.

Appendix A Proof of Theorem 1

We adopt similar techniques in the proof of Theorem 1 in [15]. Once the statement for one cluster size in one band is proven, the conclusion can be easily extended to general ATSs with various cluster sizes and multiple bands. Thus, we focus on clusters of size LL in Band-AA (i.e., subband LL in Band-AA). We ignore the index AA for simplicity. All the clusters considered below satisfy 𝒞(A)\mathcal{C}\!\subset\!\mathcal{B}^{(A)} and |𝒞|=L|\mathcal{C}|\!=\!L.

The set of feasible scheduling instants is denoted by \mathcal{F}, which includes vectors 𝐞\mathbf{e} with element ek𝒞{0,1}e_{k\mathcal{C}}\!\in\!\{0,1\}, where ek𝒞=1e_{k\mathcal{C}}\!=\!1 if user kk connects to 𝒞\mathcal{C} and ek𝒞=0e_{k\mathcal{C}}\!=\!0 otherwise. According to Defn. 3, 𝐞\mathbf{e} is consisted of {ek𝒞}\{e_{k\mathcal{C}}\} satisfying that user kk connects to at most one cluster and BS jj serves at most Sj(L)S_{j}(L) distinct users. By time sharing among the feasible scheduling instants in \mathcal{F}, any fractional association in the convex hull of \mathcal{F} can be achieved in the long term. We denote the convex hull of \mathcal{F} by X=conv()X^{\prime}=\textrm{conv}(\mathcal{F}) and the set of activity fractions associated to clusters in AA satisfying constraints in (5) by XX, i.e.,

X=\displaystyle X= {xk𝒞:𝒞:j𝒞k𝒰xk𝒞Sj(L)1,𝒞xk𝒞1,xk𝒞0,k𝒰,j(A) and 𝒞(A)}.\displaystyle\left\{x_{k\mathcal{C}}:\sum_{\mathcal{C}:j\in\mathcal{C}}\sum_{k\in\mathcal{U}}\frac{x_{k\mathcal{C}}}{S_{j}(L)}\leq 1,\sum_{\mathcal{C}}x_{k\mathcal{C}}\leq 1,x_{k\mathcal{C}}\geq 0,\forall k\in\mathcal{U},\forall j\in\mathcal{B}^{(A)}\text{ and }\forall\mathcal{C}\subset\mathcal{B}^{(A)}\right\}.

It is easy to show that any feasible scheduling instants in \mathcal{F} satisfies the constraints (5b)-(5f), and thus X\mathcal{F}\subseteq X. Note that XX is convex. Thus, we have X=conv()XX^{\prime}=\textrm{conv}(\mathcal{F})\subseteq X.

As for the opposite direction (i.e., XXX\not\subseteq X^{\prime}), we first define the totally unimodular (TU) matrix: every square submatrix of a TU matrix has determinant +1,1+1,-1 or 0. The Hoffman & Kruskal’s (1956) Theorem claims that a matrix 𝐁\mathbf{B} is TU if and only if for each integral vector 𝐛\mathbf{b}, the extreme points of the polyhedron {𝐳:𝐁𝐳𝐛,𝐳0}\{\mathbf{z}:\mathbf{B}\mathbf{z}\!\leq\!\mathbf{b},\mathbf{z}\!\geq\!0\} are integral [53]. Denoting by N𝒞L(A)N_{\mathcal{C}L}^{(A)} the number of size-LL clusters in Band-AA, we let 𝐱=[𝐱1T,𝐱2T,,𝐱KT]T\mathbf{x}\!=\![\mathbf{x}_{1}^{T},\mathbf{x}_{2}^{T},\cdots,\mathbf{x}_{K}^{T}]^{T} with 𝐱k=[xk𝒞1,xk𝒞1,,xk𝒞N𝒞L(A)]T\mathbf{x}_{k}\!=\![x_{k\mathcal{C}_{1}},x_{k\mathcal{C}_{1}},\cdots,x_{k\mathcal{C}_{N_{\mathcal{C}L}^{(A)}}}]^{T}, and 𝐛=[S1(L),,SJ(L),1,,1]T\mathbf{b}\!=\!\left[S_{1}(L),\cdots,S_{J}(L),1,\cdots,1\right]^{T} with size (J+K)×1(J\!+\!K)\!\times\!1. We let 𝐁=[𝐂𝐃]\mathbf{B}\!=\!\left[\begin{smallmatrix}\mathbf{C}\\ \mathbf{D}\end{smallmatrix}\right], where the size of matrices 𝐂\mathbf{C} and 𝐃\mathbf{D} are J×(KN𝒞L(A))J\!\times\!(KN_{\mathcal{C}L}^{(A)}) and K×(KN𝒞L(A))K\!\times\!(KN_{\mathcal{C}L}^{(A)}), respectively. The element in jjth row and ((k1)N𝒞L(A)+i)\left((k\!-\!1)N_{\mathcal{C}L}^{(A)}\!+\!i\right)th column of matrix 𝐁\mathbf{B} is 1 k𝒰\forall k\!\in\!\mathcal{U} if j𝒞ij\!\in\!\mathcal{C}_{i}, and 0 otherwise. The matrix 𝐂\mathbf{C} has all elements being 1. Recall that we consider large networks including the following type of cluster combination: {j1,j2}\{j_{1},j_{2}\}, {j1,j3}\{j_{1},j_{3}\} and {j2,j3}\{j_{2},j_{3}\} if Lmax(A)>1L_{\max}^{(A)}\!>\!1, where j1,j2j_{1},j_{2} and j3j_{3} are BS indexes. Then, 𝐁\mathbf{B} with Lmax(A)>1L_{\max}^{(A)}\!>\!1 always includes the submatrix [110101011]\left[\begin{smallmatrix}1&1&0\\ 1&0&1\\ 0&1&1\end{smallmatrix}\right] whose determinant is -2, and thus 𝐁\mathbf{B} is not TU. According to the Hoffman & Kruskal’s (1956) Theorem, there are some non-integer extreme points 𝐯X\mathbf{v}\!\in\!X that cannot be characterized by a convex combination of any elements in \mathcal{F}. Thus, we have 𝐯conv()=X\mathbf{v}\!\not\in\!\textrm{conv}(\mathcal{F})\!=\!X^{\prime} and XXX\!\not\subseteq\!X^{\prime}.

Appendix B Proof of Proposition 2

We use the techniques similar to the proof of Prop. 3 in [30], where a graph is used to represent the association and KKT conditions (13) are used to restrict the graph structure. For a given cluster size LL in Band-AA, we denote the graph by G1G_{1}, where nodes represent users, and edge represents the BS cluster that serves the two nodes (users). Each node has an ID indicating the user index, while each edge has a color that identifies the BS cluster.

Recalling that constraints (5c) are inactive, we have θkL(A)=0,k𝒰\theta_{kL}^{(A)}\!=\!0,\forall k\!\in\!\mathcal{U}. If there are two users kk and mm being served by size-LL clusters 𝒞1\mathcal{C}_{1} and 𝒞2\mathcal{C}_{2} in Band-AA (i.e., xk𝒞1(A)>0x_{k\mathcal{C}_{1}}^{(A)}\!>\!0, xk𝒞2(A)>0x_{k\mathcal{C}_{2}}^{(A)}\!>\!0, xm𝒞1(A)>0x_{m\mathcal{C}_{1}}^{(A)}\!>\!0, xm𝒞2(A)>0x_{m\mathcal{C}_{2}}^{(A)}\!>\!0), we have Rk=rk𝒞1(A)j:j𝒞1νjL(A)/Sj(L)=rk𝒞2(A)j:j𝒞2νjL(A)/Sj(L)R_{k}\!=\!\frac{r_{k\mathcal{C}_{1}}^{(A)}}{\sum_{j:j\in\mathcal{C}_{1}}\nu_{jL}^{(A)}/S_{j}(L)}\!=\!\frac{r_{k\mathcal{C}_{2}}^{(A)}}{\sum_{j:j\in\mathcal{C}_{2}}\nu_{jL}^{(A)}/S_{j}(L)} and Rm=rm𝒞1(A)j:j𝒞1νjL(A)/Sj(L)=rm𝒞2(A)j:j𝒞2νjL(A)/Sj(L)R_{m}\!=\!\frac{r_{m\mathcal{C}_{1}}^{(A)}}{\sum_{j:j\in\mathcal{C}_{1}}\nu_{jL}^{(A)}/S_{j}(L)}\!=\!\frac{r_{m\mathcal{C}_{2}}^{(A)}}{\sum_{j:j\in\mathcal{C}_{2}}\nu_{jL}^{(A)}/S_{j}(L)} from KKT condition (13), where Rk=A=13𝒞(A)xk𝒞(A)rk𝒞(A)R_{k}\!=\!\sum_{A^{\prime}=1}^{3}\sum_{\mathcal{C^{\prime}}\subset\mathcal{B}(A^{\prime})}x_{k\mathcal{C^{\prime}}}^{(A^{\prime})}r_{k\mathcal{C^{\prime}}}^{(A^{\prime})}. Thus, we have

rk𝒞1(A)rk𝒞2(A)=rm𝒞1(A)rm𝒞2(A),\frac{r_{k\mathcal{C}_{1}}^{(A)}}{r_{k\mathcal{C}_{2}}^{(A)}}=\frac{r_{m\mathcal{C}_{1}}^{(A)}}{r_{m\mathcal{C}_{2}}^{(A)}}, (17)

which is true with probability 0. Therefore, it is almost sure that any two users can share at most one same cluster of size LL in Band-AA. Similarly, we consider an example of three users k,m,ik,m,i and clusters 𝒞1,𝒞2,𝒞3\mathcal{C}_{1},\mathcal{C}_{2},\mathcal{C}_{3}. User kk is associated to 𝒞1\mathcal{C}_{1} and 𝒞2\mathcal{C}_{2}, user mm is associated to 𝒞1\mathcal{C}_{1} and 𝒞3\mathcal{C}_{3}, and user ii is associated to 𝒞2\mathcal{C}_{2} and 𝒞3\mathcal{C}_{3}. We consider the following three cases:

1) Clusters 𝒞1\mathcal{C}_{1}, 𝒞2\mathcal{C}_{2} and 𝒞3\mathcal{C}_{3} are different: we have rk𝒞1(A)rk𝒞2(A)=j:j𝒞1νjL(A)/Sj(L)j:j𝒞3νjL(A)/Sj(L)j𝒞3νjL(A)/Sj(L)j𝒞2νjL(A)/Sj(L)=rm𝒞1(A)rm𝒞3(A)ri𝒞3(A)ri𝒞2(A),\frac{r_{k\mathcal{C}_{1}}^{(A)}}{r_{k\mathcal{C}_{2}}^{(A)}}=\frac{\sum_{j:j\in\mathcal{C}_{1}}\nu_{jL}^{(A)}/S_{j}(L)}{\sum_{j:j\in\mathcal{C}_{3}}\nu_{jL}^{(A)}/S_{j}(L)}\frac{\sum_{j\in\mathcal{C}_{3}}\nu_{jL}^{(A)}/S_{j}(L)}{\sum_{j\in\mathcal{C}_{2}}\nu_{jL}^{(A)}/S_{j}(L)}=\frac{r_{m\mathcal{C}_{1}}^{(A)}}{r_{m\mathcal{C}_{3}}^{(A)}}\frac{r_{i\mathcal{C}_{3}}^{(A)}}{r_{i\mathcal{C}_{2}}^{(A)}}, which is true with probability 0.

2) 𝒞1=𝒞2𝒞3\mathcal{C}_{1}=\mathcal{C}_{2}\neq\mathcal{C}_{3}: users mm and ii are served by both 𝒞1\mathcal{C}_{1} and 𝒞3\mathcal{C}_{3}, which is true with probability 0 from (17).

3) 𝒞1=𝒞2=𝒞3\mathcal{C}_{1}=\mathcal{C}_{2}=\mathcal{C}_{3}: users kk, mm and ii are served by the same cluster, which is possible. In this case, the graph becomes a complete graph.

Therefore, the graph G1G_{1} with three users either contains a loop with the same color edges or no loop. We can get a similar result for graph G1G_{1} with more than three users, where any subgraph formed by users served by the same BS cluster is a complete graph. Thus, we generate a new graph, G2G_{2}, where node represents a cluster. Hence, G2G_{2} has N𝒞L(A)N_{\mathcal{C}L}^{(A)} nodes. There is an edge between two nodes in G2G_{2}, if they have a common vertex in G1G_{1} (i.e., there is at least one user served by both these two clusters). Thus, the number of users who are served by more than one cluster is limited by the edge of G2G_{2}. Any loop in G2G_{2} corresponds to a loop with more than one edge color in G1G_{1}. Since there are no such colorful loops in G1G_{1}, there is no loop in G2G_{2}. In other words, G2G_{2} is a tree. Thus, the maximal number of edges in G2G_{2} (i.e., the maximal number of fractional users) is one less than the number of nodes (i.e., N𝒞L(A)1N_{\mathcal{C}L}^{(A)}\!-\!1).

Appendix C Implementation issues of the dual-subgradient algorithm

We let Lmax=maxALmax(A)L_{\max}\!=\!\max_{A}L_{\max}^{(A)}, N𝒞m=maxL,AN𝒞L(A)N_{\mathcal{C}m}\!=\!\max_{L,A}N_{\mathcal{C}L}^{(A)} and NAN_{A} be the number of operations. To solve (5) directly by CVX [51], we have the problem of size O(N𝒞mNALmaxK)O(N_{\mathcal{C}m}N_{A}L_{\max}K), which is dominated by the size of variables xk𝒞(A)x_{k\mathcal{C}}^{(A)}. On the other hand, as discussed below, the proposed algorithm has lower complexity. Let LaL_{a} be the maximal number of active cluster sizes over all bands (i.e., maxA|{L:λAL>0}|\max_{A}|\{L:\lambda_{AL}\!>\!0\}|). The size of the LP (14) is O(N𝒞mNALamin{N𝒞m1,K}+NALamax{0,KJ+1})O(N_{\mathcal{C}m}N_{A}L_{a}\min\{N_{\mathcal{C}m}\!-\!1,K\}\!+\!N_{A}L_{a}\max\{0,K\!-\!J\!+\!1\}), where the first term signifies the size of positive xk𝒞(A)x_{k\mathcal{C}}^{(A)} for fractional users and the second term signifies the size of positive xk𝒞(A)x_{k\mathcal{C}}^{(A)} for users with unique association. It is easy to check that the size of (14) is smaller than the size of (5). As shown in Sec. VII, the number of fractional users is very small (less than 3.5%KK), and thus the size of (14) is much smaller (less than 3.5%) than (5). Moreover, the size of (14) can be further reduced when La/LmaxL_{a}/L_{\max} is small (e.g., only 2 active cluster sizes among 4 possible sizes in Sec. VII ). The fast convergence in the first part (i.e., steps (8)-(11)) of the algorithm (less than 60 iterations in our simulation) along with the low complexity per iteration, and the reduced size of (14) makes that the proposed algorithm can be more efficient than CVX for larger networks.

Appendix D Detailed Dual Subgradient Algorithm with Redundanct Contraints

In the formulated problem (5) in our paper, constraints (5c)-(5e) imply xKC(A)1x_{KC}^{(A)}\leq 1. Thus, adding the additional constraint xKC(A)1x_{KC}^{(A)}\leq 1 to (5) will not change the problem. In other words, the following problem with constraint xKC(A)1x_{KC}^{(A)}\leq 1 is equivalent to our original optimization problem (5).

maxλAL,xk𝒞(A),μA\displaystyle\max\limits_{\lambda_{AL},x_{k\mathcal{C}}^{(A)},\mu_{A}}\ k𝒰U(A=13𝒞:𝒞(A),|𝒞|Lmax(A)xk𝒞(A)rk𝒞(A))\displaystyle\sum_{k\in\mathcal{U}}U\left(\sum_{A=1}^{3}\sum_{\begin{subarray}{c}\mathcal{C}:\mathcal{C}\subset\mathcal{B}^{(A)},\\ |\mathcal{C}|\leq L_{\max}^{(A)}\end{subarray}}x_{k\mathcal{C}}^{(A)}r_{k\mathcal{C}}^{(A)}\right) (18a)
s.t. 𝒞:𝒞(A),j𝒞,|𝒞|=Lk𝒰xk𝒞(A)Sj(L)λAL,j(A),LLmax(A),A,\displaystyle\sum_{\begin{subarray}{c}\mathcal{C}:\mathcal{C}\subset\mathcal{B}^{(A)},\\ j\in\mathcal{C},|\mathcal{C}|=L\end{subarray}}\frac{\sum_{k\in\mathcal{U}}x_{k\mathcal{C}}^{(A)}}{S_{j}(L)}\leq\lambda_{AL},\ \forall j\in\mathcal{B}^{(A)},\forall L\leq L_{\max}^{(A)},\forall A, (18b)
𝒞:|𝒞|=L,𝒞(A)xk𝒞(A)λAL,k𝒰,LLmax(A),A,\displaystyle\sum_{\mathcal{C}:|\mathcal{C}|=L,\mathcal{C}\subset\mathcal{B}^{(A)}}x_{k\mathcal{C}}^{(A)}\leq\lambda_{AL},\ \forall k\in\mathcal{U},\forall L\leq L_{\max}^{(A)},\forall A, (18c)
xk𝒞(A)[0,1],k𝒰,𝒞:|𝒞|Lmax(A),A,\displaystyle x_{k\mathcal{C}}^{(A)}\in[0,1],\ \forall k\in\mathcal{U},\forall\mathcal{C}:|\mathcal{C}|\leq L_{\max}^{(A)},\forall A, (18d)
L=1Lmax(A)λALμA,A,\displaystyle\sum_{L=1}^{L_{\max}^{(A)}}\lambda_{AL}\leq\mu_{A},\forall A, (18e)
A=13μA1,\displaystyle\sum_{A=1}^{3}\mu_{A}\leq 1, (18f)
λAL,μA0,k𝒰,𝒞,LLmax(A),A,\displaystyle\lambda_{AL},\mu_{A}\geq 0,\ \forall k\in\mathcal{U},\forall\mathcal{C},\forall L\leq L_{\max}^{(A)},\forall A, (18g)

Note that the above problem formulation is the same as (5) in the paper, except the redundant constraint (18d). The dual subgradient algorithm in the paper is proposed based on the above equivalent optimization problem (18).
Specifically, we let νjL(A)\nu_{jL}^{(A)} and θkL(A)\theta_{kL}^{(A)} be the Lagrange multipliers corresponding to (18b) and (18c), respectively. The dual problem of (18) is

minνjL(A),θkL(A)0k𝒰fk(νjL(A),θkL(A))+g(νjL(A),θkL(A)),\min\limits_{\nu_{jL}^{(A)},\theta_{kL}^{(A)}\geq 0}\ \sum_{k\in\mathcal{U}}f_{k}\left(\nu_{jL}^{(A)},\theta_{kL}^{(A)}\right)+g\left(\nu_{jL}^{(A)},\theta_{kL}^{(A)}\right),

where

fk(νjL(A),θkL(A))=maxxk𝒞(A)[0,1]\displaystyle f_{k}\left(\nu_{jL}^{(A)},\theta_{kL}^{(A)}\right)=\max\limits_{x_{k\mathcal{C}}^{(A)}\in[0,1]} log(A=13𝒞:𝒞(A),|C|Lmax(A)xk𝒞(A)rk𝒞(A))A=13L=1Lmax(A)𝒞:𝒞(A),|𝒞|=Lj:j𝒞νjL(A)Sj(L)xk𝒞(A)\displaystyle\log\left(\sum_{A=1}^{3}\sum_{\begin{subarray}{c}\mathcal{C}:\mathcal{C}\subset\mathcal{B}^{(A)},\\ |C|\leq L_{\max}^{(A)}\end{subarray}}x_{k\mathcal{C}}^{(A)}r_{k\mathcal{C}}^{(A)}\right)-\sum_{A=1}^{3}\sum_{L=1}^{L_{\max}^{(A)}}\sum_{\begin{subarray}{c}\mathcal{C}:\mathcal{C}\subset\mathcal{B}^{(A)},\\ |\mathcal{C}|=L\end{subarray}}\sum_{j:j\in\mathcal{C}}\frac{\nu_{jL}^{(A)}}{S_{j}(L)}x_{k\mathcal{C}}^{(A)} (19)
A=13L=1Lmax(A)θkL(A)𝒞:𝒞(A),|𝒞|=Lxk𝒞(A),\displaystyle-\sum_{A=1}^{3}\sum_{L=1}^{L_{\max}^{(A)}}\theta_{kL}^{(A)}\sum_{\mathcal{C}:\mathcal{C}\subset\mathcal{B}^{(A)},|\mathcal{C}|=L}x_{k\mathcal{C}}^{(A)},

and

g(νjL(A),θkL(A))=maxL=1Lmax(A)λALμA,A=13μA1A=13L=1Lmax(A)(j:j(A)νjL(A)+k𝒰θkL(A))λAL.g(\nu_{jL}^{(A)},\theta_{kL}^{(A)})=\max\limits_{\begin{subarray}{c}\sum_{L=1}^{L_{\max}^{(A)}}\lambda_{AL}\leq\mu_{A},\\ \sum_{A=1}^{3}\mu_{A}\leq 1\end{subarray}}\sum_{A=1}^{3}\sum_{L=1}^{L_{\max}^{(A)}}\left(\sum_{j:j\in\mathcal{B}^{(A)}}\nu_{jL}^{(A)}+\sum_{k\in\mathcal{U}}\theta_{kL}^{(A)}\right)\lambda_{AL}. (20)

The function (19) is simmilar to (6) in the paper, except that we have an additional contraint xKC(A)1x_{KC}^{(A)}\leq 1.

The constraints of (18) satisfy the Slater condition, and thus strong duality holds (i.e., the dual problem and the original problem (18) have the same optimal value, which has the same optional solution with problem (5) in the paper).

The optimization problem (19) has the closed-form optimal solution

xk𝒞(A)={[1L:L=|𝒞|(j:j𝒞νjL(A)/Sj(L)+θkL(A))]01,if {𝒞,A}={𝒞,A},0,otherwise,x_{k\mathcal{C}}^{(A)}=\begin{cases}\left[\frac{1}{\sum_{L:L=|\mathcal{C}|}\left(\sum_{j:j\in\mathcal{C}}\nu_{jL}^{(A)}/S_{j}(L)+\theta_{kL}^{(A)}\right)}\right]^{1}_{0},&\text{if }\{\mathcal{C},A\}=\{\mathcal{C}^{*},A^{*}\},\\ 0,&\text{otherwise},\end{cases} (21)

where [x]01=min{1,max{0,x}}[x]_{0}^{1}=\min\{1,\max\{0,x\}\}, {𝒞,A}=argmax𝒞,Ark𝒞(A)L:L=|𝒞|(j:j𝒞νjL(A)/Sj(L)+θkL(A))\{\mathcal{C}^{*},A^{*}\}=\arg\max_{\mathcal{C},A}\frac{r_{k\mathcal{C}}^{(A)}}{\sum_{L:L=|\mathcal{C}|}\left(\sum_{j:j\in\mathcal{C}}\nu_{jL}^{(A)}/S_{j}(L)+\theta_{kL}^{(A)}\right)}191919If we have multiple pairs of {𝒞,A}\{\mathcal{C}^{*},A^{*}\}, we can pick the pair with largest rk𝒞(A)r_{k\mathcal{C}}^{(A)}..

The problem (20) is an LP and one optimal solution is202020If we have multiple {A,L}\{A,L\} pairs that maximize the j:j(A)νjL(A)+k𝒰θkL(A)\sum_{j:j\in\mathcal{B}(A)}\nu_{jL}^{(A)}+\sum_{k\in\mathcal{U}}\theta_{kL}^{(A)}, we just randomly pick one.

λAL={1, if {A,L}=argmaxA,Lj:j(A)νjL(A)+k𝒰θkL(A),0, otherwise,\lambda_{AL}=\left\{\begin{aligned} &1,\text{ if }\{A,L\}=\arg\max_{A^{\prime},L^{\prime}}\sum_{j:j\in\mathcal{B}(A^{\prime})}\nu_{jL^{\prime}}^{(A^{\prime})}+\sum_{k\in\mathcal{U}}\theta_{kL^{\prime}}^{(A^{\prime})},\\ &0,\text{ otherwise},\end{aligned}\right. (22)

and

μA={1, if there exists a band A such that the above λAL>0,0, otherwise.\mu_{A}=\left\{\begin{aligned} &1,\text{ if there exists a band $A$ such that the above }\lambda_{AL}>0,\\ &0,\text{ otherwise}.\end{aligned}\right. (23)

The ttth iteration of the algorithm is as follows.

  1. 1.

    Update the activity fractions by (21).

  2. 2.

    Update resource allocation for different bands and clusters by (22) and (23).

  3. 3.

    Update the Lagrangian multipliers by

    νjL(A)(n+1)=[νjL(A)(n)δ(n)(λAL(n)𝒞:𝒞(A),j𝒞,|𝒞|=Lk𝒰xk𝒞(A)(n)Sj(L))]+,\nu_{jL}^{(A)}(n+1)=\left[\nu_{jL}^{(A)}(n)-\delta(n)\left(\lambda_{AL}(n)-\sum_{\begin{subarray}{c}\mathcal{C}:\mathcal{C}\subset\mathcal{B}(A),\\ j\in\mathcal{C},|\mathcal{C}|=L\end{subarray}}\frac{\sum_{k\in\mathcal{U}}x_{k\mathcal{C}}^{(A)}(n)}{S_{j}(L)}\right)\right]^{+}, (24)

    and

    θkL(A)(n+1)=θkL(A)(n)δ(n)(λAL(n)𝒞:𝒞(A),|𝒞|=Lxk𝒞(A)),\theta_{kL}^{(A)}(n+1)=\theta_{kL}^{(A)}(n)-\delta(n)\left(\lambda_{AL}(n)-\sum_{\mathcal{C}:\mathcal{C}\subset\mathcal{B}(A),|\mathcal{C}|=L}x_{k\mathcal{C}}^{(A)}\right), (25)

    where [z]+=max{z,0}[z]^{+}=\max\{z,0\} and δ(n)\delta(n) is the stepsize at the nthn^{\rm th} iteration.

From the above steps, we can observe that the difference in the dual algorithm by adding the redundant constraint xKC(A)1x_{KC}^{(A)}\leq 1 is in (21), where the variable xk𝒞(A)x_{k\mathcal{C}}^{(A)} needs to be projected to the set [0, 1]. Based on this constraint, the subgradients ΔνjL(A)\Delta\nu_{jL}^{(A)} and ΔθkL(A)\Delta\theta_{kL}^{(A)} are bounded, which can be used to show the convergence of the dual algorithm.

References

  • [1] J. G. Andrews, “Seven ways that HetNets are a cellular paradigm shift,” IEEE Comm. Mag., vol. 51, pp. 136–144, Mar. 2013.
  • [2] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. K. Soong, and J. C. Zhang, “What will 5G be?,” IEEE Journal on Sel. Areas in Communications, vol. 32, pp. 1065–1082, June 2014.
  • [3] F. Boccardi, R. W. Heath, A. Lozano, T. L. Marzetta, and P. Popovski, “Five disruptive technology directions for 5G,” IEEE Comm. Mag., vol. 52, pp. 74–80, Feb. 2014.
  • [4] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,” IEEE Trans. on Wireless Communications, vol. 9, pp. 3590–3600, Nov. 2010.
  • [5] J. Hoydis, S. Ten Brink, M. Debbah, et al., “Massive MIMO in the UL/DL of cellular networks: How many antennas do we need?,” IEEE Journal on Sel. Areas in Communications, vol. 31, pp. 160–171, Feb. 2013.
  • [6] E. Larsson, O. Edfors, F. Tufvesson, and T. Marzetta, “Massive MIMO for next generation wireless systems,” IEEE Comm. Mag., vol. 52, pp. 186–195, Feb. 2014.
  • [7] 3GPP, “Technical specification group radio access network; Small cell enhancements for E-UTRA and E-UTRAN,” TR 36.872, V12.1.0, Dec. 2013.
  • [8] J. G. Andrews, S. Singh, Q. Ye, X. Lin, and H. Dhillon, “An overview of load balancing in HetNets: Old myths and open problems,” IEEE Wireless Communications, vol. 21, pp. 18–25, Apr. 2014.
  • [9] S. Singh, H. S. Dhillon, and J. G. Andrews, “Offloading in heterogeneous networks: Modeling, analysis, and design insights,” IEEE Trans. on Wireless Communications, vol. 12, pp. 2484–2497, May 2013.
  • [10] H. S. Jo, Y. J. Sang, P. Xia, and J. G. Andrews, “Heterogeneous cellular networks with flexible cell association: a comprehensive downlink SINR analysis,” IEEE Trans. on Wireless Communications, vol. 11, pp. 3484–3495, Oct. 2012.
  • [11] E. Aryafar, A. Keshavarz-Haddad, M. Wang, and M. Chiang, “RAT selection games in HetNets,” in Proc., IEEE INFOCOM, pp. 998–1006, Apr. 2013.
  • [12] A. Damnjanovic, J. Montojo, Y. Wei, T. Ji, T. Luo, M. Vajapeyam, T. Yoo, O. Song, and D. Malladi, “A survey on 3GPP heterogeneous networks,” IEEE Wireless Communications Magazine, vol. 18, pp. 10–21, June 2011.
  • [13] A. Ghosh, N. Mangalvedhe, R. Ratasuk, B. Mondal, M. Cudak, E. Visotsky, T. A. Thomas, J. G. Andrews, et al., “Heterogeneous cellular networks: From theory to practice,” IEEE Comm. Mag., vol. 50, pp. 54–64, June 2012.
  • [14] Q. Ye, B. Rong, Y. Chen, M. Al-Shalash, C. Caramanis, and J. Andrews, “User association for load balancing in heterogeneous cellular networks,” IEEE Trans. on Wireless Communications, vol. 12, pp. 2706–2716, June 2013.
  • [15] D. Bethanabhotla, O. Y. Bursalioglu, H. C. Papadopoulos, and G. Caire, “Optimal user-cell association for massive MIMO wireless networks,” IEEE Trans. Wireless Comm., vol. PP, pp. 1–1, Nov. 2015.
  • [16] D. Gesbert, S. Hanly, H. Huang, S. Shamai Shitz, O. Simeone, and W. Yu, “Multi-cell MIMO cooperative networks: A new look at interference,” IEEE Journal on Sel. Areas in Communications, vol. 28, pp. 1380–1408, Dec. 2010.
  • [17] M. Sawahashi, Y. Kishiyama, A. Morimoto, D. Nishikawa, and M. Tanno, “Coordinated multipoint transmission/reception techniques for LTE-Advanced [coordinated and distributed MIMO],” IEEE Wireless Communications, vol. 17, pp. 26–34, June 2010.
  • [18] D. Lee, H. Seo, B. Clerckx, E. Hardouin, D. Mazzarese, S. Nagata, and K. Sayana, “Coordinated multipoint transmission and reception in LTE-Advanced: deployment scenarios and operational challenges,” IEEE Comm. Mag., vol. 50, pp. 148–155, Feb. 2012.
  • [19] P. Marsch and G. Fettweis, “Static clustering for cooperative multi-point (CoMP) in mobile communications,” in Proc., IEEE Intl. Conf. on Communications, pp. 1–6, June 2011.
  • [20] J. Li, T. Svensson, C. Botella, T. Eriksson, X. Xu, and X. Chen, “Joint scheduling and power control in coordinated multi-point clusters,” in Proc., IEEE Veh. Technology Conf., pp. 1–5, Sep. 2011.
  • [21] J. Zhao, T. Q. S. Quek, and Z. Lei, “Coordinated multipoint transmission with limited backhaul data transfer,” IEEE Trans. on Wireless Communications, vol. 12, pp. 2762–2775, June 2013.
  • [22] Y. Du and G. de Veciana, “Wireless networks without edges: Dynamic radio resource clustering and user scheduling,” in Proc., IEEE INFOCOM, pp. 1321–1329, Apr. 2014.
  • [23] E. Björnson, M. Kountouris, and M. Debbah, “Massive MIMO and small cells: Improving energy efficiency by optimal soft-cell coordination,” in IEEE International Conference on Telecommuniations, May 2013.
  • [24] M. Hong, R. Sun, H. Baligh, and Z. Q. Luo, “Joint base station clustering and beamformer design for partial coordinated transmission in heterogeneous networks,” IEEE Journal on Sel. Areas in Communications”, vol. 31, pp. 226–240, Feb. 2013.
  • [25] M. Panjabi, M. Razaviyayn, and Z. Q. Luo, “Optimal joint base station assignment and beamforming for heterogeneous networks,” IEEE Trans. on Signal Processing, vol. 62, pp. 1950–1961, Apr. 2014.
  • [26] S. Wagner, R. Couplet, M. Debbah, and D. T. M. Slock, “Joint precoding and load balancing optimization for energy-efficient heterogeneous networks,” IEEE Trans. on Wireless Comm., vol. 14, pp. 5810–5822, Oct. 2015.
  • [27] D. Lopez-Perez et al., “Enhanced intercell interference coordination challenges in heterogeneous networks,” IEEE Wireless Communications, vol. 18, pp. 22–30, June 2011.
  • [28] S. Vasudevan, R. Pupala, and K. Sivanesan, “Dynamic eICIC - a proactive strategy for improving spectral efficiencies of heterogeneous LTE cellular networks by leveraging user mobility and traffic dynamics,” IEEE Trans. on Wireless Communications, vol. 12, pp. 4956–4969, Oct. 2013.
  • [29] A. Liu, V. K. N. Lau, L. Ruan, J. Chen, and D. Xiao, “Hierarchical radio resource optimization for heterogeneous networks with enhanced inter-cell interference coordination (eICIC),” IEEE Trans. on Signal Processing, vol. 62, pp. 1684–1693, Apr. 2014.
  • [30] Q. Ye, M. Al-Shalash, C. Caramanis, and J. G. Andrews, “On/off macrocells and load balancing in heterogeneous cellular networks,” in Proc., IEEE Globecom, pp. 3814–3819, Dec. 2013.
  • [31] A. Bedekar and R. Agrawal, “Optimal muting and load balancing for eICIC,” in Intl. Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt), pp. 280–287, May 2013.
  • [32] J. Ghimire and C. Rosenberg, “Resource allocation, transmission coordination and user association in heterogeneous networks: A flow-based unified approach,” IEEE Trans. on Wireless Communications, vol. 12, pp. 1340–1351, Mar. 2013.
  • [33] S. Deb, P. Monogioudis, J. Miernik, and J. P. Seymour, “Algorithms for enhanced inter-cell interference coordination (eICIC) in LTE hetnets,” IEEE/ACM Trans. on Networking, vol. 22, pp. 137–150, Feb. 2014.
  • [34] S. Singh and J. G. Andrews, “Joint resource partitioning and offloading in heterogeneous cellular networks,” IEEE Trans. on Wireless Communications, vol. 13, pp. 888–901, Feb. 2014.
  • [35] E. Björnson, R. Zakhour, D. Gesbert, and B. Ottersten, “Cooperative multicell precoding: Rate region characterization and distributed strategies with instantaneous and statistical CSI,” IEEE Trans. on Signal Processing, vol. 58, pp. 4298–4310, Aug. 2010.
  • [36] E. Hossain, M. Rasti, H. Tabassum, and A. Abdelnasser, “Evolution toward 5G multi-tier cellular wireless networks: An interference management perspective,” IEEE Wireless Communications, vol. 21, pp. 118–127, June 2014.
  • [37] S. Shakkottai, T. S. Rappaport, and P. C. Karlsson, “Cross-layer design for wireless networks,” IEEE Comm. Mag., vol. 41, pp. 74–80, Oct. 2003.
  • [38] X. Lin, N. Shroff, and R. Srikant, “A tutorial on cross-layer optimization in wireless networks,” IEEE Journal on Sel. Areas in Communications, vol. 24, pp. 1452–1463, Aug. 2006.
  • [39] E. Björnson and E. Jorswieck, Optimal resource allocation in coordinated multi-cell systems, vol. 9 (2-3). Now Publishers, 2013.
  • [40] E. Björnson, N. Jaldè, M. Bengtsson, and B. Ottersten, “Optimality properties, distributed strategies, and measurement-based evaluation of coordinated multicell OFDMA transmission,” IEEE Transactions on Signal Processing, vol. 59, pp. 6086–6101, Dec. 2011.
  • [41] Q. Ye, O. Y. Bursalioglu, and H. Papadopoulos, “Harmonized cellular and distributed massive MIMO: Load balancing and scheduling,” in Proc., IEEE Globecom, Dec. 2015.
  • [42] H. Huh, G. Caire, H. C. Papadopoulos, and S. A. Ramprashad, “Achieving large spectral efficiency with TDD and not-so-many base-station antennas,” in IEEE APWC, pp. 1346–1349, Sep. 2011.
  • [43] G. Caire, N. Jindal, M. Kobayashi, and N. Ravindran, “Multiuser MIMO achievable rates with downlink training and channel state feedback,” IEEE Trans. on Info. Theory, vol. 56, pp. 2845–2866, June 2010.
  • [44] J. Zhang, R. Chen, J. G. Andrews, A. Ghosh, and R. W. Heath, “Networked MIMO with clustered linear precoding,” IEEE Trans. on Wireless Communications, vol. 8, pp. 1910–1921, Apr. 2009.
  • [45] H. Huh, A. M. Tulino, and G. Caire, “Network MIMO with linear zero-forcing beamforming: Large system analysis, impact of channel estimation, and reduced-complexity scheduling,” IEEE Trans. on Info. Theory, vol. 58, pp. 2911–2934, May 2012.
  • [46] M. Chiang, P. Hande, T. Lan, and C. W. Tan, “Power control in wireless cellular networks,” Foundations and Trends® in Networking, vol. 2, pp. 381–533, Apr. 2008.
  • [47] Y.-G. Lim, C.-B. Chae, and G. Caire, “Performance analysis of massive mimo for cell-boundary users,” IEEE Trans. on Wireless Comm., vol. 14, pp. 6827–6842, Dec. 2015.
  • [48] S. Stanczak, M. Wiczanowski, and H. Boche, Fundamentals of Resource Allocation in Wireless Networks: Theory and Algorithms, vol. 3. Springer Verlag, 2009.
  • [49] J. Mo and J. Walrand, “Fair end-to-end window-based congestion control,” IEEE /ACM Transactions on Networking, vol. 8, pp. 556–567, Oct. 2000.
  • [50] D. P. Bertsekas, Convex Optimization Theory. Athena Scientific, 2009.
  • [51] M. Grant, S. Boyd, and Y. Ye, “CVX: Matlab software for disciplined convex programming,” 2009. Available: http://cvxr.com/cvx/.
  • [52] H. Shirani-Mehr, G. Caire, and M. J. Neely, “MIMO downlink scheduling with non-perfect channel state knowledge,” IEEE Trans. on Communications, vol. 58, pp. 2055–2066, July 2010.
  • [53] A. Schrijver, Theory of Linear and Integer Programming. John Wiley & Sons, 1998.