This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Joint Optimization of Preamble Selection and Access Barring for Random Access in MTC with General Device Activities

Wang Liu, Ying Cui, Feng Yang, Lianghui Ding, and Jun Sun The authors are with Shanghai Jiao Tong University, China. This paper was presented in part at IEEE ICC Workshops 2021 [1].
Abstract

Most existing random access schemes for machine-type communications (MTC) simply adopt a uniform preamble selection distribution, irrespective of the underlying device activity distributions. Hence, they may yield unsatisfactory access efficiency. In this paper, we model device activities for MTC as multiple Bernoulli random variables following an arbitrary multivariate Bernoulli distribution which can reflect both dependent and independent device activities. Then, we optimize preamble selection and access barring for random access in MTC according to the underlying joint device activity distribution. Specifically, we investigate three cases of the joint device activity distribution, i.e., the cases of perfect, imperfect, and unknown joint device activity distributions, and formulate the average, worst-case average, and sample average throughput maximization problems, respectively. The problems in the three cases are challenging nonconvex problems. In the case of perfect joint device activity distribution, we develop an iterative algorithm and a low-complexity iterative algorithm to obtain stationary points of the original problem and an approximate problem, respectively. In the case of imperfect joint device activity distribution, we develop an iterative algorithm and a low-complexity iterative algorithm to obtain a Karush-Kuhn-Tucker (KKT) point of an equivalent problem and a stationary point of an approximate problem, respectively. Finally, in the case of unknown joint device activity distribution, we develop an iterative algorithm to obtain a stationary point. The proposed solutions are widely applicable and outperform existing solutions for dependent and independent device activities.

Index Terms:
Machine-type communications (MTC), random access procedure, preamble selection, access barring, optimization.

I Introduction

The emerging Internet-of-Things (IoT), whose key idea is to connect everything and everyone by the Internet, has received increasing attention in recent years. Machine-type communications (MTC) are expected to support IoT services and applications, such as home automation, smart grids, smart healthcare, and environment monitoring [2, 3]. Long Term Evolution (LTE) cellular networks offer the most natural and appealing solution for MTC due to ubiquitous coverage. Specifically, the Third-Generation Partnership Project (3GPP) has developed radio technologies, such as LTE-M [4] and Narrowband IoT (NB-IoT) [5], to enhance existing LTE networks and to provide new solutions inherited from LTE, respectively, for better serving IoT use cases. In addition, 3GPP proposes to the International Telecommunications Union (ITU) that LTE-M and NB-IoT should be integrated as part of the fifth generation (5G) specifications [6].

In LTE-M and NB-IoT, devices compete in a random access channel (RACH) to access the base station (BS) through the random access procedure [7]. Specifically, each active device randomly selects a preamble from a pool of available preambles according to a preamble selection distribution, and transmits it during the RACH. The BS acknowledges the successful reception of a preamble if such preamble is transmitted by only one device. In[8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18], the authors consider the random access procedure and study the effect of preamble selection under certain assumptions on the knowledge of device activities. Specifically,[8, 9, 10, 13, 16, 17, 18] assume that the number of active devices is known; [15, 14] assume that the distribution of the number of active devices is known; [11, 12] assume that the statistics of the data queue of each device are known. In[8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18], preambles are selected according to a uniform distribution, the average throughput [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18], average access efficiency [16] and resource consumption [10] are analyzed and the number of allocated preambles is optimized to maximize the average throughput [10, 17] or access efficiency [16]. Notice that optimal solutions are obtained in [10, 16, 17]. In [19, 20, 21], the active devices are assumed to be known. Under this assumption, each active device is allocated a preamble if the number of preambles is greater than or equal to the number of active devices, and the preambles are allocated to a subset of active devices otherwise.

When many active devices attempt to access a BS simultaneously, a preamble is very likely to be selected by more than one device, leading to a significant decrease in the probability of access success. In this scenario, access control is necessary. One widely used access control method is the access barring scheme [22], which has been included in the LTE specification in [7]. In[8, 9, 10, 11, 12, 13, 14, 15, 16], the authors consider access barring, besides random access procedure with preambles selected according to a uniform distribution. Specifically, the access barring factor is optimized to maximize the average throughput [8, 9, 10, 11, 12, 13, 14] or access efficiency [16], or to minimize the number of backlogged devices [15]. Note that optimal solutions are obtained in [15, 8, 13, 10, 14, 16], an approximate solution is obtained in [9], and asymptotically optimal solutions for a large number of devices are obtained in [11, 12].

There exist two main types of MTC traffic, i.e., event-driven (or event-triggered) traffic and periodic traffic [23, 24]. In event-driven MTC (e.g., smart health care and environment sensing such as rockfall detection and fire alarm), one device being triggered may increase the probability that other devices in the vicinity trigger in quick succession, or many devices are simultaneously triggered in response to a particular IoT event. Hence, device activities can be dependent. Besides, in periodic MTC, devices have their own periods, and hence device activities can be considered independent for tractability (if the periodic behaviors of devices are not tracked). Intuitively, effective preamble selection and access barring should adapt to the statistical behaviors of device activities. Specifically, if devices are more likely to activate simultaneously, letting them always select different preambles can avoid more collisions; if devices are less likely to activate simultaneously, letting them always select the same preamble can avoid more collisions. However, most existing random access schemes simply adopt a uniform preamble selection distribution, irrespective of the underlying device activity distributions [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]. They hence may yield far from optimal access efficiency. This motivates us to model device activities as multiple Bernoulli random variables following an arbitrary multivariate Bernoulli distribution, which can reflect both dependent and independent device activities [25, 26, 27, 28], and optimize preamble selection and access barring according to the multivariate Bernoulli distribution [27]. To our knowledge, [27] is the first work that considers a general joint device activity distribution and attempts to optimize the preamble selection distributions and access barring factors of all devices. More specifically, in [27], the authors assume that a perfect joint device activity distribution is known, maximize an approximation of the average throughput, which captures the active probabilities of every single device and every two devices, and develop a heuristic algorithm to tackle the challenging nonconvex problem. The approximation error and the heuristic algorithm may yield a nonnegligible loss in access efficiency. Therefore, it is critical to explore more effective algorithms adaptive to statistics of device activities. Furthermore, notice that most existing works assume that the number of active devices [8, 9, 10, 13, 16, 17], the distribution of the number of active devices [14, 15], the statistics of the data queue of each device [11, 12], the active devices [19, 20, 21], or the device activity distribution [27] is perfectly known. However, in practice, such assumptions are hardly satisfied. Hence, it is critical to consider the optimization of preamble selection and access barring under imperfect information of device activities or even historical samples of device activities.

This paper investigates the joint optimization of preamble selection and access barring for IoT devices in MTC, whose activities are modeled by an arbitrary multivariate Bernoulli distribution, which can reflect both dependent and independent device activities. Specifically, we consider three cases of the joint device activity distribution, i.e., the case of perfect general joint device activity distribution (where the joint device activity distribution has been perfectly estimated), the case of imperfect joint device activity distribution (where the estimation error of the joint device activity distribution lies within a known deterministic bound), and the case of unknown joint device activity distribution (where activity samples generated according to the joint device activity distribution are available). The optimization-based solutions are generally applicable for any dependent and independent device activities. Our main contributions are summarized below.

  • In the case of perfect joint device activity distribution, we formulate the average throughput maximization problem, which is nonconvex with a complicated objective function. Based on the block coordinate descend (BCD) method [29], we develop an iterative algorithm, where most block coordinate optimization problems are solved analytically, and a low-complexity iterative algorithm, where all block coordinate optimization problems have closed-form optimal points, to obtain stationary points of the original problem and an approximate problem, respectively. Furthermore, we characterize an optimality property of a globally optimal point of the original problem.

  • In the case of imperfect joint device activity distribution, we formulate the worst-case average throughput maximization problem as a robust optimization problem, which is a max-min problem. By duality theory and successive convex approximation (SCA) [30], we develop an iterative algorithm to obtain a Karush-Kuhn-Tucker (KKT) point of an equivalent problem of the max-min problem. Based on the BCD method, we also develop a low-complexity iterative algorithm, where all block coordinate optimization problems are solved analytically, to obtain a stationary point of an approximate problem of the max-min problem.

  • In the case of unknown joint device activity distribution, we formulate the sample average throughput maximization problem, which can be viewed as a nonconvex stochastic optimization problem. Based on mini-batch stochastic parallel SCA [31], we develop an efficient parallel iterative algorithm where the approximate convex problems in each iteration are solved analytically, to obtain a stationary point.

  • Finally, we show that for dependent and independent device activities, the proposed solutions achieve significant gains over existing schemes in all three cases by numerical results. Besides, the performance of each low-complexity algorithm is close to that of its counterpart designed for solving the original problem.

The key notation used in this paper is listed in Table I.

TABLE I: Key Notation
Notation Description Notation Description
KK the number of devices ak,n[0,1]a_{k,n}\in[0,1] the probability that device kk selects preamble nn
NN the number of preambles T(𝐀,ϵ,𝐱)T({\bf A},{\epsilon},{\bf x}) average throughput conditional on 𝐱\mathbf{x}
xk{0,1}x_{k}\in\{0,1\} the activity state of device kk T¯(𝐀,ϵ,𝐩)\bar{T}({\bf A},{\epsilon},{\bf p}) average throughput
p𝐱[0,1]p_{\bf x}\in[0,1] the probability that the activity states are 𝐱\bf x T~(𝐀,ϵ,𝐩)\tilde{T}({\bf A},{\epsilon},{\bf p}) approximate average throughput
p¯𝐱[0,1]\underline{p}_{\bf x}\in[0,1] the lower bound of p𝐱p_{\bf x} T¯wt(𝐀,ϵ,𝒫)\bar{T}_{\text{wt}}({\bf A},{\epsilon},\mathcal{P}) worst-case average throughput
p¯𝐱[0,1]\overline{p}_{\bf x}\in[0,1] the upper bound of p𝐱p_{\bf x} T~wt(𝐀,ϵ,𝒫~)\tilde{T}_{\text{wt}}({\bf A},{\epsilon},\tilde{\mathcal{P}}) approximate worst-case average throughput
ϵ[0,1]\epsilon\in[0,1] the access barring factor T¯st(𝐀,ϵ)\bar{T}_{\text{st}}({\bf A},{\epsilon}) sample average throughput

II System Model

We consider the uplink of a single-cell wireless network consisting of one BS and KK devices. Let 𝒦{1,2,,K}\mathcal{K}\triangleq\{1,2,...,K\} denote the set of KK devices. We consider a discrete-time system with time being slotted and assume that devices activate independently and identically over slots within a certain period. In each slot, a device can be either active or inactive. Thus, the activities of the KK devices in a slot can be modeled as KK Bernoulli random variables, following an arbitrary multivariate Bernoulli distribution [28, 25, 26]. Generally, the KK Bernoulli random variables (representing the activities of the KK devices) can be dependent or independent. Let xk{0,1}x_{k}\in\{0,1\} denote the activity state of device kk, where xk=1x_{k}=1 if device kk is active and xk=0x_{k}=0 otherwise. Let binary vector 𝐱(xk)k𝒦𝒳{\bf x}\triangleq(x_{k})_{k\in\mathcal{K}}\in\mathcal{X} denote the activity states of the KK devices, where 𝒳{0,1}K\mathcal{X}\triangleq\{0,1\}^{K}. The joint device activity distribution (assumed invariant over the considered period) is denoted by 𝐩(p𝐱)𝐱𝒳{\bf p}\triangleq(p_{\bf x})_{{\bf x}\in\mathcal{X}}, where p𝐱p_{\bf x} represents the probability of the activity states of the KK devices being 𝐱\bf x. Note that whether the device activities are dependent or independent depends on the values or forms of 𝐩\mathbf{p}’s components. According to the nonnegativity and normalization axioms, we have

p𝐱0,𝐱𝒳,\displaystyle p_{\bf x}\geq 0,\ {\bf x}\in\mathcal{X}, (1a)
𝐱𝒳p𝐱=1.\displaystyle\sum\limits_{{\bf x}\in\mathcal{X}}p_{\bf x}=1. (1b)

In this paper, we consider p𝟎<1p_{\mathbf{0}}<1.111p𝟎=1p_{\mathbf{0}}=1 is not an interesting or practical scenario. We do not pose any additional assumption on 𝐩\mathbf{p}. Thus, all analytical and optimization results, relying on 𝐩\mathbf{p} or its related quantities, are general.

In a slot, each active device tries to access the BS. Congestion may occur when many active devices require to access the BS at the same time. We adopt an access barring scheme for access control [22]. In particular, at the beginning of each slot, all active devices independently attempt to access the BS with probability ϵ\epsilon, where

ϵ0,\displaystyle\epsilon\geq 0, (2a)
ϵ1.\displaystyle\epsilon\leq 1. (2b)

Here, ϵ\epsilon is referred to as the access barring factor for all devices and will be optimized later.222The average throughput can be improved when devices have different access barring factors. To avoid resource starvation for each device and maintain fairness, we consider identical access barring factors for all devices in the optimizations for simplicity. The proposed solution framework can be extended to handle different access barring factors. That is, the access barring scheme is parameterized by the access barring factor ϵ\epsilon.

We adopt the random access procedure, which consists of four stages, i.e., preamble transmission, random access response, scheduled transmission, and contention resolution [3]. We focus only on the first stage, which mainly determines the success of access [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 27]. Consider NN orthogonal preambles, the set of which is denoted by 𝒩{1,2,,N}\mathcal{N}\triangleq\{1,2,...,N\}. Specifically, at the first stage, each device that attempts to access the BS independently selects a preamble out of the NN preambles to transmit. The probability that device kk selects preamble nn is denoted by ak,na_{k,n}, which satisfies

ak,n0,k𝒦,n𝒩,\displaystyle\ a_{k,n}\geq 0,\ k\in\mathcal{K},n\in\mathcal{N}, (3a)
n𝒩ak,n=1,k𝒦.\displaystyle\sum\limits_{n\in\mathcal{N}}a_{k,n}=1,\ k\in\mathcal{K}. (3b)

Let 𝐚k(ak,n)n𝒩{\bf a}_{k}\triangleq(a_{k,n})_{n\in\mathcal{N}} denote the preamble selection distribution of device kk. Let 𝐀(ak,n)k𝒦,n𝒩{\bf A}\triangleq({a}_{k,n})_{k\in\mathcal{K},n\in\mathcal{N}} denote the distributions of the KK devices. The kk-th column of 𝐀\bf A is 𝐚k{\bf a}_{k}. Note that for all k𝒦k\in\mathcal{K}, the random preamble transmission parameterized by 𝐚k{\bf a}_{k} reduces to the preamble transmission in the standard random access procedure [7] when ak,n=1Na_{k,n}=\frac{1}{N}, n𝒩n\in\mathcal{N}. Furthermore, note that for all k𝒦k\in\mathcal{K}, the considered preamble selection is generally random and becomes deterministic when ak,n{0,1},n𝒩a_{k,n}\in\{0,1\},n\in\mathcal{N}. We allow 𝐚k{\bf a}_{k}, k𝒦k\in\mathcal{K} to be arbitrary distributions to maximally avoid the collision.

If a preamble is selected by a single device, this device successfully accesses the BS [11]. Then, the average number of devices that successfully access the BS at activity states 𝐱\bf x in a slot is given by [27]

T(𝐀,ϵ,𝐱)n𝒩k𝒦xkak,nϵ𝒦:k(1xa,nϵ).\displaystyle T({\bf A},{\epsilon},{\bf x})\triangleq\sum\limits_{n\in\mathcal{N}}\sum\limits_{k\in\mathcal{K}}x_{k}a_{k,n}\epsilon\prod\limits_{\ell\in{\mathcal{K}:\ell\neq k}}(1-x_{\ell}a_{\ell,n}\epsilon). (4)

In this paper, we consider the following three cases of joint device activity distribution and introduce the respective performance metrics.
Perfect joint device activity distribution: In this case, we assume that the joint device activity distribution has been estimated by some learning methods, and the estimation error is negligible. That is, the exact value of 𝐩\bf p is known. We adopt the average throughput [27]

T¯(𝐀,ϵ,𝐩)\displaystyle\bar{T}({\bf A},{\epsilon},{\bf p}) 𝐱𝒳p𝐱T(𝐀,ϵ,𝐱)\displaystyle\triangleq\sum\limits_{{\bf x}\in\mathcal{X}}p_{\bf x}T({\bf A},{\epsilon},{\bf x}) (5)
=m=1Km(1)m1ϵmn𝒩𝒦𝒦:|𝒦|=m(𝐱𝒳p𝐱k𝒦xk)k𝒦ak,n,\displaystyle=\sum\limits_{m=1}^{K}m(-1)^{m-1}\epsilon^{m}\sum\limits_{n\in\mathcal{N}}\sum\limits_{\mathcal{K}^{\prime}\subseteq{\mathcal{K}}:|\mathcal{K}^{\prime}|=m}\left(\sum\limits_{{\bf x\in\mathcal{X}}}p_{\bf x}\prod\limits_{k\in\mathcal{K}^{\prime}}{x}_{k}\right)\prod\limits_{k\in\mathcal{K}^{\prime}}a_{k,n}, (6)

as the performance metric, where T(𝐀,ϵ,𝐱)T({\bf A},{\epsilon},{\bf x}) is given by (4). The proof for (6) can be found in Appendix A. Note that for all 𝒦𝒦\mathcal{K}^{\prime}\subseteq\mathcal{K}, 𝐱𝒳\sum\nolimits_{{\bf x\in\mathcal{X}}} p𝐱k𝒦xkp_{\bf x}\prod\nolimits_{k\in\mathcal{K}^{\prime}}{x}_{k} represents the probability that all devices in 𝒦\mathcal{K}^{\prime} are active and can be computed in advance.
Imperfect joint device activity distribution: In this case, we assume that the joint device activity distribution has been estimated by some learning methods with certain estimation errors. For all 𝐱𝒳{\bf x}\in\mathcal{X}, let p^𝐱\hat{p}_{\bf x} denote the estimated probability of the device activity states being 𝐱\bf x, and let Δ𝐱p𝐱p^𝐱\Delta_{{\bf x}}\triangleq p_{\bf x}-\hat{p}_{\bf x} denote the corresponding estimation error. Note that Δ𝐱,𝐱𝒳\Delta_{{\bf x}},{\bf x}\in\mathcal{X} satisfy 𝐱𝒳Δ𝐱=0\sum\nolimits_{{\bf x}\in\mathcal{X}}\Delta_{{\bf x}}=0, and |Δ𝐱|δ𝐱,𝐱𝒳|\Delta_{{\bf x}}|\leq\delta_{\bf x},{\bf x}\in\mathcal{X} for some known estimation error bounds δ𝐱[0,1)\delta_{{\bf x}}\in[0,1), 𝐱𝒳{\bf x}\in\mathcal{X}. Assume that p^𝐱\hat{p}_{\bf x}, δ𝐱,𝐱𝒳\delta_{{\bf x}},{\bf x}\in\mathcal{X} are known to the BS, but neither 𝐩\bf p nor Δ𝐱,𝐱𝒳\Delta_{\bf x},{\bf x}\in\mathcal{X} is known to the BS. Thus, the BS knows that the exact joint activity distribution satisfies 𝐩𝒫\bf p\in\mathcal{P}, where

𝒫{(y𝐱)𝐱𝒳|p¯𝐱y𝐱p¯𝐱,𝐱𝒳,𝐱𝒳y𝐱=1},\displaystyle\mathcal{P}\triangleq\left\{(y_{\bf x})_{{\bf x}\in\mathcal{X}}\bigg{|}\ \underline{p}_{\bf x}\leq y_{\bf x}\leq\overline{p}_{\bf x},\ {\bf x}\in\mathcal{X},\ \sum\limits_{{\bf x}\in\mathcal{X}}y_{\bf x}=1\right\},

with p¯𝐱max{p^𝐱δ𝐱,0}\underline{p}_{\bf x}\triangleq\max\{\hat{p}_{\bf x}-\delta_{\bf x},0\} and p¯𝐱min{p^𝐱+δ𝐱, 1}\overline{p}_{\bf x}\triangleq\min\{\hat{p}_{\bf x}+\delta_{\bf x},\ 1\}, for all 𝐱𝒳{\bf x}\in\mathcal{X}. Note that 𝒫\mathcal{P} reflects 𝐩\mathbf{p} to a certain extent. It can be easily verified that 𝒫\mathcal{P} shrinks to {𝐩}\{\mathbf{p}\} as the estimation error bounds δ𝐱\delta_{\mathbf{x}}, 𝐱𝒳\mathbf{x}\in\mathcal{X} decrease to 0. We adopt the worst-case average throughput

T¯wt(𝐀,ϵ,𝒫)min𝐩𝒫T¯(𝐀,ϵ,𝐩)\displaystyle\bar{T}_{\text{wt}}({\bf A},{\epsilon},\mathcal{P})\triangleq\min_{{\bf p}\in\mathcal{P}}\bar{T}({\bf A},{\epsilon},{\bf p}) (7)

as the performance metric, where T¯(𝐀,ϵ,𝐩)\bar{T}({\bf A},{\epsilon},{\bf p}) is given by (6). Note that T¯wt(𝐀,ϵ,𝒫)T¯(𝐀,ϵ,𝐩)\bar{T}_{\text{wt}}({\bf A},{\epsilon},\mathcal{P})\leq\bar{T}({\bf A},{\epsilon},{\bf p}) with the equality holds if and only if δ𝐱=0\delta_{\mathbf{x}}=0, 𝐱𝒳\mathbf{x}\in\mathcal{X}.
Unknown joint device activity distribution: In this case, we assume no direct information on the joint device activity distribution, but II samples of the device activity states, generated according to 𝐩\mathbf{p}, are available. The II samples reflect 𝐩\mathbf{p} to a certain extent via the empirical joint probability distribution, an unbiased estimator of 𝐩\mathbf{p}. It can be easily verified that the mean squared error decreases with II. Denote {1,2,,I}\mathcal{I}\triangleq\{1,2,...,I\}. For all ii\in\mathcal{I}, the ii-th sample is denoted by 𝐱i{\bf x}_{i}. We adopt the sample average throughput

T¯st(𝐀,ϵ)1IiT(𝐀,ϵ,𝐱i)\displaystyle\bar{T}_{\text{st}}({\bf A},{\epsilon})\triangleq\frac{1}{I}\sum\limits_{i\in\mathcal{I}}T\left({\bf A},{\epsilon},{\bf x}_{i}\right) (8)

as the performance metric, where T(𝐀,ϵ,𝐱)T({\bf A},{\epsilon},{\bf x}) is given by (4). Note that T¯st(𝐀,ϵ)\bar{T}_{\text{st}}({\bf A},{\epsilon}) is a random variable with randomness induced by the II samples and 𝔼[T¯st(𝐀,ϵ)]=T¯(𝐀,ϵ,𝐩)\mathbb{E}[\bar{T}_{\text{st}}({\bf A},{\epsilon})]=\bar{T}({\bf A},{\epsilon},\mathbf{p}).

As illustrated in the introduction, overlooking the joint device activity distribution 𝐩\mathbf{p} (or related quantities, such as 𝒫\mathcal{P} and 𝐱i\mathbf{x}_{i}, ii\in\mathcal{I}) may yield ineffective preamble selection and access barring designs. This can be further seen from the following example.

Example 1 (Motivation Example)

We consider K=3K=3 devices and N=2N=2 preambles. The marginal probability of each device being active is p(0,12]p\in(0,\frac{1}{2}]; the activities of device 1 and device 2 can be correlated (dependent),333Note that correlation (correlated) implies dependence (dependent). and their joint distribution px1,x2p_{x_{1},x_{2}}, x1,x2{0,1}x_{1},x_{2}\in\{0,1\} is given by p0,0=1+(η2)p+(1η)p2p_{0,0}=1+(\eta-2)p+(1-\eta)p^{2}, p0,1=p1,0=(1η)(pp2)p_{0,1}=p_{1,0}=(1-\eta)(p-p^{2}), and p1,1=ηp+(1η)p2p_{1,1}=\eta p+(1-\eta)p^{2} [26], where η[pp1,1]\eta\in[\frac{p}{p-1},1] represents the correlation coefficient;444The range of η\eta is to guarantee that px1,x2p_{x_{1},x_{2}}, x1,x2{0,1}x_{1},x_{2}\in\{0,1\} satisfy (1a)-(1b). and the activity of device 3 is independent of the activities of device 1 and device 2. Thus, 𝐩\mathbf{p} is given by px1,x2,x3=px1,x2px3,x1,x2,x3{0,1}p_{x_{1},x_{2},x_{3}}=p_{x_{1},x_{2}}p_{x_{3}},\ x_{1},x_{2},x_{3}\in\{0,1\}. Note that η0\eta\neq 0 corresponds to dependent device activities, and η=0\eta=0 corresponds to independent device activities. We consider four feasible random access schemes parameterized by (𝐀i,ϵi)(\mathbf{A}_{i},\epsilon_{i}), i=1,2,3,4i=1,2,3,4. Specifically, 𝐀1=[𝐞1,𝐞1,𝐞2]\mathbf{A}_{1}=[\mathbf{e}_{1},\mathbf{e}_{1},\mathbf{e}_{2}], 𝐀2=[𝐞1,𝐞2,𝐞2]\mathbf{A}_{2}=[\mathbf{e}_{1},\mathbf{e}_{2},\mathbf{e}_{2}],

𝐀3={𝐀1,η0𝐀2,η>0,\displaystyle\mathbf{A}_{3}=\left\{\begin{array}[]{ll}\mathbf{A}_{1}&,\ \ \eta\leq 0\\ \mathbf{A}_{2}&,\ \ \eta>0\end{array}\right., (11)

and 𝐀4=12[𝟏,𝟏,𝟏]\mathbf{A}_{4}=\frac{1}{2}[\mathbf{1},\mathbf{1},\mathbf{1}] (uniform distributions), where 𝐞nN{\bf e}_{n}\in\mathbb{R}^{N} represents the column vector of all zeros except the nn-th entry being 11, and 𝟏N\mathbf{1}\in\mathbb{R}^{N} represents all-one column vector; and

ϵi=argmaxϵT¯(𝐀i,ϵ,𝐩)={min(1,341η+(1η)p),i=11,i=2,3,4.\displaystyle\epsilon_{i}=\mathrm{arg}\max_{\epsilon}\bar{T}(\mathbf{A}_{i},\epsilon,\mathbf{p})=\left\{\begin{array}[]{ll}\min\left(1,\frac{3}{4}\frac{1}{\eta+(1-\eta)p}\right)&,\ i=1\\ 1&,\ i=2,3,4\end{array}\right.\!\!. (14)

Note that 𝐀1\mathbf{A}_{1}, 𝐀2\mathbf{A}_{2}, and 𝐀4\mathbf{A}_{4} do not rely on 𝐩\mathbf{p}, whereas 𝐀3\mathbf{A}_{3} depends on the parameter of 𝐩\mathbf{p}, i.e., η\eta.

Refer to caption
Figure 1: Average throughput versus correlation coefficient η\eta for Example 1 at p=0.5p=0.5.

Fig. 1 plots the average throughput versus the correlation efficient η\eta. From Fig. 1, we can see that Scheme 3 outperforms Scheme 1 and Scheme 4 at η>0\eta>0 and outperforms Scheme 2 and Scheme 4 at η0\eta\leq 0. This is because when device 1 and device 2 are more (or less) likely to activate simultaneously, letting them always select different preambles (or the identical preamble) can avoid more collisions. Besides, Scheme 3 outperforms Scheme 4 at η=0\eta=0. This example indicates that adapting (𝐀,ϵ)(\mathbf{A},\epsilon) to 𝐩\mathbf{p} can improve the performance of the random access scheme.

Motivated by Example 1, in Section III, Section IV, and Section V, we shall optimize the preamble selection distributions 𝐀\bf A and access barring factor ϵ\epsilon to maximize the average, worst-case average, and sample average throughputs for given 𝐩\mathbf{p}, 𝒫\mathcal{P}, and 𝐱i\mathbf{x}_{i}, ii\in\mathcal{I} in the cases of perfect, imperfect, and unknown joint device activity distributions, respectively, as shown in Fig. 2. In what follows, we assume that the optimization problems are solved at the BS given knowledge of 𝐩\mathbf{p}, 𝒫\mathcal{P}, and 𝐱i\mathbf{x}_{i}, ii\in\mathcal{I}, and the solutions are then sent to all devices for implementation.555In practice, 𝐩\mathbf{p} may change slowly over time. For practical implementation, we can re-optimize the preamble selection and access barring using the proposed methods when the change of 𝐩\mathbf{p} is sufficiently large, and the obtained stationary point for an outdated 𝐩\mathbf{p} generally serves as a good initial point for the latest 𝐩\mathbf{p}.

Refer to caption
Figure 2: Solution framework.

III Performance Optimization for Perfect Joint Device Activity Distribution

In this section, we consider the average throughput maximization in the case of perfect joint device activity distribution. First, we formulate the average throughput maximization problem, which is a challenging nonconvex problem. Then, we develop an iterative algorithm to obtain a stationary point of the original problem. Finally, we develop a low-complexity iterative algorithm to obtain a stationary point of an approximate problem.

A Problem Formulation

In the case of perfect joint device activity distribution, we optimize the preamble selection distributions 𝐀\bf A and access barring factor ϵ\epsilon to maximize the average throughput T¯(𝐀,ϵ,𝐩)\bar{T}({\bf A},{\epsilon},\bf p) in (6) subject to the constraints on (𝐀,ϵ)({\bf A},\epsilon) in (2a), (2b), (3a), and (3b).

Problem 1 (Average Throughput Maximization)
max𝐀,ϵT¯(𝐀,ϵ,𝐩)\displaystyle\max_{\bf{A},\epsilon}\quad\bar{T}(\mathbf{A},\epsilon,\bf p)
s.t.(2a),(2b),(3a),(3b).\displaystyle\ \mathrm{s.t.}\quad\text{(\ref{C1})},\ \text{(\ref{C12})},\ \text{(\ref{C2})},\ \text{(\ref{C3})}.

The objective function T¯(𝐀,ϵ,𝐩)\bar{T}(\mathbf{A},\epsilon,\bf p) is nonconcave in (𝐀,ϵ)({\bf A},\epsilon), and the constraints in (2a), (2b), (3a), and (3b) are linear, corresponding to a convex feasible set. Thus, Problem 1 is nonconvex with a convex feasible set. In general, a globally optimal point of a nonconvex problem cannot be obtained effectively and efficiently. Obtaining a stationary point is the classic goal for dealing with a nonconvex problem with a convex feasible set.

B Stationary Point

We propose an iterative algorithm based on the BCD method [29], to obtain a stationary point of Problem 1.666One can obtain a stationary point of Problem 1 using SCA [30]. As the approximate convex optimization problem in each iteration has no analytical solution, SCA is not as efficient as the proposed one. Specifically, we divide the variables (𝐀,ϵ)\left({\bf A},\epsilon\right) into K+1K+1 blocks, i.e., 𝐚k,k𝒦{\bf a}_{k},\ k\in\mathcal{K} and ϵ\epsilon. In each iteration of the proposed algorithm, all K+1K+1 blocks are sequentially updated. At each step of one iteration, we maximize T¯(𝐀,ϵ,𝐩)\bar{T}(\mathbf{A},\epsilon,\bf p) with respect to one of the K+1K+1 blocks. For ease of illustration, in the following, we also write T¯(𝐀,ϵ,𝐩)\bar{T}(\mathbf{A},\epsilon,\bf p) as T¯(𝐚k,𝐚k,ϵ,𝐩)\bar{T}({\bf a}_{k},{\bf a}_{-k},\epsilon,\bf p), where 𝐚k(𝐚)𝒦,k{\bf a}_{-k}\triangleq({\bf a}_{\ell})_{\ell\in{\mathcal{K}},\ell\neq k}. Given 𝐚k{\bf a}_{-k} and ϵ\epsilon obtained in the previous step, the block coordinate optimization with respect to 𝐚k{\bf a}_{k} is given by

max𝐚kT¯(𝐚k,𝐚k,ϵ,𝐩),k𝒦\displaystyle\max_{{\bf a}_{k}}\quad\bar{T}\left({\bf a}_{k},{\bf a}_{-k},\epsilon,\bf p\right),\quad k\in\mathcal{K} (15)
s.t.(3a),(3b).\displaystyle\ \mathrm{s.t.}\quad\text{(\ref{C2})},\ \text{(\ref{C3})}.

Given 𝐀\bf A obtained in the previous step, the block coordinate optimization with respect to ϵ\epsilon is given by

maxϵT¯(𝐀,ϵ,𝐩)\displaystyle\max_{\epsilon}\quad\bar{T}\left({\bf A},\epsilon,\bf p\right) (16)
s.t.(2a),(2b).\displaystyle\ \mathrm{s.t.}\quad\text{(\ref{C1})},\ \text{(\ref{C12})}.

Each problem in (15) is a linear program (LP) with NN variables and N+1N+1 constraints. The problem in (16) is a polynomial programming with a single variable and two constraints.

Next, we obtain optimal points of the problems in (15) and (16). Define777Note that Qk,n(𝐚k,ϵ,𝐩)=T¯(𝐚k,𝐚k,ϵ,𝐩)ak,nQ_{k,n}({\bf a}_{-k},\epsilon,{\bf p})=\frac{\partial\bar{T}({\bf a}_{k},{\bf a}_{-k},\epsilon,{\bf p})}{\partial a_{k,n}}, k𝒦k\in\mathcal{K}, n𝒩n\in\mathcal{N} and q(𝐀,ϵ,𝐩)=T¯(𝐀,ϵ,𝐩)ϵq({\bf A},\epsilon,{\bf p})=\frac{\partial\bar{T}({\bf A},\epsilon,{\bf p})}{\partial\epsilon}.

Qk,n(𝐚k,ϵ,𝐩)m=1Km(1)m1ϵm𝒦𝒦:k𝒦,|𝒦|=m(𝐱𝒳p𝐱𝒦x)𝒦:ka,n,\displaystyle Q_{k,n}({\bf a}_{-k},\epsilon,{\bf p})\triangleq\sum\limits_{m=1}^{K}m(-1)^{m-1}\epsilon^{m}\sum\limits_{\mathcal{K}^{\prime}\subseteq{\mathcal{K}}:k\in\mathcal{K}^{\prime},|\mathcal{K}^{\prime}|=m}\left(\sum\limits_{{\bf x\in\mathcal{X}}}p_{\bf x}\prod\limits_{\ell\in\mathcal{K}^{\prime}}{x}_{\ell}\right)\prod\limits_{\ell\in\mathcal{K}^{\prime}:\ell\neq k}a_{\ell,n}, (17)
q(𝐀,ϵ,𝐩)m=1Km2(1)m1ϵm1n𝒩𝒦𝒦:|𝒦|=m(𝐱𝒳p𝐱k𝒦xk)k𝒦ak,n.\displaystyle q({\bf A},\epsilon,{\bf p})\triangleq\sum\limits_{m=1}^{K}m^{2}(-1)^{m-1}\epsilon^{m-1}\sum\limits_{n\in\mathcal{N}}\sum\limits_{\mathcal{K}^{\prime}\subseteq{\mathcal{K}}:|\mathcal{K}^{\prime}|=m}\left(\sum\limits_{{\bf x\in\mathcal{X}}}p_{\bf x}\prod\limits_{k\in\mathcal{K}^{\prime}}{x}_{k}\right)\prod\limits_{k\in\mathcal{K}^{\prime}}a_{k,n}. (18)

Denote (𝐀,𝐩){z[0,1]:q(𝐀,z,𝐩)=0}\mathcal{B}({\bf A},{\bf p})\triangleq\{z\in[0,1]:q({\bf A},z,{\bf p})=0\} as the set of roots of equation q(𝐀,z,𝐩)=0q({\bf A},z,{\bf p})=0 that lie in the interval [0,1][0,1]. Based on the structural properties of the block coordinate optimization problems in (15) and (16), we can obtain their optimal points.

Theorem 1 (Optimal Points of Problems in (15) and (16))

A set of optimal points of each block coordinate optimization in (15) is given by

{𝐞m:margmaxn𝒩Qk,n(𝐚k,ϵ,𝐩)},k𝒦,\displaystyle\left\{{\bf e}_{m}:m\in\mathop{\arg\max}_{n\in\mathcal{N}}\ Q_{k,n}({\bf a}_{-k},\epsilon,{\bf p})\right\},\ k\in\mathcal{K}, (19)

and a set of optimal points of the block coordinate optimization in (16) is given by

argmaxϵ(𝐚,𝐩){1}T¯(𝐀,ϵ,𝐩).\displaystyle\mathop{\arg\max}_{\epsilon\in{{\mathcal{B}}({\bf a},{\bf p})}\cup\{1\}}\bar{T}({\bf A},\epsilon,{\bf p}). (20)
Proof:

Please refer to Appendix B. ∎

The optimal point in (19) indicates that each device kk selects the preamble corresponding to the maximum average throughput (increase rate of the average throughput) conditioned on selected by device kk for given (𝐚k,ϵ)(\mathbf{a}_{-k},\epsilon). The overall computational complexity for determining the sets in (19) is 𝒪(NK22K)\mathcal{O}(NK^{2}2^{K}), and the overall computational complexity for determining the set in (20) is 𝒪(NK22K)\mathcal{O}(NK^{2}2^{K}). The detailed complexity analysis can be found in Appendix C. As 𝐀\bf A is usually sparse during the iterations, the actual computational complexities for obtaining (19) and (20) are much lower.

Finally, the details of the proposed iterative algorithm are summarized in Algorithm 1. Specifically, in Steps 474-7, 𝐚k\mathbf{a}_{k}, k𝒦k\in\mathcal{K} are updated one by one; in Steps 9119-11, ϵ\epsilon is updated. Step 55 and Step 99 are to ensure the convergence of Algorithm 1 to a stationary point. Based on the proof for [29, Proposition 2.7.1], we can show the following result.

Theorem 2 (Convergence of Algorithm 1)

Algorithm 1 returns a stationary point of Problem 1 in a finite number of iterations.

Proof:

Please refer to Appendix D. ∎

In practice, we can run Algorithm 1 multiple times (with possibly different random initial points) to obtain multiple stationary points and choose the stationary point with the largest objective value as a suboptimal point of Problem 1.888Generally speaking, finding a good stationary point involves numerical experiments and is more art than technology [32]. However, we find that ((𝐞nk)k𝒦,1)((\mathbf{e}_{n_{k}})_{k\in\mathcal{K}},1) is generally a good initial point that yields a faster convergence speed with numerical experiments. The intuition is that when (𝐀,ϵ)=((𝐞nk)k𝒦,1)(\mathbf{A},\epsilon)=((\mathbf{e}_{n_{k}})_{k\in\mathcal{K}},1), the contending devices for each preamble are statistically balanced irrespective of 𝐩\mathbf{p}. The average throughput of the best obtained stationary point and the computational complexity increase with the number of times that Algorithm 1 is run. We can choose a suitable number to balance the increases of the average throughput and computational complexity.

Based on Algorithm 1 and Theorem 2, we can characterize an optimality property of a globally optimal point of Problem 1.

Theorem 3 (Optimality Property)

There exists at least one globally optimal point (𝐀,ϵ)({\bf A}^{*},\epsilon^{*}) of Problem 1, which satisfies 𝐚k=𝐞nk,k𝒦{\bf a}_{k}^{*}={\bf e}_{n_{k}},k\in\mathcal{K}, for some nk𝒩n_{k}\in\mathcal{N}, k𝒦k\in\mathcal{K}.

Proof:

Please refer to Appendix E. ∎

Algorithm 1 Obtaining A Stationary Point of Problem 1
1:  initialization: for k𝒦k\in\mathcal{K}, set 𝐚k:=𝐞nk{\bf a}_{k}:={\bf e}_{n_{k}}, where nkn_{k} is uniformly chosen from 𝒩\mathcal{N} at random, and set ϵ:=1\epsilon:=1.
2:  repeat
3:  𝐀last:=𝐀{\bf A}_{\text{last}}:=\bf A.
4:  for k𝒦k\in\mathcal{K} do
5:   if 𝐚k{\bf a}_{k} does not belong to the set in (19)
6:    𝐚k{\bf a}_{k} is randomly chosen from the set in (19).
7:   end if
8:  end for
9:  if ϵ\epsilon does not belong to the set in (20)
10:   ϵ\epsilon is randomly chosen from the set in (20).
11:  end if
12:  until 𝐀last=𝐀{\bf A}_{\text{last}}={\bf A}.

Theorem 3 indicates that there exists a deterministic preamble selection rule that can achieve the maximum average throughput. It is worth noting that the proposed stationary point satisfies the optimality property in Theorem 3. In Section III-C, we shall see that the low-complexity solution also satisfies this optimality property.

C Low-complexity Solution

From the complexity analysis for obtaining a stationary point of Problem 1, we know that Algorithm 1 is computationally expensive when KK or NN is large. In this part, we develop a low-complexity algorithm, which is applicable for large KK or NN, to obtain a stationary point of an approximate problem of Problem 1. Later, in Section VI, we shall show that the performance of the low-complexity algorithm is comparable with Algorithm 1.

Motivated by the approximations of T¯(𝐀,ϵ,𝐩)\bar{T}(\mathbf{A},\epsilon,{\bf p}) in [27], we approximate the complicated function T¯(𝐀,ϵ,𝐩)\bar{T}(\mathbf{A},\epsilon,{\bf p}), which has N2KN2^{K} terms, with a simpler function

T~(𝐀,ϵ,𝐩)\displaystyle\tilde{T}(\mathbf{A},\epsilon,{\bf p}) ϵk𝒦𝐱𝒳p𝐱xkϵ2n𝒩k𝒦ak,n𝒦:>ka,n𝐱𝒳p𝐱xkx,\displaystyle\triangleq\epsilon\sum\limits_{k\in\mathcal{K}}\sum\limits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k}-\epsilon^{2}\sum\limits_{n\in\mathcal{N}}\sum\limits_{k\in\mathcal{K}}a_{k,n}\sum\limits_{\ell\in{\mathcal{K}}:\ell>k}a_{\ell,n}\sum\limits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k}x_{\ell}, (21)

which has 1+K(K1)21+\frac{K(K-1)}{2} terms. The detailed reason for the approximation can be found in [33]. Note that 𝐱𝒳p𝐱xk\sum\nolimits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k} and 𝐱𝒳p𝐱xkx\sum\nolimits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k}x_{\ell} (k<k<\ell) represent the probability of device kk being active and the probability of devices kk and \ell being active, respectively. By comparing (21) with (6), we can see that T~(𝐀,ϵ,𝐩)\tilde{T}(\mathbf{A},\epsilon,{\bf p}) captures the active probabilities of every single device and every two devices. Accordingly, we consider the following approximate problem of Problem 1.

Problem 2 (Approximate Average Throughput Maximization)
max𝐀,ϵT~(𝐀,ϵ,𝐩)\displaystyle{\max_{\bf{A},\epsilon}}\quad\tilde{T}\left(\mathbf{A},\epsilon,{\bf p}\right)
s.t.(2a),(2b),(3a),(3b).\displaystyle\ \mathrm{s.t.}\quad\text{(\ref{C1})},\ \text{(\ref{C12})},\ \text{(\ref{C2})},\ \text{(\ref{C3})}.

Analogously, using the BCD method, we propose a computationally efficient iterative algorithm, with more performance guarantee than the heuristic method in [27], to obtain a stationary point of Problem 2. Specifically, variables (𝐀,ϵ)\left({\bf A},\epsilon\right) are divided into K+1K+1 blocks, i.e., 𝐚k,k𝒦{\bf a}_{k},\ k\in\mathcal{K} and ϵ\epsilon. For ease of illustration, we also write T~(𝐀,ϵ,𝐩)\tilde{T}(\mathbf{A},\epsilon,\bf p) as T~(𝐚k,𝐚k,ϵ,𝐩)\tilde{T}({\bf a}_{k},{\bf a}_{-k},\epsilon,\bf p) in the sequel. Given 𝐚k{\bf a}_{-k} and ϵ\epsilon obtained in the previous step, the block coordinate optimization with respect to 𝐚k{\bf a}_{k} is given by

max𝐚kT~(𝐚k,𝐚k,ϵ,𝐩),k𝒦\displaystyle\max_{{\bf a}_{k}}\quad\tilde{T}\left({\bf a}_{k},{\bf a}_{-k},\epsilon,{\bf p}\right),\quad k\in\mathcal{K} (22)
s.t.(3a),(3b).\displaystyle\ \mathrm{s.t.}\quad\text{(\ref{C2})},\ \text{(\ref{C3})}.

Given 𝐀\bf A obtained in the previous step, the block coordinate optimization with respect to ϵ\epsilon is given by

maxϵT~(𝐀,ϵ,𝐩)\displaystyle\max_{\epsilon}\quad\tilde{T}\left({\bf A},\epsilon,{\bf p}\right) (23)
s.t.(2a),(2b).\displaystyle\ \mathrm{s.t.}\quad\ \text{(\ref{C1})},\ \text{(\ref{C12})}.

Each problem in (22) is an LP with NN variables and N+1N+1 constraints, and the problem in (23) is a quadratic program (QP) with a single variable and two constraints. It is clear that the convex problems in (22) and (23) are much simpler than those in (15) and (16), respectively. Based on the structural properties of the block coordinate optimization problems in (22) and (23), we can obtain their optimal points.

Theorem 4 (Optimal Points of Problems in (22) and (23))

A set of optimal points of each block coordinate optimization in (22) is given by

{𝐞m:margminn𝒩𝒦:ka,n𝐱𝒳p𝐱xkx},k𝒦,\displaystyle\left\{{\bf e}_{m}:m\in\mathop{\arg\min}_{n\in\mathcal{N}}\quad\sum\limits_{\ell\in{\mathcal{K}}:\ell\neq k}a_{\ell,n}\sum\limits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k}x_{\ell}\right\},\ k\in\mathcal{K}, (24)

and the optimal point of the block coordinate optimization in (23) is given by

min(1,k𝒦𝐱𝒳p𝐱xk2n𝒩k𝒦ak,n𝒦:>ka,n𝐱𝒳p𝐱xkx).\displaystyle\min\left(1,\frac{\sum\limits_{k\in\mathcal{K}}\sum\limits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k}}{2\sum\limits_{n\in\mathcal{N}}\sum\limits_{k\in\mathcal{K}}a_{k,n}\sum\limits_{\ell\in{\mathcal{K}}:\ell>k}a_{\ell,n}\sum\limits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k}x_{\ell}}\right). (25)
Proof:

Theorem 4 can be proved in a similar way to Theorem 1. ∎

The optimal point in (24) indicates that each device kk selects the preamble with the minimum number of contending devices conditioned on selected by device kk for given (𝐚k,ϵ)(\mathbf{a}_{-k},\epsilon). In the following, we analyze the computational complexity for solving the problems in (22) and (23) according to Theorem 4. As constants k𝒦𝐱𝒳p𝐱xk\sum\nolimits_{k\in\mathcal{K}}\sum\nolimits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k} and 𝐱𝒳p𝐱xxk\sum\nolimits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{\ell}x_{k}, k,𝒦,k<k,\ell\in\mathcal{K},k<\ell are computed in advance, the corresponding computational complexities are not considered below. For all k𝒦k\in\mathcal{K} and n𝒩n\in\mathcal{N}, the computational complexity for calculating 𝒦:ka,n𝐱𝒳p𝐱xkx\sum\nolimits_{\ell\in{\mathcal{K}}:\ell\neq k}a_{\ell,n}\sum\nolimits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k}x_{\ell} is 𝒪(K)\mathcal{O}(K). Furthermore, for all k𝒦k\in\mathcal{K}, the computational complexity of finding the largest one among 𝒦:ka,n𝐱𝒳p𝐱xkx\sum\nolimits_{\ell\in{\mathcal{K}}:\ell\neq k}a_{\ell,n}\sum\nolimits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k}x_{\ell}, n𝒩n\in\mathcal{N} is 𝒪(N)\mathcal{O}(N). Thus, the overall computational complexity for determining the sets in (24) is 𝒪(NK2)+𝒪(NK)=𝒪(NK2)\mathcal{O}(NK^{2})+\mathcal{O}(NK)=\mathcal{O}(NK^{2}). Analogously, the computational complexity for obtaining the closed-form optimal point in (25) is 𝒪(NK2)\mathcal{O}(NK^{2}). It is obvious that the computational complexities for obtaining the optimal points given by Theorem 4 are much lower than those for obtaining the optimal points given by Theorem 1. Furthermore, it is worth noting that the optimal points given by Theorem 4 do not rely on the active probabilities of more than two devices.

Finally, the details of the proposed iterative algorithm are summarized in Algorithm 2. Specifically, in Steps 474-7, 𝐚k\mathbf{a}_{k}, k𝒦k\in\mathcal{K} are updated one by one; in Steps 9119-11, ϵ\epsilon is updated. Step 55 and Step 99 are to ensure the convergence of Algorithm 2 to a stationary point. Similarly, we have the following results.

Theorem 5 (Convergence of Algorithm 2)

Algorithm 2 returns a stationary point of Problem 2 in a finite number of iterations.

Proof:

Theorem 5 can be proved in a similar way to Theorem 2. ∎

Algorithm 2 Obtaining A Stationary Point of Problem 2
1:  initialization: for k𝒦k\in\mathcal{K}, set 𝐚k:=𝐞nk{\bf a}_{k}:={\bf e}_{n_{k}}, where nkn_{k} is uniformly chosen from 𝒩\mathcal{N} at random, and set ϵ:=1\epsilon:=1.
2:  repeat
3:  𝐀last:=𝐀{\bf A}_{\text{last}}:=\bf A.
4:  for k𝒦k\in\mathcal{K} do
5:   if 𝐚k{\bf a}_{k} does not belong to the set in (24)
6:    𝐚k{\bf a}_{k} is randomly chosen from the set in (24).
7:   end if
8:  end for
9:  if ϵ\epsilon does not belong to the set in (25)
10:   ϵ\epsilon is randomly chosen from the set in (25).
11:  end if
12:  until 𝐀last=𝐀{\bf A}_{\text{last}}={\bf A}.

Analogously, we can run Algorithm 2 multiple times (with possibly different random initial points) to obtain multiple stationary points of Problem 2 and choose the stationary point with the largest average throughput as a low-complexity suboptimal point of Problem 1.

IV Robust Optimization for Imperfect Joint Device Activity Distribution

In this section, we consider the worst-case average throughput maximization in the case of imperfect joint device activity distribution. First, we formulate the worst-case average throughput maximization problem, which is a challenging max-min problem. Then, we develop an iterative algorithm to obtain a KKT point of an equivalent problem. Finally, we develop a low-complexity iterative algorithm to obtain a stationary point of an approximate problem.

A Problem Formulation

In the case of imperfect joint device activity distribution, we optimize the preamble selection distributions 𝐀\bf A and access barring factor ϵ\epsilon to maximize the worst average throughput T¯wt(𝐀,ϵ,𝐩)\bar{T}_{\text{wt}}({\bf A},{\epsilon},\bf p) in (7) subject to the constraints on (𝐀,ϵ)({\bf A},\epsilon) in (2a), (2b), (3a), and (3b).

Problem 3 (Worst-case Average Throughput Maximization)
max𝐀,ϵ\displaystyle\max\limits_{{\bf A},\epsilon} T¯wt(𝐀,ϵ,𝒫)\displaystyle\quad\bar{T}_{\text{wt}}({\bf A},{\epsilon},\mathcal{P})
s.t.\displaystyle\mathrm{s.t.} (2a),(2b),(3a),(3b).\displaystyle\quad\text{(\ref{C1})},\ \text{(\ref{C12})},\ \text{(\ref{C2})},\ \text{(\ref{C3})}.

Note that we explicitly consider the estimation error of the joint device activity distribution in the optimization. The objective function T¯wt(𝐀,ϵ,𝒫)\bar{T}_{\text{wt}}({\bf A},{\epsilon},\mathcal{P}) is nonconcave in (𝐀,ϵ)({\bf A},\epsilon), and the constraints in (2a), (2b), (3a), and (3b) are linear. Thus, Problem 3 is nonconvex. Moreover, note that the objective function T¯wt(𝐀,ϵ,𝒫)min𝐩𝒫T¯(𝐀,ϵ,𝐩)\bar{T}_{\text{wt}}({\bf A},{\epsilon},\mathcal{P})\triangleq\min_{{\bf p}\in\mathcal{P}}\bar{T}({\bf A},{\epsilon},{\bf p}) does not have an analytical form.

B KKT Point

Problem 3 is a challenging max-min problem. In this part, we solve Problem 3 in two steps [34]. First, we transform the max-min problem in Problem 3 to an equivalent maximization problem. As the inner problem min𝐩𝒫T¯(𝐀,ϵ,𝐩)\min_{{\bf p}\in\mathcal{P}}\bar{T}({\bf A},\epsilon,{\bf p}) is an LP with respect to 𝐩\bf p and strong duality holds for the LP, the inner problem min𝐏𝒫T¯(𝐀,ϵ,𝐩)\min_{{\bf P}\in\mathcal{P}}\bar{T}({\bf A},\epsilon,{\bf p}) shares the same optimal value as its dual problem. Furthermore, to facilitate algorithm design,999In the constraints in (26c), variables 𝐁\bf B, ϵ\epsilon 𝝀{\boldsymbol{\lambda}}, and ν\nu are coupled. Thus, we cannot apply the BCD method to solve Problem 4. we can transform Problem 3 to the following equivalent problem by a change of variables 𝐁=ϵ𝐀{\bf B}=\epsilon\bf A, where 𝐁(bk,n)k𝒦,n𝒩{\bf B}\triangleq(b_{k,n})_{k\in\mathcal{K},n\in\mathcal{N}}.

Problem 4 (Equivalent Problem of Problem 3)
max𝐁,𝝀0,ϵ,ν\displaystyle\max\limits_{{\bf B},{\boldsymbol{\lambda}}\succeq 0,\epsilon,\nu} (𝐱𝒳p¯𝐱1)ν+𝐱𝒳((p¯𝐱p¯𝐱)λ𝐱+p¯𝐱T(𝐁,1,𝐱))\displaystyle\quad\left(\sum\limits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}-1\right)\nu+\sum\limits_{{\bf x}\in\mathcal{X}}\left((\underline{p}_{\bf x}-\overline{p}_{\bf x})\lambda_{\bf x}+\underline{p}_{\bf x}T({\bf B},1,{\bf x})\right)
s.t.\displaystyle\mathrm{s.t.} (2a),(2b),\displaystyle\quad\ \ \text{(\ref{C1})},\ \text{(\ref{C12})},
bk,n0,k𝒦,n𝒩,\displaystyle\quad\ b_{k,n}\geq 0,\ k\in\mathcal{K},n\in\mathcal{N}, (26a)
n𝒩bk,n=ϵ,k𝒦,\displaystyle\quad\ \sum\limits_{n\in\mathcal{N}}{{b_{k,n}}}=\epsilon,\ k\in\mathcal{K}, (26b)
ν+λ𝐱+T(𝐁,1,𝐱)0,𝐱𝒳,\displaystyle\quad\ \nu+\lambda_{\bf x}+T({\bf B},1,{\bf x})\geq 0,\ {\bf x}\in\mathcal{X}, (26c)

where 𝝀(λ𝐱)𝐱𝒳\boldsymbol{\lambda}\triangleq(\lambda_{\bf x})_{{\bf x}\in\mathcal{X}}, and T(𝐀,ϵ,𝐱)T({\bf A},\epsilon,{\bf x}) is given by (4). Let (𝐁,𝝀,ϵ,ν)({\bf B}^{\star},{\boldsymbol{\lambda}}^{\star},\epsilon^{\star},\nu^{\star}) denote an optimal point of Problem 4.

The equivalence between Problem 3 and Problem 4 is summarized below.101010As p𝟎<1p_{\bf 0}<1, we can easily show ϵ>0\epsilon^{\star}>0.

Lemma 1 (Equivalence between Problem 3 and Problem 4)

(𝐁ϵ,ϵ)(\frac{{\bf B}^{\star}}{\epsilon^{\star}},\epsilon^{\star}) is an optimal point of Problem 3.

Proof:

Please refer to Appendix F. ∎

Next, based on Lemma 1, we can solve Problem 4 instead of Problem 3. Problem 4 is nonconvex with a nonconvex feasible set, as T(𝐁,1,𝐱)T({\bf B},1,{\bf x}), 𝐱𝒳\mathbf{x}\in\mathcal{X} in the objective function and the constraint functions in (26c) are nonconcave. Note that obtaining a KKT point is the classic goal for dealing with a nonconvex problem with a nonconvex feasible set. In what follows, we propose an iterative algorithm to obtain a KKT point of Problem 4 using SCA. Specifically, at iteration ss, we update (𝐁(s),𝝀(s),ϵ(s),ν(s))({\bf B}^{(s)},{\boldsymbol{\lambda}}^{(s)},\epsilon^{(s)},\nu^{(s)}) by solving an approximate convex problem parameterized by 𝐁(s1){\bf B}^{(s-1)} obtained at iteration s1s-1. For notation convenience, define111111Note that T(𝐀,ϵ,𝐱)ak,n=xkϵgk,n(𝐀,ϵ,𝐱)\frac{\partial T({{\bf A}},\epsilon,{\bf x})}{\partial a_{k,n}}=x_{k}\epsilon g_{k,n}({\bf A},\epsilon,{\bf x}) , k𝒦,n𝒩k\in\mathcal{K},n\in\mathcal{N} and T(𝐀,ϵ,𝐱)ϵ=k𝒦xkn𝒩ak,ngk,n(𝐀,ϵ,𝐱)\frac{\partial T({{\bf A}},\epsilon,{\bf x})}{\partial\epsilon}=\sum\nolimits_{k\in\mathcal{K}}x_{k}\sum\nolimits_{n\in\mathcal{N}}a_{k,n}g_{k,n}({\bf A},\epsilon,{\bf x}).

gk,n(𝐀,ϵ,𝐱)𝒦:k(1xa,nϵ)ϵ𝒦:kxa,nj𝒦:j,jk(1xjaj,nϵ),k𝒦,n𝒩.\displaystyle g_{k,n}({\bf A},\epsilon,{\bf x})\triangleq\prod\limits_{\ell\in\mathcal{K}:\ell\neq k}\!\!\!\!(1\!\!-\!x_{\ell}a_{\ell,n}\epsilon)-\epsilon\!\!\!\sum\limits_{\ell\in\mathcal{K}:\ell\neq k}\!\!\!\!\!x_{\ell}a_{\ell,n}\!\!\!\!\!\!\!\prod\limits_{j\in\mathcal{K}:j\neq\ell,j\neq k}\!\!\!\!\!\!\!\!(1-x_{j}a_{j,n}\epsilon),\ k\in\mathcal{K},n\in\mathcal{N}. (27)

We choose

h~(𝐁,𝝀,ν,𝐁(s1))\displaystyle\tilde{h}({\bf B},{\boldsymbol{\lambda}},{\nu},{\bf B}^{(s-1)})\triangleq (𝐱𝒳p¯𝐱1)ν+𝐱𝒳((p¯𝐱p¯𝐱)λ𝐱+p¯𝐱T(𝐁(s1),1,𝐱))\displaystyle\left(\sum\limits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}-1\right)\nu+\sum\limits_{{\bf x}\in\mathcal{X}}\left((\underline{p}_{\bf x}-\overline{p}_{\bf x})\lambda_{\bf x}+\underline{p}_{\bf x}T({{\bf B}^{(s-1)}},1,{\bf x})\right)
+k𝒦n𝒩𝐱𝒳p¯𝐱xkgk,n(𝐁(s1),1,𝐱)(bk,nbk,n(s1))\displaystyle+\sum\limits_{k\in\mathcal{K}}\sum\limits_{n\in\mathcal{N}}\sum\limits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}x_{k}g_{k,n}({\bf B}^{(s-1)},1,{\bf x})\left(b_{k,n}-b_{k,n}^{(s-1)}\right)
12Nk𝒦𝒦:k(𝐱𝒳p¯𝐱xkx𝐱1)2k𝒦n𝒩(bk,nbk,n(s1))2,\displaystyle-\frac{1}{2}\sqrt{N\sum\limits_{k\in\mathcal{K}}\sum\limits_{\ell\in\mathcal{K}:\ell\neq k}\big{(}\sum\limits_{\mathbf{x}\in\mathcal{X}}\underline{p}_{\mathbf{x}}x_{k}x_{\ell}\|\mathbf{x}\|_{1}\big{)}^{2}}\sum\limits_{k\in\mathcal{K}}\sum\limits_{n\in\mathcal{N}}\left(b_{k,n}-b_{k,n}^{(s-1)}\right)^{2}, (28)

where 1\|\cdot\|_{1} denotes the 1\ell_{1}-norm, as an approximate function of the objective function in Problem 4 at iteration ss. In addition, we choose

h~c(𝐁,𝝀,ν,𝐁(s1),𝐱)\displaystyle\tilde{h}_{\text{c}}({\bf B},{\boldsymbol{\lambda}},{\nu},{\bf B}^{(s-1)},{\bf x})\triangleq ν+λ𝐱+T(𝐁(s1),1,𝐱)+k𝒦n𝒩xkgk,n(𝐁(s1),1,𝐱)(bk,nbk,n(s1))\displaystyle\nu+\lambda_{\bf x}+T\left({\bf B}^{(s-1)},1,{\bf x}\right)+\sum\limits_{k\in\mathcal{K}}\sum\limits_{n\in\mathcal{N}}x_{k}g_{k,n}({\bf B}^{(s-1)},1,{\bf x})\left(b_{k,n}-b_{k,n}^{(s-1)}\right)
12N𝐱12k𝒦xkn𝒩(bk,nbk,n(s1))2\displaystyle-\frac{1}{2}\sqrt{N}\|\mathbf{x}\|_{1}^{2}\sum\limits_{k\in\mathcal{K}}x_{k}\sum\limits_{n\in\mathcal{N}}\left(b_{k,n}-b_{k,n}^{(s-1)}\right)^{2} (29)

as an approximate function of the constraint function for 𝐱𝒳{\bf x}\in\mathcal{X} in (26c) at iteration ss. Note that the concave components of the objective function and the constraint functions in Problem 4 are left unchanged, and the other nonconcave components, i.e., 𝐱𝒳p¯𝐱T(𝐁,1,𝐱)\sum\nolimits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}T({\bf B},1,{\bf x}), in the objective function and T(𝐁,1,𝐱),𝐱𝒳T({\bf B},1,{\bf x}),{\bf x}\in\mathcal{X} in (26c) are minorized at (𝐁,𝝀,ϵ,ν)=(𝐁(s1),𝝀(s1),ϵ(s1),ν(s1))({\bf B},{\boldsymbol{\lambda}},\epsilon,{\nu})=({\bf B}^{(s-1)},{\boldsymbol{\lambda}}^{(s-1)},\epsilon^{(s-1)},\nu^{(s-1)}), using the concave components based on their second-order Taylor expansions [35]. Then, at iteration ss, we approximate Problem 4 with the following convex problem.

Problem 5 (Approximate Problem of Problem 4 at Iteration ss)
(𝐁^(s),𝝀^(s),ϵ^(s),ν^(s))argmax𝐁,𝝀0,ϵ,ν\displaystyle(\hat{\bf B}^{(s)},\hat{\boldsymbol{\lambda}}^{(s)},\hat{\epsilon}^{(s)},\hat{\nu}^{(s)})\triangleq\mathop{\arg\max}\limits_{{\bf B},{\boldsymbol{\lambda}}\succ 0,\epsilon,\nu} h~(𝐁,𝝀,ν,𝐁(s1))\displaystyle\quad\tilde{h}({\bf B},{\boldsymbol{\lambda}},{\nu},{\bf B}^{(s-1)})
s.t.\displaystyle\mathrm{s.t.} (2a),(2b),(26a),(26b),\displaystyle\quad\ \text{(\ref{C1})},\ \text{(\ref{C12})},\ \text{(\ref{Cbb1})},\ \text{(\ref{Cbb2})},
h~c(𝐁,𝝀,ν,𝐁(s1),𝐱)0,𝐱𝒳.\displaystyle\quad\tilde{h}_{\text{c}}({\bf B},{\boldsymbol{\lambda}},{\nu},{\bf B}^{(s-1)},{\bf x})\geq 0,\ {\bf x}\in\mathcal{X}.

First, we analyze the computational complexity for determining the updated objective function and constraint functions of Problem 5. The computational complexity for calculating xkgk,n(𝐁(s1),1,𝐱)x_{k}g_{k,n}({\bf B}^{(s-1)},1,{\bf x}), k𝒦,n𝒩,𝐱𝒳k\in\mathcal{K},n\in\mathcal{N},{\bf x}\in\mathcal{X} is 𝒪(NK32K)\mathcal{O}(NK^{3}2^{K}). Given xkgk,n(𝐁(s1),1,𝐱)x_{k}g_{k,n}({\bf B}^{(s-1)},1,{\bf x}), the computational complexity for calculating 𝐱𝒳p¯𝐱xkgk,n(𝐁(s1),1,𝐱)\sum\nolimits_{\mathbf{x}\in\mathcal{X}}\underline{p}_{\mathbf{x}}x_{k}g_{k,n}({\bf B}^{(s-1)},1,{\bf x}) is 𝒪(NK2K)\mathcal{O}(NK2^{K}). Note that constants 𝐱12\|\mathbf{x}\|_{1}^{2}, 𝐱𝒳\mathbf{x}\in\mathcal{X} and k𝒦𝒦:k(𝐱𝒳p¯𝐱xkx𝐱1)2\sum\nolimits_{k\in\mathcal{K}}\sum\nolimits_{\ell\in\mathcal{K}:\ell\neq k}\big{(}\sum\nolimits_{\mathbf{x}\in\mathcal{X}}\underline{p}_{\mathbf{x}}x_{k}x_{\ell}\|\mathbf{x}\|_{1}\big{)}^{2} are computed in advance. Thus, at iteration ss, the computational complexity for determining the updated objective function and constraint functions of Problem 5 is 𝒪(NK32K)+𝒪(NK2K)=𝒪(NK32K)\mathcal{O}(NK^{3}2^{K})+\mathcal{O}(NK2^{K})=\mathcal{O}(NK^{3}2^{K}). Then, we solve Problem 5. Problem 5 has 2+NK+2K2+NK+2^{K} variables and 2+N(K+1)+2(K+1)2+N(K+1)+2^{(K+1)} constraints. Thus, solving Problem 5 by using an interior-point method has computational complexity 𝒪((NK+2K)3)\mathcal{O}\left((NK+2^{K})^{3}\right)[32, pp. 8]. After solving Problem 5, we update

𝐁(s)=(1γ)𝐁(s1)+γ𝐁^(s),𝝀(s)=(1γ)𝝀(s1)+γ𝝀^(s),\displaystyle{\bf B}^{(s)}=(1-\gamma){\bf B}^{(s-1)}+\gamma\hat{\bf B}^{(s)},\ {\boldsymbol{\lambda}}^{(s)}=(1-\gamma)\boldsymbol{\lambda}^{(s-1)}+\gamma\hat{\boldsymbol{\lambda}}^{(s)}, (30a)
ϵ(s)=(1γ)ϵ(s1)+γϵ^(s),ν(s)=(1γ)ν(s1)+γν^(s),\displaystyle\epsilon^{(s)}=(1-\gamma)\epsilon^{(s-1)}+\gamma\hat{\epsilon}^{(s)},\ \nu^{(s)}=(1-\gamma)\nu^{(s-1)}+\gamma\hat{\nu}^{(s)}, (30b)

where γ(0,1]\gamma\in(0,1] is a positive constant.

Finally, the details of the proposed iterative algorithm are summarized in Algorithm 3. Based on [30, Theorem 1], we can show the following result.

Theorem 6 (Convergence of Algorithm 3)

Let (𝐁,𝛌,ϵ,ν)({\bf B}^{{\dagger}},\boldsymbol{\lambda}^{{\dagger}},\epsilon^{{\dagger}},\nu^{{\dagger}}) be a limit point of the iterates generated by Algorithm 3. If the interior of the set {(𝐁,𝛌,ϵ,ν)|h~c(𝐁,𝛌,ν,𝐁(),𝐱)0,𝐱𝒳,(2a),(2b),(26a),(26b)}\{({\bf B},{\boldsymbol{\lambda}},\epsilon,\nu)|\ \tilde{h}_{\text{c}}({\bf B},{\boldsymbol{\lambda}},{\nu},{\bf B}^{({\dagger})},{\bf x})\geq 0,\ {\bf x}\in\mathcal{X},\text{(\ref{C1})},\text{(\ref{C12})},\text{(\ref{Cbb1})},\text{(\ref{Cbb2})}\} is nonempty, then (𝐁,𝛌,ϵ,ν)({\bf B}^{{\dagger}},\boldsymbol{\lambda}^{{\dagger}},\epsilon^{{\dagger}},\nu^{{\dagger}}) is a KKT point of Problem 4.

Proof:

Please refer to Appendix G. ∎

Algorithm 3 Obtaining A KKT Point of Problem 4
1:  initialization: Set (𝐁(0),𝝀(0),ϵ(0),ν(0))({\bf B}^{(0)},{\boldsymbol{\lambda}}^{(0)},\epsilon^{(0)},\nu^{(0)}) :=((𝐞nk)k𝒦,𝟏,1,0):=((\mathbf{e}_{n_{k}})_{k\in\mathcal{K}},\mathbf{1},1,0), where nkn_{k} is uniformly chosen from 𝒩\mathcal{N} at random, for all k𝒦k\in\mathcal{K}, set s:=0s:=0, and choose μ>0\mu>0.
2:  repeat
3:   Set s:=s+1s:=s+1.
4:   Calculate (𝐁^(s),𝝀^(s),ϵ^(s)ν^(s))(\hat{\bf B}^{(s)},{\hat{\boldsymbol{\lambda}}}^{(s)},\hat{\epsilon}^{(s)}\hat{\nu}^{(s)}) by solving Problem 5.
5:   Update (𝐁(s),𝝀(s),ϵ(s),ν(s))({\bf B}^{(s)},{\boldsymbol{\lambda}}^{(s)},\epsilon^{(s)},\nu^{(s)}) according to (30).
6:  until 𝐁(s)𝐁(s1)𝔽μ\|{\bf B}^{(s)}-{\bf B}^{(s-1)}\|_{\mathbb{F}}\leq\mu.

Analogously, we can run Algorithm 3 multiple times (with possibly different random initial points) to obtain multiple stationary points of Problem 4 and choose the KKT point with the largest worst-case average throughput as a suboptimal point of Problem 4, which can also be regarded as a suboptimal point of Problem 3 according to Lemma 1.

C Low-complexity Solution

From the complexity analysis for solving Problem 5, we know that Algorithm 3 is computationally expensive when KK or NN is large. In this part, we develop a low-complexity iterative algorithm, which is applicable for large KK or NN, to obtain a stationary point of an approximate problem of Problem 3. Later, in Section VI, we shall show that this low-complexity algorithm can achieve comparable performance. First, we approximate the complicated function T¯wt(𝐀,ϵ,𝒫)min𝐩𝒫T¯(𝐀,ϵ,𝐩)\bar{T}_{\text{wt}}({\bf A},{\epsilon},\mathcal{P})\triangleq\min_{{\bf p}\in\mathcal{P}}\bar{T}({\bf A},{\epsilon},{\bf p}), which has N2KN2^{K} terms, with a simpler function, which has 1+K(K1)21+\frac{K(K-1)}{2} terms. Specifically, as in Section III.C, we approximate T¯(𝐀,ϵ,𝐩)\bar{T}({\bf A},{\epsilon},{\bf p}) with T~(𝐀,ϵ,𝐩)\tilde{T}({\bf A},{\epsilon},{\bf p}). Recall that T~(𝐀,ϵ,𝐩)\tilde{T}(\mathbf{A},\epsilon,{\bf p}) captures the active probabilities of every single device and every two devices. Analogously, we approximate 𝒫\mathcal{P} with

𝒫~{(y𝐱)𝐱𝒳|\displaystyle\tilde{\mathcal{P}}\triangleq\big{\{}(y_{\bf x})_{{\bf x}\in\mathcal{X}}\big{|}\ 𝐱𝒳y𝐱=1,𝐱𝒳p¯𝐱xk𝐱𝒳y𝐱xk𝐱𝒳p¯𝐱xk,k𝒦,\displaystyle\sum\limits_{{\bf x}\in\mathcal{X}}y_{\bf x}=1,\ \sum\limits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}x_{k}\leq\sum\limits_{{\bf x}\in\mathcal{X}}y_{\bf x}x_{k}\leq\sum\limits_{{\bf x}\in\mathcal{X}}\overline{p}_{\bf x}x_{k},\ k\in\mathcal{K},
𝐱𝒳p¯𝐱xkx𝐱𝒳y𝐱xkx𝐱𝒳p¯𝐱xkx,k,𝒦,k<}.\displaystyle\sum\limits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}x_{k}x_{\ell}\leq\sum\limits_{{\bf x}\in\mathcal{X}}y_{\bf x}x_{k}x_{\ell}\leq\sum\limits_{{\bf x}\in\mathcal{X}}\overline{p}_{\bf x}x_{k}x_{\ell},\ k,\ell\in\mathcal{K},k<\ell\big{\}}.

Obviously, 𝒫~𝒫\tilde{\mathcal{P}}\supseteq\mathcal{P}. Note that contrary to 𝒫\mathcal{P}, only the upper and lower bounds on the active probabilities of every single device and every two devices remain. Thus, we approximate T¯wt(𝐀,ϵ,𝒫)\bar{T}_{\text{wt}}({\bf A},{\epsilon},\mathcal{P}) with T~wt(𝐀,ϵ,𝒫~)min𝐩𝒫~T~(𝐀,ϵ,𝐩)\tilde{T}_{\text{wt}}({\bf A},{\epsilon},\tilde{\mathcal{P}})\triangleq\min\nolimits_{{\bf p}\in\tilde{\mathcal{P}}}\tilde{T}({\bf A},{\epsilon},\bf p) whose analytical form is given in the following lemma.

Lemma 2 (Approximate Worst-case Average Throughput)

For all (𝐀,ϵ)({\bf A},\epsilon) satisfying (2a), (2b), (3a) and (3b),

T~wt(𝐀,ϵ,𝒫~)=ϵk𝒦𝐱𝒳p¯𝐱xkϵ2n𝒩k𝒦ak,n𝒦:>ka,n𝐱𝒳p¯𝐱xkx.\displaystyle\tilde{T}_{\text{wt}}({\bf A},{\epsilon},\tilde{\mathcal{P}})=\epsilon\sum\limits_{k\in\mathcal{K}}\sum\limits_{{\bf x}\in\mathcal{X}}{\underline{p}}_{\bf x}x_{k}-\epsilon^{2}\sum\limits_{n\in\mathcal{N}}\sum\limits_{k\in\mathcal{K}}a_{k,n}\sum\limits_{\ell\in{\mathcal{K}}:\ell>k}a_{\ell,n}\sum\limits_{{\bf x}\in\mathcal{X}}{\overline{p}}_{\bf x}x_{k}x_{\ell}. (31)
Proof:

Please refer to Appendix H. ∎

Next, we consider the following approximate problem of Problem 3.

Problem 6 (Approximate Worst-case Average Throughput Maximization)
max𝐀,ϵ\displaystyle\max\limits_{\bf A,\epsilon} T~wt(𝐀,ϵ,𝒫~)\displaystyle\quad\tilde{T}_{\text{wt}}({\bf A},{\epsilon},\tilde{\mathcal{P}})
s.t.\displaystyle\mathrm{s.t.} (2a),(2b),(3a),(3b),\displaystyle\quad\text{(\ref{C1})},\ \text{(\ref{C12})},\ \text{(\ref{C2})},\ \text{(\ref{C3})},

where T~wt(𝐀,ϵ,𝒫~)\tilde{T}_{\text{wt}}({\bf A},{\epsilon},\tilde{\mathcal{P}}) is given by (31).

The numbers of variables and constraints of Problem 6 are KN+1KN+1 and 2+K(N+1)2+K(N+1), respectively, which are much smaller than those of Problem 4. Note that 𝐱𝒳p¯𝐱xk\sum\nolimits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}x_{k} and 𝐱𝒳p¯𝐱xxk\sum\nolimits_{{\bf x}\in\mathcal{X}}\overline{p}_{\bf x}x_{\ell}x_{k} (k<k<\ell) represent a lower bound on the active probability of every single device and an upper bound on the active probability of every two devices, respectively, and can be computed in advance. Obviously, Problem 6 shares the same form as Problem 2. Thus, we can use a low-complexity iterative algorithm, similar to Algorithm 2, to obtain a stationary point of Problem 6. The details are omitted due to the page limitation.

V Stochastic Optimization for Unknown Joint Device Activity Distribution

In this section, we consider the sample average throughput maximization in the case of unknown joint device activity distribution. We first formulate the sample average throughput maximization problem, which is a challenging nonconvex problem. Then, we develop an iterative algorithm to obtain a stationary point.

A Problem Formulation

In the case of unknown joint device activity distribution, we optimize the preamble selection distributions 𝐀\bf A and access barring factor ϵ\epsilon to maximize the sample average throughput T¯st(𝐀,ϵ,𝐩)\bar{T}_{\text{st}}({\bf A},{\epsilon},\bf p) in (8) subject to the constraints on (𝐀,ϵ)({\bf A},\epsilon) in (2a), (2b), (3a), and (3b).

Problem 7 (Sample Average Throughput Maximization)
max𝐀,ϵ\displaystyle\max_{\bf{A},\epsilon} T¯st(𝐀,ϵ)\displaystyle\quad\bar{T}_{\text{st}}({\bf A},{\epsilon})
s.t.\displaystyle\mathrm{s.t.} (2a),(2b),(3a),(3b).\displaystyle\quad\text{(\ref{C1})},\ \text{(\ref{C12})},\ \text{(\ref{C2})},\ \text{(\ref{C3})}.

The objective function T¯st(𝐀,ϵ)\bar{T}_{\text{st}}({\bf A},{\epsilon}) is nonconcave in (𝐀,ϵ)({\bf A},\epsilon), and the constraints in (2a), (2b), (3a), and (3b) are linear. Thus, Problem 7 is nonconvex with a convex feasible set. Note that the objective function T¯st(𝐀,ϵ)\bar{T}_{\text{st}}({\bf A},{\epsilon}) has NKINKI terms, and the number of samples II is usually quite large in practice. Therefore, directly tackling Problem 7 is not computationally efficient. To reduce the computation time, we solve its equivalent stochastic version (which is given in Appendix I) using a stochastic iterative algorithm.

B Stationary Point

Based on mini-batch stochastic parallel SCA [31], we propose a stochastic algorithm to obtain a stationary point of Problem 7. The main idea is to solve a set of parallelly refined convex problems, each of which is obtained by approximating T¯st(𝐀,ϵ)\bar{T}_{\text{st}}({\bf A},{\epsilon}) with a convex function based on its structure and samples in a uniformly randomly selected mini-batch. We partition \mathcal{I} into MM disjoint subsets, m,m\mathcal{I}_{m},m\in\mathcal{M}, each of size IM\frac{I}{M} (assuming II is divisible by MM), where {1,2,,M}\mathcal{M}\triangleq\{1,2,...,M\}. We divide the variables (𝐀,ϵ)\left({\bf A},\epsilon\right) into K+1K+1 blocks, i.e., 𝐚k,k𝒦{\bf a}_{k},\ k\in\mathcal{K} and ϵ\epsilon. This algorithm updates all K+1K+1 blocks in each iteration separately in a parallel manner by maximizing K+1K+1 approximate functions of T¯st(𝐀,ϵ)\bar{T}_{\text{st}}({\bf A},{\epsilon}).

Specifically, at iteration tt, a mini-batch denoted by ξ(t)\mathcal{I}_{\xi^{(t)}} is selected, where ξ(t){\xi^{(t)}} follows the uniform distribution over \mathcal{M}. Let 𝐚k(t1){\bf a}_{k}^{(t-1)} and ϵ(t1)\epsilon^{(t-1)} represent the preamble selection distribution of device kk and the access barring factor obtained at iteration t1t-1. Denote 𝐀(t1)(𝐚k(t1))k𝒦{\bf A}^{(t-1)}\triangleq({\bf a}_{k}^{(t-1)})_{k\in\mathcal{K}}. For ease of illustration, in the following, we also write (𝐀,ϵ)(\mathbf{A},\epsilon) as (𝐚k,𝐚k,ϵ)({\bf a}_{k},{\bf a}_{-k},\epsilon), where 𝐚k(𝐚)𝒦,k{\bf a}_{-k}\triangleq({\bf a}_{\ell})_{\ell\in{\mathcal{K}},\ell\neq k}. We choose

T~st,k(t)(𝐚k,𝐚k(t1),ϵ(t1))\displaystyle\tilde{T}_{\text{st},k}^{(t)}\left({\bf a}_{k},{\bf a}_{-k}^{(t-1)},\epsilon^{(t-1)}\right)\triangleq ρ(t)MIiξ(t)T(𝐀(t1),ϵ(t1),𝐱i)+n𝒩ck,n(t)(ak,nak,n(t1))\displaystyle\rho^{(t)}\frac{M}{I}\sum\limits_{i\in\mathcal{I}_{{\xi}^{(t)}}}T\left({\bf A}^{(t-1)},{\epsilon}^{(t-1)},{\bf x}_{i}\right)+\sum\limits_{n\in\mathcal{N}}c_{k,n}^{(t)}\left(a_{k,n}-a_{k,n}^{(t-1)}\right)

as an approximate function of T¯st(𝐀,ϵ)\bar{T}_{\text{st}}({\bf A},{\epsilon}) for updating 𝐚k{\bf a}_{k} at iteration tt. Here, ρ(t)\rho^{(t)} is a positive diminishing stepsize satisfying

ρ(t)>0,limtρ(t)=0,t=1ρ(t)=,t=1(ρ(t))2<,\displaystyle\rho^{(t)}>0,\ \lim\limits_{t\rightarrow\infty}\rho^{(t)}=0,\ \sum\limits_{t=1}^{\infty}\rho^{(t)}=\infty,\ \sum\limits_{t=1}^{\infty}(\rho^{(t)})^{2}<\infty,

and ck,n(t)c_{k,n}^{(t)} is given by

ck,n(t)=(1ρ(t))ck,n(t1)+ρ(t)MIϵ(t1)iξ(t)xi,kgk,n(𝐀(t1),ϵ(t1),𝐱i),\displaystyle c_{k,n}^{(t)}=\left(1-\rho^{(t)}\right)c_{k,n}^{(t-1)}\!+\!\rho^{(t)}\frac{M}{I}\epsilon^{(t-1)}\sum\limits_{i\in\mathcal{I}_{{\xi}^{(t)}}}x_{i,k}g_{k,n}({\bf A}^{(t-1)},\epsilon^{(t-1)},{\bf x}_{i}),

where ck,n(0)=0c_{k,n}^{(0)}=0, k𝒦,n𝒩k\in\mathcal{K},n\in\mathcal{N}, xi,kx_{i,k} represents the kk-th element of 𝐱i\mathbf{x}_{i}, and gk,n(𝐀,ϵ,𝐱)g_{k,n}(\mathbf{A},\epsilon,{\bf x}) is given by (27). We choose

T~st,0(t)(ϵ,𝐀(t1),ϵ(t1))c0(t)ϵτ(ϵϵ(t1))2\displaystyle\tilde{T}_{\text{st},0}^{(t)}\left(\epsilon,{\bf A}^{(t-1)},\epsilon^{(t-1)}\right)\triangleq c_{0}^{(t)}\epsilon-\tau\left(\epsilon-\epsilon^{(t-1)}\right)^{2}

as an approximate function of T¯st(𝐀,ϵ)\bar{T}_{\text{st}}({\bf A},{\epsilon}) for updating ϵ\epsilon at iteration tt. Here, τ\tau is a positive constant and c0(t)c_{0}^{(t)} is given by

c0(t)=(1ρ(t))c0(t1)+ρ(t)MIiξ(t)k𝒦xi,kn𝒩ak,n(t1)gk,n(𝐀(t1),ϵ(t1),𝐱i),\displaystyle c_{0}^{(t)}=(1-\rho^{(t)})c_{0}^{(t-1)}+\rho^{(t)}\frac{M}{I}\sum\limits_{i\in\mathcal{I}_{{\xi}^{(t)}}}\sum\limits_{k\in\mathcal{K}}x_{i,k}\sum\limits_{n\in\mathcal{N}}a_{k,n}^{(t-1)}g_{k,n}({\bf A}^{(t-1)},\epsilon^{(t-1)},{\bf x}_{i}),

where c0(0)=0c_{0}^{(0)}=0.

We first solve the following problems for K+1K+1 blocks, in a parallel manner. Given (𝐀(t1),ϵ(t1))\left({\bf A}^{(t-1)},{\epsilon}^{(t-1)}\right) and ξ(t)\mathcal{I}_{\xi^{(t)}}, the optimization problem with respect to 𝐚k{\bf a}_{k} is given by

𝐚^k(t)argmax𝐚k\displaystyle\hat{\bf a}_{k}^{(t)}\triangleq\mathop{\arg\max}\limits_{{\bf a}_{k}} T~st,k(t)(𝐚k,𝐚k(t1),ϵ(t1)),k𝒦\displaystyle\quad\tilde{T}_{\text{st},k}^{(t)}\left({\bf a}_{k},{\bf a}_{-k}^{(t-1)},\epsilon^{(t-1)}\right),\quad k\in\mathcal{K} (32)
s.t.\displaystyle\mathrm{s.t.} (3a),(3b),\displaystyle\quad\text{(\ref{C2})},\ \text{(\ref{C3})},

and the optimization problem with respect to ϵ\epsilon is given by

ϵ^(t)argmaxϵ\displaystyle\hat{\epsilon}^{(t)}\triangleq\mathop{\arg\max}\limits_{\epsilon} T~st,0(t)(ϵ,𝐀(t1),ϵ(t1))\displaystyle\quad\tilde{T}_{\text{st},0}^{(t)}\left(\epsilon,{\bf A}^{(t-1)},\epsilon^{(t-1)}\right) (33)
s.t.\displaystyle\mathrm{s.t.} (2a),(2b).\displaystyle\quad\ \text{(\ref{C1})},\ \text{(\ref{C12})}.

Each problem in (32) is an LP with NN variables and N+1N+1 constraints. The problem in (33) is a QP with a single variable and two constraints. Based on the structural properties of the optimization problems in (32) and (33), we can obtain their optimal points.

Theorem 7 (Optimal Points of Problems in (32) and (33))

A set of optimal points of each optimization problem in (32) is given by

{𝐞m:margmaxn𝒩ck,n(t)},k𝒦,\displaystyle\left\{{\bf e}_{m}:m\in\mathop{\arg\max}_{n\in\mathcal{N}}\quad c_{k,n}^{(t)}\right\},\ k\in\mathcal{K}, (34)

and the optimal point of the optimization problem in (33) is given by

min(max(ϵ(t1)+c0(t)2τ,0),1).\displaystyle{\min\left(\max\left(\epsilon^{(t-1)}+\frac{c_{0}^{(t)}}{{2\tau}},0\right),1\right)}. (35)
Algorithm 4 Obtaining A Stationary Point of Problem 7
1:  initialization: for k𝒦k\in\mathcal{K}, set 𝐚k:=𝐞nk{\bf a}_{k}:={\bf e}_{n_{k}}, where nkn_{k} is uniformly chosen from 𝒩\mathcal{N} at random, set ϵ:=1\epsilon:=1, set t:=0t:=0, and choose μ>0\mu>0.
2:  repeat
3:  t:=t+1t:=t+1.
4:  for k𝒦k\in\mathcal{K} do
5:   𝐚^k\hat{\bf a}_{k} is randomly chosen from the set in (34).
6:   Update 𝐚k(t){{\bf a}}_{k}^{(t)} according to (36a).
7:  end for
8:   Calculate ϵ^(t)\hat{\epsilon}^{(t)} according to (35).
9:   Update ϵ(t)\epsilon^{(t)} according to (36b).
10:  until 𝐀(t)𝐀(t1)𝔽μ\|{\bf A}^{(t)}-{\bf A}^{(t-1)}\|_{\mathbb{F}}\leq\mu.
Proof:

Theorem 7 can be proved in a similar way to Theorem 1. ∎

The optimal point in (34) indicates that each device kk selects the preamble corresponding to the maximum approximate average throughput (increase rate of the average throughput) conditioned on selected by device kk for given (𝐚k(t1),ϵ(t1))(\mathbf{a}_{-k}^{(t-1)},\epsilon^{(t-1)}). For all k𝒦k\in\mathcal{K} and n𝒩n\in\mathcal{N}, the computational complexity for calculating ck,n(t)c_{k,n}^{(t)} is 𝒪(K2)\mathcal{O}(K^{2}). For all k𝒦k\in\mathcal{K}, the computational complexity of finding the largest one among ck,n(t)c_{k,n}^{(t)}, n𝒩n\in\mathcal{N} is 𝒪(N)\mathcal{O}(N). Thus, the overall computational complexity for determining the sets in (34) is 𝒪(NK3)+𝒪(NK)=𝒪(NK3)\mathcal{O}(NK^{3})+\mathcal{O}(NK)=\mathcal{O}(NK^{3}). The computational complexity for calculating c0(t)c_{0}^{(t)} is 𝒪(NK3)\mathcal{O}(NK^{3}), which is also the computational complexity for obtaining the optimal point in (35).

Then, we update the preamble selection distributions 𝐀\bf A and access barring factor ϵ\epsilon by

𝐚k(t)=(1ω(t))𝐚k(t1)+𝐚^k(t),k𝒦,\displaystyle{\bf a}_{k}^{(t)}=(1-\omega^{(t)}){\bf a}_{k}^{(t-1)}+\hat{{\bf a}}_{k}^{(t)},\ k\in\mathcal{K}, (36a)
ϵ(t)=(1ω(t))ϵ(t1)+ϵ^(t),\displaystyle\epsilon^{(t)}=(1-\omega^{(t)})\epsilon^{(t-1)}+\hat{\epsilon}^{(t)}, (36b)

where ω(t)\omega^{(t)} is a positive diminishing stepsize satisfying

ω(t)>0,limtω(t)=0,t=1ω(t)=,t=1(ω(t))2<,limtω(t)ρ(t)=0.\displaystyle\omega^{(t)}>0,\ \lim\limits_{t\rightarrow\infty}\omega^{(t)}=0,\ \sum\limits_{t=1}^{\infty}\omega^{(t)}=\infty,\ \sum\limits_{t=1}^{\infty}(\omega^{(t)})^{2}<\infty,\ \lim\limits_{t\rightarrow\infty}\frac{\omega^{(t)}}{\rho^{(t)}}=0.

Finally, the details of the proposed stochastic parallel iterative algorithm are summarized in Algorithm 4. Based on [36], we can show the following result.

Theorem 8 (Convergence of Algorithm 4)

If argmaxn𝒩ck,n(t)\mathop{\arg\max}_{n\in\mathcal{N}}c_{k,n}^{(t)} is a singleton for all k𝒦k\in\mathcal{K} and all t1t\geq 1, then every limit point of {(𝐀(t),ϵ(t))}\{\left({\bf A}^{(t)},\epsilon^{(t)}\right)\} generated by Algorithm 4 is a stationary point of Problem 7 almost surely.

Proof:

Please refer to Appendix I. ∎

VI Numerical Results

In this section, we evaluate the performance of the proposed solutions121212Algorithm 2 and Algorithm 4 are computationally efficient and can be easily implemented in practical systems with large KK. Algorithm 1 and Algorithm 3 are computationally expensive but can provide essential benchmarks for designing effective and low-complexity methods. for dependent and independent device activities via numerical results. In the simulation, we adopt the group device activity model in [26]. Specifically, KK devices are divided into GG groups, each of size KG\frac{K}{G} (assuming KK is divisible by GG), the activity states of devices in different groups are independent, and the activity states of devices in one group are the same. Note that when KG=1\frac{K}{G}=1, the device activities are independent; when KG>1\frac{K}{G}>1, the device activities are dependent. Denote 𝒢{1,2,,G}\mathcal{G}\triangleq\{1,2,...,G\}. Let 𝒦g\mathcal{K}_{g} denote the set of devices in group g𝒢g\in\mathcal{G}. Let yg{0,1}y_{g}\in\{0,1\} denote the activity state of group gg, where yg=1y_{g}=1 if group gg is active, and yg=0y_{g}=0 otherwise. Let 𝐲(yg)g𝒢{\bf y}\triangleq(y_{g})_{g\in\mathcal{G}} denote the activity states of all groups. The probability that a group is active is pap_{a}. Then, in the case of perfect joint device activity distribution, the joint device activity distribution is given by

p𝐱={pag𝒢yg(1pa)Gg𝒢yg,xk=yg,k𝒦g,g𝒢,𝐲{0,1}G0,otherwise,𝐱𝒳.\displaystyle p_{\bf x}=\left\{\begin{array}[]{ll}{p_{a}}^{\sum\limits_{g\in\mathcal{G}}y_{g}}(1-p_{a})^{G-\sum\limits_{g\in\mathcal{G}}y_{g}}&,\ x_{k}=y_{g},k\in\mathcal{K}_{g},g\in\mathcal{G},{\bf y}\in\{0,1\}^{G}\\ 0&,\ \text{otherwise}\end{array}\right.,\ {\bf x}\in\mathcal{X}. (39)

In the case of imperfect joint device activity distribution, the estimated joint device activity distribution is given by

p^𝐱={pag𝒢yg(1pa)Gg𝒢yg,xk=yg,k𝒦g,g𝒢,𝐲{0,1}G0,otherwise,𝐱𝒳,\displaystyle\hat{p}_{\bf x}=\left\{\begin{array}[]{ll}{p_{a}}^{\sum\limits_{g\in\mathcal{G}}y_{g}}(1-p_{a})^{G-\sum\limits_{g\in\mathcal{G}}y_{g}}&,\ x_{k}=y_{g},k\in\mathcal{K}_{g},g\in\mathcal{G},{\bf y}\in\{0,1\}^{G}\\ 0&,\ \text{otherwise}\end{array}\right.,\ {\bf x}\in\mathcal{X}, (42)

for K100K\leq 100, we set δ𝐱=δ¯p^𝐱\delta_{\bf x}=\bar{\delta}{\hat{p}}_{\bf x}, 𝐱𝒳{\bf x}\in\mathcal{X}, and for K>100K>100, we set

δ𝐱={δ¯p^𝐱,xk=yg,k𝒦g,g𝒢,g𝒢yg3,𝐲{0,1}G0,otherwise,𝐱𝒳,\displaystyle\delta_{\bf x}=\left\{\begin{array}[]{ll}\bar{\delta}{\hat{p}}_{\bf x}&,\ x_{k}=y_{g},k\in\mathcal{K}_{g},g\in\mathcal{G},\sum\limits_{g\in\mathcal{G}}y_{g}\leq 3,{\bf y}\in\{0,1\}^{G}\\ 0&,\ \text{otherwise}\end{array}\right.,\ {\bf x}\in\mathcal{X}, (45)

for some δ¯[0,1)\bar{\delta}\in[0,1). In the case of unknown joint device activity distribution, we consider 10510^{5} samples generated according to the joint device activity distribution p𝐱,𝐱𝒳p_{\bf x},{\bf x}\in\mathcal{X} given by (39). Note that in the three cases, we consider the same underlying joint device activity distribution 𝐩\mathbf{p} for a fair comparison. For ease of presentation, in the following, T¯(𝐀,ϵ,𝐩)\bar{T}({\bf A},{\epsilon},{\bf p}), T¯wt(𝐀,ϵ,𝒫)\bar{T}_{\text{wt}}({\bf A},{\epsilon},\mathcal{P}), and T¯st(𝐀,ϵ)\bar{T}_{\text{st}}({\bf A},{\epsilon})’s numerical mean in the cases of perfect, imperfect, and unknown joint device activity distributions, i.e., case-pp, case-ip, and case-up, respectively, are referred to as throughput, unless otherwise specified. We evaluate T¯st(𝐀,ϵ)\bar{T}_{\text{st}}({\bf A},{\epsilon})’s numerical mean by averaging over 100100 realizations of T¯st(𝐀,ϵ)\bar{T}_{\text{st}}({\bf A},{\epsilon}), each obtained based on a set of 10510^{5} samples. In case-tt, the proposed stationary point or KKT point is called Prop-tt, where t=t= pp, ip, up. Furthermore, in case-tt, the proposed low-complexity solution is called Prop-LowComp-tt, where t=t= pp and ip.

We consider three baseline schemes, namely BL-MMPC, BL-MSPC and BL-LTE. In BL-MMPC and BL-MSPC, 𝐀\bf A are obtained by the MMPC and MSPC allocation algorithms and the proposed access barring scheme in [27], respectively. In BL-LTE, we set ak,n=1N,k𝒦,n𝒩a_{k,n}=\frac{1}{N},\ k\in\mathcal{K},n\in\mathcal{N} according to the standard random access procedure of LTE networks [7] and set ϵ=min(1,NK¯)\epsilon=\min\left(1,\frac{N}{\bar{K}}\right) according to the optimal access control [15], where K¯\bar{K} denotes the average number of active devices. Note that BL-MMPC and BL-MSPC make use of the dependence of the activities of every two devices; BL-LTE does not utilize any information on the dependence of device activities. In case-tt, the three baseline schemes are referred to as BL-MMPC-tt, BL-MSPC-tt, and BL-LTE-tt, where t=t= pp, ip, up. In case-pp, (𝐀,ϵ){(\bf A},\epsilon) in BL-MMPC-pp and BL-MSPC-pp and ϵ\epsilon in BL-LTE-pp rely on p𝐱p_{\bf x}, 𝐱𝒳{\bf x}\in\mathcal{X}. In case-ip, (𝐀,ϵ){(\bf A},\epsilon) in BL-MMPC-ip and BL-MSPC-ip and ϵ\epsilon in BL-LTE-ip rely on p^𝐱\hat{p}_{\bf x}, 𝐱𝒳{\bf x}\in\mathcal{X} without considering potential estimation errors. In case-up, (𝐀,ϵ){(\bf A},\epsilon) in BL-MMPC-up and BL-MSPC-up and ϵ\epsilon in BL-LTE-up rely on the empirical joint device activity distribution obtained from the samples. Considering the tradeoff between the throughput and computational complexity, we run each proposed algorithm five times to obtain each point for all considered parameters.

Refer to caption
(a) Throughput versus KK at pa=0.25p_{a}=0.25 and N=15N=15.
Refer to caption
(b) Throughput versus NN at pa=0.25p_{a}=0.25 and K=60K=60.
Refer to caption
(c) Throughput versus pap_{a} at N=15N=15 and K=60K=60.
Figure 3: Throughput versus KK, NN, and pap_{a} at δ¯=0.3\bar{\delta}=0.3 and KG=10\frac{K}{G}=10 (dependent device activities).
Refer to caption
(a) Throughput versus KK at pa=0.25p_{a}=0.25 and N=4N=4.
Refer to caption
(b) Throughput versus NN at pa=0.25p_{a}=0.25 and K=10K=10.
Refer to caption
(c) Throughput versus pap_{a} at N=4N=4 and K=10K=10.
Figure 4: Throughput versus KK, NN, and pap_{a} at δ¯=0.3\bar{\delta}=0.3 and KG=1\frac{K}{G}=1 (independent device activities).

A Small K and N

This part compares the throughputs of all proposed solutions and three baseline schemes at small numbers of devices and preambles. Fig. 3 and Fig. 4 illustrate the throughput versus the number of devices KK, the number of preambles NN, and the group active probability pap_{a}, for dependent and independent device activities, respectively. From Fig. 3 and Fig. 4, we make the following observations. For each scheme, the curve in case-up is close to that in case-pp, as we evaluate T¯st(𝐀,ϵ)\bar{T}_{\text{st}}({\bf A},{\epsilon})’s numerical mean over a large number of samples; the curve in case-pp is above that in case-ip, as illustrated in Section II. For t=t= pp, ip, and up, Prop-tt outperforms all the baseline schemes in case-tt, as Prop-tt utilizes more statistical information on device activities. For t=t= pp and ip, Prop-LowComp-tt outperforms all the baseline schemes in case-tt, as Prop-LowComp-tt relies on a more accurate approximation of the throughput. The fact that the gap between the throughputs of Prop-tt and Prop-LowComp-tt is small, where t=t= pp and ip, shows that exploiting statistical information on the activities of every two devices rigorously already achieves a significant gain. Furthermore, from Fig. 3 (a), Fig. 4 (a) and Fig. 3 (c), Fig. 4 (c), we can see that the throughput of each proposed scheme increases with KK and pap_{a}, respectively, due to the increase of traffic load. From Fig. 3 (b) and Fig. 4 (b), we can see that the throughput of each scheme increases with NN due to the increase of communications resource.

Fig. 6 and Fig. 6 illustrate the worst-case average throughput versus the estimation error bound δ¯\bar{\delta} for dependent and independent device activities, respectively. From Fig. 6 and Fig. 6, we can see the worst-case average throughputs of Prop-ip and Prop-LowComp-ip are greater than those of Prop-pp, Prop-LowComp-pp, BL-MMPC-pp, BL-MSPC-pp, and BL-LTE-pp, which reveals the importance of explicitly considering the imperfectness of the estimated joint device activity distribution in this case. Furthermore, the gain of Prop-ip over Prop-pp and the gain of Prop-LowComp-ip over Prop-LowComp-pp increase with δ¯\bar{\delta}, as it is more important to take into account possible device activity estimation errors when δ¯\bar{\delta} is larger.

Refer to caption
Figure 5: Worst-case average throughput versus δ¯\bar{\delta} at K=60K=60, N=15N=15, pa=0.25p_{a}=0.25, and KG=10\frac{K}{G}=10 (dependent device activities).
Refer to caption
Figure 6: Worst-case average throughput versus δ¯\bar{\delta} at K=10K=10, N=4N=4, pa=0.4p_{a}=0.4, and KG=1\frac{K}{G}=1 (independent device activities).
Refer to caption
(a) Throughput versus KK at pa=0.03p_{a}=0.03 and N=50N=50.
Refer to caption
(b) Throughput versus NN at pa=0.03p_{a}=0.03 and K=1000K=1000.
Refer to caption
(c) Throughput versus pap_{a} at N=50N=50 and K=1000K=1000.
Figure 7: Throughput versus KK, NN, and pap_{a} at δ¯=0.3\bar{\delta}=0.3 and KG=20\frac{K}{G}=20 (dependent device activities).
Refer to caption
(a) Throughput versus KK at pa=0.03p_{a}=0.03 and N=50N=50.
Refer to caption
(b) Throughput versus NN at pa=0.03p_{a}=0.03 and K=1000K=1000.
Refer to caption
(c) Throughput versus pap_{a} at N=50N=50 and K=1000K=1000.
Figure 8: Throughput versus KK, NN, and pap_{a} at δ¯=0.3\bar{\delta}=0.3 and KG=1\frac{K}{G}=1 (independent device activities).

B Large K and N

This part compares the throughputs of Prop-LowComp-pp, Prop-LowComp-ip, Prop-up, and three baseline schemes, at large numbers of devices and preambles. Fig. 7 and Fig. 8 illustrate the throughput versus the number of devices KK, the number of preambles NN, and the group active probability pap_{a}, for dependent and independent device activities, respectively. From Fig. 7 and Fig. 8, we also observe that Prop-LowComp-tt significantly outperforms all the baseline schemes in case-tt for t=t= pp and ip; Prop-up outperforms all the baseline schemes in case-up. The results for large KK and NN shown in Fig. 7 and Fig. 8 are similar to those for small KK and NN shown in Fig. 3 and Fig. 4.

VII Conclusion

This paper considered the joint optimization of preamble selection and access barring for random access in MTC under an arbitrary joint device activity distribution that can reflect dependent and independent device activities. We considered the cases of perfect, imperfect, and unknown general joint device activity distributions and formulated the average, worst-case average, and sample average throughput maximization problems, respectively. All three problems are challenging nonconvex problems. We proposed iterative algorithms with convergence guarantees to tackle these problems with various optimization techniques. Numerical results showed that the proposed solutions achieve significant gains over existing schemes in all three cases for both dependent and independent device activities, and the proposed low-complexity algorithms for the first two cases already achieve competitive performance.

Appendix A: Proof for (6)

Substituting (4) into (5), we have [33]131313We omit some details due to the page limitation. The complete proofs can be found in [33].

T¯(𝐀,ϵ,𝐩)=𝐱𝒳p𝐱n𝒩k𝒦xkak,nϵm𝒦:mk(1xmam,nϵ)\displaystyle\begin{array}[]{ll}&\bar{T}\left({\bf A},\epsilon,\bf p\right)=\sum\limits_{{\bf x\in\mathcal{X}}}p_{\bf x}\sum\limits_{n\in\mathcal{N}}\sum\limits_{k\in\mathcal{K}}x_{k}a_{k,n}\epsilon\prod\limits_{m\in{\mathcal{K}:m\neq k}}(1-x_{m}a_{m,n}\epsilon)\\ \end{array}
=𝐱𝒳p𝐱n𝒩k𝒦xkak,nϵ𝒦𝒦{k}m𝒦(xmam,nϵ)=𝐱𝒳p𝐱n𝒩𝒦1𝒦2𝒦:𝒦1𝒦2=,|𝒦1|=1m𝒦1(xmam,nϵ)𝒦2(xa,nϵ)=𝐱𝒳p𝐱n𝒩𝒦𝒦|𝒦|k𝒦(xkak,nϵ)=m(1)m1ϵmn𝒩𝒦𝒦:|𝒦|=m(𝐱𝒳p𝐱k𝒦xk)k𝒦ak,n.\displaystyle\begin{array}[]{ll}&=\sum\limits_{{\bf x\in\mathcal{X}}}p_{\bf x}\sum\limits_{n\in\mathcal{N}}\sum\limits_{k\in\mathcal{K}}{x}_{k}a_{k,n}\epsilon\sum\limits_{\mathcal{K}^{\prime}\subseteq{\mathcal{K}}\setminus\{k\}}\prod\limits_{m\in\mathcal{K}^{\prime}}(-{x}_{m}a_{m,n}\epsilon)\\ &=-\sum\limits_{{\bf x\in\mathcal{X}}}p_{\bf x}\sum\limits_{n\in\mathcal{N}}\sum\limits_{\mathcal{K}_{1}\cup\mathcal{K}_{2}\subseteq\mathcal{K}:\mathcal{K}_{1}\bigcap\mathcal{K}_{2}=\emptyset,|\mathcal{K}_{1}|=1}\prod\limits_{m\in\mathcal{K}_{1}}(-{x}_{m}a_{m,n}\epsilon)\prod\limits_{\ell\in\mathcal{K}_{2}}(-{x}_{\ell}a_{\ell,n}\epsilon)\\ &=-\sum\limits_{{\bf x\in\mathcal{X}}}p_{\bf x}\sum\limits_{n\in\mathcal{N}}\sum\limits_{\mathcal{K}^{\prime}\subseteq{\mathcal{K}}}|\mathcal{K}^{\prime}|\prod\limits_{k\in\mathcal{K}^{\prime}}(-{x}_{k}a_{k,n}\epsilon)=m(-1)^{m-1}\epsilon^{m}\sum\limits_{n\in\mathcal{N}}\sum\limits_{\mathcal{K}^{\prime}\subseteq{\mathcal{K}}:|\mathcal{K}^{\prime}|=m}\left(\sum\limits_{{\bf x\in\mathcal{X}}}p_{\bf x}\prod\limits_{k\in\mathcal{K}^{\prime}}{x}_{k}\right)\prod\limits_{k\in\mathcal{K}^{\prime}}a_{k,n}.\end{array}

Appendix B: Proof of Theorem 1

First, it is clear that each problem in (15) has the same form as the problem in [32, Exercise 4.8]. According to the analytical solution of the problem in [32, Exercise 4.8], a set of optimal points of each problem in (15) are given by (19). Next, since T¯(𝐀,ϵ,𝐩)\bar{T}({\bf A},\epsilon,{\bf p}) is a polynomial with respect to ϵ\epsilon, by checking all roots of T¯(𝐀,ϵ,𝐩)ϵ=0\frac{\partial\bar{T}({\bf A},\epsilon,{\bf p})}{\partial\epsilon}=0 and the endpoints of the interval, we can obtain the set of optimal points of the problem in (16), which is given by (20). Therefore, we complete the proof of Theorem 1.

Appendix C: Complexity analysis for applying Theorem 1

As constants m2(1)m1m^{2}(-1)^{m-1}, m(1)m1m(-1)^{m-1}, m𝒦m\in\mathcal{K} and 𝐱𝒳p𝐱k𝒦xk\sum\nolimits_{{\bf x\in\mathcal{X}}}p_{\bf x}\prod\nolimits_{k\in\mathcal{K}^{\prime}}{x}_{k}, 𝒦𝒦{\mathcal{K}^{\prime}}\subseteq\mathcal{K} are computed in advance, the corresponding computational complexities are not considered below. First, we analyze the computational complexity for determining the set in (19). In each iteration, we first compute m(1)m1ϵmm(-1)^{m-1}\epsilon^{m}, m𝒦m\in\mathcal{K} in 2K2K flops. Then, we compute Qk,n(𝐚k,ϵ,𝐩)Q_{k,n}({\bf a}_{-k},\epsilon,{\bf p}), k𝒦k\in\mathcal{K}, n𝒩n\in\mathcal{N}. In (17), the summation with respect to mm contains KK summands, the summation with respect to 𝒦\mathcal{K}^{\prime} contains (K1m1)\binom{K-1}{m-1} summands, and 𝒦:ka,n\prod\nolimits_{\ell\in\mathcal{K}^{\prime}:\ell\neq k}a_{\ell,n} for 𝒦𝒦\mathcal{K}^{\prime}\subseteq\mathcal{K} with k𝒦,|𝒦|=mk\in\mathcal{K}^{\prime},|\mathcal{K}^{\prime}|=m involves m2m-2 multiplications. Thus, calculating Qk,n(𝐚k,ϵ,𝐩)Q_{k,n}({\bf a}_{-k},\epsilon,{\bf p}), k𝒦k\in\mathcal{K}, n𝒩n\in\mathcal{N} costs (K1+m𝒦((m2+1)(K1m1)+1+(K1m1)1))NK+2K=NK(K+1)(2(K2)+1)2K(N1)\left(K-1+\sum\nolimits_{m\in\mathcal{K}}\left((m-2+1)\binom{K-1}{m-1}+1+\binom{K-1}{m-1}-1\right)\right)NK+2K=NK(K+1)(2^{(K-2)}+1)-2K(N-1) flops, i.e., has computational complexity 𝒪(NK22K)\mathcal{O}\left(NK^{2}2^{K}\right). For all k𝒦k\in\mathcal{K}, the computational complexity for finding the largest one among Qk,n(𝐚k,ϵ,𝐩),n𝒩Q_{k,n}({\bf a}_{-k},\epsilon,{\bf p}),n\in\mathcal{N} is 𝒪(N)\mathcal{O}(N). Thus, the overall computational complexity for determining the sets in (19) is 𝒪(NK22K)+𝒪(KN)=𝒪(NK22K)\mathcal{O}\left(NK^{2}2^{K}\right)+\mathcal{O}\left(KN\right)=\mathcal{O}\left(NK^{2}2^{K}\right).

Then, we analyze the computational complexity for determining the set in (20). Note that q(𝐀,ϵ,𝐩)q({\bf A},\epsilon,{\bf p}) in (18) is a univariate polynomial of order K1K-1 with respect to ϵ\epsilon for any given 𝐀\bf A and 𝐩\bf p. We first calculate the coefficients of q(𝐀,ϵ,𝐩)q({\bf A},\epsilon,{\bf p}). In (18), the summation with respect to mm contains KK summands, the summation with respect to nn and 𝒦\mathcal{K}^{\prime} contains N(Km)N\binom{K}{m} summands, and 𝒦a,n\prod\nolimits_{\ell\in\mathcal{K}^{\prime}}a_{\ell,n} for n𝒩n\in\mathcal{N} and 𝒦𝒦\mathcal{K}^{\prime}\subseteq\mathcal{K} with |𝒦|=m|\mathcal{K}^{\prime}|=m involves m1m-1 multiplications. Thus, calculating the coefficients of q(𝐀,ϵ,𝐩)q({\bf A},\epsilon,{\bf p}) costs m𝒦((m1+1)N(Km)+1+N(Km)1)=N(K+2)2(K1)N\sum\nolimits_{m\in\mathcal{K}}\left((m-1+1)N\binom{K}{m}+1+N\binom{K}{m}-1\right)=N(K+2)2^{(K-1)}-N flops, i.e., has computational complexity 𝒪(NK2K)\mathcal{O}(NK2^{K}). Next, we obtain the K1K-1 roots of q(𝐀,ϵ,𝐩)q({\bf A},\epsilon,{\bf p}) based on its coefficients by using the QR algorithm with computational complexity 𝒪(K3)\mathcal{O}(K^{3}). Furthermore, finding the real roots in [0,1][0,1] from these K1K-1 roots has computational complexity 𝒪(K)\mathcal{O}(K). Hence, the computational complexity for determining (𝐀,𝐩)\mathcal{B}({\bf A},{\bf p}) is 𝒪(NK2K)+𝒪(K3)+𝒪(K)=𝒪(NK2K)\mathcal{O}(NK2^{K})+\mathcal{O}(K^{3})+\mathcal{O}(K)=\mathcal{O}(NK2^{K}). Analogously, we know that the computational complexity for computing T¯(𝐀,z,𝐩)\bar{T}\left({\bf A},z,{\bf p}\right), z(𝐀,𝐩){1}z\in\mathcal{B}({\bf A},{\bf p})\cup\{1\} is 𝒪(NK22K)\mathcal{O}\left(NK^{2}2^{K}\right). The computational complexity for finding the largest ones among T¯(𝐀,z,𝐩)\bar{T}\left({\bf A},z,{\bf p}\right), z(𝐀,𝐩){1}z\in\mathcal{B}({\bf A},{\bf p})\cup\{1\} is 𝒪(K)\mathcal{O}(K). Therefore, the overall computational complexity for determining the set in (20) is 𝒪(NK2K)+𝒪(NK22K)+𝒪(K)=𝒪(NK22K)\mathcal{O}(NK2^{K})+\mathcal{O}(NK^{2}2^{K})+\mathcal{O}(K)=\mathcal{O}(NK^{2}2^{K}).

Appendix D: Proof of Theorem 2

First, we show that Algorithm 1 stops in a finite number of iterations. Let (𝐀(i),ϵ(i))\left({\bf A}^{(i)},\epsilon^{(i)}\right) denote the preamble selection distributions and the access barring factor obtained at iteration ii. 𝒜{𝐀:𝐚k=𝐞nk,nk𝒩,k𝒦}\mathcal{A}\triangleq\left\{{\bf A}:{\bf a}_{k}={\bf e}_{n_{k}},n_{k}\in\mathcal{N},k\in\mathcal{K}\right\} contains NKN^{K} elements, (𝐀,𝐩){1}\mathcal{B}({\bf A},{\bf p})\cup\{1\} contains no more than KK elements. Thus, 𝒞{(𝐀,ϵ):𝐀𝒜,ϵ(𝐀,𝐩){1}}\mathcal{C}\triangleq\left\{({\bf A},\epsilon):{\bf A}\in\mathcal{A},\epsilon\in\mathcal{B}({\bf A},{\bf p})\cup\{1\}\right\} and 𝒯{T¯(𝐀,ϵ,𝐩):(𝐀,ϵ)𝒞}\mathcal{T}\triangleq\left\{\bar{T}\left({\bf A},\epsilon,{\bf p}\right):\left({\bf A},\epsilon\right)\in\mathcal{C}\right\} both contain no more than KNKKN^{K} elements. In addition, as T¯(𝐀(i),ϵ(i),𝐩)\bar{T}\left({\bf A}^{(i)},\epsilon^{(i)},{\bf p}\right) is nondecreasing with ii (due to the block coordinate optimizations in each iteration) and T¯(𝐀(i),ϵ(i),𝐩)𝒯\bar{T}\left({\bf A}^{(i)},\epsilon^{(i)},{\bf p}\right)\in\mathcal{T}, i1i\geq 1, by contradiction, we can show that there exists an integer dKNKd\leq KN^{K} such that T¯(𝐀(d),ϵ(d),𝐩)=T¯(𝐀(d+1),ϵ(d+1),𝐩)\bar{T}\left({\bf A}^{(d)},\epsilon^{(d)},{\bf p}\right)=\bar{T}\left({\bf A}^{(d+1)},\epsilon^{(d+1)},{\bf p}\right). According to Steps 44 - 1111 in Algorithm 1, we have (𝐀(d+1),ϵ(d+1))=(𝐀(d),ϵ(d))\left({\bf A}^{(d+1)},\epsilon^{(d+1)}\right)=\left({\bf A}^{(d)},\epsilon^{(d)}\right), which satisfies the stopping criteria of Algorithm 1. Thus, Algorithm 1 stops at iteration d+1d+1, which is no more than NKN+1NK^{N}+1 iterations. Next, we show that Algorithm 1 returns a stationary point of Problem 1. By (𝐀(d+1),ϵ(d+1))=(𝐀(d),ϵ(d))\left({\bf A}^{(d+1)},\epsilon^{(d+1)}\right)=\left({\bf A}^{(d)},\epsilon^{(d)}\right) and Theorem 1, 𝐚k(d){\bf a}_{k}^{(d)} is the optimal point of the problem in (15) with 𝐚k=𝐚k(d){\bf a}_{-k}={\bf a}_{-k}^{(d)} and ϵ=ϵ(d)\epsilon=\epsilon^{(d)}, for k𝒦k\in\mathcal{K}, and ϵ(d)\epsilon^{(d)} is the optimal point of the problem in (16) with 𝐀=𝐀(d){\bf A}={\bf A}^{(d)}. In addition, note that the objective function T¯(𝐀,ϵ,𝐩)\bar{T}({\bf A},\epsilon,{\bf p}) is continuously differentiable and the feasible set of Problem 1 is convex. Thus, according to the proof for [29, Proposition 2.7.1], we know that (𝐀(d),ϵ(d))\left({\bf A}^{(d)},\epsilon^{(d)}\right) satisfies the first-order optimality condition of Problem 1, i.e., it is a stationary point. Therefore, we complete the proof of Theorem 2.

Appendix E: Proof of Theorem 3

Let 𝐟:KN+1KN+1{\bf f}\!\!:\mathbb{R}^{KN+1}\rightarrow\mathbb{R}^{KN+1} denote the vector mapping corresponding to one iteration of Algorithm 1. Let (𝐀,ϵ)\left({\bf A}^{\diamond},\epsilon^{\diamond}\right) denote an optimal point of Problem 1. We construct a feasible point (𝐀,ϵ)𝐟(𝐀,ϵ)\left({\bf A}^{\star},\epsilon^{\star}\right)\triangleq{\bf f}\left({\bf A}^{\diamond},\epsilon^{\diamond}\right) which satisfies the optimality property in Theorem 3. In the following, we show that (𝐀,ϵ)\left({\bf A}^{\star},\epsilon^{\star}\right) is also an optimal point of Problem 1. According to Theorem 1, we have T¯(𝐀,ϵ,𝐩)T¯(𝐚1,𝐚2,,𝐚K,ϵ,𝐩)T¯(𝐚1,𝐚2,,𝐚K,ϵ,𝐩)T¯(𝐚1,𝐚2,,𝐚K,ϵ,𝐩)T¯(𝐀,ϵ,𝐩)\bar{T}\left({\bf A}^{\diamond},\epsilon^{\diamond},{\bf p}\right)\leq\bar{T}\left({\bf a}_{1}^{\star},{\bf a}_{2}^{\diamond},...,{\bf a}_{K}^{\diamond},\epsilon^{\diamond},{\bf p}\right)\leq\bar{T}\left({\bf a}_{1}^{\star},{\bf a}_{2}^{\star},...,{\bf a}_{K}^{\diamond},\epsilon^{\diamond},{\bf p}\right)\leq...\leq\bar{T}\left({\bf a}_{1}^{\star},{\bf a}_{2}^{\star},...,{\bf a}_{K}^{\star},\epsilon^{\diamond},{\bf p}\right)\leq\bar{T}\left({\bf A}^{\star},\epsilon^{\star},{\bf p}\right). In addition, as (𝐀,ϵ)\left({\bf A}^{\diamond},\epsilon^{\diamond}\right) is an optimal point, and (𝐀,ϵ)\left({\bf A}^{\star},\epsilon^{\star}\right) is a feasible point, we have T¯(𝐀,ϵ,𝐩)T¯(𝐀,ϵ,𝐩)\bar{T}\left({\bf A}^{\star},\epsilon^{\star},{\bf p}\right)\leq\bar{T}\left({\bf A}^{\diamond},\epsilon^{\diamond},{\bf p}\right). By T¯(𝐀,ϵ,𝐩)T¯(𝐀,ϵ,𝐩)\bar{T}\left({\bf A}^{\star},\epsilon^{\star},{\bf p}\right)\geq\bar{T}\left({\bf A}^{\diamond},\epsilon^{\diamond},{\bf p}\right) and T¯(𝐀,ϵ,𝐩)T¯(𝐀,ϵ,𝐩)\bar{T}\left({\bf A}^{\star},\epsilon^{\star},{\bf p}\right)\leq\bar{T}\left({\bf A}^{\diamond},\epsilon^{\diamond},{\bf p}\right), we have T¯(𝐀,ϵ,𝐩)=T¯(𝐀,ϵ,𝐩)\bar{T}\left({\bf A}^{\star},\epsilon^{\star},{\bf p}\right)=\bar{T}\left({\bf A}^{\diamond},\epsilon^{\diamond},{\bf p}\right), which implies that (𝐀,ϵ)\left({\bf A}^{\star},\epsilon^{\star}\right) is also optimal. Therefore, we complete the proof of Theorem 3.

Appendix F: Proof of Lemma 1

The inner problem of Problem 3, min𝐩𝒫T¯(𝐀,ϵ,𝐩)\min_{{\bf p}\in\mathcal{P}}\bar{T}({\bf A},\epsilon,{\bf p}), is a feasible LP with respect to 𝐩\bf p for all (𝐀,ϵ)({\bf A},\epsilon) satisfying (2a), (2b), (3a) and (3b). As strong duality holds for LP, the primal problem, min𝐩𝒫T¯(𝐀,ϵ,𝐩)\min_{{\bf p}\in\mathcal{P}}\bar{T}({\bf A},\epsilon,{\bf p}), and its dual problem, share the same optimal value. Furthermore, by eliminating the equality constraints in the dual problem, we can equivalently convert the dual problem to the following problem[33]

max𝝀0,ν\displaystyle\max\limits_{{\boldsymbol{\lambda}}\succeq 0,\nu} (𝐱𝒳p¯𝐱1)ν+𝐱𝒳((p¯𝐱p¯𝐱)λ𝐱+p¯𝐱n𝒩k𝒦xkak,nϵ𝒦:k(1xa,nϵ))\displaystyle\quad\Big{(}\sum\limits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}-1\Big{)}\nu+\sum\limits_{{\bf x}\in\mathcal{X}}\Big{(}(\underline{p}_{\bf x}-\overline{p}_{\bf x})\lambda_{\bf x}+\underline{p}_{\bf x}\sum\limits_{n\in\mathcal{N}}\sum\limits_{k\in\mathcal{K}}x_{k}a_{k,n}\epsilon\prod\limits_{\ell\in{\mathcal{K}}:\ell\neq k}(1-x_{\ell}a_{\ell,n}\epsilon)\Big{)} (47)
s.t.\displaystyle\mathrm{s.t.} ν+λ𝐱+n𝒩k𝒦xkak,nϵ𝒦:k(1xa,nϵ)0,𝐱𝒳.\displaystyle\quad\nu+\lambda_{\bf x}+\sum\limits_{n\in\mathcal{N}}\sum\limits_{k\in\mathcal{K}}x_{k}a_{k,n}\epsilon\prod\limits_{\ell\in{\mathcal{K}}:\ell\neq k}(1-x_{\ell}a_{\ell,n}\epsilon)\geq 0,\ {\bf x}\in\mathcal{X}.

Thus, the problems in (47) and min𝐩𝒫T¯(𝐀,ϵ,𝐩)\min_{{\bf p}\in\mathcal{P}}\bar{T}({\bf A},\epsilon,{\bf p}) share the same optimal value, which can be viewed as a function of (𝐀,ϵ)({\bf A},\epsilon). Thus, the max-min problem in Problem 3 can be equivalently converted to the maximization problem in Problem 4, by replacing the inner problem, min𝐩𝒫T¯(𝐀,ϵ,𝐩)\min_{{\bf p}\in\mathcal{P}}\bar{T}({\bf A},\epsilon,{\bf p}), with the problem in (47), and replacing ϵak,n\epsilon a_{k,n} with new variables bk,nb_{k,n} in both the objective function and constraint functions. Therefore, we complete the proof of Lemma 1.

Appendix G: Proof of Theorem 6

We show that the assumptions in [30, Theorem 1] are satisfied. It is clear that h~(𝐁,𝝀,ν,𝐁(s))\tilde{h}({\bf B},{\boldsymbol{\lambda}},{\nu},{\bf B}^{(s)}) and h~c(𝐁,𝝀,ϵ,ν,𝐁(s),𝐱)\tilde{h}_{\text{c}}({\bf B},{\boldsymbol{\lambda}},\epsilon,{\nu},{\bf B}^{(s)},{\bf x}), 𝐱𝒳{\bf x}\in\mathcal{X} are continuously differentiable with respect to (𝐁,𝝀,ϵ,ν)({\bf B},{\boldsymbol{\lambda}},\epsilon,{\nu}), and each of them is the sum of linear functions and strongly concave functions. Thus, h~(𝐁,𝝀,ν,𝐁(s))\tilde{h}({\bf B},{\boldsymbol{\lambda}},{\nu},{\bf B}^{(s)}) and h~c(𝐁,𝝀,ϵ,ν,𝐁(s),𝐱)\tilde{h}_{\text{c}}({\bf B},{\boldsymbol{\lambda}},\epsilon,{\nu},{\bf B}^{(s)},{\bf x}), 𝐱𝒳{\bf x}\in\mathcal{X} satisfy the first, second, and third assumptions. From (28) and (29), it is clear that h~(𝐁,𝝀,ν,𝐁(s))\tilde{h}({\bf B},{\boldsymbol{\lambda}},{\nu},{\bf B}^{(s)}) and h~c(𝐁,𝝀,ϵ,ν,𝐁(s),𝐱)\tilde{h}_{\text{c}}({\bf B},{\boldsymbol{\lambda}},\epsilon,{\nu},{\bf B}^{(s)},{\bf x}), 𝐱𝒳{\bf x}\in\mathcal{X} satisfy the fourth and fifth assumptions. Let 𝐁2T(𝐁,1,𝐱)(2T(𝐁,1,𝐱)bk,nbk,n)(k,n),(k,n)𝒦×𝒩\nabla_{\bf B}^{2}T({{\bf B}},1,{\bf x})\triangleq\big{(}\frac{\partial^{2}T({{\bf B}},1,{\bf x})}{\partial b_{k,n}b_{k^{\prime},n^{\prime}}}\big{)}_{(k,n),(k^{\prime},n^{\prime})\in\mathcal{K}\times\mathcal{N}} and 𝐁2𝐱𝒳p¯𝐱T(𝐁,1,𝐱)(𝐱𝒳p¯𝐱2T(𝐁,1,𝐱)bk,nbk,n)(k,n),(k,n)𝒦×𝒩\nabla_{\bf B}^{2}\sum\nolimits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}T({{\bf B}},1,{\bf x})\triangleq\big{(}\sum\nolimits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}\frac{\partial^{2}T({{\bf B}},1,{\bf x})}{\partial b_{k,n}b_{k^{\prime},n^{\prime}}}\big{)}_{(k,n),(k^{\prime},n^{\prime})\in\mathcal{K}\times\mathcal{N}} denote the Hessians of T(𝐁,1,𝐱)T({{\bf B}},1,{\bf x}) and 𝐱𝒳p¯𝐱T(𝐁,1,𝐱)\sum\nolimits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}T({{\bf B}},1,{\bf x}), respectively. We can show 𝐁2T(𝐁,1,𝐱)\nabla_{\mathbf{B}}^{2}T({{\bf B}},1,{\bf x})\preceq N𝐱12𝐄\sqrt{N}\|\mathbf{x}\|_{1}^{2}{\bf E}, 𝐱𝒳{\bf x}\in\mathcal{X}, and 𝐁2𝐱𝒳p¯𝐱T(𝐁,1,𝐱)Nk𝒦k𝒦:kk(𝐱𝒳p¯𝐱xkxk𝐱1)2𝐄\nabla_{\bf B}^{2}\sum\nolimits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}T({{\bf B}},1,{\bf x})\preceq\sqrt{N\sum\nolimits_{k\in\mathcal{K}}\sum\nolimits_{k^{\prime}\in\mathcal{K}:k^{\prime}\neq k}\big{(}\sum\nolimits_{\mathbf{x}\in\mathcal{X}}\underline{p}_{\mathbf{x}}x_{k}x_{k^{\prime}}\|\mathbf{x}\|_{1}\big{)}^{2}}{\bf E} [33], where 𝐄\bf E represents the identity matrix. Thus, using the inequality implied by Taylor’s theorem [35, Eq. (25)], we know that h~(𝐁,𝝀,ν,𝐁(s))\tilde{h}({\bf B},{\boldsymbol{\lambda}},{\nu},{\bf B}^{(s)}) and h~c(𝐁,𝝀,ϵ,ν,𝐁(s),𝐱)\tilde{h}_{\text{c}}({\bf B},{\boldsymbol{\lambda}},\epsilon,{\nu},{\bf B}^{(s)},{\bf x}), 𝐱𝒳{\bf x}\in\mathcal{X} satisfy the sixth assumption. Therefore, Theorem 4 readily follows from [30, Theorem 1].

Appendix H: Proof of Lemma 2

For all (𝐀,ϵ)({\bf A},\epsilon) satisfying (2a), (2b), (3a), (3b), as 𝐩𝒫~{\bf p}\in\tilde{\mathcal{P}}, we have min𝐩𝒫~T~(𝐀,ϵ,𝐩)ϵk𝒦𝐱𝒳p¯𝐱xkϵ2n𝒩k𝒦ak,n𝒦:>ka,n𝐱𝒳p¯𝐱xkx\min\nolimits_{{\bf p}\in\tilde{\mathcal{P}}}\tilde{T}({\bf A},{\epsilon},{\bf p})\geq\epsilon\sum\nolimits_{k\in\mathcal{K}}\sum\nolimits_{{\bf x}\in\mathcal{X}}{\underline{p}}_{\bf x}x_{k}-\epsilon^{2}\sum\nolimits_{n\in\mathcal{N}}\sum\nolimits_{k\in\mathcal{K}}a_{k,n}\sum\nolimits_{\ell\in{\mathcal{K}}:\ell>k}a_{\ell,n}\sum\nolimits_{{\bf x}\in\mathcal{X}}{\overline{p}}_{\bf x}x_{k}x_{\ell}, where the equality holds when

𝐱𝒳p𝐱=1,𝐱𝒳p𝐱xk=𝐱𝒳p¯𝐱xk,k𝒦,𝐱𝒳p𝐱xkx=𝐱𝒳p¯𝐱xkx,k,𝒦,k<\displaystyle\begin{array}[]{ll}\sum\limits_{{\bf x}\in\mathcal{X}}p_{\bf x}=1,\ \sum\limits_{{\bf x}\in\mathcal{X}}{p}_{\bf x}x_{k}=\sum\limits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}x_{k},\ k\in\mathcal{K},\ \sum\limits_{{\bf x}\in\mathcal{X}}{p}_{\bf x}x_{k}x_{\ell}=\sum\limits_{{\bf x}\in\mathcal{X}}\overline{p}_{\bf x}x_{k}x_{\ell},\ k,\ell\in\mathcal{K},k<\ell\end{array} (49)

are satisfied. Thus, to show Lemma 2, it is equivalent to show that the system of linear equations with variable 𝐩\bf p in (49) has a solution. Let g(𝐱)1+k𝒦xk2k1g({\bf x})\triangleq 1+\sum\nolimits_{k\in\mathcal{K}}x_{k}2^{k-1} [37]. Obviously, g:𝒳{1,2,3,,2K}g:\mathcal{X}\rightarrow\{1,2,3,...,2^{K}\} is a bijection. Thus, for all 𝐱𝒳{\bf x}\in\mathcal{X}, we can also write p𝐱p_{\bf x} as pg(𝐱)p_{g({\bf x})}. As g(𝟎)g(\bf 0), min{g(𝐱)|xk=1,𝐱𝒳}=g(𝐞k)\min\{g({\bf x})|x_{k}=1,{\bf x}\in\mathcal{X}\}=g({\bf e}_{k}), k𝒦k\in\mathcal{K}, and min{g(𝐱)|xk=1,x=1,𝐱𝒳}=g(𝐞k+𝐞)\min\{g({\bf x})|x_{k}=1,x_{\ell}=1,{\bf x}\in\mathcal{X}\}=g({\bf e}_{k}+{\bf e}_{\ell}), k,𝒦,k<k,\ell\in\mathcal{K},k<\ell are all different, the matrix of the system of linear equations in (49) has full row rank. In addition, note that the number of equations, K2+K+22\frac{K^{2}+K+2}{2}, is no greater than the number of variables, 2K2^{K}. Thus, the system of linear equations in (49) has a solution. We complete the proof of Lemma 2.

Appendix I: Proof of Theorem 8

For all k𝒦k\in\mathcal{K} and all t1t\geq 1, let n(t)argmaxn𝒩ck,n(t)n^{(t)}\triangleq\mathop{\arg\max}_{n\in\mathcal{N}}c_{k,n}^{(t)}, which is a singleton, and let n(t)n^{{}^{\prime}(t)} denote one index in argmaxn𝒩:nn(t)ck,n(t)\mathop{\arg\max}_{n\in\mathcal{N}:n\neq n^{(t)}}c_{k,n}^{(t)}. First, we show that for any 0<ω<14(ck,n(t)(t)ck,n(t)(t))0<\omega<\frac{1}{4}\big{(}c_{k,n^{(t)}}^{(t)}-c_{k,n^{{}^{\prime}(t)}}^{(t)}\big{)}, the optimization problem in (32) and the following QP (which is strongly convex),

max𝐚k\displaystyle\max\limits_{{\bf a}_{k}} Qω(t)(𝐚k)T~st,k(t)(𝐚k,𝐀(t1),ϵ(t1))ωn𝒩(ak,nak,n(t1))2,k𝒦\displaystyle\quad Q_{\omega}^{(t)}\left({\bf a}_{k}\right)\triangleq\tilde{T}_{\text{st},k}^{(t)}\big{(}{\bf a}_{k},{\bf A}^{(t-1)},\epsilon^{(t-1)}\big{)}-\omega\sum\nolimits_{n\in\mathcal{N}}\big{(}a_{k,n}-a_{k,n}^{(t-1)}\big{)}^{2},\quad k\in\mathcal{K} (50)
s.t.\displaystyle\mathrm{s.t.} (3a),(3b),\displaystyle\quad\text{(\ref{C2})},\ \text{(\ref{C3})},

share the same optimal point. According to Theorem 7, the optimal point of the problem in (32) is 𝐞n(t){\bf e}_{n^{(t)}}. In addition, for any 𝐚k{\bf a}_{k} satisfying the constraints in (3a) and (3b), we can show Qω(t)(𝐚k)Qω(t)(𝐞n(t))Q_{\omega}^{(t)}\left({\bf a}_{k}\right)\leq Q_{\omega}^{(t)}\left({\bf e}_{n^{(t)}}\right) [33]. By noting that the QP in (50) is strongly convex (as ω>0\omega>0), 𝐞n(t){\bf e}_{n^{(t)}} is also the unique optimal point of the strongly convex QP. Thus, if argmaxn𝒩ck,n(t)\mathop{\arg\max}_{n\in\mathcal{N}}c_{k,n}^{(t)} is a singleton for all k𝒦k\in\mathcal{K} and all t1t\geq 1, Algorithm 4 can be viewed as stochastic parallel SCA in [36], for solving the following stochastic problem

max𝐀,ϵ\displaystyle\max\limits_{{\bf A},\epsilon} 𝔼[MImI[m=ξ]imT(𝐀,ϵ,𝐱i)]\displaystyle\quad\mathbb{E}\left[\frac{M}{I}\sum\limits_{m\in\mathcal{M}}\text{I}[m=\xi]\sum\limits_{i\in\mathcal{I}_{m}}T\left({\bf A},{\epsilon},{\bf x}_{i}\right)\right] (51)
s.t.\displaystyle\mathrm{s.t.} (2a),(2b),(3a),(3b),\displaystyle\quad\text{(\ref{C1})},\ \text{(\ref{C12})},\ \text{(\ref{C2})},\ \text{(\ref{C3})},

where ξ\xi (index of mini-batch) follows the uniform distribution over \mathcal{M}, and the expectation is taken over ξ\xi. By noting that 𝔼[MImI[m=ξ]imT(𝐀,ϵ,𝐱i)]=T¯st(𝐀,ϵ)\mathbb{E}\left[\frac{M}{I}\sum\nolimits_{m\in\mathcal{M}}\text{I}[m=\xi]\sum\nolimits_{i\in\mathcal{I}_{m}}T\left({\bf A},{\epsilon},{\bf x}_{i}\right)\right]=\bar{T}_{\text{st}}\left({\bf A},\epsilon\right), the stochastic problem in (51) is equivalent to Problem 7. Next, we show the convergence of Algorithm 4 by showing that the assumptions in [36, Theorem 1] are satisfied. Obviously, the constraint set of Problem 7 is compact and convex. It is clear that for any ξ\xi\in\mathcal{M}, MImI[m=ξ]imT(𝐀,ϵ,𝐱i)\frac{M}{I}\sum\nolimits_{m\in\mathcal{M}}\text{I}[m=\xi]\sum\nolimits_{i\in\mathcal{I}_{m}}T\left({\bf A},{\epsilon},{\bf x}_{i}\right) is smooth on the constraint set of Problem 7, and hence it is continuously differentiable and its derivative is Lipschitz continuous. Random variables ξ(0),ξ(1),{\xi}^{(0)},{\xi}^{(1)},... are bounded and identically distributed. Therefore, Theorem 8 readily follows from[36, Theorem 1].

References

  • [1] W. Liu, Y. Cui, L. Ding, J. Sun, Y. Liu, Y. Li, and L. Zhang, “Joint optimization of preamble selection and access barring for MTC with correlated device activities,” in Proc. IEEE ICC Wkshps., Jun. 2021, pp. 1–6.
  • [2] Z. Dawy, W. Saad, A. Ghosh, J. G. Andrews, and E. Yaacoub, “Toward massive machine type cellular communications,” IEEE Wireless Commun., vol. 24, no. 1, pp. 120–128, Feb. 2017.
  • [3] M. Hasan, E. Hossain, and D. Niyato, “Random access for machine-to-machine communication in LTE-advanced networks: Issues and approaches,” IEEE Commun. Mag., vol. 51, no. 6, pp. 86–93, Jun. 2013.
  • [4] R. Ratasuk, N. Mangalvedhe, D. Bhatoolaul, and A. Ghosh, “LTE-M evolution towards 5G massive MTC,” in Proc. IEEE GC Wkshps., Dec. 2017, pp. 1–6.
  • [5] Y. Wang, X. Lin, A. Adhikary, A. Grovlen, Y. Sui, Y. Blankenship, J. Bergman, and H. S. Razaghi, “A primer on 3GPP narrowband Internet of Things,” IEEE Commun. Mag., vol. 55, no. 3, pp. 117–123, Mar. 2017.
  • [6] E. Dahlman, P. Stefan, and S. Johan, 5G NR: The Next Generation Wireless Access Technology.   Academic Press, 2020.
  • [7] 3GPP, “Evolved universal terrestrial radio access (E-UTRA); medium access control (MAC) protocol specification,” TS 36.321, Apr, 2015.
  • [8] S. Duan, V. Shah-Mansouri, Z. Wang, and V. W. S. Wong, “D-ACB: Adaptive congestion control algorithm for bursty M2M traffic in LTE networks,” IEEE Trans. Veh. Technol., vol. 65, no. 12, pp. 9847–9861, Dec. 2016.
  • [9] Z. Wang and V. W. S. Wong, “Optimal access class barring for stationary machine type communication devices with timing advance information,” IEEE Trans. Wireless Commun., vol. 14, no. 10, pp. 5374–5387, Oct. 2015.
  • [10] M. Vilgelm, S. Rueda Lin~\tilde{n}ares, and W. Kellerer, “On the resource consumption of M2M random access: Efficiency and pareto optimality,” IEEE Wireless Commun. Lett., vol. 8, no. 3, pp. 709–712, Jun. 2019.
  • [11] W. Zhan and L. Dai, “Massive random access of machine-to-machine communications in LTE networks: Modeling and throughput optimization,” IEEE Trans. Wireless Commun., vol. 17, no. 4, pp. 2771–2785, Apr. 2018.
  • [12] ——, “Massive random access of machine-to-machine communications in LTE networks: Throughput optimization with a finite data transmission rate,” IEEE Trans. Wireless Commun., vol. 18, no. 12, pp. 5749–5763, Apr. 2019.
  • [13] C. Di, B. Zhang, Q. Liang, S. Li, and Y. Guo, “Learning automata-based access class barring scheme for massive random access in machine-to-machine communications,” IEEE Internet Things J., vol. 6, no. 4, pp. 6007–6017, Aug. 2019.
  • [14] H. Jin, W. T. Toor, B. C. Jung, and J. Seo, “Recursive pseudo-bayesian access class barring for M2M communications in LTE systems,” IEEE Trans. Veh. Technol., vol. 66, no. 9, pp. 8595–8599, Sep. 2017.
  • [15] O. Galinina, A. Turlikov, S. Andreev, and Y. Koucheryavy, “Stabilizing multi-channel slotted ALOHA for machine-type communications,” in Proc. IEEE ISIT, Jul. 2013, pp. 2119–2123.
  • [16] C. Oh, D. Hwang, and T. Lee, “Joint access control and resource allocation for concurrent and massive access of M2M devices,” IEEE Trans. Wireless Commun., vol. 14, no. 8, pp. 4182–4192, Aug. 2015.
  • [17] J. Choi, “On the adaptive determination of the number of preambles in RACH for MTC,” IEEE Wireless Commun. Lett., vol. 20, no. 7, pp. 1385–1388, Jul. 2016.
  • [18] ——, “Multichannel ALOHA with exploration phase,” in Proc. IEEE WCNC, May 2020, pp. 1–6.
  • [19] M. Shehab, A. K. Hagelskjar, A. E. Kalor, P. Popovski, and H. Alves, “Traffic prediction based fast uplink grant for massive IoT,” in Proc. IEEE PIMRC, Sep. 2020, pp. 1–6.
  • [20] S. Ali, A. Ferdowsi, W. Saad, N. Rajatheva, and J. Haapola, “Sleeping multi-armed bandit learning for fast uplink grant allocation in machine type communications,” IEEE Trans. Commun., vol. 68, no. 8, pp. 5072–5086, Apr. 2020.
  • [21] S. Ali, W. Saad, and N. Rajatheva, “A directed information learning framework for event-driven M2M traffic prediction,” IEEE Commun. Lett., vol. 22, no. 11, pp. 2378–2381, Aug. 2018.
  • [22] 3GPP, “Study on RAN improvements for machine-type communications,” TR 37.868, Spt. 2011.
  • [23] E. Soltanmohammadi, K. Ghavami, and M. Naraghi-Pour, “A survey of traffic issues in machine-to-machine communications over LTE,” IEEE Internet Things J., vol. 3, no. 6, pp. 865–884, Dec. 2016.
  • [24] J. Navarro-Ortiz, P. Romero-Diaz, S. Sendra, P. Ameigeiras, J. J. Ramos-Munoz, and J. M. Lopez-Soler, “A survey on 5G usage scenarios and traffic models,” IEEE Commun. Surv. Tutor., vol. 22, no. 2, pp. 905–929, May 2020.
  • [25] Y. Cui, S. Li, and W. Zhang, “Jointly sparse signal recovery and support recovery via deep learning with applications in MIMO-based grant-free random access,” IEEE J. Select. Areas Commun., vol. 39, no. 3, pp. 788–803, Mar. 2021.
  • [26] D. Jiang and Y. Cui, “ML and MAP device activity detections for grant-free massive access in multi-cell networks,” IEEE Trans. Wireless Commun., to appear.
  • [27] A. E. Kalor, O. A. Hanna, and P. Popovski, “Random access schemes in wireless systems with correlated user activity,” in Proc. IEEE SPAWC, Jun. 2018, pp. 1–5.
  • [28] L. Liu and W. Yu, “Massive connectivity with massive MIMO–part I: Device activity detection and channel estimation,” IEEE Trans. Signal Process., vol. 66, no. 11, pp. 2933–2946, Jun. 2018.
  • [29] D. P. Bertsekas, Nonlinear progranmming.   Athena scientific Belmont, MA, 1998.
  • [30] M. Razaviyayn, “Successive convex approximation: Analysis and applications,” Ph.D. dissertation, Univ. of Minnesota, Minneapolis, MN, USA, May 2014.
  • [31] A. Koppel, A. Mokhtari, and A. Ribeiro, “Parallel stochastic successive convex approximation method for large-scale dictionary learning,” in Proc. IEEE ICASSP, Apr. 2018, pp. 2771–2775.
  • [32] S. Boyd, S. P. Boyd, and L. Vandenberghe, Convex optimization.   Cambridge university press, 2004.
  • [33] W. Liu, Y. Cui, F. Yang, L. Ding, and J. Sun, “Joint optimization of preamble selection and access barring for random access in mtc with general device activities,” arXiv preprint arXiv:2107.10461, Jul. 2021.
  • [34] C. Ye, Y. Cui, Y. Yang, and R. Wang, “Optimal caching designs for perfect, imperfect, and unknown file popularity distributions in large-scale multi-tier wireless networks,” IEEE Trans. Commun., vol. 67, no. 9, pp. 6612–6625, May 2019.
  • [35] Y. Sun, P. Babu, and D. P. Palomar, “Majorization-minimization algorithms in signal processing, communications, and machine learning,” IEEE Trans. Signal Process., vol. 65, no. 3, pp. 794–816, Feb. 2017.
  • [36] Y. Yang, G. Scutari, D. P. Palomar, and M. Pesavento, “A parallel decomposition method for nonconvex stochastic multi-agent optimization problems,” IEEE Trans. Signal Process., vol. 64, no. 11, pp. 2949–2964, Feb. 2016.
  • [37] J. L. Teugels, “Some representations of the multivariate bernoulli and binomial distributions,” Journal of Multivariate Analysis, vol. 32, no. 2, pp. 256–268, Feb. 1990.