Joint Optimization of Preamble Selection and Access Barring for Random Access in MTC with General Device Activities

Wang Liu, Ying Cui, Feng Yang, Lianghui Ding, and Jun Sun The authors are with Shanghai Jiao Tong University, China. This paper was presented in part at IEEE ICC Workshops 2021 [1].

Abstract

Most existing random access schemes for machine-type communications (MTC) simply adopt a uniform preamble selection distribution, irrespective of the underlying device activity distributions. Hence, they may yield unsatisfactory access efficiency. In this paper, we model device activities for MTC as multiple Bernoulli random variables following an arbitrary multivariate Bernoulli distribution which can reflect both dependent and independent device activities. Then, we optimize preamble selection and access barring for random access in MTC according to the underlying joint device activity distribution. Specifically, we investigate three cases of the joint device activity distribution, i.e., the cases of perfect, imperfect, and unknown joint device activity distributions, and formulate the average, worst-case average, and sample average throughput maximization problems, respectively. The problems in the three cases are challenging nonconvex problems. In the case of perfect joint device activity distribution, we develop an iterative algorithm and a low-complexity iterative algorithm to obtain stationary points of the original problem and an approximate problem, respectively. In the case of imperfect joint device activity distribution, we develop an iterative algorithm and a low-complexity iterative algorithm to obtain a Karush-Kuhn-Tucker (KKT) point of an equivalent problem and a stationary point of an approximate problem, respectively. Finally, in the case of unknown joint device activity distribution, we develop an iterative algorithm to obtain a stationary point. The proposed solutions are widely applicable and outperform existing solutions for dependent and independent device activities.

Index Terms:

Machine-type communications (MTC), random access procedure, preamble selection, access barring, optimization.

I Introduction

The emerging Internet-of-Things (IoT), whose key idea is to connect everything and everyone by the Internet, has received increasing attention in recent years. Machine-type communications (MTC) are expected to support IoT services and applications, such as home automation, smart grids, smart healthcare, and environment monitoring [2, 3]. Long Term Evolution (LTE) cellular networks offer the most natural and appealing solution for MTC due to ubiquitous coverage. Specifically, the Third-Generation Partnership Project (3GPP) has developed radio technologies, such as LTE-M [4] and Narrowband IoT (NB-IoT) [5], to enhance existing LTE networks and to provide new solutions inherited from LTE, respectively, for better serving IoT use cases. In addition, 3GPP proposes to the International Telecommunications Union (ITU) that LTE-M and NB-IoT should be integrated as part of the fifth generation (5G) specifications [6].

In LTE-M and NB-IoT, devices compete in a random access channel (RACH) to access the base station (BS) through the random access procedure [7]. Specifically, each active device randomly selects a preamble from a pool of available preambles according to a preamble selection distribution, and transmits it during the RACH. The BS acknowledges the successful reception of a preamble if such preamble is transmitted by only one device. In[8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18], the authors consider the random access procedure and study the effect of preamble selection under certain assumptions on the knowledge of device activities. Specifically,[8, 9, 10, 13, 16, 17, 18] assume that the number of active devices is known; [15, 14] assume that the distribution of the number of active devices is known; [11, 12] assume that the statistics of the data queue of each device are known. In[8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18], preambles are selected according to a uniform distribution, the average throughput [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18], average access efficiency [16] and resource consumption [10] are analyzed and the number of allocated preambles is optimized to maximize the average throughput [10, 17] or access efficiency [16]. Notice that optimal solutions are obtained in [10, 16, 17]. In [19, 20, 21], the active devices are assumed to be known. Under this assumption, each active device is allocated a preamble if the number of preambles is greater than or equal to the number of active devices, and the preambles are allocated to a subset of active devices otherwise.

When many active devices attempt to access a BS simultaneously, a preamble is very likely to be selected by more than one device, leading to a significant decrease in the probability of access success. In this scenario, access control is necessary. One widely used access control method is the access barring scheme [22], which has been included in the LTE specification in [7]. In[8, 9, 10, 11, 12, 13, 14, 15, 16], the authors consider access barring, besides random access procedure with preambles selected according to a uniform distribution. Specifically, the access barring factor is optimized to maximize the average throughput [8, 9, 10, 11, 12, 13, 14] or access efficiency [16], or to minimize the number of backlogged devices [15]. Note that optimal solutions are obtained in [15, 8, 13, 10, 14, 16], an approximate solution is obtained in [9], and asymptotically optimal solutions for a large number of devices are obtained in [11, 12].

There exist two main types of MTC traffic, i.e., event-driven (or event-triggered) traffic and periodic traffic [23, 24]. In event-driven MTC (e.g., smart health care and environment sensing such as rockfall detection and fire alarm), one device being triggered may increase the probability that other devices in the vicinity trigger in quick succession, or many devices are simultaneously triggered in response to a particular IoT event. Hence, device activities can be dependent. Besides, in periodic MTC, devices have their own periods, and hence device activities can be considered independent for tractability (if the periodic behaviors of devices are not tracked). Intuitively, effective preamble selection and access barring should adapt to the statistical behaviors of device activities. Specifically, if devices are more likely to activate simultaneously, letting them always select different preambles can avoid more collisions; if devices are less likely to activate simultaneously, letting them always select the same preamble can avoid more collisions. However, most existing random access schemes simply adopt a uniform preamble selection distribution, irrespective of the underlying device activity distributions [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]. They hence may yield far from optimal access efficiency. This motivates us to model device activities as multiple Bernoulli random variables following an arbitrary multivariate Bernoulli distribution, which can reflect both dependent and independent device activities [25, 26, 27, 28], and optimize preamble selection and access barring according to the multivariate Bernoulli distribution [27]. To our knowledge, [27] is the first work that considers a general joint device activity distribution and attempts to optimize the preamble selection distributions and access barring factors of all devices. More specifically, in [27], the authors assume that a perfect joint device activity distribution is known, maximize an approximation of the average throughput, which captures the active probabilities of every single device and every two devices, and develop a heuristic algorithm to tackle the challenging nonconvex problem. The approximation error and the heuristic algorithm may yield a nonnegligible loss in access efficiency. Therefore, it is critical to explore more effective algorithms adaptive to statistics of device activities. Furthermore, notice that most existing works assume that the number of active devices [8, 9, 10, 13, 16, 17], the distribution of the number of active devices [14, 15], the statistics of the data queue of each device [11, 12], the active devices [19, 20, 21], or the device activity distribution [27] is perfectly known. However, in practice, such assumptions are hardly satisfied. Hence, it is critical to consider the optimization of preamble selection and access barring under imperfect information of device activities or even historical samples of device activities.

This paper investigates the joint optimization of preamble selection and access barring for IoT devices in MTC, whose activities are modeled by an arbitrary multivariate Bernoulli distribution, which can reflect both dependent and independent device activities. Specifically, we consider three cases of the joint device activity distribution, i.e., the case of perfect general joint device activity distribution (where the joint device activity distribution has been perfectly estimated), the case of imperfect joint device activity distribution (where the estimation error of the joint device activity distribution lies within a known deterministic bound), and the case of unknown joint device activity distribution (where activity samples generated according to the joint device activity distribution are available). The optimization-based solutions are generally applicable for any dependent and independent device activities. Our main contributions are summarized below.

•

In the case of perfect joint device activity distribution, we formulate the average throughput maximization problem, which is nonconvex with a complicated objective function. Based on the block coordinate descend (BCD) method [29], we develop an iterative algorithm, where most block coordinate optimization problems are solved analytically, and a low-complexity iterative algorithm, where all block coordinate optimization problems have closed-form optimal points, to obtain stationary points of the original problem and an approximate problem, respectively. Furthermore, we characterize an optimality property of a globally optimal point of the original problem.
•

In the case of imperfect joint device activity distribution, we formulate the worst-case average throughput maximization problem as a robust optimization problem, which is a max-min problem. By duality theory and successive convex approximation (SCA) [30], we develop an iterative algorithm to obtain a Karush-Kuhn-Tucker (KKT) point of an equivalent problem of the max-min problem. Based on the BCD method, we also develop a low-complexity iterative algorithm, where all block coordinate optimization problems are solved analytically, to obtain a stationary point of an approximate problem of the max-min problem.
•

In the case of unknown joint device activity distribution, we formulate the sample average throughput maximization problem, which can be viewed as a nonconvex stochastic optimization problem. Based on mini-batch stochastic parallel SCA [31], we develop an efficient parallel iterative algorithm where the approximate convex problems in each iteration are solved analytically, to obtain a stationary point.
•

Finally, we show that for dependent and independent device activities, the proposed solutions achieve significant gains over existing schemes in all three cases by numerical results. Besides, the performance of each low-complexity algorithm is close to that of its counterpart designed for solving the original problem.

The key notation used in this paper is listed in Table I.

TABLE I: Key Notation

Notation	Description	Notation	Description
$K$	the number of devices	$a_{k,n}\in[0,1]$	the probability that device $k$ selects preamble $n$
$N$	the number of preambles	$T({\bf A},{\epsilon},{\bf x})$	average throughput conditional on $\mathbf{x}$
$x_{k}\in\{0,1\}$	the activity state of device $k$	$\bar{T}({\bf A},{\epsilon},{\bf p})$	average throughput
$p_{\bf x}\in[0,1]$	the probability that the activity states are $\bf x$	$\tilde{T}({\bf A},{\epsilon},{\bf p})$	approximate average throughput
$\underline{p}_{\bf x}\in[0,1]$	the lower bound of $p_{\bf x}$	$\bar{T}_{\text{wt}}({\bf A},{\epsilon},\mathcal{P})$	worst-case average throughput
$\overline{p}_{\bf x}\in[0,1]$	the upper bound of $p_{\bf x}$	$\tilde{T}_{\text{wt}}({\bf A},{\epsilon},\tilde{\mathcal{P}})$	approximate worst-case average throughput
$\epsilon\in[0,1]$	the access barring factor	$\bar{T}_{\text{st}}({\bf A},{\epsilon})$	sample average throughput

II System Model

We consider the uplink of a single-cell wireless network consisting of one BS and $K$ devices. Let $\mathcal{K}\triangleq\{1,2,...,K\}$ denote the set of $K$ devices. We consider a discrete-time system with time being slotted and assume that devices activate independently and identically over slots within a certain period. In each slot, a device can be either active or inactive. Thus, the activities of the $K$ devices in a slot can be modeled as $K$ Bernoulli random variables, following an arbitrary multivariate Bernoulli distribution [28, 25, 26]. Generally, the $K$ Bernoulli random variables (representing the activities of the $K$ devices) can be dependent or independent. Let $x_{k}\in\{0,1\}$ denote the activity state of device $k$ , where $x_{k}=1$ if device $k$ is active and $x_{k}=0$ otherwise. Let binary vector ${\bf x}\triangleq(x_{k})_{k\in\mathcal{K}}\in\mathcal{X}$ denote the activity states of the $K$ devices, where $\mathcal{X}\triangleq\{0,1\}^{K}$ . The joint device activity distribution (assumed invariant over the considered period) is denoted by ${\bf p}\triangleq(p_{\bf x})_{{\bf x}\in\mathcal{X}}$ , where $p_{\bf x}$ represents the probability of the activity states of the $K$ devices being $\bf x$ . Note that whether the device activities are dependent or independent depends on the values or forms of $\mathbf{p}$ ’s components. According to the nonnegativity and normalization axioms, we have


	$\displaystyle p_{\bf x}\geq 0,\ {\bf x}\in\mathcal{X},$		(1a)
	$\displaystyle\sum\limits_{{\bf x}\in\mathcal{X}}p_{\bf x}=1.$		(1b)

In this paper, we consider $p_{\mathbf{0}}<1$ .¹¹1 $p_{\mathbf{0}}=1$ is not an interesting or practical scenario. We do not pose any additional assumption on $\mathbf{p}$ . Thus, all analytical and optimization results, relying on $\mathbf{p}$ or its related quantities, are general.

In a slot, each active device tries to access the BS. Congestion may occur when many active devices require to access the BS at the same time. We adopt an access barring scheme for access control [22]. In particular, at the beginning of each slot, all active devices independently attempt to access the BS with probability $\epsilon$ , where


	$\displaystyle\epsilon\geq 0,$		(2a)
	$\displaystyle\epsilon\leq 1.$		(2b)

Here, $\epsilon$ is referred to as the access barring factor for all devices and will be optimized later.²²2The average throughput can be improved when devices have different access barring factors. To avoid resource starvation for each device and maintain fairness, we consider identical access barring factors for all devices in the optimizations for simplicity. The proposed solution framework can be extended to handle different access barring factors. That is, the access barring scheme is parameterized by the access barring factor $\epsilon$ .

We adopt the random access procedure, which consists of four stages, i.e., preamble transmission, random access response, scheduled transmission, and contention resolution [3]. We focus only on the first stage, which mainly determines the success of access [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 27]. Consider $N$ orthogonal preambles, the set of which is denoted by $\mathcal{N}\triangleq\{1,2,...,N\}$ . Specifically, at the first stage, each device that attempts to access the BS independently selects a preamble out of the $N$ preambles to transmit. The probability that device $k$ selects preamble $n$ is denoted by $a_{k,n}$ , which satisfies


	$\displaystyle\ a_{k,n}\geq 0,\ k\in\mathcal{K},n\in\mathcal{N},$		(3a)
	$\displaystyle\sum\limits_{n\in\mathcal{N}}a_{k,n}=1,\ k\in\mathcal{K}.$		(3b)

Let ${\bf a}_{k}\triangleq(a_{k,n})_{n\in\mathcal{N}}$ denote the preamble selection distribution of device $k$ . Let ${\bf A}\triangleq({a}_{k,n})_{k\in\mathcal{K},n\in\mathcal{N}}$ denote the distributions of the $K$ devices. The $k$ -th column of $\bf A$ is ${\bf a}_{k}$ . Note that for all $k\in\mathcal{K}$ , the random preamble transmission parameterized by ${\bf a}_{k}$ reduces to the preamble transmission in the standard random access procedure [7] when $a_{k,n}=\frac{1}{N}$ , $n\in\mathcal{N}$ . Furthermore, note that for all $k\in\mathcal{K}$ , the considered preamble selection is generally random and becomes deterministic when $a_{k,n}\in\{0,1\},n\in\mathcal{N}$ . We allow ${\bf a}_{k}$ , $k\in\mathcal{K}$ to be arbitrary distributions to maximally avoid the collision.

If a preamble is selected by a single device, this device successfully accesses the BS [11]. Then, the average number of devices that successfully access the BS at activity states $\bf x$ in a slot is given by [27]

\displaystyle T({\bf A},{\epsilon},{\bf x})\triangleq\sum\limits_{n\in\mathcal{N}}\sum\limits_{k\in\mathcal{K}}x_{k}a_{k,n}\epsilon\prod\limits_{\ell\in{\mathcal{K}:\ell\neq k}}(1-x_{\ell}a_{\ell,n}\epsilon).

(4)

In this paper, we consider the following three cases of joint device activity distribution and introduce the respective performance metrics.
Perfect joint device activity distribution: In this case, we assume that the joint device activity distribution has been estimated by some learning methods, and the estimation error is negligible. That is, the exact value of $\bf p$ is known. We adopt the average throughput [27]

	$\displaystyle\bar{T}({\bf A},{\epsilon},{\bf p})$	$\displaystyle\triangleq\sum\limits_{{\bf x}\in\mathcal{X}}p_{\bf x}T({\bf A},{\epsilon},{\bf x})$		(5)
		$\displaystyle=\sum\limits_{m=1}^{K}m(-1)^{m-1}\epsilon^{m}\sum\limits_{n\in\mathcal{N}}\sum\limits_{\mathcal{K}^{\prime}\subseteq{\mathcal{K}}:\|\mathcal{K}^{\prime}\|=m}\left(\sum\limits_{{\bf x\in\mathcal{X}}}p_{\bf x}\prod\limits_{k\in\mathcal{K}^{\prime}}{x}_{k}\right)\prod\limits_{k\in\mathcal{K}^{\prime}}a_{k,n},$		(6)

as the performance metric, where $T({\bf A},{\epsilon},{\bf x})$ is given by (4). The proof for (6) can be found in Appendix A. Note that for all $\mathcal{K}^{\prime}\subseteq\mathcal{K}$ , $\sum\nolimits_{{\bf x\in\mathcal{X}}}$ $p_{\bf x}\prod\nolimits_{k\in\mathcal{K}^{\prime}}{x}_{k}$ represents the probability that all devices in $\mathcal{K}^{\prime}$ are active and can be computed in advance.
Imperfect joint device activity distribution: In this case, we assume that the joint device activity distribution has been estimated by some learning methods with certain estimation errors. For all ${\bf x}\in\mathcal{X}$ , let $\hat{p}_{\bf x}$ denote the estimated probability of the device activity states being $\bf x$ , and let $\Delta_{{\bf x}}\triangleq p_{\bf x}-\hat{p}_{\bf x}$ denote the corresponding estimation error. Note that $\Delta_{{\bf x}},{\bf x}\in\mathcal{X}$ satisfy $\sum\nolimits_{{\bf x}\in\mathcal{X}}\Delta_{{\bf x}}=0$ , and $|\Delta_{{\bf x}}|\leq\delta_{\bf x},{\bf x}\in\mathcal{X}$ for some known estimation error bounds $\delta_{{\bf x}}\in[0,1)$ , ${\bf x}\in\mathcal{X}$ . Assume that $\hat{p}_{\bf x}$ , $\delta_{{\bf x}},{\bf x}\in\mathcal{X}$ are known to the BS, but neither $\bf p$ nor $\Delta_{\bf x},{\bf x}\in\mathcal{X}$ is known to the BS. Thus, the BS knows that the exact joint activity distribution satisfies $\bf p\in\mathcal{P}$ , where

\displaystyle\mathcal{P}\triangleq\left\{(y_{\bf x})_{{\bf x}\in\mathcal{X}}\bigg{|}\ \underline{p}_{\bf x}\leq y_{\bf x}\leq\overline{p}_{\bf x},\ {\bf x}\in\mathcal{X},\ \sum\limits_{{\bf x}\in\mathcal{X}}y_{\bf x}=1\right\},

with $\underline{p}_{\bf x}\triangleq\max\{\hat{p}_{\bf x}-\delta_{\bf x},0\}$ and $\overline{p}_{\bf x}\triangleq\min\{\hat{p}_{\bf x}+\delta_{\bf x},\ 1\}$ , for all ${\bf x}\in\mathcal{X}$ . Note that $\mathcal{P}$ reflects $\mathbf{p}$ to a certain extent. It can be easily verified that $\mathcal{P}$ shrinks to $\{\mathbf{p}\}$ as the estimation error bounds $\delta_{\mathbf{x}}$ , $\mathbf{x}\in\mathcal{X}$ decrease to $0$ . We adopt the worst-case average throughput

\displaystyle\bar{T}_{\text{wt}}({\bf A},{\epsilon},\mathcal{P})\triangleq\min_{{\bf p}\in\mathcal{P}}\bar{T}({\bf A},{\epsilon},{\bf p})

(7)

as the performance metric, where $\bar{T}({\bf A},{\epsilon},{\bf p})$ is given by (6). Note that $\bar{T}_{\text{wt}}({\bf A},{\epsilon},\mathcal{P})\leq\bar{T}({\bf A},{\epsilon},{\bf p})$ with the equality holds if and only if $\delta_{\mathbf{x}}=0$ , $\mathbf{x}\in\mathcal{X}$ .
Unknown joint device activity distribution: In this case, we assume no direct information on the joint device activity distribution, but $I$ samples of the device activity states, generated according to $\mathbf{p}$ , are available. The $I$ samples reflect $\mathbf{p}$ to a certain extent via the empirical joint probability distribution, an unbiased estimator of $\mathbf{p}$ . It can be easily verified that the mean squared error decreases with $I$ . Denote $\mathcal{I}\triangleq\{1,2,...,I\}$ . For all $i\in\mathcal{I}$ , the $i$ -th sample is denoted by ${\bf x}_{i}$ . We adopt the sample average throughput

\displaystyle\bar{T}_{\text{st}}({\bf A},{\epsilon})\triangleq\frac{1}{I}\sum\limits_{i\in\mathcal{I}}T\left({\bf A},{\epsilon},{\bf x}_{i}\right)

(8)

as the performance metric, where $T({\bf A},{\epsilon},{\bf x})$ is given by (4). Note that $\bar{T}_{\text{st}}({\bf A},{\epsilon})$ is a random variable with randomness induced by the $I$ samples and $\mathbb{E}[\bar{T}_{\text{st}}({\bf A},{\epsilon})]=\bar{T}({\bf A},{\epsilon},\mathbf{p})$ .

As illustrated in the introduction, overlooking the joint device activity distribution $\mathbf{p}$ (or related quantities, such as $\mathcal{P}$ and $\mathbf{x}_{i}$ , $i\in\mathcal{I}$ ) may yield ineffective preamble selection and access barring designs. This can be further seen from the following example.

Example 1 (Motivation Example)

We consider $K=3$ devices and $N=2$ preambles. The marginal probability of each device being active is $p\in(0,\frac{1}{2}]$ ; the activities of device 1 and device 2 can be correlated (dependent),³³3Note that correlation (correlated) implies dependence (dependent). and their joint distribution $p_{x_{1},x_{2}}$ , $x_{1},x_{2}\in\{0,1\}$ is given by $p_{0,0}=1+(\eta-2)p+(1-\eta)p^{2}$ , $p_{0,1}=p_{1,0}=(1-\eta)(p-p^{2})$ , and $p_{1,1}=\eta p+(1-\eta)p^{2}$ [26], where $\eta\in[\frac{p}{p-1},1]$ represents the correlation coefficient;⁴⁴4The range of $\eta$ is to guarantee that $p_{x_{1},x_{2}}$ , $x_{1},x_{2}\in\{0,1\}$ satisfy (1a)-(1b). and the activity of device 3 is independent of the activities of device 1 and device 2. Thus, $\mathbf{p}$ is given by $p_{x_{1},x_{2},x_{3}}=p_{x_{1},x_{2}}p_{x_{3}},\ x_{1},x_{2},x_{3}\in\{0,1\}$ . Note that $\eta\neq 0$ corresponds to dependent device activities, and $\eta=0$ corresponds to independent device activities. We consider four feasible random access schemes parameterized by $(\mathbf{A}_{i},\epsilon_{i})$ , $i=1,2,3,4$ . Specifically, $\mathbf{A}_{1}=[\mathbf{e}_{1},\mathbf{e}_{1},\mathbf{e}_{2}]$ , $\mathbf{A}_{2}=[\mathbf{e}_{1},\mathbf{e}_{2},\mathbf{e}_{2}]$ ,

\displaystyle\mathbf{A}_{3}=\left\{\begin{array}[]{ll}\mathbf{A}_{1}&,\ \ \eta\leq 0\\ \mathbf{A}_{2}&,\ \ \eta>0\end{array}\right.,

(11)

and $\mathbf{A}_{4}=\frac{1}{2}[\mathbf{1},\mathbf{1},\mathbf{1}]$ (uniform distributions), where ${\bf e}_{n}\in\mathbb{R}^{N}$ represents the column vector of all zeros except the $n$ -th entry being $1$ , and $\mathbf{1}\in\mathbb{R}^{N}$ represents all-one column vector; and

\displaystyle\epsilon_{i}=\mathrm{arg}\max_{\epsilon}\bar{T}(\mathbf{A}_{i},\epsilon,\mathbf{p})=\left\{\begin{array}[]{ll}\min\left(1,\frac{3}{4}\frac{1}{\eta+(1-\eta)p}\right)&,\ i=1\\ 1&,\ i=2,3,4\end{array}\right.\!\!.

(14)

Note that $\mathbf{A}_{1}$ , $\mathbf{A}_{2}$ , and $\mathbf{A}_{4}$ do not rely on $\mathbf{p}$ , whereas $\mathbf{A}_{3}$ depends on the parameter of $\mathbf{p}$ , i.e., $\eta$ .

Refer to caption — Figure 1: Average throughput versus correlation coefficient $\eta$ for Example 1 at $p=0.5$ .

Fig. 1 plots the average throughput versus the correlation efficient $\eta$ . From Fig. 1, we can see that Scheme 3 outperforms Scheme 1 and Scheme 4 at $\eta>0$ and outperforms Scheme 2 and Scheme 4 at $\eta\leq 0$ . This is because when device 1 and device 2 are more (or less) likely to activate simultaneously, letting them always select different preambles (or the identical preamble) can avoid more collisions. Besides, Scheme 3 outperforms Scheme 4 at $\eta=0$ . This example indicates that adapting $(\mathbf{A},\epsilon)$ to $\mathbf{p}$ can improve the performance of the random access scheme.

Motivated by Example 1, in Section III, Section IV, and Section V, we shall optimize the preamble selection distributions $\bf A$ and access barring factor $\epsilon$ to maximize the average, worst-case average, and sample average throughputs for given $\mathbf{p}$ , $\mathcal{P}$ , and $\mathbf{x}_{i}$ , $i\in\mathcal{I}$ in the cases of perfect, imperfect, and unknown joint device activity distributions, respectively, as shown in Fig. 2. In what follows, we assume that the optimization problems are solved at the BS given knowledge of $\mathbf{p}$ , $\mathcal{P}$ , and $\mathbf{x}_{i}$ , $i\in\mathcal{I}$ , and the solutions are then sent to all devices for implementation.⁵⁵5In practice, $\mathbf{p}$ may change slowly over time. For practical implementation, we can re-optimize the preamble selection and access barring using the proposed methods when the change of $\mathbf{p}$ is sufficiently large, and the obtained stationary point for an outdated $\mathbf{p}$ generally serves as a good initial point for the latest $\mathbf{p}$ .

III Performance Optimization for Perfect Joint Device Activity Distribution

In this section, we consider the average throughput maximization in the case of perfect joint device activity distribution. First, we formulate the average throughput maximization problem, which is a challenging nonconvex problem. Then, we develop an iterative algorithm to obtain a stationary point of the original problem. Finally, we develop a low-complexity iterative algorithm to obtain a stationary point of an approximate problem.

A Problem Formulation

In the case of perfect joint device activity distribution, we optimize the preamble selection distributions $\bf A$ and access barring factor $\epsilon$ to maximize the average throughput $\bar{T}({\bf A},{\epsilon},\bf p)$ in (6) subject to the constraints on $({\bf A},\epsilon)$ in (2a), (2b), (3a), and (3b).

Problem 1 (Average Throughput Maximization)

	$\displaystyle\max_{\bf{A},\epsilon}\quad\bar{T}(\mathbf{A},\epsilon,\bf p)$
	$\displaystyle\ \mathrm{s.t.}\quad\text{(\ref{C1})},\ \text{(\ref{C12})},\ \text{(\ref{C2})},\ \text{(\ref{C3})}.$

The objective function $\bar{T}(\mathbf{A},\epsilon,\bf p)$ is nonconcave in $({\bf A},\epsilon)$ , and the constraints in (2a), (2b), (3a), and (3b) are linear, corresponding to a convex feasible set. Thus, Problem 1 is nonconvex with a convex feasible set. In general, a globally optimal point of a nonconvex problem cannot be obtained effectively and efficiently. Obtaining a stationary point is the classic goal for dealing with a nonconvex problem with a convex feasible set.

B Stationary Point

We propose an iterative algorithm based on the BCD method [29], to obtain a stationary point of Problem 1.⁶⁶6One can obtain a stationary point of Problem 1 using SCA [30]. As the approximate convex optimization problem in each iteration has no analytical solution, SCA is not as efficient as the proposed one. Specifically, we divide the variables $\left({\bf A},\epsilon\right)$ into $K+1$ blocks, i.e., ${\bf a}_{k},\ k\in\mathcal{K}$ and $\epsilon$ . In each iteration of the proposed algorithm, all $K+1$ blocks are sequentially updated. At each step of one iteration, we maximize $\bar{T}(\mathbf{A},\epsilon,\bf p)$ with respect to one of the $K+1$ blocks. For ease of illustration, in the following, we also write $\bar{T}(\mathbf{A},\epsilon,\bf p)$ as $\bar{T}({\bf a}_{k},{\bf a}_{-k},\epsilon,\bf p)$ , where ${\bf a}_{-k}\triangleq({\bf a}_{\ell})_{\ell\in{\mathcal{K}},\ell\neq k}$ . Given ${\bf a}_{-k}$ and $\epsilon$ obtained in the previous step, the block coordinate optimization with respect to ${\bf a}_{k}$ is given by

	$\displaystyle\max_{{\bf a}_{k}}\quad\bar{T}\left({\bf a}_{k},{\bf a}_{-k},\epsilon,\bf p\right),\quad k\in\mathcal{K}$		(15)
	$\displaystyle\ \mathrm{s.t.}\quad\text{(\ref{C2})},\ \text{(\ref{C3})}.$

Given $\bf A$ obtained in the previous step, the block coordinate optimization with respect to $\epsilon$ is given by

	$\displaystyle\max_{\epsilon}\quad\bar{T}\left({\bf A},\epsilon,\bf p\right)$		(16)
	$\displaystyle\ \mathrm{s.t.}\quad\text{(\ref{C1})},\ \text{(\ref{C12})}.$

Each problem in (15) is a linear program (LP) with $N$ variables and $N+1$ constraints. The problem in (16) is a polynomial programming with a single variable and two constraints.

Next, we obtain optimal points of the problems in (15) and (16). Define⁷⁷7Note that $Q_{k,n}({\bf a}_{-k},\epsilon,{\bf p})=\frac{\partial\bar{T}({\bf a}_{k},{\bf a}_{-k},\epsilon,{\bf p})}{\partial a_{k,n}}$ , $k\in\mathcal{K}$ , $n\in\mathcal{N}$ and $q({\bf A},\epsilon,{\bf p})=\frac{\partial\bar{T}({\bf A},\epsilon,{\bf p})}{\partial\epsilon}$ .

\displaystyle Q_{k,n}({\bf a}_{-k},\epsilon,{\bf p})\triangleq\sum\limits_{m=1}^{K}m(-1)^{m-1}\epsilon^{m}\sum\limits_{\mathcal{K}^{\prime}\subseteq{\mathcal{K}}:k\in\mathcal{K}^{\prime},|\mathcal{K}^{\prime}|=m}\left(\sum\limits_{{\bf x\in\mathcal{X}}}p_{\bf x}\prod\limits_{\ell\in\mathcal{K}^{\prime}}{x}_{\ell}\right)\prod\limits_{\ell\in\mathcal{K}^{\prime}:\ell\neq k}a_{\ell,n},

(17)

\displaystyle q({\bf A},\epsilon,{\bf p})\triangleq\sum\limits_{m=1}^{K}m^{2}(-1)^{m-1}\epsilon^{m-1}\sum\limits_{n\in\mathcal{N}}\sum\limits_{\mathcal{K}^{\prime}\subseteq{\mathcal{K}}:|\mathcal{K}^{\prime}|=m}\left(\sum\limits_{{\bf x\in\mathcal{X}}}p_{\bf x}\prod\limits_{k\in\mathcal{K}^{\prime}}{x}_{k}\right)\prod\limits_{k\in\mathcal{K}^{\prime}}a_{k,n}.

(18)

Denote $\mathcal{B}({\bf A},{\bf p})\triangleq\{z\in[0,1]:q({\bf A},z,{\bf p})=0\}$ as the set of roots of equation $q({\bf A},z,{\bf p})=0$ that lie in the interval $[0,1]$ . Based on the structural properties of the block coordinate optimization problems in (15) and (16), we can obtain their optimal points.

Theorem 1 (Optimal Points of Problems in (15) and (16))

A set of optimal points of each block coordinate optimization in (15) is given by

\displaystyle\left\{{\bf e}_{m}:m\in\mathop{\arg\max}_{n\in\mathcal{N}}\ Q_{k,n}({\bf a}_{-k},\epsilon,{\bf p})\right\},\ k\in\mathcal{K},

(19)

and a set of optimal points of the block coordinate optimization in (16) is given by

\displaystyle\mathop{\arg\max}_{\epsilon\in{{\mathcal{B}}({\bf a},{\bf p})}\cup\{1\}}\bar{T}({\bf A},\epsilon,{\bf p}).

(20)

Proof:

Please refer to Appendix B. ∎

The optimal point in (19) indicates that each device $k$ selects the preamble corresponding to the maximum average throughput (increase rate of the average throughput) conditioned on selected by device $k$ for given $(\mathbf{a}_{-k},\epsilon)$ . The overall computational complexity for determining the sets in (19) is $\mathcal{O}(NK^{2}2^{K})$ , and the overall computational complexity for determining the set in (20) is $\mathcal{O}(NK^{2}2^{K})$ . The detailed complexity analysis can be found in Appendix C. As $\bf A$ is usually sparse during the iterations, the actual computational complexities for obtaining (19) and (20) are much lower.

Finally, the details of the proposed iterative algorithm are summarized in Algorithm 1. Specifically, in Steps $4-7$ , $\mathbf{a}_{k}$ , $k\in\mathcal{K}$ are updated one by one; in Steps $9-11$ , $\epsilon$ is updated. Step $5$ and Step $9$ are to ensure the convergence of Algorithm 1 to a stationary point. Based on the proof for [29, Proposition 2.7.1], we can show the following result.

Theorem 2 (Convergence of Algorithm 1)

Algorithm 1 returns a stationary point of Problem 1 in a finite number of iterations.

Proof:

Please refer to Appendix D. ∎

In practice, we can run Algorithm 1 multiple times (with possibly different random initial points) to obtain multiple stationary points and choose the stationary point with the largest objective value as a suboptimal point of Problem 1.⁸⁸8Generally speaking, finding a good stationary point involves numerical experiments and is more art than technology [32]. However, we find that $((\mathbf{e}_{n_{k}})_{k\in\mathcal{K}},1)$ is generally a good initial point that yields a faster convergence speed with numerical experiments. The intuition is that when $(\mathbf{A},\epsilon)=((\mathbf{e}_{n_{k}})_{k\in\mathcal{K}},1)$ , the contending devices for each preamble are statistically balanced irrespective of $\mathbf{p}$ . The average throughput of the best obtained stationary point and the computational complexity increase with the number of times that Algorithm 1 is run. We can choose a suitable number to balance the increases of the average throughput and computational complexity.

Based on Algorithm 1 and Theorem 2, we can characterize an optimality property of a globally optimal point of Problem 1.

Theorem 3 (Optimality Property)

There exists at least one globally optimal point $({\bf A}^{*},\epsilon^{*})$ of Problem 1, which satisfies ${\bf a}_{k}^{*}={\bf e}_{n_{k}},k\in\mathcal{K}$ , for some $n_{k}\in\mathcal{N}$ , $k\in\mathcal{K}$ .

Proof:

Please refer to Appendix E. ∎

Algorithm 1 Obtaining A Stationary Point of Problem 1

1: initialization: for

k\in\mathcal{K}

, set

{\bf a}_{k}:={\bf e}_{n_{k}}

, where

n_{k}

is uniformly chosen from

\mathcal{N}

at random, and set

\epsilon:=1

2: repeat

{\bf A}_{\text{last}}:=\bf A

4: for

k\in\mathcal{K}

5: if

{\bf a}_{k}

does not belong to the set in (19)

{\bf a}_{k}

is randomly chosen from the set in (19).

7: end if

8: end for

9: if

\epsilon

does not belong to the set in (20)

10:

\epsilon

is randomly chosen from the set in (20).

11: end if

12: until

{\bf A}_{\text{last}}={\bf A}

Theorem 3 indicates that there exists a deterministic preamble selection rule that can achieve the maximum average throughput. It is worth noting that the proposed stationary point satisfies the optimality property in Theorem 3. In Section III-C, we shall see that the low-complexity solution also satisfies this optimality property.

C Low-complexity Solution

From the complexity analysis for obtaining a stationary point of Problem 1, we know that Algorithm 1 is computationally expensive when $K$ or $N$ is large. In this part, we develop a low-complexity algorithm, which is applicable for large $K$ or $N$ , to obtain a stationary point of an approximate problem of Problem 1. Later, in Section VI, we shall show that the performance of the low-complexity algorithm is comparable with Algorithm 1.

Motivated by the approximations of $\bar{T}(\mathbf{A},\epsilon,{\bf p})$ in [27], we approximate the complicated function $\bar{T}(\mathbf{A},\epsilon,{\bf p})$ , which has $N2^{K}$ terms, with a simpler function

\displaystyle\tilde{T}(\mathbf{A},\epsilon,{\bf p})

\displaystyle\triangleq\epsilon\sum\limits_{k\in\mathcal{K}}\sum\limits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k}-\epsilon^{2}\sum\limits_{n\in\mathcal{N}}\sum\limits_{k\in\mathcal{K}}a_{k,n}\sum\limits_{\ell\in{\mathcal{K}}:\ell>k}a_{\ell,n}\sum\limits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k}x_{\ell},

(21)

which has $1+\frac{K(K-1)}{2}$ terms. The detailed reason for the approximation can be found in [33]. Note that $\sum\nolimits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k}$ and $\sum\nolimits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k}x_{\ell}$ ( $k<\ell$ ) represent the probability of device $k$ being active and the probability of devices $k$ and $\ell$ being active, respectively. By comparing (21) with (6), we can see that $\tilde{T}(\mathbf{A},\epsilon,{\bf p})$ captures the active probabilities of every single device and every two devices. Accordingly, we consider the following approximate problem of Problem 1.

Problem 2 (Approximate Average Throughput Maximization)

	$\displaystyle{\max_{\bf{A},\epsilon}}\quad\tilde{T}\left(\mathbf{A},\epsilon,{\bf p}\right)$
	$\displaystyle\ \mathrm{s.t.}\quad\text{(\ref{C1})},\ \text{(\ref{C12})},\ \text{(\ref{C2})},\ \text{(\ref{C3})}.$

Analogously, using the BCD method, we propose a computationally efficient iterative algorithm, with more performance guarantee than the heuristic method in [27], to obtain a stationary point of Problem 2. Specifically, variables $\left({\bf A},\epsilon\right)$ are divided into $K+1$ blocks, i.e., ${\bf a}_{k},\ k\in\mathcal{K}$ and $\epsilon$ . For ease of illustration, we also write $\tilde{T}(\mathbf{A},\epsilon,\bf p)$ as $\tilde{T}({\bf a}_{k},{\bf a}_{-k},\epsilon,\bf p)$ in the sequel. Given ${\bf a}_{-k}$ and $\epsilon$ obtained in the previous step, the block coordinate optimization with respect to ${\bf a}_{k}$ is given by

	$\displaystyle\max_{{\bf a}_{k}}\quad\tilde{T}\left({\bf a}_{k},{\bf a}_{-k},\epsilon,{\bf p}\right),\quad k\in\mathcal{K}$		(22)
	$\displaystyle\ \mathrm{s.t.}\quad\text{(\ref{C2})},\ \text{(\ref{C3})}.$

Given $\bf A$ obtained in the previous step, the block coordinate optimization with respect to $\epsilon$ is given by

	$\displaystyle\max_{\epsilon}\quad\tilde{T}\left({\bf A},\epsilon,{\bf p}\right)$		(23)
	$\displaystyle\ \mathrm{s.t.}\quad\ \text{(\ref{C1})},\ \text{(\ref{C12})}.$

Each problem in (22) is an LP with $N$ variables and $N+1$ constraints, and the problem in (23) is a quadratic program (QP) with a single variable and two constraints. It is clear that the convex problems in (22) and (23) are much simpler than those in (15) and (16), respectively. Based on the structural properties of the block coordinate optimization problems in (22) and (23), we can obtain their optimal points.

Theorem 4 (Optimal Points of Problems in (22) and (23))

A set of optimal points of each block coordinate optimization in (22) is given by

\displaystyle\left\{{\bf e}_{m}:m\in\mathop{\arg\min}_{n\in\mathcal{N}}\quad\sum\limits_{\ell\in{\mathcal{K}}:\ell\neq k}a_{\ell,n}\sum\limits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k}x_{\ell}\right\},\ k\in\mathcal{K},

(24)

and the optimal point of the block coordinate optimization in (23) is given by

\displaystyle\min\left(1,\frac{\sum\limits_{k\in\mathcal{K}}\sum\limits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k}}{2\sum\limits_{n\in\mathcal{N}}\sum\limits_{k\in\mathcal{K}}a_{k,n}\sum\limits_{\ell\in{\mathcal{K}}:\ell>k}a_{\ell,n}\sum\limits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k}x_{\ell}}\right).

(25)

Proof:

Theorem 4 can be proved in a similar way to Theorem 1. ∎

The optimal point in (24) indicates that each device $k$ selects the preamble with the minimum number of contending devices conditioned on selected by device $k$ for given $(\mathbf{a}_{-k},\epsilon)$ . In the following, we analyze the computational complexity for solving the problems in (22) and (23) according to Theorem 4. As constants $\sum\nolimits_{k\in\mathcal{K}}\sum\nolimits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k}$ and $\sum\nolimits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{\ell}x_{k}$ , $k,\ell\in\mathcal{K},k<\ell$ are computed in advance, the corresponding computational complexities are not considered below. For all $k\in\mathcal{K}$ and $n\in\mathcal{N}$ , the computational complexity for calculating $\sum\nolimits_{\ell\in{\mathcal{K}}:\ell\neq k}a_{\ell,n}\sum\nolimits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k}x_{\ell}$ is $\mathcal{O}(K)$ . Furthermore, for all $k\in\mathcal{K}$ , the computational complexity of finding the largest one among $\sum\nolimits_{\ell\in{\mathcal{K}}:\ell\neq k}a_{\ell,n}\sum\nolimits_{{\bf x}\in\mathcal{X}}p_{\bf x}x_{k}x_{\ell}$ , $n\in\mathcal{N}$ is $\mathcal{O}(N)$ . Thus, the overall computational complexity for determining the sets in (24) is $\mathcal{O}(NK^{2})+\mathcal{O}(NK)=\mathcal{O}(NK^{2})$ . Analogously, the computational complexity for obtaining the closed-form optimal point in (25) is $\mathcal{O}(NK^{2})$ . It is obvious that the computational complexities for obtaining the optimal points given by Theorem 4 are much lower than those for obtaining the optimal points given by Theorem 1. Furthermore, it is worth noting that the optimal points given by Theorem 4 do not rely on the active probabilities of more than two devices.

Finally, the details of the proposed iterative algorithm are summarized in Algorithm 2. Specifically, in Steps $4-7$ , $\mathbf{a}_{k}$ , $k\in\mathcal{K}$ are updated one by one; in Steps $9-11$ , $\epsilon$ is updated. Step $5$ and Step $9$ are to ensure the convergence of Algorithm 2 to a stationary point. Similarly, we have the following results.

Theorem 5 (Convergence of Algorithm 2)

Algorithm 2 returns a stationary point of Problem 2 in a finite number of iterations.

Proof:

Theorem 5 can be proved in a similar way to Theorem 2. ∎

Algorithm 2 Obtaining A Stationary Point of Problem 2

1: initialization: for

k\in\mathcal{K}

, set

{\bf a}_{k}:={\bf e}_{n_{k}}

, where

n_{k}

is uniformly chosen from

\mathcal{N}

at random, and set

\epsilon:=1

2: repeat

{\bf A}_{\text{last}}:=\bf A

4: for

k\in\mathcal{K}

5: if

{\bf a}_{k}

does not belong to the set in (24)

{\bf a}_{k}

is randomly chosen from the set in (24).

7: end if

8: end for

9: if

\epsilon

does not belong to the set in (25)

10:

\epsilon

is randomly chosen from the set in (25).

11: end if

12: until

{\bf A}_{\text{last}}={\bf A}

Analogously, we can run Algorithm 2 multiple times (with possibly different random initial points) to obtain multiple stationary points of Problem 2 and choose the stationary point with the largest average throughput as a low-complexity suboptimal point of Problem 1.

IV Robust Optimization for Imperfect Joint Device Activity Distribution

In this section, we consider the worst-case average throughput maximization in the case of imperfect joint device activity distribution. First, we formulate the worst-case average throughput maximization problem, which is a challenging max-min problem. Then, we develop an iterative algorithm to obtain a KKT point of an equivalent problem. Finally, we develop a low-complexity iterative algorithm to obtain a stationary point of an approximate problem.

A Problem Formulation

In the case of imperfect joint device activity distribution, we optimize the preamble selection distributions $\bf A$ and access barring factor $\epsilon$ to maximize the worst average throughput $\bar{T}_{\text{wt}}({\bf A},{\epsilon},\bf p)$ in (7) subject to the constraints on $({\bf A},\epsilon)$ in (2a), (2b), (3a), and (3b).

Problem 3 (Worst-case Average Throughput Maximization)

	$\displaystyle\max\limits_{{\bf A},\epsilon}$	$\displaystyle\quad\bar{T}_{\text{wt}}({\bf A},{\epsilon},\mathcal{P})$
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\quad\text{(\ref{C1})},\ \text{(\ref{C12})},\ \text{(\ref{C2})},\ \text{(\ref{C3})}.$

Note that we explicitly consider the estimation error of the joint device activity distribution in the optimization. The objective function $\bar{T}_{\text{wt}}({\bf A},{\epsilon},\mathcal{P})$ is nonconcave in $({\bf A},\epsilon)$ , and the constraints in (2a), (2b), (3a), and (3b) are linear. Thus, Problem 3 is nonconvex. Moreover, note that the objective function $\bar{T}_{\text{wt}}({\bf A},{\epsilon},\mathcal{P})\triangleq\min_{{\bf p}\in\mathcal{P}}\bar{T}({\bf A},{\epsilon},{\bf p})$ does not have an analytical form.

B KKT Point

Problem 3 is a challenging max-min problem. In this part, we solve Problem 3 in two steps [34]. First, we transform the max-min problem in Problem 3 to an equivalent maximization problem. As the inner problem $\min_{{\bf p}\in\mathcal{P}}\bar{T}({\bf A},\epsilon,{\bf p})$ is an LP with respect to $\bf p$ and strong duality holds for the LP, the inner problem $\min_{{\bf P}\in\mathcal{P}}\bar{T}({\bf A},\epsilon,{\bf p})$ shares the same optimal value as its dual problem. Furthermore, to facilitate algorithm design,⁹⁹9In the constraints in (26c), variables $\bf B$ , $\epsilon$ ${\boldsymbol{\lambda}}$ , and $\nu$ are coupled. Thus, we cannot apply the BCD method to solve Problem 4. we can transform Problem 3 to the following equivalent problem by a change of variables ${\bf B}=\epsilon\bf A$ , where ${\bf B}\triangleq(b_{k,n})_{k\in\mathcal{K},n\in\mathcal{N}}$ .

Problem 4 (Equivalent Problem of Problem 3)


$\displaystyle\max\limits_{{\bf B},{\boldsymbol{\lambda}}\succeq 0,\epsilon,\nu}$	$\displaystyle\quad\left(\sum\limits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}-1\right)\nu+\sum\limits_{{\bf x}\in\mathcal{X}}\left((\underline{p}_{\bf x}-\overline{p}_{\bf x})\lambda_{\bf x}+\underline{p}_{\bf x}T({\bf B},1,{\bf x})\right)$
$\displaystyle\mathrm{s.t.}$	$\displaystyle\quad\ \ \text{(\ref{C1})},\ \text{(\ref{C12})},$
	$\displaystyle\quad\ b_{k,n}\geq 0,\ k\in\mathcal{K},n\in\mathcal{N},$	(26a)
	$\displaystyle\quad\ \sum\limits_{n\in\mathcal{N}}{{b_{k,n}}}=\epsilon,\ k\in\mathcal{K},$	(26b)
	$\displaystyle\quad\ \nu+\lambda_{\bf x}+T({\bf B},1,{\bf x})\geq 0,\ {\bf x}\in\mathcal{X},$	(26c)

where $\boldsymbol{\lambda}\triangleq(\lambda_{\bf x})_{{\bf x}\in\mathcal{X}}$ , and $T({\bf A},\epsilon,{\bf x})$ is given by (4). Let $({\bf B}^{\star},{\boldsymbol{\lambda}}^{\star},\epsilon^{\star},\nu^{\star})$ denote an optimal point of Problem 4.

The equivalence between Problem 3 and Problem 4 is summarized below.¹⁰¹⁰10As $p_{\bf 0}<1$ , we can easily show $\epsilon^{\star}>0$ .

Lemma 1 (Equivalence between Problem 3 and Problem 4)

$(\frac{{\bf B}^{\star}}{\epsilon^{\star}},\epsilon^{\star})$ is an optimal point of Problem 3.

Proof:

Please refer to Appendix F. ∎

Next, based on Lemma 1, we can solve Problem 4 instead of Problem 3. Problem 4 is nonconvex with a nonconvex feasible set, as $T({\bf B},1,{\bf x})$ , $\mathbf{x}\in\mathcal{X}$ in the objective function and the constraint functions in (26c) are nonconcave. Note that obtaining a KKT point is the classic goal for dealing with a nonconvex problem with a nonconvex feasible set. In what follows, we propose an iterative algorithm to obtain a KKT point of Problem 4 using SCA. Specifically, at iteration $s$ , we update $({\bf B}^{(s)},{\boldsymbol{\lambda}}^{(s)},\epsilon^{(s)},\nu^{(s)})$ by solving an approximate convex problem parameterized by ${\bf B}^{(s-1)}$ obtained at iteration $s-1$ . For notation convenience, define¹¹¹¹11Note that $\frac{\partial T({{\bf A}},\epsilon,{\bf x})}{\partial a_{k,n}}=x_{k}\epsilon g_{k,n}({\bf A},\epsilon,{\bf x})$ , $k\in\mathcal{K},n\in\mathcal{N}$ and $\frac{\partial T({{\bf A}},\epsilon,{\bf x})}{\partial\epsilon}=\sum\nolimits_{k\in\mathcal{K}}x_{k}\sum\nolimits_{n\in\mathcal{N}}a_{k,n}g_{k,n}({\bf A},\epsilon,{\bf x})$ .

\displaystyle g_{k,n}({\bf A},\epsilon,{\bf x})\triangleq\prod\limits_{\ell\in\mathcal{K}:\ell\neq k}\!\!\!\!(1\!\!-\!x_{\ell}a_{\ell,n}\epsilon)-\epsilon\!\!\!\sum\limits_{\ell\in\mathcal{K}:\ell\neq k}\!\!\!\!\!x_{\ell}a_{\ell,n}\!\!\!\!\!\!\!\prod\limits_{j\in\mathcal{K}:j\neq\ell,j\neq k}\!\!\!\!\!\!\!\!(1-x_{j}a_{j,n}\epsilon),\ k\in\mathcal{K},n\in\mathcal{N}.

(27)

We choose

\displaystyle\tilde{h}({\bf B},{\boldsymbol{\lambda}},{\nu},{\bf B}^{(s-1)})\triangleq

\displaystyle\left(\sum\limits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}-1\right)\nu+\sum\limits_{{\bf x}\in\mathcal{X}}\left((\underline{p}_{\bf x}-\overline{p}_{\bf x})\lambda_{\bf x}+\underline{p}_{\bf x}T({{\bf B}^{(s-1)}},1,{\bf x})\right)

	$\displaystyle+\sum\limits_{k\in\mathcal{K}}\sum\limits_{n\in\mathcal{N}}\sum\limits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}x_{k}g_{k,n}({\bf B}^{(s-1)},1,{\bf x})\left(b_{k,n}-b_{k,n}^{(s-1)}\right)$
	$\displaystyle-\frac{1}{2}\sqrt{N\sum\limits_{k\in\mathcal{K}}\sum\limits_{\ell\in\mathcal{K}:\ell\neq k}\big{(}\sum\limits_{\mathbf{x}\in\mathcal{X}}\underline{p}_{\mathbf{x}}x_{k}x_{\ell}\\|\mathbf{x}\\|_{1}\big{)}^{2}}\sum\limits_{k\in\mathcal{K}}\sum\limits_{n\in\mathcal{N}}\left(b_{k,n}-b_{k,n}^{(s-1)}\right)^{2},$		(28)

where $\|\cdot\|_{1}$ denotes the $\ell_{1}$ -norm, as an approximate function of the objective function in Problem 4 at iteration $s$ . In addition, we choose

	$\displaystyle\tilde{h}_{\text{c}}({\bf B},{\boldsymbol{\lambda}},{\nu},{\bf B}^{(s-1)},{\bf x})\triangleq$	$\displaystyle\nu+\lambda_{\bf x}+T\left({\bf B}^{(s-1)},1,{\bf x}\right)+\sum\limits_{k\in\mathcal{K}}\sum\limits_{n\in\mathcal{N}}x_{k}g_{k,n}({\bf B}^{(s-1)},1,{\bf x})\left(b_{k,n}-b_{k,n}^{(s-1)}\right)$
		$\displaystyle-\frac{1}{2}\sqrt{N}\\|\mathbf{x}\\|_{1}^{2}\sum\limits_{k\in\mathcal{K}}x_{k}\sum\limits_{n\in\mathcal{N}}\left(b_{k,n}-b_{k,n}^{(s-1)}\right)^{2}$		(29)

as an approximate function of the constraint function for ${\bf x}\in\mathcal{X}$ in (26c) at iteration $s$ . Note that the concave components of the objective function and the constraint functions in Problem 4 are left unchanged, and the other nonconcave components, i.e., $\sum\nolimits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}T({\bf B},1,{\bf x})$ , in the objective function and $T({\bf B},1,{\bf x}),{\bf x}\in\mathcal{X}$ in (26c) are minorized at $({\bf B},{\boldsymbol{\lambda}},\epsilon,{\nu})=({\bf B}^{(s-1)},{\boldsymbol{\lambda}}^{(s-1)},\epsilon^{(s-1)},\nu^{(s-1)})$ , using the concave components based on their second-order Taylor expansions [35]. Then, at iteration $s$ , we approximate Problem 4 with the following convex problem.

Problem 5 (Approximate Problem of Problem 4 at Iteration $s$ )

	$\displaystyle(\hat{\bf B}^{(s)},\hat{\boldsymbol{\lambda}}^{(s)},\hat{\epsilon}^{(s)},\hat{\nu}^{(s)})\triangleq\mathop{\arg\max}\limits_{{\bf B},{\boldsymbol{\lambda}}\succ 0,\epsilon,\nu}$	$\displaystyle\quad\tilde{h}({\bf B},{\boldsymbol{\lambda}},{\nu},{\bf B}^{(s-1)})$
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\quad\ \text{(\ref{C1})},\ \text{(\ref{C12})},\ \text{(\ref{Cbb1})},\ \text{(\ref{Cbb2})},$
		$\displaystyle\quad\tilde{h}_{\text{c}}({\bf B},{\boldsymbol{\lambda}},{\nu},{\bf B}^{(s-1)},{\bf x})\geq 0,\ {\bf x}\in\mathcal{X}.$

First, we analyze the computational complexity for determining the updated objective function and constraint functions of Problem 5. The computational complexity for calculating $x_{k}g_{k,n}({\bf B}^{(s-1)},1,{\bf x})$ , $k\in\mathcal{K},n\in\mathcal{N},{\bf x}\in\mathcal{X}$ is $\mathcal{O}(NK^{3}2^{K})$ . Given $x_{k}g_{k,n}({\bf B}^{(s-1)},1,{\bf x})$ , the computational complexity for calculating $\sum\nolimits_{\mathbf{x}\in\mathcal{X}}\underline{p}_{\mathbf{x}}x_{k}g_{k,n}({\bf B}^{(s-1)},1,{\bf x})$ is $\mathcal{O}(NK2^{K})$ . Note that constants $\|\mathbf{x}\|_{1}^{2}$ , $\mathbf{x}\in\mathcal{X}$ and $\sum\nolimits_{k\in\mathcal{K}}\sum\nolimits_{\ell\in\mathcal{K}:\ell\neq k}\big{(}\sum\nolimits_{\mathbf{x}\in\mathcal{X}}\underline{p}_{\mathbf{x}}x_{k}x_{\ell}\|\mathbf{x}\|_{1}\big{)}^{2}$ are computed in advance. Thus, at iteration $s$ , the computational complexity for determining the updated objective function and constraint functions of Problem 5 is $\mathcal{O}(NK^{3}2^{K})+\mathcal{O}(NK2^{K})=\mathcal{O}(NK^{3}2^{K})$ . Then, we solve Problem 5. Problem 5 has $2+NK+2^{K}$ variables and $2+N(K+1)+2^{(K+1)}$ constraints. Thus, solving Problem 5 by using an interior-point method has computational complexity $\mathcal{O}\left((NK+2^{K})^{3}\right)$ [32, pp. 8]. After solving Problem 5, we update


	$\displaystyle{\bf B}^{(s)}=(1-\gamma){\bf B}^{(s-1)}+\gamma\hat{\bf B}^{(s)},\ {\boldsymbol{\lambda}}^{(s)}=(1-\gamma)\boldsymbol{\lambda}^{(s-1)}+\gamma\hat{\boldsymbol{\lambda}}^{(s)},$		(30a)
	$\displaystyle\epsilon^{(s)}=(1-\gamma)\epsilon^{(s-1)}+\gamma\hat{\epsilon}^{(s)},\ \nu^{(s)}=(1-\gamma)\nu^{(s-1)}+\gamma\hat{\nu}^{(s)},$		(30b)

where $\gamma\in(0,1]$ is a positive constant.

Finally, the details of the proposed iterative algorithm are summarized in Algorithm 3. Based on [30, Theorem 1], we can show the following result.

Theorem 6 (Convergence of Algorithm 3)

Let $({\bf B}^{{\dagger}},\boldsymbol{\lambda}^{{\dagger}},\epsilon^{{\dagger}},\nu^{{\dagger}})$ be a limit point of the iterates generated by Algorithm 3. If the interior of the set $\{({\bf B},{\boldsymbol{\lambda}},\epsilon,\nu)|\ \tilde{h}_{\text{c}}({\bf B},{\boldsymbol{\lambda}},{\nu},{\bf B}^{({\dagger})},{\bf x})\geq 0,\ {\bf x}\in\mathcal{X},\text{(\ref{C1})},\text{(\ref{C12})},\text{(\ref{Cbb1})},\text{(\ref{Cbb2})}\}$ is nonempty, then $({\bf B}^{{\dagger}},\boldsymbol{\lambda}^{{\dagger}},\epsilon^{{\dagger}},\nu^{{\dagger}})$ is a KKT point of Problem 4.

Proof:

Please refer to Appendix G. ∎

Algorithm 3 Obtaining A KKT Point of Problem 4

1: initialization: Set

({\bf B}^{(0)},{\boldsymbol{\lambda}}^{(0)},\epsilon^{(0)},\nu^{(0)})

:=((\mathbf{e}_{n_{k}})_{k\in\mathcal{K}},\mathbf{1},1,0)

, where

n_{k}

is uniformly chosen from

\mathcal{N}

at random, for all

k\in\mathcal{K}

, set

s:=0

, and choose

\mu>0

2: repeat

3: Set

s:=s+1

4: Calculate

(\hat{\bf B}^{(s)},{\hat{\boldsymbol{\lambda}}}^{(s)},\hat{\epsilon}^{(s)}\hat{\nu}^{(s)})

by solving Problem 5.

5: Update

({\bf B}^{(s)},{\boldsymbol{\lambda}}^{(s)},\epsilon^{(s)},\nu^{(s)})

according to (30).

6: until

\|{\bf B}^{(s)}-{\bf B}^{(s-1)}\|_{\mathbb{F}}\leq\mu

Analogously, we can run Algorithm 3 multiple times (with possibly different random initial points) to obtain multiple stationary points of Problem 4 and choose the KKT point with the largest worst-case average throughput as a suboptimal point of Problem 4, which can also be regarded as a suboptimal point of Problem 3 according to Lemma 1.

C Low-complexity Solution

From the complexity analysis for solving Problem 5, we know that Algorithm 3 is computationally expensive when $K$ or $N$ is large. In this part, we develop a low-complexity iterative algorithm, which is applicable for large $K$ or $N$ , to obtain a stationary point of an approximate problem of Problem 3. Later, in Section VI, we shall show that this low-complexity algorithm can achieve comparable performance. First, we approximate the complicated function $\bar{T}_{\text{wt}}({\bf A},{\epsilon},\mathcal{P})\triangleq\min_{{\bf p}\in\mathcal{P}}\bar{T}({\bf A},{\epsilon},{\bf p})$ , which has $N2^{K}$ terms, with a simpler function, which has $1+\frac{K(K-1)}{2}$ terms. Specifically, as in Section III.C, we approximate $\bar{T}({\bf A},{\epsilon},{\bf p})$ with $\tilde{T}({\bf A},{\epsilon},{\bf p})$ . Recall that $\tilde{T}(\mathbf{A},\epsilon,{\bf p})$ captures the active probabilities of every single device and every two devices. Analogously, we approximate $\mathcal{P}$ with

	$\displaystyle\tilde{\mathcal{P}}\triangleq\big{\{}(y_{\bf x})_{{\bf x}\in\mathcal{X}}\big{\|}\$	$\displaystyle\sum\limits_{{\bf x}\in\mathcal{X}}y_{\bf x}=1,\ \sum\limits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}x_{k}\leq\sum\limits_{{\bf x}\in\mathcal{X}}y_{\bf x}x_{k}\leq\sum\limits_{{\bf x}\in\mathcal{X}}\overline{p}_{\bf x}x_{k},\ k\in\mathcal{K},$
		$\displaystyle\sum\limits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}x_{k}x_{\ell}\leq\sum\limits_{{\bf x}\in\mathcal{X}}y_{\bf x}x_{k}x_{\ell}\leq\sum\limits_{{\bf x}\in\mathcal{X}}\overline{p}_{\bf x}x_{k}x_{\ell},\ k,\ell\in\mathcal{K},k<\ell\big{\}}.$

Obviously, $\tilde{\mathcal{P}}\supseteq\mathcal{P}$ . Note that contrary to $\mathcal{P}$ , only the upper and lower bounds on the active probabilities of every single device and every two devices remain. Thus, we approximate $\bar{T}_{\text{wt}}({\bf A},{\epsilon},\mathcal{P})$ with $\tilde{T}_{\text{wt}}({\bf A},{\epsilon},\tilde{\mathcal{P}})\triangleq\min\nolimits_{{\bf p}\in\tilde{\mathcal{P}}}\tilde{T}({\bf A},{\epsilon},\bf p)$ whose analytical form is given in the following lemma.

Lemma 2 (Approximate Worst-case Average Throughput)

For all $({\bf A},\epsilon)$ satisfying (2a), (2b), (3a) and (3b),

\displaystyle\tilde{T}_{\text{wt}}({\bf A},{\epsilon},\tilde{\mathcal{P}})=\epsilon\sum\limits_{k\in\mathcal{K}}\sum\limits_{{\bf x}\in\mathcal{X}}{\underline{p}}_{\bf x}x_{k}-\epsilon^{2}\sum\limits_{n\in\mathcal{N}}\sum\limits_{k\in\mathcal{K}}a_{k,n}\sum\limits_{\ell\in{\mathcal{K}}:\ell>k}a_{\ell,n}\sum\limits_{{\bf x}\in\mathcal{X}}{\overline{p}}_{\bf x}x_{k}x_{\ell}.

(31)

Proof:

Please refer to Appendix H. ∎

Next, we consider the following approximate problem of Problem 3.

Problem 6 (Approximate Worst-case Average Throughput Maximization)

	$\displaystyle\max\limits_{\bf A,\epsilon}$	$\displaystyle\quad\tilde{T}_{\text{wt}}({\bf A},{\epsilon},\tilde{\mathcal{P}})$
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\quad\text{(\ref{C1})},\ \text{(\ref{C12})},\ \text{(\ref{C2})},\ \text{(\ref{C3})},$

where $\tilde{T}_{\text{wt}}({\bf A},{\epsilon},\tilde{\mathcal{P}})$ is given by (31).

The numbers of variables and constraints of Problem 6 are $KN+1$ and $2+K(N+1)$ , respectively, which are much smaller than those of Problem 4. Note that $\sum\nolimits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}x_{k}$ and $\sum\nolimits_{{\bf x}\in\mathcal{X}}\overline{p}_{\bf x}x_{\ell}x_{k}$ ( $k<\ell$ ) represent a lower bound on the active probability of every single device and an upper bound on the active probability of every two devices, respectively, and can be computed in advance. Obviously, Problem 6 shares the same form as Problem 2. Thus, we can use a low-complexity iterative algorithm, similar to Algorithm 2, to obtain a stationary point of Problem 6. The details are omitted due to the page limitation.

V Stochastic Optimization for Unknown Joint Device Activity Distribution

In this section, we consider the sample average throughput maximization in the case of unknown joint device activity distribution. We first formulate the sample average throughput maximization problem, which is a challenging nonconvex problem. Then, we develop an iterative algorithm to obtain a stationary point.

A Problem Formulation

In the case of unknown joint device activity distribution, we optimize the preamble selection distributions $\bf A$ and access barring factor $\epsilon$ to maximize the sample average throughput $\bar{T}_{\text{st}}({\bf A},{\epsilon},\bf p)$ in (8) subject to the constraints on $({\bf A},\epsilon)$ in (2a), (2b), (3a), and (3b).

Problem 7 (Sample Average Throughput Maximization)

	$\displaystyle\max_{\bf{A},\epsilon}$	$\displaystyle\quad\bar{T}_{\text{st}}({\bf A},{\epsilon})$
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\quad\text{(\ref{C1})},\ \text{(\ref{C12})},\ \text{(\ref{C2})},\ \text{(\ref{C3})}.$

The objective function $\bar{T}_{\text{st}}({\bf A},{\epsilon})$ is nonconcave in $({\bf A},\epsilon)$ , and the constraints in (2a), (2b), (3a), and (3b) are linear. Thus, Problem 7 is nonconvex with a convex feasible set. Note that the objective function $\bar{T}_{\text{st}}({\bf A},{\epsilon})$ has $NKI$ terms, and the number of samples $I$ is usually quite large in practice. Therefore, directly tackling Problem 7 is not computationally efficient. To reduce the computation time, we solve its equivalent stochastic version (which is given in Appendix I) using a stochastic iterative algorithm.

B Stationary Point

Based on mini-batch stochastic parallel SCA [31], we propose a stochastic algorithm to obtain a stationary point of Problem 7. The main idea is to solve a set of parallelly refined convex problems, each of which is obtained by approximating $\bar{T}_{\text{st}}({\bf A},{\epsilon})$ with a convex function based on its structure and samples in a uniformly randomly selected mini-batch. We partition $\mathcal{I}$ into $M$ disjoint subsets, $\mathcal{I}_{m},m\in\mathcal{M}$ , each of size $\frac{I}{M}$ (assuming $I$ is divisible by $M$ ), where $\mathcal{M}\triangleq\{1,2,...,M\}$ . We divide the variables $\left({\bf A},\epsilon\right)$ into $K+1$ blocks, i.e., ${\bf a}_{k},\ k\in\mathcal{K}$ and $\epsilon$ . This algorithm updates all $K+1$ blocks in each iteration separately in a parallel manner by maximizing $K+1$ approximate functions of $\bar{T}_{\text{st}}({\bf A},{\epsilon})$ .

Specifically, at iteration $t$ , a mini-batch denoted by $\mathcal{I}_{\xi^{(t)}}$ is selected, where ${\xi^{(t)}}$ follows the uniform distribution over $\mathcal{M}$ . Let ${\bf a}_{k}^{(t-1)}$ and $\epsilon^{(t-1)}$ represent the preamble selection distribution of device $k$ and the access barring factor obtained at iteration $t-1$ . Denote ${\bf A}^{(t-1)}\triangleq({\bf a}_{k}^{(t-1)})_{k\in\mathcal{K}}$ . For ease of illustration, in the following, we also write $(\mathbf{A},\epsilon)$ as $({\bf a}_{k},{\bf a}_{-k},\epsilon)$ , where ${\bf a}_{-k}\triangleq({\bf a}_{\ell})_{\ell\in{\mathcal{K}},\ell\neq k}$ . We choose

\displaystyle\tilde{T}_{\text{st},k}^{(t)}\left({\bf a}_{k},{\bf a}_{-k}^{(t-1)},\epsilon^{(t-1)}\right)\triangleq

\displaystyle\rho^{(t)}\frac{M}{I}\sum\limits_{i\in\mathcal{I}_{{\xi}^{(t)}}}T\left({\bf A}^{(t-1)},{\epsilon}^{(t-1)},{\bf x}_{i}\right)+\sum\limits_{n\in\mathcal{N}}c_{k,n}^{(t)}\left(a_{k,n}-a_{k,n}^{(t-1)}\right)

as an approximate function of $\bar{T}_{\text{st}}({\bf A},{\epsilon})$ for updating ${\bf a}_{k}$ at iteration $t$ . Here, $\rho^{(t)}$ is a positive diminishing stepsize satisfying

\displaystyle\rho^{(t)}>0,\ \lim\limits_{t\rightarrow\infty}\rho^{(t)}=0,\ \sum\limits_{t=1}^{\infty}\rho^{(t)}=\infty,\ \sum\limits_{t=1}^{\infty}(\rho^{(t)})^{2}<\infty,

and $c_{k,n}^{(t)}$ is given by

\displaystyle c_{k,n}^{(t)}=\left(1-\rho^{(t)}\right)c_{k,n}^{(t-1)}\!+\!\rho^{(t)}\frac{M}{I}\epsilon^{(t-1)}\sum\limits_{i\in\mathcal{I}_{{\xi}^{(t)}}}x_{i,k}g_{k,n}({\bf A}^{(t-1)},\epsilon^{(t-1)},{\bf x}_{i}),

where $c_{k,n}^{(0)}=0$ , $k\in\mathcal{K},n\in\mathcal{N}$ , $x_{i,k}$ represents the $k$ -th element of $\mathbf{x}_{i}$ , and $g_{k,n}(\mathbf{A},\epsilon,{\bf x})$ is given by (27). We choose

\displaystyle\tilde{T}_{\text{st},0}^{(t)}\left(\epsilon,{\bf A}^{(t-1)},\epsilon^{(t-1)}\right)\triangleq c_{0}^{(t)}\epsilon-\tau\left(\epsilon-\epsilon^{(t-1)}\right)^{2}

as an approximate function of $\bar{T}_{\text{st}}({\bf A},{\epsilon})$ for updating $\epsilon$ at iteration $t$ . Here, $\tau$ is a positive constant and $c_{0}^{(t)}$ is given by

\displaystyle c_{0}^{(t)}=(1-\rho^{(t)})c_{0}^{(t-1)}+\rho^{(t)}\frac{M}{I}\sum\limits_{i\in\mathcal{I}_{{\xi}^{(t)}}}\sum\limits_{k\in\mathcal{K}}x_{i,k}\sum\limits_{n\in\mathcal{N}}a_{k,n}^{(t-1)}g_{k,n}({\bf A}^{(t-1)},\epsilon^{(t-1)},{\bf x}_{i}),

where $c_{0}^{(0)}=0$ .

We first solve the following problems for $K+1$ blocks, in a parallel manner. Given $\left({\bf A}^{(t-1)},{\epsilon}^{(t-1)}\right)$ and $\mathcal{I}_{\xi^{(t)}}$ , the optimization problem with respect to ${\bf a}_{k}$ is given by

	$\displaystyle\hat{\bf a}_{k}^{(t)}\triangleq\mathop{\arg\max}\limits_{{\bf a}_{k}}$	$\displaystyle\quad\tilde{T}_{\text{st},k}^{(t)}\left({\bf a}_{k},{\bf a}_{-k}^{(t-1)},\epsilon^{(t-1)}\right),\quad k\in\mathcal{K}$		(32)
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\quad\text{(\ref{C2})},\ \text{(\ref{C3})},$

and the optimization problem with respect to $\epsilon$ is given by

	$\displaystyle\hat{\epsilon}^{(t)}\triangleq\mathop{\arg\max}\limits_{\epsilon}$	$\displaystyle\quad\tilde{T}_{\text{st},0}^{(t)}\left(\epsilon,{\bf A}^{(t-1)},\epsilon^{(t-1)}\right)$		(33)
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\quad\ \text{(\ref{C1})},\ \text{(\ref{C12})}.$

Each problem in (32) is an LP with $N$ variables and $N+1$ constraints. The problem in (33) is a QP with a single variable and two constraints. Based on the structural properties of the optimization problems in (32) and (33), we can obtain their optimal points.

Theorem 7 (Optimal Points of Problems in (32) and (33))

A set of optimal points of each optimization problem in (32) is given by

\displaystyle\left\{{\bf e}_{m}:m\in\mathop{\arg\max}_{n\in\mathcal{N}}\quad c_{k,n}^{(t)}\right\},\ k\in\mathcal{K},

(34)

and the optimal point of the optimization problem in (33) is given by

\displaystyle{\min\left(\max\left(\epsilon^{(t-1)}+\frac{c_{0}^{(t)}}{{2\tau}},0\right),1\right)}.

(35)

Algorithm 4 Obtaining A Stationary Point of Problem 7

1: initialization: for

k\in\mathcal{K}

, set

{\bf a}_{k}:={\bf e}_{n_{k}}

, where

n_{k}

is uniformly chosen from

\mathcal{N}

at random, set

\epsilon:=1

, set

t:=0

, and choose

\mu>0

2: repeat

t:=t+1

4: for

k\in\mathcal{K}

\hat{\bf a}_{k}

is randomly chosen from the set in (34).

6: Update

{{\bf a}}_{k}^{(t)}

according to (36a).

7: end for

8: Calculate

\hat{\epsilon}^{(t)}

according to (35).

9: Update

\epsilon^{(t)}

according to (36b).

10: until

\|{\bf A}^{(t)}-{\bf A}^{(t-1)}\|_{\mathbb{F}}\leq\mu

Proof:

Theorem 7 can be proved in a similar way to Theorem 1. ∎

The optimal point in (34) indicates that each device $k$ selects the preamble corresponding to the maximum approximate average throughput (increase rate of the average throughput) conditioned on selected by device $k$ for given $(\mathbf{a}_{-k}^{(t-1)},\epsilon^{(t-1)})$ . For all $k\in\mathcal{K}$ and $n\in\mathcal{N}$ , the computational complexity for calculating $c_{k,n}^{(t)}$ is $\mathcal{O}(K^{2})$ . For all $k\in\mathcal{K}$ , the computational complexity of finding the largest one among $c_{k,n}^{(t)}$ , $n\in\mathcal{N}$ is $\mathcal{O}(N)$ . Thus, the overall computational complexity for determining the sets in (34) is $\mathcal{O}(NK^{3})+\mathcal{O}(NK)=\mathcal{O}(NK^{3})$ . The computational complexity for calculating $c_{0}^{(t)}$ is $\mathcal{O}(NK^{3})$ , which is also the computational complexity for obtaining the optimal point in (35).

Then, we update the preamble selection distributions $\bf A$ and access barring factor $\epsilon$ by


	$\displaystyle{\bf a}_{k}^{(t)}=(1-\omega^{(t)}){\bf a}_{k}^{(t-1)}+\hat{{\bf a}}_{k}^{(t)},\ k\in\mathcal{K},$		(36a)
	$\displaystyle\epsilon^{(t)}=(1-\omega^{(t)})\epsilon^{(t-1)}+\hat{\epsilon}^{(t)},$		(36b)

where $\omega^{(t)}$ is a positive diminishing stepsize satisfying

\displaystyle\omega^{(t)}>0,\ \lim\limits_{t\rightarrow\infty}\omega^{(t)}=0,\ \sum\limits_{t=1}^{\infty}\omega^{(t)}=\infty,\ \sum\limits_{t=1}^{\infty}(\omega^{(t)})^{2}<\infty,\ \lim\limits_{t\rightarrow\infty}\frac{\omega^{(t)}}{\rho^{(t)}}=0.

Finally, the details of the proposed stochastic parallel iterative algorithm are summarized in Algorithm 4. Based on [36], we can show the following result.

Theorem 8 (Convergence of Algorithm 4)

If $\mathop{\arg\max}_{n\in\mathcal{N}}c_{k,n}^{(t)}$ is a singleton for all $k\in\mathcal{K}$ and all $t\geq 1$ , then every limit point of $\{\left({\bf A}^{(t)},\epsilon^{(t)}\right)\}$ generated by Algorithm 4 is a stationary point of Problem 7 almost surely.

Proof:

Please refer to Appendix I. ∎

VI Numerical Results

In this section, we evaluate the performance of the proposed solutions¹²¹²12Algorithm 2 and Algorithm 4 are computationally efficient and can be easily implemented in practical systems with large $K$ . Algorithm 1 and Algorithm 3 are computationally expensive but can provide essential benchmarks for designing effective and low-complexity methods. for dependent and independent device activities via numerical results. In the simulation, we adopt the group device activity model in [26]. Specifically, $K$ devices are divided into $G$ groups, each of size $\frac{K}{G}$ (assuming $K$ is divisible by $G$ ), the activity states of devices in different groups are independent, and the activity states of devices in one group are the same. Note that when $\frac{K}{G}=1$ , the device activities are independent; when $\frac{K}{G}>1$ , the device activities are dependent. Denote $\mathcal{G}\triangleq\{1,2,...,G\}$ . Let $\mathcal{K}_{g}$ denote the set of devices in group $g\in\mathcal{G}$ . Let $y_{g}\in\{0,1\}$ denote the activity state of group $g$ , where $y_{g}=1$ if group $g$ is active, and $y_{g}=0$ otherwise. Let ${\bf y}\triangleq(y_{g})_{g\in\mathcal{G}}$ denote the activity states of all groups. The probability that a group is active is $p_{a}$ . Then, in the case of perfect joint device activity distribution, the joint device activity distribution is given by

\displaystyle p_{\bf x}=\left\{\begin{array}[]{ll}{p_{a}}^{\sum\limits_{g\in\mathcal{G}}y_{g}}(1-p_{a})^{G-\sum\limits_{g\in\mathcal{G}}y_{g}}&,\ x_{k}=y_{g},k\in\mathcal{K}_{g},g\in\mathcal{G},{\bf y}\in\{0,1\}^{G}\\ 0&,\ \text{otherwise}\end{array}\right.,\ {\bf x}\in\mathcal{X}.

(39)

In the case of imperfect joint device activity distribution, the estimated joint device activity distribution is given by

\displaystyle\hat{p}_{\bf x}=\left\{\begin{array}[]{ll}{p_{a}}^{\sum\limits_{g\in\mathcal{G}}y_{g}}(1-p_{a})^{G-\sum\limits_{g\in\mathcal{G}}y_{g}}&,\ x_{k}=y_{g},k\in\mathcal{K}_{g},g\in\mathcal{G},{\bf y}\in\{0,1\}^{G}\\ 0&,\ \text{otherwise}\end{array}\right.,\ {\bf x}\in\mathcal{X},

(42)

for $K\leq 100$ , we set $\delta_{\bf x}=\bar{\delta}{\hat{p}}_{\bf x}$ , ${\bf x}\in\mathcal{X}$ , and for $K>100$ , we set

\displaystyle\delta_{\bf x}=\left\{\begin{array}[]{ll}\bar{\delta}{\hat{p}}_{\bf x}&,\ x_{k}=y_{g},k\in\mathcal{K}_{g},g\in\mathcal{G},\sum\limits_{g\in\mathcal{G}}y_{g}\leq 3,{\bf y}\in\{0,1\}^{G}\\ 0&,\ \text{otherwise}\end{array}\right.,\ {\bf x}\in\mathcal{X},

(45)

for some $\bar{\delta}\in[0,1)$ . In the case of unknown joint device activity distribution, we consider $10^{5}$ samples generated according to the joint device activity distribution $p_{\bf x},{\bf x}\in\mathcal{X}$ given by (39). Note that in the three cases, we consider the same underlying joint device activity distribution $\mathbf{p}$ for a fair comparison. For ease of presentation, in the following, $\bar{T}({\bf A},{\epsilon},{\bf p})$ , $\bar{T}_{\text{wt}}({\bf A},{\epsilon},\mathcal{P})$ , and $\bar{T}_{\text{st}}({\bf A},{\epsilon})$ ’s numerical mean in the cases of perfect, imperfect, and unknown joint device activity distributions, i.e., case-pp, case-ip, and case-up, respectively, are referred to as throughput, unless otherwise specified. We evaluate $\bar{T}_{\text{st}}({\bf A},{\epsilon})$ ’s numerical mean by averaging over $100$ realizations of $\bar{T}_{\text{st}}({\bf A},{\epsilon})$ , each obtained based on a set of $10^{5}$ samples. In case- $t$ , the proposed stationary point or KKT point is called Prop- $t$ , where $t=$ pp, ip, up. Furthermore, in case- $t$ , the proposed low-complexity solution is called Prop-LowComp- $t$ , where $t=$ pp and ip.

We consider three baseline schemes, namely BL-MMPC, BL-MSPC and BL-LTE. In BL-MMPC and BL-MSPC, $\bf A$ are obtained by the MMPC and MSPC allocation algorithms and the proposed access barring scheme in [27], respectively. In BL-LTE, we set $a_{k,n}=\frac{1}{N},\ k\in\mathcal{K},n\in\mathcal{N}$ according to the standard random access procedure of LTE networks [7] and set $\epsilon=\min\left(1,\frac{N}{\bar{K}}\right)$ according to the optimal access control [15], where $\bar{K}$ denotes the average number of active devices. Note that BL-MMPC and BL-MSPC make use of the dependence of the activities of every two devices; BL-LTE does not utilize any information on the dependence of device activities. In case- $t$ , the three baseline schemes are referred to as BL-MMPC- $t$ , BL-MSPC- $t$ , and BL-LTE- $t$ , where $t=$ pp, ip, up. In case-pp, ${(\bf A},\epsilon)$ in BL-MMPC-pp and BL-MSPC-pp and $\epsilon$ in BL-LTE-pp rely on $p_{\bf x}$ , ${\bf x}\in\mathcal{X}$ . In case-ip, ${(\bf A},\epsilon)$ in BL-MMPC-ip and BL-MSPC-ip and $\epsilon$ in BL-LTE-ip rely on $\hat{p}_{\bf x}$ , ${\bf x}\in\mathcal{X}$ without considering potential estimation errors. In case-up, ${(\bf A},\epsilon)$ in BL-MMPC-up and BL-MSPC-up and $\epsilon$ in BL-LTE-up rely on the empirical joint device activity distribution obtained from the samples. Considering the tradeoff between the throughput and computational complexity, we run each proposed algorithm five times to obtain each point for all considered parameters.

A Small K and N

This part compares the throughputs of all proposed solutions and three baseline schemes at small numbers of devices and preambles. Fig. 3 and Fig. 4 illustrate the throughput versus the number of devices $K$ , the number of preambles $N$ , and the group active probability $p_{a}$ , for dependent and independent device activities, respectively. From Fig. 3 and Fig. 4, we make the following observations. For each scheme, the curve in case-up is close to that in case-pp, as we evaluate $\bar{T}_{\text{st}}({\bf A},{\epsilon})$ ’s numerical mean over a large number of samples; the curve in case-pp is above that in case-ip, as illustrated in Section II. For $t=$ pp, ip, and up, Prop- $t$ outperforms all the baseline schemes in case- $t$ , as Prop- $t$ utilizes more statistical information on device activities. For $t=$ pp and ip, Prop-LowComp- $t$ outperforms all the baseline schemes in case- $t$ , as Prop-LowComp- $t$ relies on a more accurate approximation of the throughput. The fact that the gap between the throughputs of Prop- $t$ and Prop-LowComp- $t$ is small, where $t=$ pp and ip, shows that exploiting statistical information on the activities of every two devices rigorously already achieves a significant gain. Furthermore, from Fig. 3 (a), Fig. 4 (a) and Fig. 3 (c), Fig. 4 (c), we can see that the throughput of each proposed scheme increases with $K$ and $p_{a}$ , respectively, due to the increase of traffic load. From Fig. 3 (b) and Fig. 4 (b), we can see that the throughput of each scheme increases with $N$ due to the increase of communications resource.

Fig. 6 and Fig. 6 illustrate the worst-case average throughput versus the estimation error bound $\bar{\delta}$ for dependent and independent device activities, respectively. From Fig. 6 and Fig. 6, we can see the worst-case average throughputs of Prop-ip and Prop-LowComp-ip are greater than those of Prop-pp, Prop-LowComp-pp, BL-MMPC-pp, BL-MSPC-pp, and BL-LTE-pp, which reveals the importance of explicitly considering the imperfectness of the estimated joint device activity distribution in this case. Furthermore, the gain of Prop-ip over Prop-pp and the gain of Prop-LowComp-ip over Prop-LowComp-pp increase with $\bar{\delta}$ , as it is more important to take into account possible device activity estimation errors when $\bar{\delta}$ is larger.

B Large K and N

This part compares the throughputs of Prop-LowComp-pp, Prop-LowComp-ip, Prop-up, and three baseline schemes, at large numbers of devices and preambles. Fig. 7 and Fig. 8 illustrate the throughput versus the number of devices $K$ , the number of preambles $N$ , and the group active probability $p_{a}$ , for dependent and independent device activities, respectively. From Fig. 7 and Fig. 8, we also observe that Prop-LowComp- $t$ significantly outperforms all the baseline schemes in case- $t$ for $t=$ pp and ip; Prop-up outperforms all the baseline schemes in case-up. The results for large $K$ and $N$ shown in Fig. 7 and Fig. 8 are similar to those for small $K$ and $N$ shown in Fig. 3 and Fig. 4.

VII Conclusion

This paper considered the joint optimization of preamble selection and access barring for random access in MTC under an arbitrary joint device activity distribution that can reflect dependent and independent device activities. We considered the cases of perfect, imperfect, and unknown general joint device activity distributions and formulated the average, worst-case average, and sample average throughput maximization problems, respectively. All three problems are challenging nonconvex problems. We proposed iterative algorithms with convergence guarantees to tackle these problems with various optimization techniques. Numerical results showed that the proposed solutions achieve significant gains over existing schemes in all three cases for both dependent and independent device activities, and the proposed low-complexity algorithms for the first two cases already achieve competitive performance.

Appendix A: Proof for (6)

Substituting (4) into (5), we have [33]¹³¹³13We omit some details due to the page limitation. The complete proofs can be found in [33].

\displaystyle\begin{array}[]{ll}&\bar{T}\left({\bf A},\epsilon,\bf p\right)=\sum\limits_{{\bf x\in\mathcal{X}}}p_{\bf x}\sum\limits_{n\in\mathcal{N}}\sum\limits_{k\in\mathcal{K}}x_{k}a_{k,n}\epsilon\prod\limits_{m\in{\mathcal{K}:m\neq k}}(1-x_{m}a_{m,n}\epsilon)\\ \end{array}

\displaystyle\begin{array}[]{ll}&=\sum\limits_{{\bf x\in\mathcal{X}}}p_{\bf x}\sum\limits_{n\in\mathcal{N}}\sum\limits_{k\in\mathcal{K}}{x}_{k}a_{k,n}\epsilon\sum\limits_{\mathcal{K}^{\prime}\subseteq{\mathcal{K}}\setminus\{k\}}\prod\limits_{m\in\mathcal{K}^{\prime}}(-{x}_{m}a_{m,n}\epsilon)\\ &=-\sum\limits_{{\bf x\in\mathcal{X}}}p_{\bf x}\sum\limits_{n\in\mathcal{N}}\sum\limits_{\mathcal{K}_{1}\cup\mathcal{K}_{2}\subseteq\mathcal{K}:\mathcal{K}_{1}\bigcap\mathcal{K}_{2}=\emptyset,|\mathcal{K}_{1}|=1}\prod\limits_{m\in\mathcal{K}_{1}}(-{x}_{m}a_{m,n}\epsilon)\prod\limits_{\ell\in\mathcal{K}_{2}}(-{x}_{\ell}a_{\ell,n}\epsilon)\\ &=-\sum\limits_{{\bf x\in\mathcal{X}}}p_{\bf x}\sum\limits_{n\in\mathcal{N}}\sum\limits_{\mathcal{K}^{\prime}\subseteq{\mathcal{K}}}|\mathcal{K}^{\prime}|\prod\limits_{k\in\mathcal{K}^{\prime}}(-{x}_{k}a_{k,n}\epsilon)=m(-1)^{m-1}\epsilon^{m}\sum\limits_{n\in\mathcal{N}}\sum\limits_{\mathcal{K}^{\prime}\subseteq{\mathcal{K}}:|\mathcal{K}^{\prime}|=m}\left(\sum\limits_{{\bf x\in\mathcal{X}}}p_{\bf x}\prod\limits_{k\in\mathcal{K}^{\prime}}{x}_{k}\right)\prod\limits_{k\in\mathcal{K}^{\prime}}a_{k,n}.\end{array}

Appendix B: Proof of Theorem 1

First, it is clear that each problem in (15) has the same form as the problem in [32, Exercise 4.8]. According to the analytical solution of the problem in [32, Exercise 4.8], a set of optimal points of each problem in (15) are given by (19). Next, since $\bar{T}({\bf A},\epsilon,{\bf p})$ is a polynomial with respect to $\epsilon$ , by checking all roots of $\frac{\partial\bar{T}({\bf A},\epsilon,{\bf p})}{\partial\epsilon}=0$ and the endpoints of the interval, we can obtain the set of optimal points of the problem in (16), which is given by (20). Therefore, we complete the proof of Theorem 1.

Appendix C: Complexity analysis for applying Theorem 1

As constants $m^{2}(-1)^{m-1}$ , $m(-1)^{m-1}$ , $m\in\mathcal{K}$ and $\sum\nolimits_{{\bf x\in\mathcal{X}}}p_{\bf x}\prod\nolimits_{k\in\mathcal{K}^{\prime}}{x}_{k}$ , ${\mathcal{K}^{\prime}}\subseteq\mathcal{K}$ are computed in advance, the corresponding computational complexities are not considered below. First, we analyze the computational complexity for determining the set in (19). In each iteration, we first compute $m(-1)^{m-1}\epsilon^{m}$ , $m\in\mathcal{K}$ in $2K$ flops. Then, we compute $Q_{k,n}({\bf a}_{-k},\epsilon,{\bf p})$ , $k\in\mathcal{K}$ , $n\in\mathcal{N}$ . In (17), the summation with respect to $m$ contains $K$ summands, the summation with respect to $\mathcal{K}^{\prime}$ contains $\binom{K-1}{m-1}$ summands, and $\prod\nolimits_{\ell\in\mathcal{K}^{\prime}:\ell\neq k}a_{\ell,n}$ for $\mathcal{K}^{\prime}\subseteq\mathcal{K}$ with $k\in\mathcal{K}^{\prime},|\mathcal{K}^{\prime}|=m$ involves $m-2$ multiplications. Thus, calculating $Q_{k,n}({\bf a}_{-k},\epsilon,{\bf p})$ , $k\in\mathcal{K}$ , $n\in\mathcal{N}$ costs $\left(K-1+\sum\nolimits_{m\in\mathcal{K}}\left((m-2+1)\binom{K-1}{m-1}+1+\binom{K-1}{m-1}-1\right)\right)NK+2K=NK(K+1)(2^{(K-2)}+1)-2K(N-1)$ flops, i.e., has computational complexity $\mathcal{O}\left(NK^{2}2^{K}\right)$ . For all $k\in\mathcal{K}$ , the computational complexity for finding the largest one among $Q_{k,n}({\bf a}_{-k},\epsilon,{\bf p}),n\in\mathcal{N}$ is $\mathcal{O}(N)$ . Thus, the overall computational complexity for determining the sets in (19) is $\mathcal{O}\left(NK^{2}2^{K}\right)+\mathcal{O}\left(KN\right)=\mathcal{O}\left(NK^{2}2^{K}\right)$ .

Then, we analyze the computational complexity for determining the set in (20). Note that $q({\bf A},\epsilon,{\bf p})$ in (18) is a univariate polynomial of order $K-1$ with respect to $\epsilon$ for any given $\bf A$ and $\bf p$ . We first calculate the coefficients of $q({\bf A},\epsilon,{\bf p})$ . In (18), the summation with respect to $m$ contains $K$ summands, the summation with respect to $n$ and $\mathcal{K}^{\prime}$ contains $N\binom{K}{m}$ summands, and $\prod\nolimits_{\ell\in\mathcal{K}^{\prime}}a_{\ell,n}$ for $n\in\mathcal{N}$ and $\mathcal{K}^{\prime}\subseteq\mathcal{K}$ with $|\mathcal{K}^{\prime}|=m$ involves $m-1$ multiplications. Thus, calculating the coefficients of $q({\bf A},\epsilon,{\bf p})$ costs $\sum\nolimits_{m\in\mathcal{K}}\left((m-1+1)N\binom{K}{m}+1+N\binom{K}{m}-1\right)=N(K+2)2^{(K-1)}-N$ flops, i.e., has computational complexity $\mathcal{O}(NK2^{K})$ . Next, we obtain the $K-1$ roots of $q({\bf A},\epsilon,{\bf p})$ based on its coefficients by using the QR algorithm with computational complexity $\mathcal{O}(K^{3})$ . Furthermore, finding the real roots in $[0,1]$ from these $K-1$ roots has computational complexity $\mathcal{O}(K)$ . Hence, the computational complexity for determining $\mathcal{B}({\bf A},{\bf p})$ is $\mathcal{O}(NK2^{K})+\mathcal{O}(K^{3})+\mathcal{O}(K)=\mathcal{O}(NK2^{K})$ . Analogously, we know that the computational complexity for computing $\bar{T}\left({\bf A},z,{\bf p}\right)$ , $z\in\mathcal{B}({\bf A},{\bf p})\cup\{1\}$ is $\mathcal{O}\left(NK^{2}2^{K}\right)$ . The computational complexity for finding the largest ones among $\bar{T}\left({\bf A},z,{\bf p}\right)$ , $z\in\mathcal{B}({\bf A},{\bf p})\cup\{1\}$ is $\mathcal{O}(K)$ . Therefore, the overall computational complexity for determining the set in (20) is $\mathcal{O}(NK2^{K})+\mathcal{O}(NK^{2}2^{K})+\mathcal{O}(K)=\mathcal{O}(NK^{2}2^{K})$ .

Appendix D: Proof of Theorem 2

First, we show that Algorithm 1 stops in a finite number of iterations. Let $\left({\bf A}^{(i)},\epsilon^{(i)}\right)$ denote the preamble selection distributions and the access barring factor obtained at iteration $i$ . $\mathcal{A}\triangleq\left\{{\bf A}:{\bf a}_{k}={\bf e}_{n_{k}},n_{k}\in\mathcal{N},k\in\mathcal{K}\right\}$ contains $N^{K}$ elements, $\mathcal{B}({\bf A},{\bf p})\cup\{1\}$ contains no more than $K$ elements. Thus, $\mathcal{C}\triangleq\left\{({\bf A},\epsilon):{\bf A}\in\mathcal{A},\epsilon\in\mathcal{B}({\bf A},{\bf p})\cup\{1\}\right\}$ and $\mathcal{T}\triangleq\left\{\bar{T}\left({\bf A},\epsilon,{\bf p}\right):\left({\bf A},\epsilon\right)\in\mathcal{C}\right\}$ both contain no more than $KN^{K}$ elements. In addition, as $\bar{T}\left({\bf A}^{(i)},\epsilon^{(i)},{\bf p}\right)$ is nondecreasing with $i$ (due to the block coordinate optimizations in each iteration) and $\bar{T}\left({\bf A}^{(i)},\epsilon^{(i)},{\bf p}\right)\in\mathcal{T}$ , $i\geq 1$ , by contradiction, we can show that there exists an integer $d\leq KN^{K}$ such that $\bar{T}\left({\bf A}^{(d)},\epsilon^{(d)},{\bf p}\right)=\bar{T}\left({\bf A}^{(d+1)},\epsilon^{(d+1)},{\bf p}\right)$ . According to Steps $4$ - $11$ in Algorithm 1, we have $\left({\bf A}^{(d+1)},\epsilon^{(d+1)}\right)=\left({\bf A}^{(d)},\epsilon^{(d)}\right)$ , which satisfies the stopping criteria of Algorithm 1. Thus, Algorithm 1 stops at iteration $d+1$ , which is no more than $NK^{N}+1$ iterations. Next, we show that Algorithm 1 returns a stationary point of Problem 1. By $\left({\bf A}^{(d+1)},\epsilon^{(d+1)}\right)=\left({\bf A}^{(d)},\epsilon^{(d)}\right)$ and Theorem 1, ${\bf a}_{k}^{(d)}$ is the optimal point of the problem in (15) with ${\bf a}_{-k}={\bf a}_{-k}^{(d)}$ and $\epsilon=\epsilon^{(d)}$ , for $k\in\mathcal{K}$ , and $\epsilon^{(d)}$ is the optimal point of the problem in (16) with ${\bf A}={\bf A}^{(d)}$ . In addition, note that the objective function $\bar{T}({\bf A},\epsilon,{\bf p})$ is continuously differentiable and the feasible set of Problem 1 is convex. Thus, according to the proof for [29, Proposition 2.7.1], we know that $\left({\bf A}^{(d)},\epsilon^{(d)}\right)$ satisfies the first-order optimality condition of Problem 1, i.e., it is a stationary point. Therefore, we complete the proof of Theorem 2.

Appendix E: Proof of Theorem 3

Let ${\bf f}\!\!:\mathbb{R}^{KN+1}\rightarrow\mathbb{R}^{KN+1}$ denote the vector mapping corresponding to one iteration of Algorithm 1. Let $\left({\bf A}^{\diamond},\epsilon^{\diamond}\right)$ denote an optimal point of Problem 1. We construct a feasible point $\left({\bf A}^{\star},\epsilon^{\star}\right)\triangleq{\bf f}\left({\bf A}^{\diamond},\epsilon^{\diamond}\right)$ which satisfies the optimality property in Theorem 3. In the following, we show that $\left({\bf A}^{\star},\epsilon^{\star}\right)$ is also an optimal point of Problem 1. According to Theorem 1, we have $\bar{T}\left({\bf A}^{\diamond},\epsilon^{\diamond},{\bf p}\right)\leq\bar{T}\left({\bf a}_{1}^{\star},{\bf a}_{2}^{\diamond},...,{\bf a}_{K}^{\diamond},\epsilon^{\diamond},{\bf p}\right)\leq\bar{T}\left({\bf a}_{1}^{\star},{\bf a}_{2}^{\star},...,{\bf a}_{K}^{\diamond},\epsilon^{\diamond},{\bf p}\right)\leq...\leq\bar{T}\left({\bf a}_{1}^{\star},{\bf a}_{2}^{\star},...,{\bf a}_{K}^{\star},\epsilon^{\diamond},{\bf p}\right)\leq\bar{T}\left({\bf A}^{\star},\epsilon^{\star},{\bf p}\right)$ . In addition, as $\left({\bf A}^{\diamond},\epsilon^{\diamond}\right)$ is an optimal point, and $\left({\bf A}^{\star},\epsilon^{\star}\right)$ is a feasible point, we have $\bar{T}\left({\bf A}^{\star},\epsilon^{\star},{\bf p}\right)\leq\bar{T}\left({\bf A}^{\diamond},\epsilon^{\diamond},{\bf p}\right)$ . By $\bar{T}\left({\bf A}^{\star},\epsilon^{\star},{\bf p}\right)\geq\bar{T}\left({\bf A}^{\diamond},\epsilon^{\diamond},{\bf p}\right)$ and $\bar{T}\left({\bf A}^{\star},\epsilon^{\star},{\bf p}\right)\leq\bar{T}\left({\bf A}^{\diamond},\epsilon^{\diamond},{\bf p}\right)$ , we have $\bar{T}\left({\bf A}^{\star},\epsilon^{\star},{\bf p}\right)=\bar{T}\left({\bf A}^{\diamond},\epsilon^{\diamond},{\bf p}\right)$ , which implies that $\left({\bf A}^{\star},\epsilon^{\star}\right)$ is also optimal. Therefore, we complete the proof of Theorem 3.

Appendix F: Proof of Lemma 1

The inner problem of Problem 3, $\min_{{\bf p}\in\mathcal{P}}\bar{T}({\bf A},\epsilon,{\bf p})$ , is a feasible LP with respect to $\bf p$ for all $({\bf A},\epsilon)$ satisfying (2a), (2b), (3a) and (3b). As strong duality holds for LP, the primal problem, $\min_{{\bf p}\in\mathcal{P}}\bar{T}({\bf A},\epsilon,{\bf p})$ , and its dual problem, share the same optimal value. Furthermore, by eliminating the equality constraints in the dual problem, we can equivalently convert the dual problem to the following problem[33]

	$\displaystyle\max\limits_{{\boldsymbol{\lambda}}\succeq 0,\nu}$	$\displaystyle\quad\Big{(}\sum\limits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}-1\Big{)}\nu+\sum\limits_{{\bf x}\in\mathcal{X}}\Big{(}(\underline{p}_{\bf x}-\overline{p}_{\bf x})\lambda_{\bf x}+\underline{p}_{\bf x}\sum\limits_{n\in\mathcal{N}}\sum\limits_{k\in\mathcal{K}}x_{k}a_{k,n}\epsilon\prod\limits_{\ell\in{\mathcal{K}}:\ell\neq k}(1-x_{\ell}a_{\ell,n}\epsilon)\Big{)}$		(47)
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\quad\nu+\lambda_{\bf x}+\sum\limits_{n\in\mathcal{N}}\sum\limits_{k\in\mathcal{K}}x_{k}a_{k,n}\epsilon\prod\limits_{\ell\in{\mathcal{K}}:\ell\neq k}(1-x_{\ell}a_{\ell,n}\epsilon)\geq 0,\ {\bf x}\in\mathcal{X}.$

Thus, the problems in (47) and $\min_{{\bf p}\in\mathcal{P}}\bar{T}({\bf A},\epsilon,{\bf p})$ share the same optimal value, which can be viewed as a function of $({\bf A},\epsilon)$ . Thus, the max-min problem in Problem 3 can be equivalently converted to the maximization problem in Problem 4, by replacing the inner problem, $\min_{{\bf p}\in\mathcal{P}}\bar{T}({\bf A},\epsilon,{\bf p})$ , with the problem in (47), and replacing $\epsilon a_{k,n}$ with new variables $b_{k,n}$ in both the objective function and constraint functions. Therefore, we complete the proof of Lemma 1.

Appendix G: Proof of Theorem 6

We show that the assumptions in [30, Theorem 1] are satisfied. It is clear that $\tilde{h}({\bf B},{\boldsymbol{\lambda}},{\nu},{\bf B}^{(s)})$ and $\tilde{h}_{\text{c}}({\bf B},{\boldsymbol{\lambda}},\epsilon,{\nu},{\bf B}^{(s)},{\bf x})$ , ${\bf x}\in\mathcal{X}$ are continuously differentiable with respect to $({\bf B},{\boldsymbol{\lambda}},\epsilon,{\nu})$ , and each of them is the sum of linear functions and strongly concave functions. Thus, $\tilde{h}({\bf B},{\boldsymbol{\lambda}},{\nu},{\bf B}^{(s)})$ and $\tilde{h}_{\text{c}}({\bf B},{\boldsymbol{\lambda}},\epsilon,{\nu},{\bf B}^{(s)},{\bf x})$ , ${\bf x}\in\mathcal{X}$ satisfy the first, second, and third assumptions. From (28) and (29), it is clear that $\tilde{h}({\bf B},{\boldsymbol{\lambda}},{\nu},{\bf B}^{(s)})$ and $\tilde{h}_{\text{c}}({\bf B},{\boldsymbol{\lambda}},\epsilon,{\nu},{\bf B}^{(s)},{\bf x})$ , ${\bf x}\in\mathcal{X}$ satisfy the fourth and fifth assumptions. Let $\nabla_{\bf B}^{2}T({{\bf B}},1,{\bf x})\triangleq\big{(}\frac{\partial^{2}T({{\bf B}},1,{\bf x})}{\partial b_{k,n}b_{k^{\prime},n^{\prime}}}\big{)}_{(k,n),(k^{\prime},n^{\prime})\in\mathcal{K}\times\mathcal{N}}$ and $\nabla_{\bf B}^{2}\sum\nolimits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}T({{\bf B}},1,{\bf x})\triangleq\big{(}\sum\nolimits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}\frac{\partial^{2}T({{\bf B}},1,{\bf x})}{\partial b_{k,n}b_{k^{\prime},n^{\prime}}}\big{)}_{(k,n),(k^{\prime},n^{\prime})\in\mathcal{K}\times\mathcal{N}}$ denote the Hessians of $T({{\bf B}},1,{\bf x})$ and $\sum\nolimits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}T({{\bf B}},1,{\bf x})$ , respectively. We can show $\nabla_{\mathbf{B}}^{2}T({{\bf B}},1,{\bf x})\preceq$ $\sqrt{N}\|\mathbf{x}\|_{1}^{2}{\bf E}$ , ${\bf x}\in\mathcal{X}$ , and $\nabla_{\bf B}^{2}\sum\nolimits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}T({{\bf B}},1,{\bf x})\preceq\sqrt{N\sum\nolimits_{k\in\mathcal{K}}\sum\nolimits_{k^{\prime}\in\mathcal{K}:k^{\prime}\neq k}\big{(}\sum\nolimits_{\mathbf{x}\in\mathcal{X}}\underline{p}_{\mathbf{x}}x_{k}x_{k^{\prime}}\|\mathbf{x}\|_{1}\big{)}^{2}}{\bf E}$ [33], where $\bf E$ represents the identity matrix. Thus, using the inequality implied by Taylor’s theorem [35, Eq. (25)], we know that $\tilde{h}({\bf B},{\boldsymbol{\lambda}},{\nu},{\bf B}^{(s)})$ and $\tilde{h}_{\text{c}}({\bf B},{\boldsymbol{\lambda}},\epsilon,{\nu},{\bf B}^{(s)},{\bf x})$ , ${\bf x}\in\mathcal{X}$ satisfy the sixth assumption. Therefore, Theorem 4 readily follows from [30, Theorem 1].

Appendix H: Proof of Lemma 2

For all $({\bf A},\epsilon)$ satisfying (2a), (2b), (3a), (3b), as ${\bf p}\in\tilde{\mathcal{P}}$ , we have $\min\nolimits_{{\bf p}\in\tilde{\mathcal{P}}}\tilde{T}({\bf A},{\epsilon},{\bf p})\geq\epsilon\sum\nolimits_{k\in\mathcal{K}}\sum\nolimits_{{\bf x}\in\mathcal{X}}{\underline{p}}_{\bf x}x_{k}-\epsilon^{2}\sum\nolimits_{n\in\mathcal{N}}\sum\nolimits_{k\in\mathcal{K}}a_{k,n}\sum\nolimits_{\ell\in{\mathcal{K}}:\ell>k}a_{\ell,n}\sum\nolimits_{{\bf x}\in\mathcal{X}}{\overline{p}}_{\bf x}x_{k}x_{\ell}$ , where the equality holds when

\displaystyle\begin{array}[]{ll}\sum\limits_{{\bf x}\in\mathcal{X}}p_{\bf x}=1,\ \sum\limits_{{\bf x}\in\mathcal{X}}{p}_{\bf x}x_{k}=\sum\limits_{{\bf x}\in\mathcal{X}}\underline{p}_{\bf x}x_{k},\ k\in\mathcal{K},\ \sum\limits_{{\bf x}\in\mathcal{X}}{p}_{\bf x}x_{k}x_{\ell}=\sum\limits_{{\bf x}\in\mathcal{X}}\overline{p}_{\bf x}x_{k}x_{\ell},\ k,\ell\in\mathcal{K},k<\ell\end{array}

(49)

are satisfied. Thus, to show Lemma 2, it is equivalent to show that the system of linear equations with variable $\bf p$ in (49) has a solution. Let $g({\bf x})\triangleq 1+\sum\nolimits_{k\in\mathcal{K}}x_{k}2^{k-1}$ [37]. Obviously, $g:\mathcal{X}\rightarrow\{1,2,3,...,2^{K}\}$ is a bijection. Thus, for all ${\bf x}\in\mathcal{X}$ , we can also write $p_{\bf x}$ as $p_{g({\bf x})}$ . As $g(\bf 0)$ , $\min\{g({\bf x})|x_{k}=1,{\bf x}\in\mathcal{X}\}=g({\bf e}_{k})$ , $k\in\mathcal{K}$ , and $\min\{g({\bf x})|x_{k}=1,x_{\ell}=1,{\bf x}\in\mathcal{X}\}=g({\bf e}_{k}+{\bf e}_{\ell})$ , $k,\ell\in\mathcal{K},k<\ell$ are all different, the matrix of the system of linear equations in (49) has full row rank. In addition, note that the number of equations, $\frac{K^{2}+K+2}{2}$ , is no greater than the number of variables, $2^{K}$ . Thus, the system of linear equations in (49) has a solution. We complete the proof of Lemma 2.

Appendix I: Proof of Theorem 8

For all $k\in\mathcal{K}$ and all $t\geq 1$ , let $n^{(t)}\triangleq\mathop{\arg\max}_{n\in\mathcal{N}}c_{k,n}^{(t)}$ , which is a singleton, and let $n^{{}^{\prime}(t)}$ denote one index in $\mathop{\arg\max}_{n\in\mathcal{N}:n\neq n^{(t)}}c_{k,n}^{(t)}$ . First, we show that for any $0<\omega<\frac{1}{4}\big{(}c_{k,n^{(t)}}^{(t)}-c_{k,n^{{}^{\prime}(t)}}^{(t)}\big{)}$ , the optimization problem in (32) and the following QP (which is strongly convex),

	$\displaystyle\max\limits_{{\bf a}_{k}}$	$\displaystyle\quad Q_{\omega}^{(t)}\left({\bf a}_{k}\right)\triangleq\tilde{T}_{\text{st},k}^{(t)}\big{(}{\bf a}_{k},{\bf A}^{(t-1)},\epsilon^{(t-1)}\big{)}-\omega\sum\nolimits_{n\in\mathcal{N}}\big{(}a_{k,n}-a_{k,n}^{(t-1)}\big{)}^{2},\quad k\in\mathcal{K}$		(50)
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\quad\text{(\ref{C2})},\ \text{(\ref{C3})},$

share the same optimal point. According to Theorem 7, the optimal point of the problem in (32) is ${\bf e}_{n^{(t)}}$ . In addition, for any ${\bf a}_{k}$ satisfying the constraints in (3a) and (3b), we can show $Q_{\omega}^{(t)}\left({\bf a}_{k}\right)\leq Q_{\omega}^{(t)}\left({\bf e}_{n^{(t)}}\right)$ [33]. By noting that the QP in (50) is strongly convex (as $\omega>0$ ), ${\bf e}_{n^{(t)}}$ is also the unique optimal point of the strongly convex QP. Thus, if $\mathop{\arg\max}_{n\in\mathcal{N}}c_{k,n}^{(t)}$ is a singleton for all $k\in\mathcal{K}$ and all $t\geq 1$ , Algorithm 4 can be viewed as stochastic parallel SCA in [36], for solving the following stochastic problem

	$\displaystyle\max\limits_{{\bf A},\epsilon}$	$\displaystyle\quad\mathbb{E}\left[\frac{M}{I}\sum\limits_{m\in\mathcal{M}}\text{I}[m=\xi]\sum\limits_{i\in\mathcal{I}_{m}}T\left({\bf A},{\epsilon},{\bf x}_{i}\right)\right]$		(51)
	$\displaystyle\mathrm{s.t.}$	$\displaystyle\quad\text{(\ref{C1})},\ \text{(\ref{C12})},\ \text{(\ref{C2})},\ \text{(\ref{C3})},$

where $\xi$ (index of mini-batch) follows the uniform distribution over $\mathcal{M}$ , and the expectation is taken over $\xi$ . By noting that $\mathbb{E}\left[\frac{M}{I}\sum\nolimits_{m\in\mathcal{M}}\text{I}[m=\xi]\sum\nolimits_{i\in\mathcal{I}_{m}}T\left({\bf A},{\epsilon},{\bf x}_{i}\right)\right]=\bar{T}_{\text{st}}\left({\bf A},\epsilon\right)$ , the stochastic problem in (51) is equivalent to Problem 7. Next, we show the convergence of Algorithm 4 by showing that the assumptions in [36, Theorem 1] are satisfied. Obviously, the constraint set of Problem 7 is compact and convex. It is clear that for any $\xi\in\mathcal{M}$ , $\frac{M}{I}\sum\nolimits_{m\in\mathcal{M}}\text{I}[m=\xi]\sum\nolimits_{i\in\mathcal{I}_{m}}T\left({\bf A},{\epsilon},{\bf x}_{i}\right)$ is smooth on the constraint set of Problem 7, and hence it is continuously differentiable and its derivative is Lipschitz continuous. Random variables ${\xi}^{(0)},{\xi}^{(1)},...$ are bounded and identically distributed. Therefore, Theorem 8 readily follows from[36, Theorem 1].

References

[1] W. Liu, Y. Cui, L. Ding, J. Sun, Y. Liu, Y. Li, and L. Zhang, “Joint optimization of preamble selection and access barring for MTC with correlated device activities,” in Proc. IEEE ICC Wkshps., Jun. 2021, pp. 1–6.
[2] Z. Dawy, W. Saad, A. Ghosh, J. G. Andrews, and E. Yaacoub, “Toward massive machine type cellular communications,” IEEE Wireless Commun., vol. 24, no. 1, pp. 120–128, Feb. 2017.
[3] M. Hasan, E. Hossain, and D. Niyato, “Random access for machine-to-machine communication in LTE-advanced networks: Issues and approaches,” IEEE Commun. Mag., vol. 51, no. 6, pp. 86–93, Jun. 2013.
[4] R. Ratasuk, N. Mangalvedhe, D. Bhatoolaul, and A. Ghosh, “LTE-M evolution towards 5G massive MTC,” in Proc. IEEE GC Wkshps., Dec. 2017, pp. 1–6.
[5] Y. Wang, X. Lin, A. Adhikary, A. Grovlen, Y. Sui, Y. Blankenship, J. Bergman, and H. S. Razaghi, “A primer on 3GPP narrowband Internet of Things,” IEEE Commun. Mag., vol. 55, no. 3, pp. 117–123, Mar. 2017.
[6] E. Dahlman, P. Stefan, and S. Johan, 5G NR: The Next Generation Wireless Access Technology. Academic Press, 2020.
[7] 3GPP, “Evolved universal terrestrial radio access (E-UTRA); medium access control (MAC) protocol specification,” TS 36.321, Apr, 2015.
[8] S. Duan, V. Shah-Mansouri, Z. Wang, and V. W. S. Wong, “D-ACB: Adaptive congestion control algorithm for bursty M2M traffic in LTE networks,” IEEE Trans. Veh. Technol., vol. 65, no. 12, pp. 9847–9861, Dec. 2016.
[9] Z. Wang and V. W. S. Wong, “Optimal access class barring for stationary machine type communication devices with timing advance information,” IEEE Trans. Wireless Commun., vol. 14, no. 10, pp. 5374–5387, Oct. 2015.
[10] M. Vilgelm, S. Rueda Li $\tilde{n}$ ares, and W. Kellerer, “On the resource consumption of M2M random access: Efficiency and pareto optimality,” IEEE Wireless Commun. Lett., vol. 8, no. 3, pp. 709–712, Jun. 2019.
[11] W. Zhan and L. Dai, “Massive random access of machine-to-machine communications in LTE networks: Modeling and throughput optimization,” IEEE Trans. Wireless Commun., vol. 17, no. 4, pp. 2771–2785, Apr. 2018.
[12] ——, “Massive random access of machine-to-machine communications in LTE networks: Throughput optimization with a finite data transmission rate,” IEEE Trans. Wireless Commun., vol. 18, no. 12, pp. 5749–5763, Apr. 2019.
[13] C. Di, B. Zhang, Q. Liang, S. Li, and Y. Guo, “Learning automata-based access class barring scheme for massive random access in machine-to-machine communications,” IEEE Internet Things J., vol. 6, no. 4, pp. 6007–6017, Aug. 2019.
[14] H. Jin, W. T. Toor, B. C. Jung, and J. Seo, “Recursive pseudo-bayesian access class barring for M2M communications in LTE systems,” IEEE Trans. Veh. Technol., vol. 66, no. 9, pp. 8595–8599, Sep. 2017.
[15] O. Galinina, A. Turlikov, S. Andreev, and Y. Koucheryavy, “Stabilizing multi-channel slotted ALOHA for machine-type communications,” in Proc. IEEE ISIT, Jul. 2013, pp. 2119–2123.
[16] C. Oh, D. Hwang, and T. Lee, “Joint access control and resource allocation for concurrent and massive access of M2M devices,” IEEE Trans. Wireless Commun., vol. 14, no. 8, pp. 4182–4192, Aug. 2015.
[17] J. Choi, “On the adaptive determination of the number of preambles in RACH for MTC,” IEEE Wireless Commun. Lett., vol. 20, no. 7, pp. 1385–1388, Jul. 2016.
[18] ——, “Multichannel ALOHA with exploration phase,” in Proc. IEEE WCNC, May 2020, pp. 1–6.
[19] M. Shehab, A. K. Hagelskjar, A. E. Kalor, P. Popovski, and H. Alves, “Traffic prediction based fast uplink grant for massive IoT,” in Proc. IEEE PIMRC, Sep. 2020, pp. 1–6.
[20] S. Ali, A. Ferdowsi, W. Saad, N. Rajatheva, and J. Haapola, “Sleeping multi-armed bandit learning for fast uplink grant allocation in machine type communications,” IEEE Trans. Commun., vol. 68, no. 8, pp. 5072–5086, Apr. 2020.
[21] S. Ali, W. Saad, and N. Rajatheva, “A directed information learning framework for event-driven M2M traffic prediction,” IEEE Commun. Lett., vol. 22, no. 11, pp. 2378–2381, Aug. 2018.
[22] 3GPP, “Study on RAN improvements for machine-type communications,” TR 37.868, Spt. 2011.
[23] E. Soltanmohammadi, K. Ghavami, and M. Naraghi-Pour, “A survey of traffic issues in machine-to-machine communications over LTE,” IEEE Internet Things J., vol. 3, no. 6, pp. 865–884, Dec. 2016.
[24] J. Navarro-Ortiz, P. Romero-Diaz, S. Sendra, P. Ameigeiras, J. J. Ramos-Munoz, and J. M. Lopez-Soler, “A survey on 5G usage scenarios and traffic models,” IEEE Commun. Surv. Tutor., vol. 22, no. 2, pp. 905–929, May 2020.
[25] Y. Cui, S. Li, and W. Zhang, “Jointly sparse signal recovery and support recovery via deep learning with applications in MIMO-based grant-free random access,” IEEE J. Select. Areas Commun., vol. 39, no. 3, pp. 788–803, Mar. 2021.
[26] D. Jiang and Y. Cui, “ML and MAP device activity detections for grant-free massive access in multi-cell networks,” IEEE Trans. Wireless Commun., to appear.
[27] A. E. Kalor, O. A. Hanna, and P. Popovski, “Random access schemes in wireless systems with correlated user activity,” in Proc. IEEE SPAWC, Jun. 2018, pp. 1–5.
[28] L. Liu and W. Yu, “Massive connectivity with massive MIMO–part I: Device activity detection and channel estimation,” IEEE Trans. Signal Process., vol. 66, no. 11, pp. 2933–2946, Jun. 2018.
[29] D. P. Bertsekas, Nonlinear progranmming. Athena scientific Belmont, MA, 1998.
[30] M. Razaviyayn, “Successive convex approximation: Analysis and applications,” Ph.D. dissertation, Univ. of Minnesota, Minneapolis, MN, USA, May 2014.
[31] A. Koppel, A. Mokhtari, and A. Ribeiro, “Parallel stochastic successive convex approximation method for large-scale dictionary learning,” in Proc. IEEE ICASSP, Apr. 2018, pp. 2771–2775.
[32] S. Boyd, S. P. Boyd, and L. Vandenberghe, Convex optimization. Cambridge university press, 2004.
[33] W. Liu, Y. Cui, F. Yang, L. Ding, and J. Sun, “Joint optimization of preamble selection and access barring for random access in mtc with general device activities,” arXiv preprint arXiv:2107.10461, Jul. 2021.
[34] C. Ye, Y. Cui, Y. Yang, and R. Wang, “Optimal caching designs for perfect, imperfect, and unknown file popularity distributions in large-scale multi-tier wireless networks,” IEEE Trans. Commun., vol. 67, no. 9, pp. 6612–6625, May 2019.
[35] Y. Sun, P. Babu, and D. P. Palomar, “Majorization-minimization algorithms in signal processing, communications, and machine learning,” IEEE Trans. Signal Process., vol. 65, no. 3, pp. 794–816, Feb. 2017.
[36] Y. Yang, G. Scutari, D. P. Palomar, and M. Pesavento, “A parallel decomposition method for nonconvex stochastic multi-agent optimization problems,” IEEE Trans. Signal Process., vol. 64, no. 11, pp. 2949–2964, Feb. 2016.
[37] J. L. Teugels, “Some representations of the multivariate bernoulli and binomial distributions,” Journal of Multivariate Analysis, vol. 32, no. 2, pp. 256–268, Feb. 1990.

Joint Optimization of Preamble Selection and Access Barring for Random Access in MTC with General Device Activities

Abstract

Index Terms:

I Introduction

II System Model

Example 1 (Motivation Example)

III Performance Optimization for Perfect Joint Device Activity Distribution

A Problem Formulation

Problem 1 (Average Throughput Maximization)

B Stationary Point

Theorem 1 (Optimal Points of Problems in (15) and (16))

Proof:

Theorem 2 (Convergence of Algorithm 1)

Proof:

Theorem 3 (Optimality Property)

Proof:

C Low-complexity Solution

Problem 2 (Approximate Average Throughput Maximization)

Theorem 4 (Optimal Points of Problems in (22) and (23))

Proof:

Theorem 5 (Convergence of Algorithm 2)

Proof:

IV Robust Optimization for Imperfect Joint Device Activity Distribution

A Problem Formulation

Problem 3 (Worst-case Average Throughput Maximization)

B KKT Point

Problem 4 (Equivalent Problem of Problem 3)

Lemma 1 (Equivalence between Problem 3 and Problem 4)

Proof:

Problem 5 (Approximate Problem of Problem 4 at Iteration ss)

Theorem 6 (Convergence of Algorithm 3)

Proof:

C Low-complexity Solution

Lemma 2 (Approximate Worst-case Average Throughput)

Proof:

Problem 6 (Approximate Worst-case Average Throughput Maximization)

V Stochastic Optimization for Unknown Joint Device Activity Distribution

A Problem Formulation

Problem 7 (Sample Average Throughput Maximization)

B Stationary Point

Theorem 7 (Optimal Points of Problems in (32) and (33))

Proof:

Theorem 8 (Convergence of Algorithm 4)

Proof:

VI Numerical Results

A Small K and N

B Large K and N

VII Conclusion

Appendix A: Proof for (6)

Appendix B: Proof of Theorem 1

Appendix C: Complexity analysis for applying Theorem 1

Appendix D: Proof of Theorem 2

Appendix E: Proof of Theorem 3

Appendix F: Proof of Lemma 1

Appendix G: Proof of Theorem 6

Appendix H: Proof of Lemma 2

Appendix I: Proof of Theorem 8

References

Problem 5 (Approximate Problem of Problem 4 at Iteration $s$ )