An Exploration of the
Heterogeneous Unsourced MAC

†A. Hao, †S. Rini, §V. K. Amalladinne, §A. K. Pradhan, §J.-F. Chamberland
†Electrical and Computer Engineering, National Chiao Tung University
§Electrical and Computer Engineering, Texas A&M University This material is based upon work supported, in part, by the National Science Foundation (NSF) under Grant CCF-1619085 and by Qualcomm Technologies, Inc., through their University Relations Program.

Abstract

The unsourced MAC model was originally introduced to study the communication scenario in which a number of devices with low-complexity and low-energy wish to upload their respective messages to a base station. In the original problem formulation, all devices communicate using the same information rate. This may be very inefficient in certain wireless situations with varied channel conditions, power budgets, and payload requirements at the devices. This paper extends the original problem setting so as to allow for such variability. More specifically, we consider the scenario in which devices are clustered into two classes, possibly with different SNR levels or distinct payload requirements. In the cluster with higher power, devices transmit using a two-layer superposition modulation. In the cluster with lower energy, users transmit with the same base constellation as in the high power cluster. Within each layer, devices employ the same codebook. At the receiver, signal groupings are recovered using Approximate Message Passing (AMP), and proceeding from the high to the low power levels using successive interference cancellation (SIC). This layered architecture is implemented using Coded Compressed Sensing (CCS) within every grouping. An outer tree code is employed to stitch fragments together across times and layers, as needed. This pragmatic approach to heterogeneous CCS is validated numerically and design guidelines are identified.

Index Terms:

Unsourced random access, Coded compressed sensing, Approximate message passing, Superposition constellation.

I Introduction

The IoT paradigm of myriad unattended devices connected wirelessly to the Internet may pose a significant disruption to existing communication networks. The predicted number of such devices, orders of magnitude greater than human subscribers, and the usage profile of these devise, sporadic and fleeting, invalidate the type of connection-based architectures that form a foundation for existing deployments. Thus, new means of Internet access must be explored to reflect this change, with provisions for random access. Along these lines, one model attuned to this reality that has gained attention in recent years is unsourced random-access (URA). The URA formulation, originally proposed by Polyanskiy [polyanskiy2017perspective], centers on concurrent up-link data transfers. There is a strong connection between URA and Compressed Sensing (CS), with the former problem being an instance of a noisy support recovery task [choi2017compressed, reeves2012sampling, gilbert2017all]. More precisely, in URA, the receiver seeks to identify the set of messages being transmitted by active devices, without regard for the identities of their sources. The identity of a source can be embedded in the message payload, if needed. The value of this approach lies in the fact that the access point does not need to determine the set of active devices at the onset of a frame, a step that can rapidly become overwhelming for connection-less settings with a very large population of candidate transmitters. URA raises both theoretical and practical challenges. Achievable bounds rooted in finite-block length analysis for such systems can be found in [polyanskiy2017perspective]. These bounds are obtained devoid of complexity constraints, as they rely on joint maximum likelihood decoding.

Several pragmatic, low-complexity approaches for this problem have been proposed [ordentlich2017low, vem2019user, amalladinne2019coded, calderbank2018chirrup, fengler2019sparcs, pradhan2019sparseidma, marshakov2019polar, AKPolar]. Conceptually, each of these contributions offer a means to circumvent the difficulty associated with the dimensionality of the problem. Indeed, when viewed as a support recovery task, unsourced random access features a $K$ -sparse state vector of length $2^{100}$ or longer. This reality prevents the straightforward application of standard CS solvers. To address this issue, many algorithms leverage lessons from random access and coding theory to design structured sensing matrices suitable for the efficient recovery of the sent messages. A line of research that has attracted attention in this context is the framework of coded compressed sensing (CCS) originally proposed by Amalladinne et al. [amalladinne2018couple, amalladinne2019coded]. This scheme is a divide-and-conquer approach where a large CS problem is partitioned into smaller components, each of which can be solved using standard CS algorithms. The output of this step produces lists of message fragments, one list for every CS instance. The transmitted messages are recovered by stitching fragments together using an outer code. Overall the approach can be abstracted as a concatenated coding scheme where an inner code is task with fragment recovery and the outer code is responsible for message disambiguation.

CCS has been ported, enhanced, and extended by multiple authors. It appears as a component of the ultra-low complexity CHIRRUP algorithm [calderbank2018chirrup], and it can be incorporated into activity detection in multi-antenna systems [fengler2019massive]. CCS can be employed to build neighbor discovery schemes and to handle signal asynchrony [zhang2013neighbor, thompson2018compressed, amalladinne2019asynchronous]. An enhanced version of the algorithm takes advantage of the fact that output from the early stages of CCS can be integrated into later stages as side information to improve execution [amalladinne2020enhanced]. This variant has inspired significant extensions related to sparse regression codes and Approximate Message Passing (AMP) [fengler2019sparcs, amalladinne2020unsourced].

Along similar lines of research, the main contributions of our article can be summarized as follows.

•

In Sec. II, we introduce a novel system model for unsourced random access. This new model captures the fact that, in practice, wireless IoT devices may have distinct payload requirements. Heterogeneity is addressed by introducing the notion of clustering, whereby users within a cluster have the same power budget and they transmit at the same rate. We refer to this model as HetURA.
•

A pragmatic communication scheme for this setting is developed in Sec. III. The propose algorithm borrows ideas from CCS [amalladinne2018couple, amalladinne2019coded], but also introduces a phased decoding approach akin to successive interference cancellation across layers. Portions of clusters with greater energy budgets are decoded first. The structure of the problem is facilitated by a superposition constellation with two-levels.

The value of the proposed framework is examined in Sec. IV, where performance results showcase the validity of the approach. Finally, Sec. LABEL:sec:Conclusion concludes the paper.

II Heterogeneous URA

Figure 1: This notional diagram offers a synopsis of the proposed communication scheme, as described in Sec. III.

Our goal is to introduce and study a heterogeneous version of URA with groupings, where distinct groups have different power levels and data requirements. We refer to this model as the heterogeneous URA (HetURA). The HetURA is formally defined as the up-link scenario where the user population is divided in $K$ clusters, with cluster $k$ containing the set of devices $\mathcal{S}_{k}$ . Of these $\mathcal{S}_{k}$ devices, only a subset $\mathcal{A}_{k}\subset\mathcal{S}_{k}$ of size $|\mathcal{A}_{k}|=M_{k}$ is active, with users therein wishing to communicate to the base station. The output at the receiver is then equal to

\displaystyle\mathbf{y}=\sum_{k\in[K]}\sum_{i\in\mathcal{A}_{k}}\mathbf{x}_{ki}+\mathbf{z},

(1)

where $\mathbf{x}_{ki}\in\mathbb{R}^{N}$ is the channel input of the $i^{\mathrm{th}}$ user in the $k^{\mathrm{th}}$ cluster, and $N$ is the block-length. Note $[K]\ \triangleq\{1,\ldots,K\}$ in (1). Each input sequence is subject to an expected power constraint $\|\mathbf{x}_{ki}\|_{2}^{2}\leq NP_{k}$ with cluster ordering $P_{k}\leq P_{k+1}$ . The components of additive noise $\mathbf{z}$ are independent, each with a standard normal distribution.

A suitable transmission scheme for the HetUMAC is defined as follows. Active user $i$ in cluster $k$ wishes to transmit message $\mathbf{w}_{ki}\in\left[\lfloor 2^{NR_{k}}\rfloor\right]$ , where $R_{k}$ denotes the rate of cluster $k$ . All the users within a cluster employ the same code and, hence, they share a same rate. This gives

\displaystyle\mathbf{x}_{ki}=f_{{\rm enc}-k}\left(\mathbf{w}_{ki}\right),\quad\forall\ i\in\mathcal{A}_{k}.

(2)

Having observed $\mathbf{y}$ , the receiver is tasked with decoding the list of messages transmitted by each cluster; that is,


	$\displaystyle{\cal W}_{k}=f_{{\rm dec}-k}(\mathbf{y}),\quad\forall\ k\in[K],$	(3a)
w

ith $|{\cal W}_{k}|=M_{k}$ . Every entry on this list should take value in the set $\left[\lfloor 2^{NR_{k}}\rfloor\right]$ . System performance is evaluated according to the per-user probability of error, defined as [polyanskiy2017perspective]

\displaystyle P_{\rm UE}=\max_{k\in[K]}\ \frac{1}{M_{k}}\sum_{i\in\mathcal{A}_{k}}\mathbb{P}\left[w_{ki}\not\in{\cal W}_{k}|\mathbf{y}\right].

(4)

In words, this captures the (maximum) probability that a message sent by one of the devices is not recovered at the receiver. Note that, since all the users in a cluster use the same encoding function, as in (2), the receiver does not discover which user transmitted which message.

III A Coded Compressed Sensing Scheme

In this section, we describe an extension of the work found in [amalladinne2019coded] adapted to the HetUMAC scenario discussed in Sec. II. To put our contribution in context, we begin with a brief review of key CS notions. The original URA formulation can be viewed as sparse support recovery from observation

\displaystyle\mathbf{y}=\boldsymbol{\Phi}\mathbf{m}+\mathbf{z},

(5)

where $\boldsymbol{\Phi}\in\mathbb{R}^{N\times 2^{\lfloor NR\rfloor}}$ is a dictionary of possible signals, $\mathbf{m}$ is a binary vector that contains the indices of the transmitted codewords so that $\mathbf{m}\in\{0,1\}^{2^{\lfloor NR\rfloor}}$ , and $\mathbf{z}$ is additive noise as in (1). We stress that $\mathbf{m}$ is a sparse vector with $\|\mathbf{m}\|_{0}$ being equal to the number of active URA devices.

As mentioned above, this article explores the extended scenario where the device population is partitioned into groups, and users from distinct clusters employ different codebooks. For ease of exposition, we restrict our treatment to the case where $K=2$ . When two groupings are present, the CS interpretation of URA becomes

\displaystyle\mathbf{y}=\boldsymbol{\Phi}_{1}\mathbf{m}_{1}+\boldsymbol{\Phi}_{2}\mathbf{m}_{2}+\mathbf{z}.

(6)

In a manner analogous to the basic URA formulation, $\mathbf{m}_{1}$ denotes the collection of indices from the first cluster; and $\mathbf{m}_{2}$ , the indices from the second cluster. As in Sec. II, we assume that the clusters are ordered in increasing transmit power, so that we refer to the first/second cluster and low/high-energy cluster. For simplicity, we do not discuss this alternative in the paper. Recall that CCS was introduced as a means to tackle the dimensionality issue posed by the width of $\boldsymbol{\Phi}$ . Quite obviously, when expanding the sensing matrix $\big{[}\boldsymbol{\Phi}_{1}\,\boldsymbol{\Phi}_{2}\big{]}$ to accommodate multiple groupings, a number of complexity issue arises. In particular, similarly to CSS, complexity allows for the decoding through non-negative least squares (NNLS) or LASSO only for limited problem dimensions in (6). To support longer transmission block-lengths, the transmitted bits are divide into fragments and sent in separate slots. Since the identity of the transmitter is not conveyed in the choice of encoding function, the individual fragments of the original messages must be pieced together through a low-complexity tree-based algorithm as in [amalladinne2019coded].

To further reduce decoding complexity, the users in the high-energy cluster transmit their message bits using a superposition constellation with two-layers: a top and a bottom layer. The symbols in the bottom layer are transmitted using the same code-book as the low-energy cluster, whereas the remaining bits are “on top” of the bottom bits using superposition constellation. This coding choice allows the receiver, for each transmission slot, to first decode the top layer of the superposition constellation in the high-energy cluster; and then, after using Successive Interference Cancellation (SIC), decode the low-energy users together with the bottom layer in the superposition constellation of the high-energy cluster. Upon decoding all the message fragments in all the slots and from all the users, the receiver can then employ the tree decoder to reconstruct the set of transmitted messages. We further detail the proposed transmission scheme below.

III-A Fragmenting

In the low-energy cluster, every message $\mathbf{w}_{1i}$ is converted into a binary vector and partitioned into $J$ sub-blocks, where the $j^{\mathrm{th}}$ sub-block consists of $B_{1j}$ bits, so that $\sum_{j\in[J]}B_{1j}=B_{1}=\lfloor NR_{1}\rfloor$ . This results in a collection of information fragments $\{\mathbf{w}_{1ij}\}_{j\in[J]}$ for message $\mathbf{w}_{1i}$ . On the other hand, users in the high-energy cluster first split their bits into two groups: one for the top layer and one for the bottom layer. Denote these two sets of bits by $\mathbf{\overline{\overline{w}}}_{2i}$ , $\mathbf{\overline{w}}_{2i}$ for the top and bottom portions, respectively. Likewise, let ${\overline{\overline{B}}}_{2}$ , $\overline{B}_{2}$ be the total numbers of bits assigned to these two layers; and ${\overline{\overline{R}}}_{2}$ , $\overline{R}_{2}$ be the corresponding rates. Subsequently, the bottom bits are fragmented exactly as in the low-energy cluster to form the set $\{\mathbf{\overline{w}}_{2ij}\}$ . The top bits $\mathbf{\overline{\overline{w}}}_{2i}$ are also partitioned into $J$ fragments, but this time ${\overline{\overline{B}}}_{2j}$ is the size of the $j^{\mathrm{th}}$ fragment (not necessarily the same partitioning as in the bottom layer). Again, we must have $\sum_{j\in[J]}{\overline{\overline{B}}}_{2j}={\overline{\overline{B}}}_{2}$ and, additionally, $\overline{B}_{2}+{\overline{\overline{B}}}_{2}=\lfloor NR_{2}\rfloor$ .

III-B Tree Encoding

The role of the tree decoder is to enable the stitching of message fragments at the decoder. In CCS, this is accomplished by appending parity bits to the $j^{\mathrm{th}}$ fragment based on the preceding information bits. Every device in the low-energy cluster takes the message fragment $\mathbf{w}_{1i(j+1)}$ and encodes it into a vector $\mathbf{v}_{1i(j+1)}$ using a systematic random linear code, together with the message fragment, $\mathbf{w}_{1i(j+1)}$

\displaystyle\mathbf{v}_{1i(j+1)}=\left[\mathbf{w}_{1i(j+1)}\ ;\ \mathbf{G}_{1j}\otimes\left[\mathbf{w}_{1i1};\ldots;\mathbf{w}_{1ij}\right]\right],

(7)

where $\mathbf{G}_{1j}$ produces $T-B_{1(j+1)}$ random parity bits from all the previous segments, and $\otimes$ indicates modulo-2 matrix multiplication. Thus, $\mathbf{v}_{1i(j+1)}$ is viewed as a binary vector. In this scheme, $B_{11}=T$ ; that is the first fragment does not contain any parity bits. In (7), $T\geq B_{1j}$ , so that effectively we have $T-B_{1j}$ random linear parity constraints embedded in this block to help stitch together fragments of information bits belonging to $\mathbf{w}_{1ij}$ when decoding codeword $\mathbf{v}_{1i}$ .

A user in the high-energy cluster performs a similar encoding process. Redundancy for bottom bits is added paralleling the encoding in the low energy cluster, yielding

\displaystyle\mathbf{\overline{v}}_{2i(j+1)}=\left[\mathbf{\overline{w}}_{2i(j+1)}\ ;\ \mathbf{G}_{1j}\otimes\left[\mathbf{\overline{w}}_{2i1};\ldots;\mathbf{\overline{w}}_{2ij}\right]\right],

(8)

where $\mathbf{G}_{1j}$ is the same binary matrix that appears in (7). The top bits are encoded according to the information bits in each cluster, i.e.,

\displaystyle\mathbf{\overline{\overline{v}}}_{2i(j+1)}=\left[\mathbf{\overline{\overline{w}}}_{2i(j+1)}\ ;\ \mathbf{G}_{2j}\otimes\left[\mathbf{\overline{\overline{w}}}_{2i1};\ldots;\mathbf{\overline{\overline{w}}}_{2ij}\right]\right],

(9)

where $\mathbf{G}_{2j}$ is, again, a random parity generating matrix.

III-C Superposition Coding & CS Encoding

After tree encoding is complete, each encoded block has size $T$ . These blocks are then encoded using two set of inner CS codes: (i) one for the segments of the low-energy cluster and the bottom segments of the high-energy cluster; and (ii) one for the top segments of the high-energy users. To apply the inner encoding, we convert the binary string $\mathbf{v}_{1ij}$ into the one-norm binary vector $\mathbf{m}_{1i}\in\{0,1\}^{2^{T}}$ in which a single one is placed at the location corresponding to the integer value of $\mathbf{v}_{1ij}$ . This is the emblematic index representation used in CCS. Blocks $\mathbf{\overline{v}}_{2ij}$ and $\mathbf{\overline{\overline{v}}}_{2ij}$ are converted to their index representations in a similar manner.

The two CS code differ as follows: the former has entries from $\{+\sqrt{P_{1}},-\sqrt{P_{1}}\}$ , while the latter features entries from $\{+\sqrt{P_{2}-P_{1}},-\sqrt{P_{2}-P_{1}}\}$ . This difference in support results in a superposition constellation; the top bits are effectively transmitted at a higher power than the bottom bits, based on our assumption $P_{2}>2P_{1}$ . Accordingly the CSS signal corresponding to section $\mathbf{v}_{1ij}$ is

\displaystyle\mathbf{x}_{1ij}=\mathbf{A}_{1}\mathbf{m}_{1ij},

(10)

where $\mathbf{A}_{1}\in\{+\sqrt{P_{1}},-\sqrt{P_{1}}\}^{Q\times 2^{T}}$ is a matrix formed by picking $Q=\lfloor NR/J\rfloor$ rows uniformly at random (excluding the row of all ones) from a Hadamard matrix of dimension $2^{T}\times 2^{T}$ and re-scaling them to meet the power constraint. Similarly, after tree encoding, the bottom bits are CS encoded,

\displaystyle\mathbf{\overline{x}}_{2ij}=\mathbf{A}_{1}\mathbf{\overline{m}}_{2ij}

(11)

using the same sensing matrix. The top bits are processed using a different signal dictionary,

\displaystyle\mathbf{\overline{\overline{x}}}_{2ij}=\mathbf{A}_{2}\mathbf{\overline{\overline{m}}}_{2ij}.

(12)

The rows of matrix $\mathbf{A}_{2}\in\{+\sqrt{P_{2}-P_{1}},-\sqrt{P_{2}-P_{1}}\}^{Q\times 2^{T}}$ are also scaled versions of randomly selected rows from a Hadamard matrix (excluding the row of all ones).

Finally, the channel inputs are obtained by concatenating the partial signals. For the low-energy users, we get

\displaystyle\mathbf{x}_{1i}

\displaystyle=\left[\mathbf{x}_{1i1}\ \mathbf{x}_{1i2}\ \ldots\ \mathbf{x}_{1iJ}\right],

(13)

and, for the high energy users, we have


	$\displaystyle\mathbf{\overline{x}}_{2i}$	$\displaystyle=\left[\mathbf{\overline{x}}_{2i1}\ \mathbf{\overline{x}}_{2i2}\ \ldots\ \mathbf{\overline{x}}_{2iJ}\right]$	(14a)
	$\displaystyle\mathbf{\overline{\overline{x}}}_{2i}$	$\displaystyle=\left[\mathbf{\overline{\overline{x}}}_{2i1}\ \mathbf{\overline{\overline{x}}}_{2i2}\ \ldots\ \mathbf{\overline{\overline{x}}}_{2iJ}\right]$	(14b)
	$\displaystyle\mathbf{x}_{2i}\$	$\displaystyle=\mathbf{\overline{x}}_{2i}+\mathbf{\overline{\overline{x}}}_{2i},$	(14c)
O

ver the ensemble of random coding parameters, we obtain expected transmit power

\begin{split}\left\|\mathbf{x}_{2i}\right\|^{2}&=\left\|\mathbf{\overline{x}}_{2i}\right\|^{2}+\left\|\mathbf{\overline{\overline{x}}}_{2i}\right\|^{2}\\ &=NP_{1}+N(P_{2}-P_{1})=NP_{2},\end{split}

as mandated by the constraint associated with (1). The parameters of the scheme are summarized in Table. I.

III-D Channel Transmission/Reception

As the transmitted channel inputs are composed of coded segments of the same length, we observe that the CCS construction naturally maps to the CS interpretation in (6). To see this, let $\mathbf{m}_{1j}$ be the sum of coded fragments $\mathbf{m}_{1ij}$ for $i\in\mathcal{A}_{1}$ , then let $\mathbf{m}_{1}$ be the concatenation of the fragments $\mathbf{m}_{1j}$ as in (13). Define $\mathbf{\overline{m}}_{2j}$ , $\mathbf{\overline{m}}_{2}$ and $\mathbf{\overline{\overline{m}}}_{2j}$ , $\mathbf{\overline{\overline{m}}}_{2}$ in a similar manner. Let $\boldsymbol{\Phi}_{1}$ be the tensor product between $\mathbf{I}_{J}$ and $\mathbf{A}_{1}$ ; likewise, let $\boldsymbol{\Phi}_{2}$ be that between $\mathbf{I}_{J}$ and $\mathbf{A}_{2}$ . Then, we can express the received signal as

\displaystyle\mathbf{y}=\boldsymbol{\Phi}_{1}\left(\mathbf{m}_{1}+\mathbf{\overline{m}}_{2}\right)+\boldsymbol{\Phi}_{2}\mathbf{\overline{\overline{m}}}_{2}+\mathbf{z}.

(15)

The signal aggregate obtained by adding $\mathbf{m}_{1}$ and $\mathbf{\overline{m}}_{2}$ has a sparsity of $J(M_{1}+M_{2})$ , whereas $\mathbf{\overline{\overline{m}}}_{2}$ is $JM_{2}$ sparse.

III-E Two-Phase Decoding

Upon getting observation $\mathbf{y}$ , the receiver separates it into sections $\{\mathbf{y}_{j}\}_{j\in[J]}$ , each block corresponding to the summation of the sections $\mathbf{A}_{1}\mathbf{m}_{1j}$ , $\mathbf{A}_{1}\mathbf{\overline{m}}_{2j}$ , and $\mathbf{A}_{2}\mathbf{\overline{\overline{m}}}_{2j}$ , as in (15). The receiver begins by decoding $\mathbf{\overline{\overline{m}}}_{2j}$ using the CSS algorithm with coding matrix $\mathbf{A}_{2}$ . During this phase of the decoding process, it treats the remaining terms in $\mathbf{y}_{j}$ as additional noise. Parameters are selected to make sure that this portion of the decoding process is successful with high probability. Furthermore, for ease of exposition, we assume that $\mathbf{\overline{\overline{m}}}_{2j}$ is correctly recovered, although in reality, an error at this stage will produce some interference at the subsequent stage.

Once $\mathbf{\overline{\overline{m}}}_{2j}$ is recovered, the decoder computes a residual, or effective observation, for every section,

\tilde{\mathbf{y}}_{j}=\mathbf{y}_{j}-\mathbf{A}_{2}\mathbf{\overline{\overline{m}}}_{2j}.

In the spirit of successive interference cancellation, these sections, $\left\{\tilde{\mathbf{y}}_{j}\right\}_{j\in[J]}$ are then passed to the CSS algorithm with coding matrix $\mathbf{A}_{1}$ and sparsity level $M_{1}+M_{2}$ .

The recovery algorithm we adopt for individual sections is an AMP-based CS solver [maleki2010optimally]. Such composite algorithms iterate through two equations:

	$\displaystyle\begin{split}\mathbf{z}^{(t)}&=\mathbf{y}-\mathbf{A}\mathbf{m}^{(t)}\\ &\quad+\frac{\mathbf{z}^{(t-1)}}{Q}\operatorname{div}\boldsymbol{\eta}_{t-1}\left(\mathbf{A}^{\mathrm{T}}\mathbf{z}^{(t-1)}+\mathbf{m}^{(t-1)}\right)\end{split}$			(16)
	$\displaystyle\mathbf{m}^{(t+1)}$	$\displaystyle=\boldsymbol{\eta}_{t}\left(\mathbf{A}^{\mathrm{T}}\mathbf{z}^{(t)}+\mathbf{m}^{(t)}\right),$		(17)

with initial conditions $\mathbf{m}^{(0)}={\bf 0}$ and $\mathbf{z}^{(0)}=\mathbf{y}$ . The function $\boldsymbol{\eta}_{t}(\cdot)$ in (17) is the denoiser, which can take the form of a posterior mean estimate [Giuseppe] or a standard soft thresholding operator [daubechies2004iterative, beck2009fast]. Equation (16) can be interpreted as the computation of a residual enhanced enhanced with an Onsager correction [bayati2011dynamics, donoho2013information].

After converting the terms $\mathbf{m}$ into $\mathbf{v}$ , the tree decoder uses the random parity bits to stitch together the information bits from each of the users, thus reconstructing the transmitted terms $\mathbf{w}_{1}$ and $\mathbf{w}_{2}$ .¹¹1In actuality, one also needs some random parity bits to stitch together the top and the bottom bits from the high-energy users. For simplicity this coding step is not presented here, as it is analogous to the step in Sec. III-B.

The encoding and decoding process is also conceptually represented in Fig. 1. Successive steps in the scheme are represented in vertical sections, proceeding from left to right. Separated horizontal section represent the processing of three sets of bits: the bits from cluster 1, the bottom bits from cluster 2, and the top bits from cluster 2. In the figure “be2i” indicates the conversion from binary string of length $l$ to the index in the $2^{l}$ binary vector. Also, “concat” indicates the concatenation of the segment as in (13).

Quantity		Quantity
block-length	$N$	# fragments	$J$
# clusters	$K$	len. low-energy fragments	$B_{1j}$
# active users	$M_{k}$	len. high-energy top fragments	$\overline{B}_{2j}$
power constraint	$P_{k}$	len. high-energy top fragments	${\overline{\overline{B}}}_{2j}$
transmission rate	$R_{k}$	len. tree-coded fragments	$T$
# transmitted bits	$B_{k}$	len. CSS+tree-coded fragments	$Q$

TABLE I: Summary of the quantities in Sec. II and Sec. III

IV Numerical Evaluations

We now turn to the numerical simulation of the scheme introduced in Sec. III. Generally speaking, we are interested in arguing that the scheme in Sec. III allows high-energy users to transmit at high rates while preserving the baseline performance in which all nodes, i.e. $M_{1}+M_{2}$ , transmit at the low-energy level $P_{1}$ as in [amalladinne2019coded]. Indeed, this is the design reasoning behind the coding choice of treating the bottom bits of the high energy cluster as the bits in the low energy cluster. Accordingly, in the simulations we fix scheme parameters in Table II and study the rate performance as a function of $P_{1}$ while $P_{2}/P_{1}=\alpha$ is kept fixed.

Parameter	Value	Parameter	Value
block-length	$N=200$	# of fragments	$J=11$
# low-energy users	$M_{1}=10$	section length	$L=5$
# high-energy users	$M_{2}=5$	# of AMP iterations	$10$
low-energy payload	$B_{1}=100$	target $P_{\rm UE}$	5%

TABLE II: Summary of the simulation parameters in Sec. IV.

IV-A Baseline Performance

In this section, we elaborate on the baseline settings for our simulation campaign. The performance associated with the original CSS code in each fragment is plotted in Fig. 2 for the settings in Table II.

Figure 2: The

P_{UE}

for a fragment in the baseline URA in Sec. IV-A as a function of

P_{1}

for various rates

R_{1}\in[0.03,0,075]

Let us consider the performance of the classic (coordinated) RA scheme in which each active user transmits using time-sharing for a portion $1/(M_{1}+M_{2})$ of the time. In this case, the largest attainable rate (ignoring the block-length effects) is $R_{1}=0.067$ : note the CCS code here attains reliable decoding up to $R_{l}=0.07$ but shows a $P_{\rm UP}$ saturation at $R=0.08$ .

To attain the desired block-length, the various fragment is stitched together using a tree code expressed as $\mathbf{B}_{1}=[0,5,5,5,5,5,5,5,5,5,9]$ where the $j$ element of $\mathbf{B}_{1}$ is $B_{1j}$ in (7). With this choice choice of tree-coding parameters, we achieve an error rate below $1\%$ at the tree decoder.

IV-B Comparison with TDMA

Next, we wish to compare the performance of the scheme in Sec. III with the simpler scheme relying time-division multiple access (TDMA) as follows. Given a block-length $N$ , transmission takes place in a low-rate phase of duration $\lambda N$ and a high-rate phase of duration $(1-\lambda)N$ . In the low-rate phase, all users send at the baseline rate in Sec. IV-A at power $P_{1}^{\prime}=P_{1}/\lambda$ while, in the high-phase rate, the high-rate users transmit at rate the baseline rate for $M_{2}$ users at power $P_{2}^{\prime}$ obtained as

\displaystyle P_{2}^{\prime}(1-\lambda)+\lambda P_{1}=P_{2}.

(18)

Let the rate attained with this scheme in the two clusters as $(R_{1}^{\prime},R_{2}^{\prime})$ . We can compare the performance of this TDMA approach with the approach in Sec. III by letting the rate in the low-energy user be $R_{1}^{\prime}$ and see what rate for the high-energy user is attainable. We can then interpret ${\overline{\overline{R}}}_{2}$ and $R_{2}^{\prime}$ as the rates that the two approaches afford the high-energy users for a given degradation of the baseline performance.

Figure 3: Simulation results for the setting in Sec. IV-B for

\alpha=P_{2}/P_{1}=6

IV-C IAN improvement

Thorough numerical experiments, we have noted that the decoding procedure in Sec. III-E actually greatly outperforms the performance predicted by SIC assuming that the effecting noise $𝚽_{1} (𝐦_{1} + \bar{𝐦}_{2} 𝐳 i s n o r m a l d i s t r i b u t e d, i . e . I A N . O n e i n h e r e n t f e a t u r e o f t h e C C S c o d e s u s e d i n S e c . III i s t h a t t h e r a n d o m l y g e n e r a t e d c o d e w o r d s i n 𝐀_{1} / 𝐀_{2} a r e u n i f o r m l y d i s t r i b u t e d i n t h e c o d e s p a c e . T h i s m e a n s t h a t, w h e n R_{1} a n d R_{2} a r e s u f f i c i e n t l y l o w, t h e c o d e w o r d s i n 𝐀_{1} a n d 𝐀_{2} a r e p e r p e n d i c u l a r w i t h h i g h p r o b a b i l i t y . W e i n d e e d o b s e r v e t h a t t h e d e c o d i n g p r o c e d u r e i n S e c . III i s i n h e r e n t l y a b l e t o e x p l o i t t h e c o d e w o r d p e r p e n d i c u l a r i t y i n t h e p r o j e c t i o n s t e p . W e a r e c u r r e n t l y u n a b l e t o p r e c i s e l y c h a r a c t e r i z e t h i s e f f e c t; n o n e t h e l e s s, w e c a n n u m e r i c a l l y i n v e s t i g a t e t h i s p h e n o m e n a a s i n F i g . LABEL:fig:SIC_CDMA . H e r e w e p l o t t h e l a r g e s t \bar{\bar{R}}_{2} = R_{2} - R_{1} a t t a i n a b l e f o r d i f f e r e n t n u m b e r o f u s e r s, t o g e t h e r w i t h t h e I A N p r e d i c t i o n . Figure 44Figure 418202224262830323410-1.410-1.210-1P1 [dB]RR2-R1 for M1=15R2-R1 for M1=10IAN prediction for R2-R14Decodingimprovementwithrespecttotheinterferenceasnoise(IAN)predictiondescribedinSec$

An Exploration of the Heterogeneous Unsourced MAC