Deep-Learning-Aided Successive-Cancellation Decoding of Polar Codes

Seyyed Ali Hashemi1, Nghia Doan2, Thibaud Tonnellier2, Warren J. Gross2 1Department of Electrical Engineering, Stanford University, USA 2Department of Electrical and Computer Engineering, McGill University, Canada ahashemi@stanford.edu, nghia.doan@mail.mcgill.ca, thibaud.tonnellier@mcgill.ca, warren.gross@mcgill.ca

Abstract

A deep-learning-aided successive-cancellation list (DL-SCL) decoding algorithm for polar codes is introduced with deep-learning-aided successive-cancellation (DL-SC) decoding being a specific case of it. The DL-SCL decoder works by allowing additional rounds of SCL decoding when the first SCL decoding attempt fails, using a novel bit-flipping metric. The proposed bit-flipping metric exploits the inherent relations between the information bits in polar codes that are represented by a correlation matrix. The correlation matrix is then optimized using emerging deep-learning techniques. Performance results on a polar code of length $128$ with $64$ information bits concatenated with a $24$ -bit cyclic redundancy check show that the proposed bit-flipping metric in the proposed DL-SCL decoder requires up to $66\%$ fewer multiplications and up to $36\%$ fewer additions, without any need to perform transcendental functions, and by providing almost the same error-correction performance in comparison with the state of the art.

Index Terms:

5G, polar codes, deep learning, SC, SCL, SC-Flip, SCL-Flip.

I Introduction

Polar codes represent a class of error-correcting codes that are proven to achieve channel capacity for any binary symmetric channel under the low-complexity successive-cancellation (SC) decoding [1]. Recently, polar codes are selected for use in the enhanced mobile broadband (eMBB) control channel of the fifth generation of cellular technology (5G standard), where codes with short block length are used [2]. The error-correction performance of short polar codes under SC decoding does not satisfy the requirements of the 5G standard. SC list (SCL) decoding was introduced in [3] to improve the error-correction performance of SC decoding by keeping a list of candidate message words at each decoding step. In addition, it was observed that under SCL decoding, the error-correction performance is significantly improved when the polar code is concatenated with a cyclic redundancy check (CRC) code [3]. However, the decoding complexity of SCL grows as the list size increases.

Unlike SCL decoding, SC flip (SCF) decoding [4] performs multiple SC decoding attempts in series where in each attempt, the first-order erroneous information bit in the initial SC decoding attempt is flipped. Similar to SCL decoding, SCF decoding uses a CRC code to determine whether a decoding attempt is successful or not and a bit-flipping metric is used to identify the erroneous information bit. Several methods have been proposed to improve the error-correction performance of SCF [5, 6, 7]. However, the bit-flipping metric of a given information bit is oversimplified where only the log-likelihood ratio (LLR) corresponding to that bit is considered. To overcome this problem, dynamic SCF (DSCF) decoding [8] defines a more accurate bit-flipping metric, which utilizes the LLR values of all the previously decoded information bits. It was shown in [8] that at practical signal-to-noise ratio (SNR) values, DSCF decoding can achieve an error-correction performance comparable to SCL decoding, while maintaining an average decoding complexity close to that of SC decoding. But the bit-flipping metric in DSCF decoding requires costly exponential and logarithmic computations, which hinders the algorithm to be efficiently implemented in hardware.

In this paper, the likelihood of the correct decoding of each information bit under SC or SCL decoding is estimated by exploiting the inherent correlations among all the information bits. These correlations are expressed in the form of a trainable correlation matrix. Consequently, a bit-flipping metric based on the proposed correlation matrix is introduced. It only requires the computation of multiplication and addition operations in the LLR domain, preventing completely the needs to use costly transcendental functions as required by DSCF decoding. Motivated by recent developments that exploit deep learning (DL) to decode polar codes [9, 10, 11, 12, 13], DL techniques are applied to optimize the correlation matrix. Thus the proposed decoding algorithm is called deep-learning-aided SCL (DL-SCL) decoding with DL-SC decoding being its specific case when the list size is one. Performance results on a polar code of length $128$ with $64$ information bits concatenated with a $24$ -bit CRC show that the proposed bit-flipping metric in the proposed DL-SCL decoder requires up to $66\%$ fewer multiplications and up to $36\%$ fewer additions in comparison with the decoder that uses the bit-flipping metric in [8]. Moreover, the proposed decoder with the proposed bit-flipping metric does not require to perform any transcendental functions and can provide almost the same error-correction performance in comparison with the decoder that uses the bit-flipping metric in [8].

II Preliminaries

II-A Polar Codes, SC Decoding, and SCL Decoding

A polar code $\mathcal{P}(N,K)$ of block length $N$ with $K$ information bits is derived as $\bm{x}=\bm{u}\bm{G}^{\otimes n}$ , where $\bm{x}=\{x_{0},x_{1},\ldots,x_{N-1}\}$ is the polar codeword, $\bm{u}=\{u_{0},u_{1},\ldots,u_{N-1}\}$ is the message word, $\bm{G}^{\otimes n}$ is the $n$ -th Kronecker power of the polarizing matrix $\bm{G}=\bigl{[}\begin{smallmatrix}1&0\\ 1&1\end{smallmatrix}\bigr{]}$ , and $n=\log_{2}N$ . The vector $\bm{u}$ consists of a set $\mathcal{A}$ of the indices of $K$ information bits and a set $\mathcal{A}^{c}$ of the indices of $N-K$ frozen bits. The positions of frozen bits are known to both the encoder and the decoder, and their values are set to $0$ . In this paper, binary phase-shift keying (BPSK) modulation technique is considered. Therefore, the received signals of the transmitted codeword are represented as $\bm{y}=(\mathbf{1}-2\bm{x})+\bm{z}$ , where $\mathbf{1}$ is an all-one vector of size $N$ , and $\bm{z}\in\mathbb{R}^{N}$ is the additive white Gaussian noise (AWGN) vector with variance $\sigma^{2}$ and zero mean. The LLR vector of the received signal is then given as $\bm{L}_{n}=\frac{2\bm{y}}{\sigma^{2}}$ .

SC decoding can be illustrated on a polar code factor graph representation. Fig. 1(a) shows an example of a factor graph for $\mathcal{P}(8,5)$ . To obtain the estimated message word, the LLR values and the hard bit estimations are propagated through all the processing elements (PEs) in the factor graph that are depicted in Fig. 1(b). A PE performs LLR computations as

	$\displaystyle L_{s,i}\!=\!\min(\|L_{s+1,i}\|,\|L_{s+1,i+2^{s}}\|)\operatorname{sgn}(L_{s+1,i})\operatorname{sgn}(L_{s+1,i+2^{s}}),$
	$\displaystyle L_{s,i+2^{s}}\!=\!(1-2\hat{v}_{s,i})L_{s+1,i}+L_{s+1,i+2^{s}},$		(1)

where $L_{s,i}$ and $\hat{v}_{s,i}$ are the LLR value and the hard bit estimation at the $s$ -th stage, $0\leq s\leq n$ , and the $i$ -th bit, $0\leq i\leq N-1$ , respectively. The hard bit values of the PE are computed as

		$\displaystyle\hat{v}_{s+1,i}=\hat{v}_{s,i}\oplus\hat{v}_{s,i+2^{s}},$		(2)
		$\displaystyle\hat{v}_{s+1,i+2^{s}}=\hat{v}_{s,i+2^{s}},$		(2)

where $\oplus$ denotes the logical XOR operation.

The LLR values at the $n$ -th stage are initialized to $\bm{L}_{n}$ . In SC decoding, the hard bit estimations at the $0$ -th stage are calculated as

\hat{u}_{i}=\hat{v}_{0,i}=\begin{cases}0&\text{if }i\in\mathcal{A}^{c},\\ \frac{1-\operatorname*{sgn}(L_{0,i})}{2}&\text{otherwise.}\end{cases}

(3)

In SCL decoding, at the $0$ -th stage, each information bit is estimated as either $0$ or $1$ and at each decoding step, only $M$ most likely candidate paths are allowed to survive. After the last bit is estimated in SCL decoding, the path with the highest reliability metric is selected as the decoding result. If a CRC of length $c$ is used to help SCL decoding, after the last bit is estimated, the path that passes the CRC verification is selected as the decoding result.

II-B SCF and DSCF Decoding

SCF decoding is used to decode a polar code that is concatenated with a CRC of length $c$ for verification. It starts by performing SC decoding and if the CRC verification fails after the initial SC decoding, it flips the bit estimation of an information bit which has the smallest absolute LLR value [4]. However, this simple bit-flipping metric prevents SCF decoding to obtain a satisfactory error-correction performance [8].

To determine the bit-flipping position, DSCF decoding estimates the probability $P^{*}_{i_{\omega}}$ of the $i_{\omega}$ -th bit ( $i_{\omega}\in\mathcal{A}$ ) being the first-order error bit after the initial SC decoding attempt as

P^{*}_{i_{\omega}}=(1-p^{*}_{i_{\omega}})\times\prod_{\begin{subarray}{c}{\forall i\in\mathcal{A}\setminus{i_{\omega}}}\\ i<i_{\omega}\end{subarray}}p^{*}_{i},

(4)

where $p^{*}_{i}$ is defined as

p^{*}_{i}=\text{Pr}(\hat{u}_{i}=u_{i}|\bm{y},\bm{\hat{u}}_{0}^{i-1}=\bm{u}_{0}^{i-1}),

(5)

with $\bm{\hat{u}}_{0}^{i-1}=\{\hat{u}_{0},\hat{u}_{1},\dots,\hat{u}_{i-1}\}$ , $\bm{u}_{0}^{i-1}=\{u_{0},u_{1},\dots,u_{i-1}\}$ . Therefore, the bit-flipping position $i^{*}_{\omega}$ that maximizes the probability of $\bm{\hat{u}}$ being correctly decoded after the second SC decoding attempt can be calculated as

i^{*}_{\omega}=\operatorname*{arg\,max}_{\forall i_{\omega}\in\mathcal{A}}P^{*}_{i_{\omega}}.

(6)

Note that $p^{*}_{i}$ cannot be obtained during the course of decoding since the message word $\bm{u}$ is unknown to the decoder [8]. Therefore, DSCF approximates $p^{*}_{i}$ as

\begin{split}p^{*}_{i}&\approx\max\left(\text{Pr}(\hat{u}_{i}=0|\bm{y},\bm{\hat{u}}_{0}^{i-1}),\text{Pr}(\hat{u}_{i}=1|\bm{y},\bm{\hat{u}}_{0}^{i-1})\right)\\ &=\frac{1}{1+\exp\left(-|L_{0,i}|\right)}.\\ \end{split}

(7)

It was observed in [8] that the approximation in (7) does not result in a desirable error-correction performance. Therefore, a perturbation parameter $\alpha\in\mathbb{R}^{+}$ is introduced to obtain a better estimation of $p^{*}_{i}$ as

p^{*}_{i}\approx\frac{1}{1+\exp\left(-\alpha|L_{0,i}|\right)}\text{.}

(8)

To enable numerically stable computations for a hardware implementation, the bit-flipping metric is defined as [8]

\begin{split}Q_{\text{DSCF}}(i_{\omega})&=-\frac{1}{\alpha}\ln(P^{*}_{i_{\omega}})\\ &=|L_{0,i_{\omega}}|+\sum_{\begin{subarray}{c}{\forall i\in\mathcal{A}}\\ i\leq i_{\omega}\end{subarray}}\frac{1}{\alpha}\ln{\left(1+\exp\left(-\alpha|L_{0,i}|\right)\right)}.\end{split}

(9)

Consequently, the most probable bit-flipping position $i^{*}_{\omega}$ under DSCF decoding can be found as

i^{*}_{\omega}=\operatorname*{arg\,min}_{\forall i_{\omega}\in\mathcal{A}}Q_{\text{DSCF}}(i_{\omega}).\vskip 1.0pt

(10)

In this paper, all the presented decoders only target the first-order error bit. However, the bit-flipping selection schemes presented in this paper can be directly extended to cover the cases of high-order error bits [8, 13].

III Deep-Learning-Aided Successive-Cancellation Decoding

In this section, a general bit-flipping algorithm for SCL decoding of polar codes is proposed, with the special case of the bit-flipping algorithm for SC decoding when the list size is $1$ . Moreover, a new bit-flipping metric is derived that directly utilizes the correlations of the information bits in terms of the likelihood that an information bit is correctly decoded. A training framework is then introduced as the optimization scheme to design the decoder’s parameters followed by the evaluation of the proposed scheme.

III-A A Bit-Flipping Algorithm for SCL decoding

Consider a failure in the SCL decoding with list size $M$ as the SCL decoding attempt in which all the $M$ decoding paths fail the CRC verification. Let $\bm{\hat{u}}[m]$ , $0\leq m<M$ , be the $m$ -th candidate path after the first SCL decoding, $\bm{\hat{u}}[0]$ be the best path after the first SCL decoding attempt, i.e. the path with the smallest path metric [14], and let $i^{*}_{\omega}$ be the estimated first erroneous bit of $\bm{\hat{u}}[0]$ . In the proposed scheme, a secondary SCL decoding attempt is performed by keeping only $\bm{\hat{u}}[0]$ and fixing all the information bits before the $i^{*}_{\omega}$ -th bit. This is because all the estimated information bits before $i^{*}_{\omega}$ -th are believed to be correct, and the $i^{*}_{\omega}$ -th bit is flipped to correct the first error bit of $\bm{\hat{u}}[0]$ .

The information bits for the second SCL decoding attempt up to the $i^{*}_{\omega}$ -th bit are fixed as

\hat{u}[m]_{i}=\begin{cases}\hat{u}[0]_{i}&\text{ if }i\in\mathcal{A},i<i^{*}_{\omega},\\ 1-\hat{u}[0]_{i}&\text{ if }i\in\mathcal{A},i=i^{*}_{\omega},\end{cases}

(11)

for $0\leq m<M$ . After the $i^{*}_{\omega}$ -th information bit, the conventional SCL decoding procedure is performed by estimating each information bit $i>i^{*}_{\omega},i\in\mathcal{A}$ as both $0$ and $1$ and by keeping the best $M$ paths at each decoding step. The path metrics of all the decoding paths are then given as [14]

\operatorname{PM}[m]_{i}=\operatorname{PM}[m]_{i-1}+\Delta,

(12)

where $0\leq i<N$ , $\operatorname{PM}[m]_{-1}=0$ , and $\Delta\geq 0$ is the path metric penalty at the $i$ -th bit that is calculated as

\Delta=\begin{cases}\frac{|L[m]_{0,i}|(1-\operatorname*{sgn}(L[m]_{0,i}))}{2}&\text{ if }i\in\mathcal{A}^{c},\\ \frac{|L[m]_{0,i}|(1-(1-2\hat{u}[m]_{i})\operatorname*{sgn}(L[m]_{0,i}))}{2}&\text{ otherwise,}\end{cases}

(13)

where $L[m]_{0,i}$ is the LLR value of the $i$ -th bit at the $m$ -th path.

Note that the bit-flipping metric of DSCF can be used to estimate $i^{*}_{\omega}$ . However, this approach requires costly logarithmic and exponential functions, hence they are not attractive for an efficient hardware implementation. In the next subsection, a novel bit-flipping metric is proposed that only requires multiplication and addition operations.

III-B The Proposed Bit-Flipping Metric

Unlike the DSCF decoder which relies on the estimation of the probability $p^{*}_{i},\forall i\in\mathcal{A}$ , for the bit-flipping metric computation, a method is proposed to directly estimate the following likelihood ratio

l^{*}_{i_{\omega}}=\max\left\{\frac{\text{Pr}(\hat{u}[0]_{i_{\omega}}=0|\bm{y},\bm{u})}{\text{Pr}(\hat{u}[0]_{i_{\omega}}=1|\bm{y},\bm{u})},\frac{\text{Pr}(\hat{u}[0]_{i_{\omega}}=1|\bm{y},\bm{u})}{\text{Pr}(\hat{u}[0]_{i_{\omega}}=0|\bm{y},\bm{u})}\right\}.

(14)

The value $l^{*}_{i_{\omega}}$ indicates how likely the estimated message bit $\hat{u}[0]_{i_{\omega}}$ is correctly decoded given the received signal $\bm{y}$ and the message word $\bm{u}$ . The bit index $i^{*}_{\omega}$ which is most likely to be the first-order erroneous bit is then obtained as

i^{*}_{\omega}=\operatorname*{arg\,min}_{\forall i_{\omega}\in\mathcal{A}}l^{*}_{i_{\omega}}.

(15)

Similar to $p^{*}_{i}$ , the value of $l^{*}_{i_{\omega}}$ is not available during the decoding process as $\bm{u}$ remains unknown to the decoder. Therefore, the following hypothesis is proposed for the estimation of $l^{*}_{i_{\omega}}$ :

l^{*}_{i_{\omega}}\approx\prod_{\forall i\in\mathcal{A}}l_{i}^{\beta_{i_{\omega},i}},

(16)

where

\begin{split}l_{i}&=\max\left\{\frac{\text{Pr}(\hat{u}_{i}=0|\bm{y},\bm{\hat{u}}^{i-1}_{0})}{\text{Pr}(\hat{u}_{i}=1|\bm{y},\bm{\hat{u}}^{i-1}_{0})},\frac{\text{Pr}(\hat{u}_{i}=1|\bm{y},\bm{\hat{u}}^{i-1}_{0})}{\text{Pr}(\hat{u}_{i}=0|\bm{y},\bm{\hat{u}}^{i-1}_{0})}\right\}\\ &=\exp{|L_{0,i}|},\end{split}

(17)

and $\beta_{i_{\omega},i}\in\mathbb{R}$ are perturbation parameters such that ${\beta_{i_{\omega},i}=\beta_{i,i_{\omega}}}$ and $\beta_{i_{\omega},i_{\omega}}=1$ , for $i\in\mathcal{A}$ and $i_{\omega}\in\mathcal{A}$ .

To enable numerically stable computations, the bit-flipping metric of the proposed decoder can be obtained by transforming the likelihood ratio $l^{*}_{i_{\omega}}$ to the LLR domain as

\begin{split}Q_{\text{DL-SCL}}(i_{\omega})&=\ln(l^{*}_{i_{\omega}})\\ &\approx\ln\left(\prod_{\forall i\in\mathcal{A}}\exp{\left(\beta_{i_{\omega},i}|L_{0,i}|\right)}\right)\\ &=\sum_{\forall i\in\mathcal{A}}\beta_{i_{\omega},i}|L_{0,i}|.\end{split}

(18)

The most probable bit-flipping index $i^{*}_{\omega}$ can then be selected as

i^{*}_{\omega}=\operatorname*{arg\,min}_{\forall i_{\omega}\in\mathcal{A}}Q_{\text{DL-SCL}}(i_{\omega}).

(19)

Note that the bit-flipping metric computation in (18) can be represented in the matrix form as

\bm{Q}_{\text{DL-SCL}}=|\bm{L}_{0}|\cdot\bm{\beta},

(20)

where $\bm{Q}_{\text{DL-SCL}}$ and $\bm{L}_{0}$ are row vectors of size ${1\times(K+c)}$ , and $\bm{\beta}$ is a matrix of size $(K+c)\times(K+c)$ . Equivalently, $i^{*}_{\omega}$ is the index of the element in $\bm{Q}_{\text{DL-SCL}}$ that has the smallest value.

With $\beta_{i_{\omega},i_{\omega}}=1$ and $\beta_{i_{\omega},i}=\beta_{i,i_{\omega}}$ , $0\leq i,i_{\omega}<K+c$ , $\bm{\beta}$ can be seen as a correlation matrix that captures the inherent relations of the absolute LLR values of all the information bits under SCL decoding. For the sake of simplicity, since only the LLR values of information bits are considered, all of the bit indices used in the rest of this paper are referred to as information bit indices, and therefore have the values in the range of $[0,K+c-1]$ .

III-C Parameter Optimization

Note that $\beta_{i_{\omega},i_{\omega}}$ is fixed to 1 and is not trainable for all $0\leq i_{\omega}<K+c$ . On the other hand, other elements of the matrix $\bm{\beta}$ are trainable with a condition that ${\beta_{i_{\omega},i}=\beta_{i,i_{\omega}}}$ , ${0\leq i,i_{\omega}<K+c}$ . In the proposed DL-SCL decoding, the number of trainable parameters of the matrix $\bm{\beta}$ is $\frac{(K+c)(K+c-1)}{2}$ , which is too large to efficiently apply heuristic methods such as Monte Carlo simulation for parameter optimization. Therefore, the optimization of $\bm{\beta}$ is considered as a learning problem and DL techniques are exploited to optimize $\bm{\beta}$ . The bit-flipping metric $\bm{Q}_{\text{DL-SCL}}$ of the proposed decoder does not depend on the values of the message word $\bm{u}$ . Thus, all-zero codewords can be used during the training phase. This symmetric property is particularly useful for DL-based decoders of linear block codes, as it simplifies the training process [13, 12, 15].

Let $\bm{\hat{T}}$ be the estimated bit-flipping vector of the information bits with $-1$ indicating a bit-flip and $+1$ indicating no bit-flip. From (19), the elements of the vector $\bm{\hat{T}}$ are defined as

\hat{T}_{i}=\begin{cases}-1&\text{if }i=i^{*}_{\omega},\\ +1&\text{if }i\neq i^{*}_{\omega},\end{cases}

(21)

for $0\leq i<K+c$ . In this paper, stochastic-gradient-descent (SGD) based techniques are used to update the values of $\bm{\beta}$ during training, thus the computation of $\bm{\hat{T}}$ is modified to enable back-propagation during training [13]. Otherwise, learning is not feasible as the derivative of (21) with respect to $\bm{Q}_{\text{DL-SCL}}[i]$ , i.e., the $i$ -th element of $\bm{Q}_{\text{DL-SCL}}$ , is always 0.

Let the soft estimation of $\hat{T}_{i}$ be

\tilde{T}_{i}=\tanh(\bm{Q}_{\text{DL-SCL}}[i]-\tau),

(22)

where $\tau=\frac{\tau_{0}+\tau_{1}}{2}$ , and $\tau_{0}$ and $\tau_{1}$ are the smallest and the second smallest values of $\bm{Q}_{\text{DL-SCL}}$ , respectively. The objective loss function is then defined as

\begin{split}\text{Loss}&=\frac{1}{K+c}\sum_{i=0}^{K+c-1}\mathcal{L}\left(\frac{1-\tilde{T}_{i}}{2},\frac{1-T_{i}}{2}\right)\\ &+\lambda\sum_{i_{\omega}=0}^{K+c-1}\sum_{i=i_{\omega}+1}^{K+c-1}(\beta_{i_{\omega},i})^{2},\end{split}

(23)

where ${\mathcal{L}(a,b)=-b\log(a)-(1-b)\log(1-a)}$ is the binary cross-entropy function, and $\lambda$ is the scaling factor of the L2 regularization [16].

In this paper, PyTorch [17] is used as the DL framework. Training is done using RMSprop optimizer [18] with a mini-batch size of $128$ and a learning rate of $10^{-4}$ . The training set consists of $2^{18}$ samples of the received channel signals, which are not correctly decoded after the first SCL decoding attempt, and the data is collected at ${E_{b}/N_{0}=5}$ dB. The L2 regularization hyperparameter $\lambda$ is set to $0.25$ . The initial values of the non-diagonal elements of $\bm{\beta}$ are drawn from an i.i.d distribution within the range of $[-0.2,0.2]$ before training takes place. The matrix $\bm{\beta}$ is trained for list sizes $M\in\{1,2,4,8\}$ ¹¹1Optimized $\bm{\beta}$ matrices are available at https://github.com/nghiadt05/DLSCL-CorMatrices.

III-D Evaluation

Figure 2: FER comparison of various decoders for

\mathcal{P}(128,64)

concatenated with a

24

-bit CRC.

In this section, the performance of the proposed DL-SCL decoder in terms of frame-error-rate (FER) and computational complexity is examined. The polar code $\mathcal{P}(128,64)$ concatenated with a $24$ -bit CRC is considered. The selected polar code and the CRC polynomial are used in the eMBB control channel of the 5G standard [2].

Fig. 2 compares the FER performance of various decoders for $\mathcal{P}(128,64)$ . In this figure, DL-SCL $M$ denotes the proposed DL-SCL decoding algorithm with list size $M=\{1,2,4,8\}$ , and the bit-flipping SCL decoder with the bit-flipping metric proposed in [8] is denoted as SCLF $M$ . In addition, the original SCL decoding in [14] is also considered for the comparison. For all the bit-flipping SCL decoders, $8$ additional decoding attempts are considered for the secondary SCL decoding. As observed from Fig. 2, the proposed bit-flipping metric in the proposed DL-SCL decoders results in almost no FER performance loss compared to that of the SCLF decoders.

Fig. 3 visualizes the values of the elements in matrix $\bm{\beta}-\bm{I}$ in the form of a heat map²²2Matrix $\bm{\beta}-\bm{I}$ is shown to exclude the diagonal elements of $\bm{\beta}$ which have a value of $1$ ., where $\bm{I}$ is the identity matrix with the same size as $\bm{\beta}$ . It can be seen that $\bm{\beta}-\bm{I}$ (and thus $\bm{\beta}$ ) is a sparse matrix with many of its elements having a value close to $0$ . This observation is exploited to reduce the computational complexity of computing the proposed bit-flipping metric, which in turn reduces the computational complexity of the proposed DL-SCL decoding algorithm.

Table I reports the computational complexity of the proposed bit-flipping metric of the DL-SCL decoder in comparison with that of the SCLF decoder in terms of the number of different operations performed. Other than the bit-flipping metric, the proposed DL-SCL decoder and the SCLF decoder are identical. In this table, all the elements of $\bm{\beta}$ which have a value in the range $[-10^{-4},10^{-4}]$ are set to $0$ , thus removing the need to perform additions or multiplications over those elements, without tangibly degrading the error-correction performance. It can be seen that the bit-flipping metric computation in the proposed DL-SCL decoders require up to $66\%$ fewer multiplications and up to $36\%$ fewer additions in comparison with that of the SCLF decoder. Moreover, unlike SCLF decoder, the proposed bit-flipping metric in the DL-SCL decoders does not require the computation of any transcendental functions.

TABLE I: Computational complexity of the bit-flipping metric for

\mathcal{P}(128,64)

in terms of the number of operations performed

Decoders	$\times$	$+$	$\ln$ / $\exp$
SCLF $M$	$7832$	$4004$	$7832$
DL-SCL $1$	$2652$	$2564$	$0$
DL-SCL $2$	$3116$	$3028$	$0$
DL-SCL $4$	$3238$	$3150$	$0$
DL-SCL $8$	$3176$	$3088$	$0$

IV Conclusion

In this paper, a deep-learning-aided successive-cancellation list (DL-SCL) decoding algorithm for polar codes is introduced. The proposed decoder improves the performance of successive-cancellation list (SCL) decoding by running additional SCL decoding attempts using a novel bit-flipping scheme. The bit-flipping metric of the proposed decoder is obtained by exploiting the inherent relations between the information bits. These relations are expressed in the form of a trainable correlation matrix, which is optimized using deep-learning (DL) techniques. Performance results on a polar code of length $128$ and rate $1/2$ show that the proposed bit-flipping metric in the proposed DL-SCL decoder requires up to $66\%$ fewer multiplications and up to $36\%$ fewer additions in comparison with the state of the art, while providing almost the same error-correction performance.

Acknowledgment

S. A. Hashemi is supported by a Postdoctoral Fellowship from the Natural Sciences and Engineering Research Council of Canada (NSERC).

References

[1] E. Arıkan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, July 2009.
[2] 3GPP, “Multiplexing and channel coding (Release 10) 3GPP TS 21.101 v10.4.0.” Oct. 2018. [Online]. Available: http://www.3gpp.org/ftp/Specs/2018-09/Rel-10/21_series/21101-a40.zip
[3] I. Tal and A. Vardy, “List decoding of polar codes,” IEEE Trans. Inf. Theory, vol. 61, no. 5, pp. 2213–2226, March 2015.
[4] O. Afisiadis, A. Balatsoukas-Stimming, and A. Burg, “A low-complexity improved successive cancellation decoder for polar codes,” in 48th Asilomar Conf. on Sig., Sys. and Comp., Nov 2014, pp. 2116–2120.
[5] C. Condo, F. Ercan, and W. J. Gross, “Improved successive cancellation flip decoding of polar codes based on error distribution,” in IEEE Wireless Commun. and Net. Conf. Workshops, April 2018, pp. 19–24.
[6] F. Ercan, C. Condo, S. A. Hashemi, and W. J. Gross, “Partitioned successive-cancellation flip decoding of polar codes,” arXiv e-prints, p. arXiv:1711.11093v4, Nov 2017. [Online]. Available: https://arxiv.org/abs/1711.11093
[7] F. Ercan, C. Condo, and W. J. Gross, “Improved bit-flipping algorithm for successive cancellation decoding of polar codes,” IEEE Trans. on Commun., vol. 67, no. 1, pp. 61–72, Jan 2019.
[8] L. Chandesris, V. Savin, and D. Declercq, “Dynamic-SCFlip decoding of polar codes,” IEEE Trans. Commun., vol. 66, no. 6, pp. 2333–2345, June 2018.
[9] S. Cammerer, T. Gruber, J. Hoydis, and S. ten Brink, “Scaling deep learning-based decoding of polar codes via partitioning,” in IEEE Global Commun. Conf., December 2017, pp. 1–6.
[10] W. Xu, Z. Wu, Y.-L. Ueng, X. You, and C. Zhang, “Improved polar decoder based on deep learning,” in IEEE Int. Workshop on Signal Process. Syst., November 2017, pp. 1–6.
[11] N. Doan, S. A. Hashemi, and W. J. Gross, “Neural successive cancellation decoding of polar codes,” in 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), June 2018, pp. 1–5.
[12] N. Doan, S. A. Hashemi, E. N. Mambou, T. Tonnellier, and W. J. Gross, “Neural belief propagation decoding of CRC-polar concatenated codes,” in IEEE Int. Conf. on Commun., May 2019, pp. 1–6.
[13] N. Doan, S. A. Hashemi, F. Ercan, T. Tonnellier, and W. J. Gross, “Neural dynamic successive cancellation flip decoding of polar codes,” ArXiv, vol. abs/1907.11563, 2019. [Online]. Available: https://arxiv.org/abs/1907.11563
[14] A. Balatsoukas-Stimming, M. B. Parizi, and A. Burg, “LLR-based successive cancellation list decoding of polar codes,” IEEE Trans. Signal Process., vol. 63, no. 19, pp. 5165–5179, Oct. 2015.
[15] E. Nachmani, E. Marciano, L. Lugosch, W. J. Gross, D. Burshtein, and Y. Be’ery, “Deep learning methods for improved decoding of linear codes,” IEEE J. of Sel. Topics in Signal Process., vol. 12, no. 1, pp. 119–131, February 2018.
[16] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, p. 436, May 2015.
[17] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
[18] G. Hinton, N. Srivastava, and K. Swersky, “Neural networks for machine learning lecture 6a overview of mini-batch gradient descent.” [Online]. Available: https://cs.toronto.edu/csc321/slides/lecture_slides_lec6.pdf