This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Noise Learning Based Denoising Autoencoder

Woong-Hee Lee, Mustafa Ozger, Ursula Challita, and Ki Won Sung,  This work was supported by a Korea University Grant. This research was supported by the BK21 FOUR(Fostering Outstanding Universities for Research) funded by the Ministry of Education(MOE, Korea) and National Research Foundation of Korea(NRF). This work was partly funded by the European Union Horizon 2020 Research and Innovation Programme under the EU/KR PriMO-5G project with grant agreement No 815191. (Corresponding author: Ki Won Sung.)W.-H. Lee is with the Department of Control and Instrumentation Engineering, Korea University, Republic of Korea (e-mail: woongheelee@korea.ac.kr).U. Challita is with Ericsson Research, Stockholm, Sweden (e-mail: ursula.challita@ericsson.com).M. Ozger and K. W. Sung are with the School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden (e-mail:{ozger and sungkw}@kth.se).
Abstract

This letter introduces a new denoiser that modifies the structure of denoising autoencoder (DAE), namely noise learning based DAE (nlDAE). The proposed nlDAE learns the noise of the input data. Then, the denoising is performed by subtracting the regenerated noise from the noisy input. Hence, nlDAE is more effective than DAE when the noise is simpler to regenerate than the original data. To validate the performance of nlDAE, we provide three case studies: signal restoration, symbol demodulation, and precise localization. Numerical results suggest that nlDAE requires smaller latent space dimension and smaller training dataset compared to DAE.

Index Terms:
machine learning, noise learning based denoising autoencoder, signal restoration, symbol demodulation, precise localization.

I Introduction

Machine learning (ML) has recently received much attention as a key enabler for future wireless communications [1, 2, 3]. While the major research effort has been put to deep neural networks, there are enormous number of Internet of Things (IoT) devices that are severely constrained on the computational power and memory size. Therefore, the implementation of efficient ML algorithms is an important challenge for IoT devices, as they are energy and memory limited. Denoising autoencoder (DAE) is a promising technique to improve the performance of IoT applications by denoising the observed data that consists of the original data and the noise [4]. DAE is a neural network model for the construction of the learned representations robust to an addition of noise to the input samples [vincent2008extracting, 5]. The representative feature of DAE is that the dimension of the latent space is smaller than the size of the input vector. It means that the neural network model is capable of encoding and decoding through a smaller dimension where the data can be represented.

The main contribution of this letter is to improve the efficiency and performance of DAE with a modification of its structure. Consider a noisy observation YY which consists of the original data XX and the noise NN, i.e., Y=X+NY=X+N. From the information theoretical perspective, DAE attempts to minimize the expected reconstruction error by maximizing a lower bound on mutual information I(X;Y)I(X;Y). In other words, YY should capture the information of XX as much as possible although YY is a function of the noisy input. Additionally, from the manifold learning perspective, DAE can be seen as a way to find a manifold where YY represents the data into a low dimensional latent space corresponding to XX. However, we often face the problem that the stochastic feature of XX to be restored is too complex to regenerate or represent. This is called the curse of dimensionality, i.e., the dimension of latent space for XX is still too high in many cases.

What can we do if NN is simpler to regenerate than XX? It will be more effective to learn NN and subtract it from YY instead of learning XX directly. In this light, we propose a new denoising framework, named as noise learning based DAE (nlDAE). The main advantage of nlDAE is that it can maximize the efficiency of the ML approach (e.g., the required dimension of the latent space or size of training dataset) for capability-constrained devices, e.g., IoT, where NN is typically easier to regenerate than XX owing to their stochastic characteristics. To verify the advantage of nlDAE over the conventional DAE, we provide three practical applications as case studies: signal restoration, symbol demodulation, and precise localization.

The following notations will be used throughout this letter.

  • Ber,Exp,𝒰,𝒩,𝒞𝒩\text{Ber},\text{Exp},\mathcal{U},\mathcal{N},\mathcal{CN}: the Bernoulli, exponential, uniform, normal, and complex normal distributions, respectively.

  • 𝐱,𝐧,𝐲P\mathbf{x},\mathbf{n},\mathbf{y}\in\mathbb{R}^{P}: the realization vectors of random variables X,N,YX,N,Y, respectively, whose dimensions are PP.

  • P(<P)P^{\prime}(<P): the dimension of the latent space.

  • 𝐖P×P,𝐖P×P\mathbf{W}\in\mathbb{R}^{P^{\prime}\times P},\mathbf{W}^{\prime}\in\mathbb{R}^{P\times P^{\prime}}: the weight matrices for encoding and decoding, respectively.

  • 𝐛P,𝐛P\mathbf{b}\in\mathbb{R}^{P^{\prime}},\mathbf{b}^{\prime}\in\mathbb{R}^{P}: the bias vectors for encoding and decoding, respectively.

  • 𝒮\mathcal{S}: the sigmoid function, acting as an activation function for neural networks, i.e., 𝒮(a)=11+ea\mathcal{S}(a)=\frac{1}{1+e^{-a}}, and 𝒮(𝐚)=(𝒮(𝐚[1]),,𝒮(𝐚[P]))T\mathcal{S}(\mathbf{a})=(\mathcal{S}(\mathbf{a}[1]),\cdots,\mathcal{S}(\mathbf{a}[P]))^{T} where 𝐚P\mathbf{a}\in\mathbb{R}^{P} is an arbitrary input vector.

  • fθf_{\theta}: the encoding function where the parameter θ\theta is {𝐖,𝐛}\{\mathbf{W},\mathbf{b}\}, i.e., fθ(𝐲)=𝒮(𝐖𝐲+𝐛)f_{\theta}(\mathbf{y})=\mathcal{S}(\mathbf{W}\mathbf{y}+\mathbf{b}).

  • gθg_{\theta^{{}^{\prime}}}: the decoding function where the parameter θ\theta^{{}^{\prime}} is {𝐖,𝐛}\{\mathbf{W}^{\prime},\mathbf{b}^{\prime}\}, i.e., gθ(fθ(𝐲))=𝒮(𝐖fθ(𝐲)+𝐛)g_{\theta^{{}^{\prime}}}(f_{\theta}(\mathbf{y}))=\mathcal{S}(\mathbf{W}^{\prime}f_{\theta}(\mathbf{y})+\mathbf{b}^{\prime}).

  • MM: the size of training dataset.

  • LL: the size of test dataset.

II Method of nlDAE

Refer to caption
(a) Training phase
Refer to caption
(b) Test phase
Figure 1: An illustration of the concept of nlDAE.
Refer to caption
Figure 2: A simple example of comparison between DAE and nlDAE: reconstruction error according to σN\sigma_{N}.

In the traditional estimation problem of signal processing, NN is treated as an obstacle to the reconstruction of XX. Therefore, most of the studies have focused on restoring XX as much as possible, which can be expressed as a function of XX and NN. Along with this philosophy, ML-based denoising techniques, e.g., DAE, have also been developed in various signal processing fields with the aim of maximizing the ability to restore XX from YY. Unlike the conventional approaches, we hypothesize that, if NN has a simpler statistical characteristic than XX, it will be better to subtract from YY after restoring NN.

We first look into the mechanism of DAE to build neural networks. Recall that DAE attempts to regenerate the original data 𝐱\mathbf{x} from the noisy observation 𝐲\mathbf{y} via training the neural network. Thus, the parameters of a DAE model can be optimized by minimizing the average reconstruction error in the training phase as follows:

θ,θ=argminθ,θ1Mi=1M(𝐱(i),gθ(fθ(𝐲(i)))),\theta^{*},\theta^{{}^{\prime}*}=\operatorname*{arg\,min}_{\theta,\theta^{{}^{\prime}}}\frac{1}{M}\sum_{i=1}^{M}\mathcal{L}\big{(}\mathbf{x}^{(i)},g_{\theta^{{}^{\prime}}}(f_{\theta}(\mathbf{y}^{(i)}))\big{)}, (1)

where \mathcal{L} is a loss function such as squared error between two inputs. Then, the jj-th regenerated data 𝐱~(j)\tilde{\mathbf{x}}^{(j)} from 𝐲(j){\mathbf{y}}^{(j)} in the test phase can be obtained as follows for all j{1,,L}j\in\{1,\cdots,L\}:

𝐱~(j)=gθ(fθ(𝐲(j))).\tilde{\mathbf{x}}^{(j)}=g_{\theta^{{}^{\prime}*}}(f_{\theta^{*}}(\mathbf{y}^{(j)})). (2)

It is noteworthy that, if there are two different neural networks which attempt to regenerate the original data and the noise from the noisy input, the linear summation of these two regenerated data would be different from the input. This means that either 𝐱\mathbf{x} or 𝐧\mathbf{n} is more effectively regenerated from 𝐲\mathbf{y}. Therefore, we can hypothesize that learning NN, instead of XX, from YY can be beneficial in some cases even if the objective is still to reconstruct XX. This constitutes the fundamental idea of nlDAE.

The training and test phases of nlDAE are depicted in Fig. 1. The parameters of nlDAE model can be optimized as follows for all i{1,,M}i\in\{1,\cdots,M\}:

θnl,θnl=argminθ,θ1Mi=1M(𝐧(i),gθ(fθ(𝐲(i)))).\theta_{nl}^{*},{\theta}_{nl}^{{}^{\prime}*}=\operatorname*{arg\,min}_{\theta,\theta^{{}^{\prime}}}\frac{1}{M}\sum_{i=1}^{M}\mathcal{L}\big{(}\mathbf{n}^{(i)},g_{\theta^{\prime}}(f_{\theta}(\mathbf{y}^{(i)}))\big{)}. (3)

Notice that the only difference from (1) is that 𝐱(i)\mathbf{x}^{(i)} is replaced by 𝐧(i)\mathbf{n}^{(i)}. Let 𝐱~nl(j)\tilde{\mathbf{x}}_{nl}^{(j)} denote the jj-th regenerated data based on nlDAE, which can be represented as follows for all j{1,,L}j\in\{1,\cdots,L\}:

𝐱~nl(j)=𝐲(j)gθnl(fθnl(𝐲(j))).\tilde{\mathbf{x}}_{nl}^{(j)}=\mathbf{y}^{(j)}-g_{\theta_{nl}^{{}^{\prime}*}}(f_{\theta_{nl}^{*}}(\mathbf{y}^{(j)})). (4)

To provide the readers with insights into nlDAE, we examine two simple examples where the standard deviation of XX is fixed as 1, i.e., σX=1\sigma_{X}=1, and that of NN varies. Y=X+NY=X+N is comprised as follows:

  • Example 1: X𝒰(0,23)X\sim\mathcal{U}(0,2\sqrt{3}) and N𝒩(0,σN)N\sim\mathcal{N}(0,\sigma_{N}).

  • Example 2: XExp(1)X\sim\text{Exp}(1) and N𝒩(0,σN)N\sim\mathcal{N}(0,\sigma_{N}).

Fig. 2 describes the performance comparison between DAE and nlDAE in terms of mean squared error (MSE) for the two examples111Throughout this letter, the squared error and the scaled conjugate gradient are applied as the loss function and the optimization method, respectively.. Here, we set P=12P=12, P=9P^{\prime}=9, M=10000M=10000, and L=5000L=5000. It is observed that nlDAE is superior to DAE when σN\sigma_{N} is smaller than σX\sigma_{X} in Fig. 2. The gap between nlDAE and DAE widens with lower σX\sigma_{X}. This implies that the standard deviation is an important factor when we select the denoiser between DAE and nlDAE.

These examples show the consideration of whether XX or NN is easier to be regenerated, which is highly related to differential entropy of each random variable, H(X)H(X) and H(N)H(N) [marsh2013introduction]. The differential entropy is normally an increasing function over the standard deviation of the corresponding random variable, e.g., H(N)=log(σN2πe)H(N)=\log(\sigma_{N}\sqrt{2\pi e}). Naturally, it is efficient to reconstruct a random variable with a small amount of information, and the standard deviation can be a good indicator.

III Case Studies

To validate the advantage of nlDAE over the conventional DAE in practical problems, we provide three applications for IoT devices in the following subsections. We assume that the noise follows Bernoulli and normal distributions, respectively, in the first two cases, which are the most common noise modeling. The third case deals with noise that follows a distribution expressed as a mixture of various random variables. For all the studied use cases, we select the DAE as the conventional denoiser as a baseline for performance comparison. We present the case studies in the first three subsections. Then, we discuss the experimental results in Sec. III-D.

III-A Case Study I: Signal Restoration

In this use case, the objective is to recover the original signal from the noisy signal which is modeled by the corruptions over samples.

III-A1 Model

The sampled signal of randomly superposed sinusoids, e.g., the recorded acoustic wave, is the summation of samples of kk damped sinusoidal waves which can be represented as follows:

𝐱={l=1kVleγlnΔtcos(2πflnΔt)}n=0P1,\mathbf{x}=\Big{\{}\sum_{l=1}^{k}V_{l}e^{-\gamma_{l}n\Delta t}\cos(2\pi f_{l}n\Delta t)\Big{\}}_{n=0}^{P-1}, (5)

where VlV_{l}, γl\gamma_{l}, and flf_{l} are the peak amplitude, the damping factor, and the frequency of the ll-th signal, respectively. Here, the time interval for sampling, Δt\Delta t, is set to satisfy the Nyquist theorem, i.e., 12Δt>max{f1,,fk}\frac{1}{2\Delta t}>\max\{f_{1},\cdots,f_{k}\}. To consider the corruption of 𝐱\mathbf{x}, let us assume that the probability of corruption for each sample follows the Bernoulli distribution Ber(pcor)\text{Ber}(p_{cor}), which indicates the corruption with the probability pcorp_{cor}. In addition, let 𝐛{0,1}P\mathbf{b}\in\{0,1\}^{P} denote the realization of Ber(pcor)\text{Ber}(p_{cor}) over PP samples. Naturally, the corrupted signal, 𝐲P\mathbf{y}\in\mathbb{R}^{P}, can be represented as follows:

𝐲=𝐱+C𝐛,\mathbf{y}=\mathbf{x}+C\mathbf{b}, (6)

where CC is a constant representing the sample corruption.

III-A2 Application of nlDAE

Based on (6), the denoised signal 𝐱~nl(j)\tilde{\mathbf{x}}_{nl}^{(j)} can be represented by

𝐱~nl(j)=𝐱(j)+C𝐛(j)gθnl(fθnl(𝐱(j)+C𝐛(j)),\tilde{\mathbf{x}}_{nl}^{(j)}=\mathbf{x}^{(j)}+C\mathbf{b}^{(j)}-g_{\theta_{nl}^{{}^{\prime}*}}(f_{\theta_{nl}^{*}}(\mathbf{x}^{(j)}+C\mathbf{b}^{(j)}), (7)

where

θnl,θnl=argminθ,θ1Mi=1M(C𝐛(i),gθ(fθ(𝐱(i)+C𝐛(i)))).\theta_{nl}^{*},{\theta}_{nl}^{{}^{\prime}*}=\operatorname*{arg\,min}_{\theta,\theta^{{}^{\prime}}}\frac{1}{M}\sum_{i=1}^{M}\mathcal{L}\big{(}C\mathbf{b}^{(i)},g_{\theta^{\prime}}(f_{\theta}(\mathbf{x}^{(i)}+C\mathbf{b}^{(i)}))\big{)}.

III-A3 Experimental Parameters

We evaluate the performance of the proposed nlDAE in terms of the MSE of restoration. For the experiment, the magnitude of noise CC is set to 1 for simplicity. In addition, VlV_{l}, γl\gamma_{l}, and flf_{l} follow 𝒩(0,1)\mathcal{N}(0,1), 𝒰(0,103)\mathcal{U}(0,10^{3}), and 𝒰(0,10 kHz)\mathcal{U}(0,10\text{ kHz}), respectively, for all ll. The sampling time interval Δt\Delta t is set to 0.5×1040.5\times 10^{-4} second, and the number of samples PP is 1212. We set P=9P^{\prime}=9, pcor=0.9p_{cor}=0.9, and M=10000M=10000 unless otherwise specified.

III-B Case Study II: Symbol Demodulation

Here, the objective is to improve the symbol demodulation quality through denoising the received signal that consists of channel, symbols, and additive noise.

III-B1 Model

Consider an orthogonal frequency-division multiplexing (OFDM) system with PP subcarriers where the subcarrier spacing is expressed by Δf\Delta f. Let 𝐝P\mathbf{d}\in\mathbb{C}^{P} be a sequence in frequency domain. 𝐝[n]\mathbf{d}[n] is the nn-th element of 𝐝\mathbf{d} and denotes the symbol transmitted over the nn-th subcarrier. In addition, let KK denote the pilot spacing for channel estimation. Furthermore, the channel impulse response (CIR) can be modeled by the sum of Dirac-delta functions as follows:

h(t,τ)=l=0Lp1αlδ(tτl),h(t,\tau)=\sum_{l=0}^{L_{p}-1}\alpha_{l}\delta(t-\tau_{l}), (8)

where αl\alpha_{l}, τl\tau_{l}, and LpL_{p} are the complex channel gain, the excess delay of ll-th path, and the number of multipaths, respectively. Let 𝐱P\mathbf{x}\in\mathbb{C}^{P} denote the discrete signal obtained by PP-point fast Fourier transform (FFT) after the sampling of the signal experiencing the channel at the receiver, which can be represented as follows:

𝐱=𝐝𝐡={𝐝[n]l=0Lp1αlej2πnΔfτl}n=0P1,{}\mathbf{x}=\mathbf{d}\odot\mathbf{h}=\{\mathbf{d}[n]\sum_{l=0}^{L_{p}-1}\alpha_{l}e^{-j2\pi n\Delta f\tau_{l}}\}_{n=0}^{P-1}, (9)

where \odot denotes the operator of the Hadamard product. Here, 𝐡P\mathbf{h}\in\mathbb{C}^{P} is the channel frequency response (CFR), which is the PP-point FFT of h(t,τ)h(t,\tau). In addition, let 𝐧P\mathbf{n}\in\mathbb{C}^{P} denote the realization of the random variable N𝒞𝒩(0,σN)N\sim\mathcal{CN}(0,\sigma_{N}). Finally, 𝐲(=𝐝𝐡+𝐧)\mathbf{y}(=\mathbf{d}\odot\mathbf{h}+\mathbf{n}) is the noisy observed signal.

Our goal is to minimize the symbol error rate (SER) over 𝐝\mathbf{d} by maximizing the quality of denoising 𝐲\mathbf{y}. We assume the method of channel estimation is fixed as the cubic interpolation [6] to focus on the performance of denoising the received signal.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 3: Case study I (signal restoration): MSE according to (a) the dimension of latent space; (b) the size of training dataset; (c) pcorp_{cor}; and (d) the depth of neural networks.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 4: Case study II (symbol demodulation): SER according to (a) the dimension of latent space; (b) the size of training dataset; (c) SNR; and (d) the depth of neural networks.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 5: Case study III (precise localization): Localization error according to (a) the dimension of latent space; (b) the size of training dataset; (c) pNLoSp_{NLoS}; and (d) the depth of neural networks.

III-B2 Application of nlDAE

To consider the complex-valued data, we separate it into real and imaginary parts. \Re and \Im denote the operators capturing real and imaginary parts of an input, respectively. Thus, 𝐱~nl(j)\tilde{\mathbf{x}}_{nl}^{(j)} is the regenerated 𝐝(j)𝐡(j)\mathbf{d}^{(j)}\odot\mathbf{h}^{(j)} by denoising 𝐲(j)\mathbf{y}^{(j)}, which can be represented by

𝐱~nl(j)=(𝐲(j))gθnl,R(fθnl,R((𝐲(j))))+i((𝐲(j))gθnl,I(fθnl,I((𝐲(j))))),\begin{split}\tilde{\mathbf{x}}_{nl}^{(j)}&=\Re(\mathbf{y}^{(j)})-g_{\theta_{nl,R}^{{}^{\prime}*}}(f_{\theta_{nl,R}^{*}}(\Re(\mathbf{y}^{(j)})))\\ &+i(\Im(\mathbf{y}^{(j)})-g_{\theta_{nl,I}^{{}^{\prime}*}}(f_{\theta_{nl,I}^{*}}(\Im(\mathbf{y}^{(j)})))),\end{split} (10)

where

θnl,R,θnl,R=argminθ,θ1Mi=1M((𝐧(i)),gθ(fθ((𝐲(i))))),θnl,I,θnl,I=argminθ,θ1Mi=1M((𝐧(i)),gθ(fθ((𝐲(i))))).\begin{split}\theta_{nl,{R}}^{*},{\theta}_{nl,{R}}^{{}^{\prime}*}=\operatorname*{arg\,min}_{\theta,\theta^{{}^{\prime}}}\frac{1}{M}\sum_{i=1}^{M}\mathcal{L}\big{(}\Re(\mathbf{n}^{(i)}),g_{\theta^{\prime}}(f_{\theta}(\Re(\mathbf{y}^{(i)})))\big{)},\\ \theta_{nl,{I}}^{*},{\theta}_{nl,{I}}^{{}^{\prime}*}=\operatorname*{arg\,min}_{\theta,\theta^{{}^{\prime}}}\frac{1}{M}\sum_{i=1}^{M}\mathcal{L}\big{(}\Im(\mathbf{n}^{(i)}),g_{\theta^{\prime}}(f_{\theta}(\Im(\mathbf{y}^{(i)})))\big{)}.\end{split}

Finally, the receiver estimates 𝐡\mathbf{h} with the predetermined pilot symbols, i.e., 𝐝[nK+1]\mathbf{d}[nK+1] where n=0,1,n=0,1,\cdots, and demodulates 𝐝\mathbf{d} based on the estimate of 𝐡\mathbf{h} and the regenerated 𝐱~nl\tilde{\mathbf{x}}_{nl}.

III-B3 Experimental Parameters

The performance of the proposed nlDAE is evaluated when L=5000L=5000. For the simulation parameters, we set 44 QAM, P=12P=12, Δf=15\Delta f=15 kHz, Lp=4L_{p}=4, and K=3K=3. We further assume that α𝒞𝒩(0,1)\alpha\sim\mathcal{CN}(0,1) and τ𝒰(0,106)\tau\sim\mathcal{U}(0,10^{-6}). Furthermore, P=9P^{\prime}=9, SNR=5=5 dB, and M=10000M=10000 unless otherwise specified. We also provide the result of non-ML (i.e., only cubic interpolation).

III-C Case Study III: Precise Localization

The objective of this case study is to improve the localization quality through denoising the measured distance which is represented by the quantized value of the mixture of the true distance and error factors.

III-C1 Model

Consider a 2-D localization where PP reference nodes and a single target node are randomly distributed. We estimate the position of the target node with the knowledge of the locations of PP reference nodes. Let 𝐱P\mathbf{x}\in\mathbb{R}^{P} denote the vector of true distances from PP reference nodes to the target node when XX denotes the distance between two random points in a 2-D space. We consider three types of random variables for the noise added to the true distance as follows:

  • NNN_{N}: ranging error dependent on signal quality.

  • NUN_{U}: ranging error due to clock asynchronization.

  • NBN_{B}: non line-of-sight (NLoS) event.

We assume that NNN_{N}, NUN_{U}, NBN_{B} follow the normal, uniform, and Bernoulli distributions, respectively. Hence, we can define the random variable for the noise NN as follows:

N=NN+NU+RNLoSNB,N=N_{N}+N_{U}+R_{NLoS}N_{B}, (11)

where RNLoSR_{NLoS} is the distance bias in the event of NLoS. Note that NN does not follow any known probability distribution because it is a convolution of three different distributions. Besides, we assume that the distance is measured by time of arrival (ToA). Thus, we define the quantization function 𝒬B\mathcal{Q}_{B} to represent the measured distance with the resolution of BB, e.g., 𝒬10(23)=20\mathcal{Q}_{10}(23)=20. In addition, the localization method based on multi-dimensional scaling (MDS) is utilized to estimate the position of the target node [7].

III-C2 Application of nlDAE

In this case study, we consider the discrete values quantized by the function 𝒬B\mathcal{Q}_{B}. Here, 𝐱~nl(j)\tilde{\mathbf{x}}_{nl}^{(j)} can be represented as follows:

𝐱~nl(j)=𝒬B(𝐲(j))gθnl,R(fθnl,R(𝒬B(𝐲(j)))),\tilde{\mathbf{x}}_{nl}^{(j)}={\mathcal{Q}_{B}(\mathbf{y}}^{(j)})-g_{\theta_{nl,R}^{{}^{\prime}*}}(f_{\theta_{nl,R}^{*}}({\mathcal{Q}_{B}(\mathbf{y}}^{(j)}))), (12)

where

θnl,R,θnl,R=argminθ,θ1Mi=1M(𝒬B(𝐧(i)),gθ(fθ(𝒬B(𝐲(i))))).\theta_{nl,{R}}^{*},{\theta}_{nl,{R}}^{{}^{\prime}*}=\operatorname*{arg\,min}_{\theta,\theta^{{}^{\prime}}}\frac{1}{M}\sum_{i=1}^{M}\mathcal{L}\big{(}\mathcal{Q}_{B}(\mathbf{n}^{(i)}),g_{\theta^{\prime}}(f_{\theta}(\mathcal{Q}_{B}(\mathbf{y}^{(i)})))\big{)}.

Thus, 𝐱~nl\tilde{\mathbf{x}}_{nl} is utilized for the estimation of the target node position in nlDAE-assisted MDS-based localization.

III-C3 Experimental Parameters

The performance of the proposed nlDAE is evaluated via L=5000L=5000. In this simulation, 1212 reference nodes and one target node are uniformly distributed in a 100×100100\times 100 square. We assume that NN𝒩(0,10),NU𝒰(0,20),NBBer(0.2)N_{N}\sim\mathcal{N}(0,10),N_{U}\sim\mathcal{U}(0,20),N_{B}\sim\text{Ber}(0.2), and RNLoS=50R_{NLoS}=50. The distance resolution BB is set to 1010 for the quantization function 𝒬B\mathcal{Q}_{B}. Note that P=9P^{\prime}=9, pNLoS=0.2p_{NLoS}=0.2, and M=10000M=10000 unless otherwise specified. We also provide the result of non-ML (i.e., only MDS based localization).

III-D Analysis of Experimental Results

Fig. 3(a), Fig. 4(a), and Fig. 5(a) show the performance of the three case studies with respect to PP^{\prime}, respectively. nlDAE outperforms non-ML and DAE for all ranges of PP^{\prime}. Particularly with small values of PP^{\prime}, nlDAE continues to perform well, whereas DAE loses its merit. This means that nlDAE provides a good denoising performance even with an extremely small dimension of latent space if the training dataset is sufficient.

The impact of the size of training dataset is depicted in Fig. 3(b), Fig. 4(b), and Fig. 5(b). nlDAE starts to outperform non-ML with MM less than 100. Conversely, DAE requires about an order higher MM to perform better than non-ML. Furthermore, nlDAE converges faster than DAE, thus requiring less training data than DAE.

In Fig. 3(c), Fig. 4(c), and Fig. 5(c), the impact of a noise-related parameter for each case study is illustrated. When the noise occurs according to a Bernoulli distribution in Fig. 3(c), the performance of ML algorithms (both nlDAE and DAE) exhibits a concave behavior. This is because the variance of Ber(p)\text{Ber}(p) is given by p(1p)p(1-p). Similar phenomenon is observed in Fig. 5(c) because the Bernoulli event of NLoS constitutes a part of localization noise. As for non-ML, the performance worsens as the probability of noise occurrence increases in both cases. Fig. 4(c) shows that the SER performance of nlDAE improves rapidly as the SNR increases. In all experiments, nlDAE gives superior performance than other schemes.

Thus far, the experiments have been conducted with a single hidden layer. Fig. 3(d), Fig. 4(d), and Fig. 5(d) show the effect of the depth of the neural network. The performance of nlDAE is almost invariant, which suggests that nlDAE is not sensitive to the number of hidden layers. On the other hand, the performance of DAE worsens quickly as the depth increases owing to overfitting in two cases.

In summary, nlDAE outperforms DAE over the whole experiments. nlDAE is observed to be more efficient for the underlying use cases than DAE because it requires smaller latent space and less training data. Furthermore, nlDAE is more robust to the change of the parameters related to the design of the neural network, e.g., the network depth.

IV Conclusion and Future Work

We introduced a new denoiser framework based on the neural network, namely nlDAE. This is a modification of DAE in that it learns the noise instead of the original data. The fundamental idea of nlDAE is that learning noise can provide a better performance depending on the stochastic characteristics (e.g., standard deviation) of the original data and noise. We applied the proposed mechanism to the practical problems for IoT devices such as signal restoration, symbol demodulation, and precise localization. The numerical results support that nlDAE is more efficient than DAE in terms of the required dimension of the latent space and the size of training dataset, thus rendering it more suitable for capability-constrained conditions. Applicability of nlDAE to other domains, e.g., image inpainting, remains as a future work. Furthermore, information theoretical criteria of decision making for the selection between or a combination of DAE and nlDAE is an interesting further research.

References

  • [1] U. Challita, H. Ryden, and H. Tullberg, “When machine learning meets wireless cellular networks: Deployment, challenges, and applications,” IEEE Communications Magazine, vol. 58, no. 6, pp. 12–18, 2020.
  • [2] M. Chen, U. Challita, W. Saad, C. Yin, and M. Debbah, “Artificial neural networks-based machine learning for wireless networks: A tutorial,” IEEE Communications Surveys & Tutorials, vol. 21, no. 4, pp. 3039–3071, 2019.
  • [3] A. Azari, M. Ozger, and C. Cavdar, “Risk-aware resource allocation for URLLC: Challenges and strategies with machine learning,” IEEE Communications Magazine, vol. 57, no. 3, pp. 42–48, 2019.
  • [4] Y. Sun, M. Peng, Y. Zhou, Y. Huang, and S. Mao, “Application of machine learning in wireless networks: Key techniques and open issues,” IEEE Communications Surveys & Tutorials, vol. 21, no. 4, pp. 3072–3108, 2019.
  • [5] Y. Bengio, L. Yao, G. Alain, and P. Vincent, “Generalized denoising auto-encoders as generative models,” Advances in neural information processing systems, pp. 899–907, 2013.
  • [6] S. Coleri, M. Ergen, A. Puri, and A. Bahai, “Channel estimation techniques based on pilot arrangement in OFDM systems,” IEEE Transactions on broadcasting, vol. 48, no. 3, pp. 223–229, 2002.
  • [7] I. Dokmanic, R. Parhizkar, J. Ranieri, and M. Vetterli, “Euclidean distance matrices: Essential theory, algorithms, and applications,” IEEE Signal Processing Magazine, vol. 32, no. 6, pp. 12–30, 2015.