This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Noise Homogenization
via Multi-Channel Wavelet Filtering
for High-Fidelity Sample Generation in GANs

Shaoning Zeng
zsn@outlook.com &Bob Zhang
bobzhang@um.edu.mo
Department of Computer and Information Science
University of Macau, Taipa, Macau, China
The corresponding author who supervised this work.
Abstract

In the generator of typical Generative Adversarial Networks (GANs), a noise is inputted to generate fake samples via a series of convolutional operations. However, current noise generation models merely relies on the information from the pixel space, which increases the difficulty to approach the target distribution. Fortunately, the long proven wavelet transformation is able to decompose multiple spectral information from the images. In this work, we propose a novel multi-channel wavelet-based filtering method for GANs, to cope with this problem. When embedding a wavelet deconvolution layer in the generator, the resultant GAN, called WaveletGAN, takes advantage of the wavelet deconvolution to learn a filtering with multiple channels, which can efficiently homogenize the generated noise via an averaging operation, so as to generate high-fidelity samples. We conducted benchmark experiments on the Fashion-MNIST, KMNIST and SVHN datasets through an open GAN benchmark tool. The results show that WaveletGAN has excellent performance in generating high-fidelity samples, thanks to the smallest FIDs obtained on these datasets.

1 Introduction

In Generative Adversarial Networks (GANs) [1], the generator GG is designed to capture the data distribution as accurately as possible, through the adversarial training by the discriminator DD, which will ultimately fail to distinguish the generated samples from the real ones. In this adversarial model, the problem getting the most attention is on how to efficiently train the model. This is because the only way to generate the fake samples with adequate fidelity used to depend on the fine tuning process after the discrimination by DD. For this reason, a bunch of research effort has been devoted to improve the training from the two components, GG and DD. We agree and argue that orchestrating the network architectures is one of the most promising solutions for this problem.

Almost every single aspect of a GAN has been exploited to enhance the sample generation, including architectures, loss function, regularization, normalization, etc. First of all, most of the cutting-edge deep CNN models have been incorporated as the backbone network of GANs, to form GG and/or DD. Among them, for example, ResNet [2] has been popular and showed a promising performance as the backbone network [3, 4]. The well-known loss functions like Least Squares [5], Hinge [6], and Wasserstein [7] are found in GANs for different purposes. Innovations on regularization [8, 9] and normalization [3, 9] for improving GANs have become common occurrences as well. Recently, many interesting strategies in representation learning, including self-supervised learning [10] and self-attention learning [11], have been incorporated to design more promising GAN architectures. However, all these explorations remain on one single dimension of spectral analysis, i.e., the pixels of images. Many other signal processing techniques that support powerful spectral decomposition, e.g., using wavelets to analyze frequency information [12], are likely to enhance spectral decomposition for sample generation by multi-channel analysis. This is so far an open question to be answered.

Refer to caption
Figure 1: WaveletGAN architecture using wavelet filtering to homogenize the generated noise.

In this work, we firstly introduce a multi-channel wavelet-based filtering to orchestrate a novel GAN architecture, which is named WaveletGAN. The idea is easy to understand and simple to implement, as shown in Figure 1. When the noise is generated by G, a multi-channel wavelet filtering is injected to perform spectral decomposition that considers both scale and frequency. The fake samples are then generated by incorporating this additional information. The filtering is simply implemented as a pluggable WaveletDeconv layer, which supports most of the GAN architectures and whose details will be explained later. In this way, we intent to let the outputted noise go through a homogenization processing, so as to improve the quality of the generated samples. This is confirmed by the lowest Fréchet Inception Distances (FIDs) [13] obtained by WaveletGAN in our benchmarking experiments on multiple image datasets. Our contributions in this work include:

  • \bullet

    Proposing a multi-channel wavelet filtering method in the generator, which excels most of the current GANs in generating high-fidelity samples.

  • \bullet

    Giving a simple implementation, as the WaveletDeconv layer supports almost all types of GANs, by averaging multiple spectral channels. It is intented to be a very cost-effective operation to homogenize the noise and fully utilize the enriched information, despite the fact that it is difficult to explore the most useful information.

  • \bullet

    Demonstrating the performance of WaveletGAN through compare_gancompare\_gan, an open-sourced GAN benchmarking tool by Google Research111Open GAN benchmarking tool - https://github.com/google/compare_gan, on the Fashion-MNIST [14], KMNIST [15] and Street View House Numbers (SVHN) [16] datasets. The lowest FIDs are obtained by our WavletGAN, which are superior to multiple current state-of-the-art GANs.

The paper is organized using the following structure. The related studies, mainly on GANs and wavelets in neural networks, are reviewed in Sec. 2. Afterwards, the architectures of WaveletGAN, as well as its formulation and analysis, are presented in Sec. 3. Sec. 4 demonstrates the benchmarking settings and the experimental results. We have the conclusions and the future work in Sec. 5.

2 Related Work

2.1 Generative Adversarial Networks

Improvements to GANs fall into three categories, including carefully selecting a good loss function [7, 5], distinctively considering regularization [8] and/or normalization [3, 9], and painstakingly designing the neural architecture [2, 3, 17], for both the generator and the discriminator. Furthermore, many of the current improvements cover more than one of the categories. For example, regularizations like gradient penalty [9, 18], consensus constraint [19], and spectral normalization [3] are specifically applied to the discriminator, so as to mitigate those gradient vanishing issues in training GANs. Also, there are similar strategies particularly for the generator, like applying spectral normalization [3] to improve training dynamics [11]. This makes things complicated and extremely hard.

As a descendant trend of deep neural networks [1], architecture design for GANs has gained the most attention. Some of the current methods merely focused on discriminator learning [20, 21], while the others were more specific on the generator [22], or both [23]. In correspondence, the GAN architecture with multiple generators has also been proposed [22]. That being said, there are a lot of complicated models proposed by stacking multiple architectures [21, 22]. Among these areas, there is another trend specific on the signal conditions, i.e., conditional training with labels [11] and unsupervised GANs trained without prior labels [10]. More recently, the self-supervised constrains [10, 23] and the self-attention mechanism [11] were also presented to improve GANs.

Besides these, architectures that manipulate the noise generation also attracts a lot of attentions. The multiplicative noise was introduced to reduce the uncertainty in GANs [24]. A decoder-encoder network was particularly designed to map the random noise vectors to informative ones and feed them to the generator of the adversarial networks [25]. A smooth generator was proposed by manipulating the output noise to cope with the perturbation during sample generation [26]. Therefore, we believe that noise manipulation provides a promising potential to produce high-fidelity sample generation in GANs. However, current options have been staying in the pixel space of noise.

2.2 Wavelets in neural networks

Wavelet transformation, known as a well-proven and fully-utilized time-frequency signal analysis tool [27], has also been playing an important role in constructing neural networks [28]. For example, it is able to approximate arbitrary nonlinear functions [28] and replace the radial basis function [29] in neural networks. Besides these, fuzzy wavelet neural networks [30] included a wavelet function to formulate dynamic systems for a better performance. Recently, a deep wavelet network was proposed for image classification [31]. Multi-level wavelet deep CNNs were designed for image restoration [32] and fault diagnosis [33]. Wavelet transform has been integrated into a deep RNN model for natural gas demand forecasting [34]. Furthermore, various types of wavelet-like deep auto-encoders have been proposed to accelerate deep neural networks [35], as well as different applications, including image classification [36], intelligent fault diagnosis [37], medical imaging [38], spherical signal detection [39], etc. Graph wavelets were constructed to sparsely represent a given class of signals in deep learning [40].

However, no exploration has occurred around utilizing wavelets to improve image generation in GANs. It is well-known that wavelet transformation is capable of filtering and performing deconvolution for signal processing [41], for example, capturing the addition frequency information. We believe that it is helpful to homogenize the noise production in the generator, which is likely to simulate the distribution easier and be accepted by the discriminator. Here, we design our proposed WaveletGAN based on this assumption.

3 WaveletGAN: Noise Homogenization for High-Fidelity Generation

3.1 GAN architectures

The formulation of GANs in the convolutional form [1] is

minGmaxDV(G,D)=E𝒙qdata[logD(𝒙)]+E𝒛pG[log(1D(G(𝒛)))],\min_{G}\max_{D}V(G,D)=\mathrm{E}_{\boldsymbol{x}\sim q_{data}}[\log D(\boldsymbol{x})]+\mathrm{E}_{\boldsymbol{z}\sim p_{G}}[\log(1-D(G(\boldsymbol{z})))], (1)

where qdataq_{data} is the data distribution, and pGp_{G} is the (model) generator distribution to be learned through the adversarial min-max optimization by the fine-tuning game between GG and DD. Given a noise 𝒛\boldsymbol{z}, 𝒙=G(𝒛)\boldsymbol{x}^{\prime}=G(\boldsymbol{z}) is the fake sample generated by GG, which is then to be discriminated by DD. In a simple implementation, the noise 𝒛\boldsymbol{z} is the output of the generator upon an initial input noise 𝒛0\boldsymbol{z}_{0}, a.k.a. 𝒛=G(𝒛0)\boldsymbol{z}=G(\boldsymbol{z}_{0}). Then, the generated sample can be obtained by adding the output noise to the real sample, i.e., 𝒙=G(𝒛0)+x\boldsymbol{x}^{\prime}=G(\boldsymbol{z}_{0})+x. In this way, the formulation of this kind of simple GANs becomes

minGmaxDV(G,D)=E𝒙qdata [logD(𝒙)]+E𝒛pG[log(1D(𝒙+G(𝒛𝟎)))].\min_{G}\max_{D}V(G,D)=\mathrm{E}_{\boldsymbol{x}\sim q_{\text{data }}}[\log D(\boldsymbol{x})]+\mathrm{E}_{\boldsymbol{z}\sim p_{G}}[\log(1-D(\boldsymbol{x}+G(\boldsymbol{z_{0}})))]. (2)

Different from the methods aiming at enhancing the discrimination of DD, a few of the new mechanisms have been revealed to manipulate the output noise 𝒛\boldsymbol{z} [24, 25, 26]. Here, we are going to take advantage of the frequency decomposed by the wavelet transformations.

The key step (one and only) in our proposed method is an additional wavelet deconvolution layer, which performs a multi-channel wavelet filtering operation to homogenize the generated noise. Typically, the generator network works on a random initialized noise 𝒛𝟎\boldsymbol{z_{0}}, i.e., the input to the generator. In conventional GANs, the generated noise was added to the real images to generate the fake samples, before being fed to the discriminator. Technically, the wavelet deconvolution supports random forms of noise, as a signal to be analyzed by wavelets, hence our method supports any GAN models. Indeed, it can be implemented in one line of code222See the accompanying code for this work in https://github.com/zengsn/compare_gan. . For simplicity, we give an implementation based on the GAN architecture provided in [3], which used ResNet [2] as the backbone network, for the sake of demonstrating the architecture of WaveletGAN.

Table 1: ResNet-based GAN architectures with wavelet deconvolution.
Generator Discriminator
𝒛0128𝒩(0,I)\boldsymbol{z}_{0}\in\mathbb{R}^{128}\sim\mathcal{N}(0,I) 𝒙28×28\boldsymbol{x}\in\mathbb{R}^{28\times 28}
dense, 7 ×\times 7 ×\times 256 ResBlock down 128
ResBlock up 256 ResBlock down 128
ResBlock up 256 ReLU
BN, ReLU, 3×\times3 conv 3 Global sum pooling
WaveletDeconv, 5, average dense \rightarrow 1
Sigmod
Generator Discriminator
𝒛0128𝒩(0,I)\boldsymbol{z}_{0}\in\mathbb{R}^{128}\sim\mathcal{N}(0,I) 𝒙32×32×3\boldsymbol{x}\in\mathbb{R}^{32\times 32\times 3}
dense, 4×4×2564\times 4\times 256 ResBlock down 128
ResBlock up 256 ResBlock down 128
ResBlock up 256 ResBlock 128
ResBlock up 256 ResBlock 128
BN, ReLU, 3×\times3 conv 3 ReLU
WaveletDeconv, 5, average Global sum pooling
Sigmod dense \rightarrow 1
(a). Architecture for FMNIST and KMNIST. (b). Architecture for SVHN.

We implement two versions of ResNet architectures, according to [3], for two types of input data. The generators accept the same input of the initial noise 𝒛0128𝒩(0,I)\boldsymbol{z}_{0}\in\mathbb{R}^{128}\sim\mathcal{N}(0,I). However, the target outputs depend on the datasets. The first one, as shown in Table 1-(a), is for the training data x28×28x\in\mathbb{R}^{28\times 28}, while the other one supports the RGB images 𝒙32×32×3\boldsymbol{x}\in\mathbb{R}^{32\times 32\times 3}, as shown in Table 1-(b). Technically, it is all about the backbone networks, i.e., ResNet, in the architecture, which contains 2 or 3 ResBlocks, respectively. However, the backbone network is easily alternated to support other scales of input. We omit this part because our wavelet-based noise homogenization has nothing to do with the network, but only depends on the noise generations. As shown in Table 1, the WaveletDeconv layer is right after the output noise finalized by a 3×33\times 3 convolution. We set the number of channels to be a fixed value of 55, which in turn produces 5 channels of analyzed results. An average operation is then performed upon these resultant signals, as an operation of noise homogenization, to compose the new output noise for the following sample generation.

3.2 Multi-channel wavelet filtering

The noise homogenization is implemented by introducing multi-channel wavelet filtering, right after the outputted noise by the generator. First of all, we introduce the continuous wavelet transformation (CWT) [42] to the signal, i.e., the noise 𝒛\boldsymbol{z} outputted by the generator (see Figure 1). CWT is defined by a mother wavelet function 𝚿\mathbf{\Psi}, which is scaled to form the wavelet functions. This transformation is usually good at processing a time series signal x(t)x(t) (where t=1Tt=1\dots T) [12], convolved with a specific wavelet function. According to the wavelet theory [27], the prerequisite to select the mother wavelet function is making sure it has small local support and satisfies the zero mean and normalization properties. We choose the Mexican Hat wavelet as the mother wavelet, since it is common for combining wavelet transformations with convolutional neural networks [43, 44, 45], whose definition can be derived from a Gaussian:

ψ(t)=2π1/43σ(t2σ21)et2σ2.\psi(t)=\frac{2}{\pi^{1/4}\sqrt{3\sigma}}\left(\frac{t^{2}}{\sigma^{2}}-1\right)e^{-\frac{t^{2}}{\sigma^{2}}}. (3)

By scaling and translating, this formulation can be rewritten to a new form of the wavelet function ψs,b\psi_{s,b}:

ψs,b(t)=1sψ(tbs),\psi_{s,b}(t)=\frac{1}{\sqrt{s}}\psi\left(\frac{t-b}{s}\right), (4)

where s>0s>0 is the scale coefficient and bb is the offset to translate. Then, the CWT of a signal, i.e., decomposition of 𝒛\boldsymbol{z} according to ss and bb, becomes a new formulation as follows:

W𝒛(s,b)=1sψ(tbs)𝒛(t)𝑑t.W_{\boldsymbol{z}}(s,b)=\int_{-\infty}^{\infty}\frac{1}{\sqrt{s}}\psi\left(\frac{t-b}{s}\right)\boldsymbol{z}(t)dt. (5)

In this way, the signal 𝒛\boldsymbol{z} will be transformed from one single dimension of domain (tt) to two dimensions (tt and ss) by convolution with a wavelet function 𝚿\mathbf{\Psi} at each scale ss. This is the basis of how we can implement a multi-channel wavelet filtering layer in a deep convolutional neural network. It is noted that this kind of wavelet deconvolution layer (a.k.a. WaveletDeconv) has been implemented for the time series-based phone recognition task [12], which showed that the gradients of the scales can be learned using back-propagation.

Denote EE as a differentiable loss function to be minimized by the neural network, which is typically the categorical cross entropy, the gradient, yielded by the wavelet deconvolution layer, with respect to the scale parameter ss for training the network is:

δEδsi=k=1KδEδψsi,kδψsi,kδsi,\frac{\delta E}{\delta s_{i}}=\sum_{k=1}^{K}\frac{\delta E}{\delta\psi_{s_{i},k}}\frac{\delta\psi_{s_{i},k}}{\delta s_{i}}, (6)

where KK is the width of the filter that can be learned during training. At this time, the scales can be dynamically updated according to the gradients:

si=siγδEδsi,s_{i}^{\prime}=s_{i}-\gamma\frac{\delta E}{\delta s_{i}}, (7)

where γ\gamma is the learning rate of the optimizer. We omit the full solution of Eq. 6, which can be found in [12]. We reuse the same implementation of WaveletDeconv from [12] in our model. Consequently, the new formulation for our WaveletGAN becomes:

minGmaxDV(G,D)=E𝒙qdata [logD(𝒙)]+E𝒛pG[log(1D(𝒙+W(G(𝒛𝟎))))].\min_{G}\max_{D}V(G,D)=\mathrm{E}_{\boldsymbol{x}\sim q_{\text{data }}}[\log D(\boldsymbol{x})]+\mathrm{E}_{\boldsymbol{z}\sim p_{G}}\left[\log\left(1-D\left(\boldsymbol{x}+W(G(\boldsymbol{z_{0}}))\right)\right)\right]. (8)

However, there are two-fold differences between our WaveletDeconv and the original one. Firstly, we implement WaveletDeconv for image generation in GANs, which has no time series information. Secondly, we support multi-channel filtering, rather than single-channel filtering, followed by an additional averaging operation to combine multiple scales of signals into one target noise signal, and therefore homogenize the output noise. In this way, the noises before and after processing are in an identical shape, and hence our WaveletGAN has no dependency on the network architectures. WaveletGAN simply acts like a novel preprocessing technique, which is therefore very easy to implement and even can be written in one line of code, to support most of the GAN architectures.

4 Experiments

We conducted a set of experiments and ran them on GPUs using the GAN benchmarking tool compare_gancompare\_gan333It supports both GPUs and Google TPU, but we ran the benchmarking experiments on GPUs only., to evaluate the performance of our WaveletGAN. As aforementioned in the previous section, our implementation is demonstrated based on ResNet, due to the fact that we observed the smallest FIDs when using ResNet as the backbone network. In order to evaluate the quality of the generated samples, we will show the results in qualitative and quantitive ways, i.e., sample visualization and FID. Sample visualization can reveal the detailed difference between the generated samples and the real samples. FIDs are known to be an effective measurement to judge sample quality. Under these two evaluations, the performance of WaveletGAN and its improvements to sample generation can be established.

4.1 Datasets

The datasets utilized in the experiments include Fashion-MNIST [14] (a.k.a. FMNIST), KMNIST [15], and Street View House Numbers (SVHN) [16], which are all popular for benchmarking GANs. The configurations of all datasets are depicted in Tab. 2.

FMNIST and KMNIST, which are both MNIST-like datasets, contain an equal number and scale of gray images in 28×2828\times 28 pixels. The fashion styles and the Japanese letters all belong to 10 classes. The training and testing sets are divided in advanced, including 60,000 training samples and 10,000 test samples. Correspondingly, SVHN is an RGB image dataset, in which there are 73,257 training samples and 26,032 test samples in 32×3232\times 32 pixels and with 33 channels. Due to the fact that the input images are of different sizes, the GAN architectures are under distinct designs, as previously described in Sec. 3.

No matter using which datasets, the GANs will generate fake images for the test samples after being trained using the training samples. The visualization, as well as the calculation of FIDs, will be conducted based on these generated images.

Table 2: Datasets and their configurations.
Name Size Classes Train Set Test Set Samples
FMNIST [14] 28×2828\times 28 10 60,000 10,000 [Uncaptioned image] [Uncaptioned image]  [Uncaptioned image] [Uncaptioned image]
KMNIST [15] 28×2828\times 28 10 60,000 10,000 [Uncaptioned image] [Uncaptioned image]  [Uncaptioned image] [Uncaptioned image]
SVHN [16] 32×32×332\times 32\times 3 10 73,257 26,032 [Uncaptioned image] [Uncaptioned image]  [Uncaptioned image] [Uncaptioned image]

4.2 Generated samples

Refer to caption
Figure 2: The real and generated samples from each dataset.

In order to intuitively evaluate WaveletGAN, we plot the generated samples in Fig. 2. Particularly, the first line shows some real samples (marked as Real) from each dataset for comparison. The following includes 3 lines of the fake samples (marked as RN) generated by the ResNet-based GAN and 3 lines of the generated samples (marked as WG) by our WaveletGAN. In this way, there are 7 lines of samples plotted for each dataset. It is noted that all the samples are directly picked up during the runtime of evaluating the models, which had no fixed order. The random appearance has less influence to judge their visualization in our opinion. For this reason, we do not reorder them manually.

However, we manually highlight some of the visibly occluded samples with the red boxes. It is obviously that more samples generated by ResNet-based GAN (in line 2 to 4) suffer the problem than the ones outputted by our WaveletGAN. For example, many fake samples of shoe (in column 8, 9, 12, and 14 of line 2 to 4) from FMNIST are over occluded, or failed in image generation. The same issue has been found in KMNIST and SVHN as well. That being said, WaveletGAN generates much fewer samples with these problems. This is merely a qualitative judgement, but we can continue to evaluate it through a quantitative metric.

4.3 Fréchet Inception Distances (FIDs)

Fréchet Inception Distance (FID) [13] is one of the most significant quantitative metrics to evaluate the performance of GANs. Despite the fact that other measurements, e.g., Inception Score (IS) [46] and GILBO [47], can also be utilized for this quantitative evaluation, FID is known to be popular and effective for the same job [10]. Therefore, we consider FID as the mere metric in this work. Since our WaveletGAN is merely related with the outputted noise, it supports both unsupervised and conditional settings, technically, in most of the GAN architectures. In order to calculate FIDs, we generated the same number of fake samples, as the test samples, 3 times on each dataset, before computing the average FID for each case.

Table 3: Best FIDs obtained in unsupervised (Un-cond.) and conditional (Cond.) GANs.
FMNIST KMNIST SVHN
GANs Un-cond. Cond. Un-cond. Cond. Un-cond. Cond.
DC-GAN [17] 10.594 10.149 14.788 12.573 36.880 41.172
WGAN-GP [9] 10.448 10.680 14.113 13.169 33.368 33.500
SN-DCGAN [3] 15.085 14.274 11.200 10.490 17.702 17.402
SSGAN-ResNet [10] 23.165 20.062 10.216 10.918 17.856 17.733
SSGAN-SNDCGAN [10] 13.895 14.572 14.084 15.320 17.004 22.336
SN-ResNet [3] 8.320 8.252 4.983 5.356 16.267 16.654
Our WaveletGAN 8.089 7.945 4.874 5.227 16.106 16.256
Improvement Rate (%) \uparrow 2.78 \uparrow 3.73 \uparrow 2.20 \uparrow 2.42 \uparrow 0.99 \uparrow 2.39
Bold values are the best (lowest) FIDs on the specific dataset.
Improvement Rate (%) is calculated between WaveletGAN and SN-ResNet, e.g., on FMNIST,
the rate is (8.3208.089)/8.089=2.78%(8.320-8.089)/8.089=2.78\%.

During the benchmarking, we compared our WaveletGAN with most of the state-of-the-art GANs, including DC-GAN [17], WGAN-GP [9], SN-DCGAN [3], and SSGAN [10], under the unsupervised (without labels) and conditional (using labels [48]) settings. Since our benchmarking experiments were conducted on compare_gan, we reused the same configurations from the tool, and set the parameters according to the GAN benchmark settings in [10]. For example, the batch size is set to 6464 for all models, and the dimension of randomly initialized noise 𝒛0\boldsymbol{z}_{0} is 128. The tool provides different structures of DC-GAN and SN-DCGAN, but spectral normalization (SN) [3] is designed as a configurable setting for all models. Therefore, we enabled SN on all models, including DC-GAN and WGAN-GP. For consistency, we set 5 iterations of sub-steps during each discriminator training step, and the hinge loss function was used in all GANs except SN-ResNet. Besides this, Adam was set as the optimizer for all models with a learning rate γ=0.0002\gamma=0.0002. All models were trained in 100,000 steps. More detailed settings of each models can also be found in our accompanying code.

The lowest FIDs by these selected GANs are recorded in Table 3. Obviously, the smallest FIDs on all datasets are obtained by our WaveletGAN, as aforementioned, which utilizes ResNet as the backbone network. For example, the best FIDs are 4.874 (in bold style) obtained on KMNIST in unsupervised mode. The lowest FIDs on FMNIST and SVHN are 7.945 and 16.106 (in bold style), respectively. All three results are the lowest FIDs on the specific dataset. More impressively, these FIDs are much smaller than many other GANs, like DC-GAN, WGAN-GP and SN-DCGAN, which can only produce FIDs larger than 10. Furthermore, it can be observed that the recently proposed self-supervised GAN (SSGAN) generates FIDs 10.216 and over (on KMNIST), which are inferior to the results by our WaveletGAN.

It is noted that whether using the labels or not this has a less decisive impact on the results. Conditional GANs produce lower FIDs most of the time, as shown in Table 3. However, there are exceptions as well. For example, the unsupervised SN-ResNet outputs lower FIDs on KMNIST (4.983) and SVHN (16.267). This leads to the same results in our WaveletGAN, which obtained FIDs lower than the ones in the conditional setting. We focus on improving sample generation in GANs, rather than the behavior of using labels or not. The results demonstrate that wavelet-based filtering has nothing to do with this. The point is that WaveletGAN is good at generating high-fidelity samples according to the quantitive metric FID.

4.4 Improvements

Technically, the wavelet-based noise homogenization indeed supports any structures of GANs, but our implementations use ResNet as the backbone network (see Table 1). This is because we observed the lowest FIDs in the SN-ResNet-based GAN implementation [3]. Therefore, we calculated the improvement rates between our WaveletGAN and SN-ResNet, as shown in the last line of Table 3.

Obviously, the highest improvements were obtained in conditional settings, where the most promising one is 3.73% on FMNIST. The improvements on the other two datasets are greater than 2.39%, which are equally impressive. Although the rate is slightly less than 1% on SVHN in the unsupervised setting, 2.78% and 2.20% are observed on FMNIST and KMNIST, respectively. This demonstrates that noise homogenization by wavelet-based filtering consistently improves the sample generation in GANs.

5 Conclusions and Future Work

The work proposes a novel GAN architecture that incorporates multi-channel wavelet filtering in the generator for the first time. Implemented as the WaveletDeconv layer after the noise generation, which supports most of the GAN architectures, noise homogenization is activated and therefore enables high-fidelity sample generation in GANs. The implementation of WaveletGAN, based on the ResNet backbone, generates very high-fidelity fake samples, as well as the lowest FIDs, on multiple image datasets. Benchmarks through the open source GAN tool confirms that WaveletGAN has a superior performance than many state-of-the-art GANs.

Despite this, there are still some known issues to be resolved. For example, wavelet transformation is a computationally-intensive operation, which calls for special accelerating techniques on GPU devices. The focus is about the generator, but it would be interested if the same mechanism can help to improve the discriminator. We will follow this idea and explore it as part of the future work.

Acknowledgments and Disclosure of Funding

We implemented our method based on the GAN benchmark tool provided by Google Research in https://github.com/google/compare_gan. We want to thank the team for providing the initial code base. The authors acknowledge the support of the University of Macau (File no. MYRG2018-00053-FST), and NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

References

  • [1] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
  • [2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • [3] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations, pages 1–26, 2018.
  • [4] Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8789–8797, 2018.
  • [5] Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 2794–2802, 2017.
  • [6] Ngoc-Trung Tran, Tuan-Anh Bui, and Ngai-Man Cheung. Dist-gan: An improved gan using distance constraints. In Proceedings of the European Conference on Computer Vision (ECCV), pages 370–385, 2018.
  • [7] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial networks. In International Conference on Machine Learning, pages 214–223, 2017.
  • [8] William Fedus, Mihaela Rosca, Balaji Lakshminarayanan, Andrew M Dai, Shakir Mohamed, and Ian Goodfellow. Many paths to equilibrium: Gans do not need to decrease a divergence at every step. In International Conference on Learning Representations, pages 1–21, 2018.
  • [9] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pages 5767–5777, 2017.
  • [10] Ting Chen, Xiaohua Zhai, Marvin Ritter, Mario Lucic, and Neil Houlsby. Self-supervised gans via auxiliary rotation loss. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 12154–12163, 2019.
  • [11] Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. Self-attention generative adversarial networks. In International Conference on Machine Learning, pages 7354–7363, 2019.
  • [12] Haidar Khan and Bulent Yener. Learning filter widths of spectral decompositions with wavelets. In Advances in Neural Information Processing Systems, pages 4601–4612, 2018.
  • [13] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems, pages 6626–6637, 2017.
  • [14] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
  • [15] Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb, Kazuaki Yamamoto, and David Ha. Deep learning for classical japanese literature. arXiv:1812.01718, 2018.
  • [16] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning, volume 2, page 5, 2011.
  • [17] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. In International Conference on Learning Representations, pages 1–16, 2016.
  • [18] Kevin Roth, Aurelien Lucchi, Sebastian Nowozin, and Thomas Hofmann. Stabilizing training of generative adversarial networks through regularization. In Advances in neural information processing systems, pages 2018–2028, 2017.
  • [19] Lars Mescheder, Sebastian Nowozin, and Andreas Geiger. The numerics of gans. In Advances in Neural Information Processing Systems, pages 1825–1835, 2017.
  • [20] Han Zhang, Zizhao Zhang, Augustus Odena, and Honglak Lee. Consistency regularization for generative adversarial networks. In International Conference on Learning Representations, pages 7354–7363, 2020.
  • [21] Tu Nguyen, Trung Le, Hung Vu, and Dinh Phung. Dual discriminator generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2670–2680, 2017.
  • [22] Quan Hoang, Tu Dinh Nguyen, Trung Le, and Dinh Phung. Mgan: Training generative adversarial nets with multiple generators. In International Conference on Learning Representations, pages 1–24, 2018.
  • [23] Ngoc-Trung Tran, Viet-Hung Tran, Bao-Ngoc Nguyen, Linxiao Yang, et al. Self-supervised gan: Analysis and improvement with multi-class minimax game. In Advances in Neural Information Processing Systems, pages 13232–13243, 2019.
  • [24] Xinhan Di and Pengqian Yu. Multiplicative noise channel in generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 1165–1172, 2017.
  • [25] Guoqiang Zhong, Wei Gao, Yongbin Liu, Youzhao Yang, Da-Han Wang, and Kaizhu Huang. Generative adversarial networks with decoder-encoder output noises. Neural Networks, 2020.
  • [26] Tianyu Guo, Chang Xu, Boxin Shi, Chao Xu, and Dacheng Tao. Smooth deep image generator from noises. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 3731–3738, 2019.
  • [27] Ingrid Daubechies. The wavelet transform, time-frequency localization and signal analysis. IEEE Transactions on Information Theory, 36(5):961–1005, 1990.
  • [28] Qinghua Zhang and Albert Benveniste. Wavelet networks. IEEE transactions on Neural Networks, 3(6):889–898, 1992.
  • [29] Daniel WC Ho, Ping-Au Zhang, and Jinhua Xu. Fuzzy wavelet networks for function learning. IEEE Transactions on Fuzzy Systems, 9(1):200–211, 2001.
  • [30] Mehrnoosh Davanipoor, Maryam Zekri, and Farid Sheikholeslam. Fuzzy wavelet neural network with an accelerated hybrid learning algorithm. IEEE Transactions on Fuzzy Systems, 20(3):463–470, 2011.
  • [31] Salwa Said, Olfa Jemai, Salima Hassairi, Ridha Ejbali, Mourad Zaied, and Chokri Ben Amar. Deep wavelet network for image classification. In 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pages 000922–000927. IEEE, 2016.
  • [32] Pengju Liu, Hongzhi Zhang, Kai Zhang, Liang Lin, and Wangmeng Zuo. Multi-level wavelet-cnn for image restoration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 773–782, 2018.
  • [33] Yan Han, Baoping Tang, and Lei Deng. Multi-level wavelet packet fusion in dynamic ensemble convolutional neural network for fault diagnosis. Measurement, 127:246–255, 2018.
  • [34] Huai Su, Enrico Zio, Jinjun Zhang, Mingjing Xu, Xueyi Li, and Zongjie Zhang. A hybrid hourly natural gas demand forecasting method based on the integration of wavelet transform and enhanced deep-rnn model. Energy, 178:585–597, 2019.
  • [35] Tianshui Chen, Liang Lin, Wangmeng Zuo, Xiaonan Luo, and Lei Zhang. Learning a wavelet-like auto-encoder to accelerate deep neural networks. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
  • [36] Huiwu Luo, Yuan Yan Tang, Robert P Biuk-Aghai, Xu Yang, Lina Yang, and Yi Wang. Wavelet-based extended morphological profile and deep autoencoder for hyperspectral image classification. International Journal of Wavelets, Multiresolution and Information Processing, 16(03):1850016–1–29, 2018.
  • [37] Zhiyi He, Haidong Shao, Ping Wang, Janet Jing Lin, Junsheng Cheng, and Yu Yang. Deep transfer multi-wavelet auto-encoder for intelligent fault diagnosis of gearbox with few target training samples. Knowledge-Based Systems, 191:105313, 2020.
  • [38] Pradeep Kumar Mallick, Seuc Ho Ryu, Sandeep Kumar Satapathy, Shruti Mishra, Gia Nhu Nguyen, and Prayag Tiwari. Brain mri image classification for cancer detection using deep wavelet autoencoder-based deep neural network. IEEE Access, 7:46278–46287, 2019.
  • [39] Edmond Q Wu, Gui-Rong Zhou, Li-Min Zhu, Chuan-Feng Wei, He Ren, and Richard SF Sheng. Rotated sphere haar wavelet and deep contractive auto-encoder network with fuzzy gaussian svm for pilot’s pupil center detection. IEEE Transactions on Cybernetics, Eary Access:1–14, 2019.
  • [40] Raif Rustamov and Leonidas J Guibas. Wavelets on graphs via deep learning. In Advances in neural information processing systems, pages 998–1006, 2013.
  • [41] Jean-Luc Starck and Albert Bijaoui. Filtering and deconvolution by the wavelet transform. Signal processing, 35(3):195–211, 1994.
  • [42] David K Hammond, Pierre Vandergheynst, and Rémi Gribonval. Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis, 30(2):129–150, 2011.
  • [43] Thomas Wiatowski and Helmut Bölcskei. A mathematical theory of deep convolutional neural networks for feature extraction. IEEE Transactions on Information Theory, 64(3):1845–1866, 2017.
  • [44] JiaWei Ji, Ziqiang Zhang, Ding Kun, Ruixiao Zhang, and ZhiXin Ma. Research on gaussian-wavelet-type activation function of neural network hidden layer based on monte carlo method. In Proceedings of the 2019 International Conference on Robotics Systems and Vehicle Technology, pages 68–73, 2019.
  • [45] Hyeon Kyu Lee and Young-Seok Choi. Application of continuous wavelet transform and convolutional neural network in decoding motor imagery brain-computer interface. Entropy, 21(12):1199, 2019.
  • [46] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. In Advances in neural information processing systems, pages 2234–2242, 2016.
  • [47] Alexander A Alemi and Ian Fischer. Gilbo: one metric to measure them all. In Advances in Neural Information Processing Systems, pages 7037–7046, 2018.
  • [48] Takeru Miyato and Masanori Koyama. cgans with projection discriminator. In International Conference on Learning Representations, pages 1–23, 2018.