This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Global Image Sentiment Transfer

Jie An Department of Computer Science
University of Rochester
Rochester, NY, USA
Email: jan6@cs.rochester.edu
   Tianlang Chen Department of Computer Science
University of Rochester
Rochester, NY, USA
Email: tchen45@cs.rochester.edu
   Songyang Zhang Department of Computer Science
University of Rochester
Rochester, NY, USA
Email: szhang83@ur.rochester.edu
   Jiebo Luo Department of Computer Science
University of Rochester
Rochester, NY, USA
Email: jluo@cs.rochester.edu
   Jie An1, Tianlang Chen1, Songyang Zhang1 and Jiebo Luo1 1Department of Computer Science
University of Rochester
Rochester, NY, USA
Email: {jan6,tchen45,jluo}@cs.rochester.edu, szhang83@ur.rochester.edu
Abstract

Transferring the sentiment of an image is an unexplored research topic in the area of computer vision. This work proposes a novel framework consisting of a reference image retrieval step and a global sentiment transfer step to transfer sentiments of images according to a given sentiment tag. The proposed image retrieval algorithm is based on the SSIM index. The retrieved reference images by the proposed algorithm is more content-related against the algorithm based on the perceptual loss. Therefore can lead to a better image sentiment transfer result. In addition, we propose a global sentiment transfer step, which employs an optimization algorithm to iteratively transfer sentiment of images based on feature maps produced by the Densenet121 architecture. The proposed sentiment transfer algorithm can transfer the sentiment of images while ensuring the content structure of the input image intact. The qualitative and quantitative experiment demonstrates that the proposed sentiment transfer framework outperforms existing artistic and photorealistic style transfer algorithms in making reliable sentiment transfer results with rich, fine, and exact details.

I Introduction

Transferring the sentiment of an image is still an unexplored research topic. Comparing with the existing well-known tasks such as two-domain image-to-image translation [1, 2, 3, 4] (e.g. winter \rightarrow summer, cat \rightarrow dog) and image style transfer (e.g. artistic style transfer, photorealistic style transfer), image sentiment transfer focuses on modifying the image from a higher-level aspect to change it overall feeling to people. For example, without modifying the content, a family portrait can be transferred to be a more positive picture. The transferred one may give people a feeling of warmth and thus be more valuable to be kept. As we live in an age of pressures, we argue that this research topic is significant with its strong potential to decorate people’s life.

Intuitively, image sentiment is an abstract concept. Compared with the two-domain image-to-image translation that commonly has a definite pattern to accomplish the transfer between two domains (e.g. cat \rightarrow dog, horse \rightarrow zebra), there are enormous ways to transfer an image to a specific sentiment. To make the image transfer controllable, a reference image should be fed into the model as guidance. Considering its similarity to the image style transfer task, we can leverage existing image style transfer models to perform reference-guided image sentiment transfer. However, it is nontrivial to implement this design because of the poorer compatibility between the input image and the reference one for image sentiment transfer. Moreover, directly using existing artistic and photorealistic style transfer models generally fails to create visually pleasing results in terms of detail preservation and artifacts/distortions elimination. Compared with the image style transfer that an artistic/photorealistic style can be indiscriminately added to any input images, the sentiment transfer between two content-unrelated images is risky. Given the example of Fig. 1, the sentiment transfer result lakes photorealism due to the reference image does not bring any content-related reference information to the input image.

Considering this, we propose a high-performance image sentiment transfer framework that starts with image retrieval. Given an input image and a sentiment tag provided by the user, instead of randomly sampling a reference image that contains the input sentiment tag, we retrieve the most suitable reference image based on the structural information of the input image. Leveraging structural similarity (SSIM), the framework significantly constructs the content relation between the input and the reference image. In Section IV, we demonstrate that this image retrieval step is crucial to improve the performance of image sentiment transfer.

To transfer the sentiment of the retrieved reference image to the input image, we design a novel global image sentiment transfer algorithm. Inspired by the image style transfer algorithm by Gatys etal.etal. [5], we use an optimization algorithm on deep features by the neural network pre-trained on the ImageNet [6] dataset to iteratively transfer the sentiment of the reference image to the input image. Different from existing style transfer algorithms, our method adopts the Densenet121 architecture as the feature extractor instead of the widely-used VGG19 architecture. We emperically find that Densenet121 architecture outperforms VGG19 architecture in terms of fine detail preservation and artifacts/distortions elimination. Therefore is more suitable to make sentiment transfer where the produced image should be photorealistic.

Our main contributions are summarized as follows:

  • We are the first to explore the task of image sentiment transfer. We present an effective two-step framework for the task by image retrieval and reference-guided image sentiment transfer.

  • We introduce an effective reference image retrieval algorithm based on SSIM index, which can achieve better results compared with other methods in finding content-related reference images.

  • We propose a global sentiment transfer algorithm based on the Densenet121, which can transfer the sentiment/style of an image while preserving fine details of the image.

Refer to caption
Figure 1: Failed sentiment transfer case with a content-unrelated image as the reference image. The generated image is not photorealistic.

II Related Work

Visual sentiment understanding has been explored for many years. Most existing works focus on the visual sentiment classification tasks. To perform accurate classification for images with different sentiments, low-level features, like color [7, 8, 9], texture [9], and shape [10] has been studied in early years. Later on, mid-level composition [9], sentributes [11], principles-of-art features [12], high-level noun-adjective pairs (ANP) [13] are also been considered. Most recently, due to the rapid development of the convolution neural network (CNN) for extracting visual features, many approaches turn to work on the CNN-based sentiment recognition. Some of them working on the noisy data during the training process [14, 15, 16], while some of them exploring the visual sentiment in region level [17, 18, 19, 20, 21]. However, compared with image sentiment classification, the other sentiment-related fields such as image sentiment generation/translation has not been well studied yet.

The most related tasks to ours are the image-to-image translation and the image style transfer. Image-to-image translation targets at learning an image-to-image mapping from two different domains. Early approaches need paired data to train the model and are essentially restricted to learn the deterministic one-to-one mapping [22, 23, 24]. This disables the generation of diverse outputted images. CycleGAN [2] first proposes a cycle consistency loss to enable the model to be trained from the unpaired data. Following approaches like MUNIT [1] and DRIT [3] further propose disentangled representations that enable the outputted images to be diverse. On the other hand, our task is related to image style transfer. A great number of approaches are proposed for artistic style transfer [25, 26, 27, 28, 29] and photorealistic style transfer [30, 31, 32, 33, 34]. Different from the above approaches, we focus on image sentiment transfer that requires a strong content relation between the input and the reference image. Therefore, we search the reference image based on the sentiment tag provided by the users instead of directly asking users to provide it. The proposed global sentiment transfer algorithm is based on the work by Gatys etal.etal. [27]. However, our algorithm uses the Densenet121 [35] network architecture instead of the VGG19 [36] as the feature extractor since we empirically find that Densenet121 can achieve a more faithful input detail preservation compared with the VGG19.

Searching a reference image by a given sentiment tag is related to the image retrieval task. Image retrieval aims at finding an image that is close to the given image. Most recent works measure the perceptual loss of images by comparing the image features extracted from pre-trained convolution neural networks [37]. Images with more similar style effects generally have a low perceptual loss. Different from these works based on the perceptual loss, we find that the SSIM index [38] is more suitable to retrieve reference images for sentiment transfer since it mainly captures the similarity of images in terms of the content notions rather than focusing on stylization effects.

Refer to caption
Figure 2: Framework of the proposed algorithm. Our method consists of two parts: a reference image retrieve algorithm and a global sentiment transfer approach based on the retrieved reference image.
Refer to caption
Figure 3: Reference images retrieved by the proposed algorithm on VSO dataset. The retrieved reference images have the same content with the input image but contain the opposite sentiments.

III Method

To transfer the global sentiment of an image, we propose a method that consists of a reference image retrieve and global sentiment transfer step. Fig. 2 shows the framework of the proposed algorithm. Given an input image, we first retrieve a reference image according to the target sentiment tag. Then a global sentiment transfer algorithm is employed to transfer the sentiment of the input image to the reference. We describe the details of these two steps in the following part of this section.

III-A Reference Image Retrieve

Retrieving a reference image is the initial step to make sentiment transfer. Given an input image, the proposed retrieval method aims at finding a reference image according to a sentiment tag given by the user. To facilitate the following global sentiment transfer step, the retrieved image should have a similar content compared with the input image but contains the feelings of the target sentiment.

To achieve this, we propose an image retrieval algorithm based on the Visual Semantic Odometry (VSO) [39] dataset. For each image in the VSO dataset, a noun-adjective pair is attached with it to describe the semantic content and its sentiment respectively. To retrieve a reference image according to a given sentiment tag, we first select a subset of the VSO dataset, where every image within the subset contains the content tag (noun) of the input image but the sentiment tag (adjective) of the given target, For example, in Fig. 2, the input image has a noun-adjective pair of “Clear River” while images in the corresponding subset have a label of “Muddy River”.

To find a reference image from the selected target subset which has the most similar semantic structure with the input image, inspired by [32, 34], we use the Structural Similarity Index (SSIM) [40] between edge responses [41] of images to measure the semantic similarity between each image in the target subset and the input image. SSIM index is originally used by image/video quality assessment methods. We empirically find that SSIM is more suitable than the widely-used perceptual loss to measure the semantic similarity between two images. For every image in the target subset, we first compute the SSIM index between the evaluated and the input images and then pick-up the image with the highest SSIM index as the corresponding reference image to the input image.

Refer to caption
Figure 4: Illustration of the proposed global sentiment transfer algorithm. Here we use Densenet121 architecture as the backbone network.

III-B Global Sentiment Transfer

With the given input and a selected reference image that is structurally most similar to the input image, we propose a novel algorithm to transfer the sentiment of the input image according to the selected reference image. Our algorithm is based on an optimization method, which iteratively transfers the sentiment of images by minimizing two objectives on deep features. Here the first objective is to ensure the details of the input intact while the other one is to restrain the sentiment of the produced image similar with the reference image.

A high-quality sentiment transfer result should have a similar sentiment with the reference image while keeping the content details intact compared with the input image. The key challenge in sentiment transfer is to measure the sentiment similarity between two images. Inspired by Gatys et al. [42, 5], we adopt the Gram loss on deep features of the input and reference images produced by neural networks to measure the sentiment similarity. Such a Gram-based loss term is originally used to measure the style similarity. Since sentiment can be regarded as the abstract of the style, we borrow the Gram loss term to make sentiment transfer. Moreover, we compute the l2l_{2} norm between features of the transferred and input images as the content-consistency loss.

Refer to caption
Figure 5: Comparison between style transfer results and sentiment transfer results. The style transfer results are produced by the VGG19 architecture while the sentiment transfer images are generated based on the Densenet121.

The sentiment/style transfer results created by the above-mentioned loss terms heavily relies on deep features used to compute objectives. Those style transfer algorithms [42, 43, 44, 45, 31, 46, 47] all use features produced by the VGG19 network pre-trained on ImageNet dataset [6]. However, the optimization algorithm based on the feature maps of VGG19 can inevitably change details of the content image. Please take the style transfer results of Gatys et al. [43] shown in Fig. 5 for example. The style transfer algorithm based on features by VGG19 changes details of the sea, sky, and plants in the input image.

As illustrated in Fig. 4, the proposed algorithm adopts the Densenet121 network which is pre-trained on ImageNet dataset [6] as the feature extractor. We empirically find that Densenet121 can achieve sentiment transfer while avoiding the damage to the detail of the input content. Based on the Densenet121, We first get deep features of the input and the reference image produced by the ReLU layers behind each pooling operator. Here we use fsi,i{15}f_{s}^{i},\ i\in\left\{1...5\right\} and fti,i{15}f_{t}^{i},\ i\in\left\{1...5\right\} to denote feature maps of the input and reference image respectively, where ss denotes source while tt represents target. The use of Densenet121 has two main advantages: first, Densenet121 networks can ensure the content information intact while transferring sentiment from the reference to the input image. Second, Densenet121 has only half of the parameters of VGG19 (Densenet121: 6.952 v.s.v.s. VGG19: 12.945). Therefore it is more time-efficient in creating sentiment transfer results.

The overall loss functions we used is,

=αcontent+βsentiment,\mathcal{L}=\alpha\cdot\mathcal{L}_{content}+\beta\cdot\mathcal{L}_{sentiment}, (1)
content=f4fs42,\mathcal{L}_{content}=\|f^{4}-f^{4}_{s}\|_{2}, (2)
sentiment=15i=15Gram(fi)Gram(fti)2,\mathcal{L}_{sentiment}=\frac{1}{5}\cdot\sum_{i=1}^{5}\|\mathrm{Gram}\left(f^{i}\right)-\mathrm{Gram}\left(f_{t}^{i}\right)\|_{2}, (3)

where Gram(f)=fTf\mathrm{Gram}\left(f\right)=f^{T}\cdot f, fif^{i} denotes the feature maps by the transferred images in Densenet121. All the feature maps ff has the shape of C×(H×W)C\times\left(H\times W\right), where CC denotes the channel number while H,WH,W represent height and width of ff respectively. Please note that here the transferred image is the variable in optimization process, i.e.i.e., we iteratively alter an image to make its str look the same as the input image while having the sentiment of the reference image.

IV Experiment

In this section, we first discuss experimental settings. Then we compare the proposed image retrieval and global sentiment transfer algorithm against other image retrieval and image style transfer algorithms respectively. Finally, we demonstrate the effectiveness of the proposed algorithm by both visual and quantitative evaluation. All the source code and the trained model will be made available to the public.

TABLE I: Selected global sentiment datasets from VSO images.
Positive Sentiments Negative Sentiments
Warm home Dark room
Clear river Muddy water
Clear water Muddy river
Clear mountain Misty mountains
Scenic mountain Rough hill
Clear lake Misty lake
Lovely city Harsh landscape
Bright city Poor city
Great city

IV-A Experimental Settings

Global Sentiment Dataset.  To demonstrate the effectiveness of the proposed global sentiment transfer framework, we collect a few images from the VSO dataset, which have global sentiments. For example, an image with the description of “beautiful bird” should not be selected since “bird” is only a regional object. Table. I shows the noun-adjective pairs of the selected subsets we used in our experiment. We employ nine subsets with positive sentiments and eight subsets with negative sentiments.

Global Sentiment Transfer Settings.  To transfer the sentiment of the selected reference image to the input image, we propose an optimization-based iterative method. As stated above, we use a Densenet121 pre-trained on the ImageNet dataset as the feature extractor. In optimization, we use Adam algorithm [48] to make optimization on the input image and the retrieved reference image. To balance the content and sentiment loss terms, we set α=1\alpha=1 and β=1,000,000\beta=1,000,000 in Equation 1. To get the sentiment transfer result for each input-reference pair, we run the optimization method for 500 iterations.

IV-B Reference Image Retrieve

Fig. 6 shows the image retrieval results based on the perceptual loss and SSIM index. Since the perceptual loss mainly measures the similarity between two images in terms of style effects, the retrieved image based on the perceptual loss generally has a similar sentiment/style but distinct content notations. On the contrary, the retrieved image produced by the algorithm based on the SSIM index contains the same content but a different sentiment compared with the input image. Please take Fig. 6 for example. The retrieved image by the perceptual loss (b) has the same blue style compared with the input image (a). However, the content structure of (a) and (b) is completely different. On the contrary, the input (c) and the retrieved image by SSIM index (b) has a similar content structure. Generally, the global sentiment transfer algorithm would generate a better result if the reference image has a more similar content structure compared with the input image. Therefore, Fig. 6 demonstrates that the proposed image retrieval method based on the SSIM index has a better performance compared with the algorithm based on the perceptual loss.

Refer to caption
Figure 6: Comparison between the proposed image retrieval method based on SSIM index and the image retrieval method based on the perceptual loss. The proposed retrieval method focuses on finding images with most relevant content while the algorithm based on the perceptual loss computes the distance between images mainly in terms of the style similarity.

IV-C Visual Comparison

Refer to caption
Figure 7: Visual comparison between the results produced by the proposed global sentiment transfer algorithm and the state-of-the-art universal style transfer algorithms. All the compared results are produced by running the officially-released code of the corresponding algorithm.
TABLE II: Comparison of mean SSIM score on the validation set. Higher SSIM score means better detail preservation ability.
Method      Gatys etal.etal. [5]      WCT [49]      AdaIN [26]      StyleNAS [34] Ours
SSIM\uparrow 0.7019 0.2443 0.5301 0.6653 0.8719

Since this work is the first global sentiment transfer algorithm for the arbitrary input image, we compare the result produced by our algorithm with state-of-the-art artistic [50, 26, 49] and photorealistic [34] style transfer algorithms to demonstrate the effectiveness of the proposed global sentiment transfer method. Other photorealistic style transfer algorithm such as [31, 30, 32] are not compared since these methods need a segmentation map or post process to assist the style transfer. Such pre or post processing steps are not needed by ours and the compared methods. To make a fair comparison, each compared style transfer algorithm adopts the retrieved image produced by our image retrieval algorithm based on the SSIM index as the reference image. Fig. 7 shows the sentiment transfer results of our method and style transfer results by the state-of-the-art universal style transfer algorithms. The results by artistic style transfer algorithms (e.g.e.g. StyleSwap[50], WCT [26], AdaIN [49]) usually has distorted content details, which is necessary to create artistic feelings but is not favored to make faithful sentiment transfer. The photorealistic style transfer algorithm [34] can preserve the content information. However, it may create significant artifacts in images. Please take Fig. 7 (f) for example. Fig. 7 (g) shows the result produced by our global sentiment transfer algorithm. The results by our method successfully achieve the sentiment transfer while ensuring the content details intact. Moreover, the produced results have significantly fewer artifacts compared with the state-of-the-art photorealistic style transfer algorithms.

IV-D Quantitative Comparison

We quantitatively demonstrate the effectiveness of the proposed algorithm by computing the SSIM index between the input image and the produced result. Here the SSIM score measures the ability of the compared algorithm to preserve fine details of the content. We collect 46 input-reference image pairs to form a validation set. We get the sentiment/style transfer results with all the compared algorithms on this validation set. Table. II shows the mean SSIM score of the compared algorithm on this validation set. The proposed global sentiment transfer algorithm has a higher mean SSIM score compared with other style transfer methods, which demonstrates that our method has a stronger ability to preserve fine details of the input.

V Conclusion

In this work, we propose a high-performance global image sentiment transfer framework consisting of a reference image retrieval step and a global sentiment transfer step. In reference image retrieval step, we adopt the SSIM index instead of the widely-used perceptual loss to measure the structure distance of images, which can capture content similarity of images rather than style effects. In the global sentiment transfer step, we use the Densenet121 network pre-trained on the ImageNet dataset as the feature extractor and employ an image style transfer framework to iteratively transfer sentiment based on features produced by the Densenet121 architectures. Our qualitatively and quantitatively experiment demonstrates that the proposed algorithm outperforms existing style transfer algorithms in terms of sentiment transfer effects and input detail preservation.

References

  • [1] X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” in ECCV, 2018.
  • [2] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in ICCV, 2017.
  • [3] H.-Y. Lee, H.-Y. Tseng, J.-B. Huang, M. Singh, and M.-H. Yang, “Diverse image-to-image translation via disentangled representations,” in ECCV, 2018.
  • [4] H. Tang, D. Xu, G. Liu, W. Wang, N. Sebe, and Y. Yan, “Cycle in cycle generative adversarial networks for keypoint-guided image generation,” in ACM MM, 2019.
  • [5] L. A. Gatys, A. S. Ecker, and M. Bethge, “A neural algorithm of artistic style,” arXiv preprint arXiv:1508.06576, 2015.
  • [6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012.
  • [7] X. Alameda-Pineda, E. Ricci, Y. Yan, and N. Sebe, “Recognizing emotions from abstract paintings using non-linear matrix completion,” in CVPR, 2016.
  • [8] A. Sartori, D. Culibrk, Y. Yan, and N. Sebe, “Who’s afraid of itten: Using the art theory of color combination to analyze emotions in abstract paintings,” in ACM MM, 2015.
  • [9] J. Machajdik and A. Hanbury, “Affective image classification using features inspired by psychology and art theory,” in ACM MM, 2010.
  • [10] X. Lu, P. Suryanarayan, R. B. Adams Jr, J. Li, M. G. Newman, and J. Z. Wang, “On shape and the computability of emotions,” in ACM MM, 2012.
  • [11] J. Yuan, S. Mcdonough, Q. You, and J. Luo, “Sentribute: image sentiment analysis from a mid-level perspective,” in Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining, 2013.
  • [12] S. Zhao, Y. Gao, X. Jiang, H. Yao, T.-S. Chua, and X. Sun, “Exploring principles-of-art features for image emotion recognition,” in ACM MM, 2014.
  • [13] D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang, “Large-scale visual sentiment ontology and detectors using adjective noun pairs,” in ACM MM, 2013.
  • [14] J. Yang, D. She, Y.-K. Lai, and M.-H. Yang, “Retrieving and classifying affective images via deep metric learning,” in AAAI, 2018.
  • [15] J. Yang, D. She, and M. Sun, “Joint image emotion classification and distribution learning via deep convolutional neural network.” in IJCAI, 2017.
  • [16] Q. You, J. Luo, H. Jin, and J. Yang, “Robust image sentiment analysis using progressively trained and domain transferred deep networks,” in AAAI, 2015.
  • [17] J. Yang, D. She, Y.-K. Lai, P. L. Rosin, and M.-H. Yang, “Weakly supervised coupled networks for visual sentiment analysis,” in CVPR, 2018.
  • [18] K. Song, T. Yao, Q. Ling, and T. Mei, “Boosting image sentiment analysis with visual attention,” Neurocomputing, 2018.
  • [19] S. Zhao, Z. Jia, H. Chen, L. Li, G. Ding, and K. Keutzer, “Pdanet: Polarity-consistent deep attention network for fine-grained visual emotion regression,” in ACM MM, 2019.
  • [20] T. Rao, X. Li, H. Zhang, and M. Xu, “Multi-level region-based convolutional neural network for image emotion classification,” Neurocomputing, 2019.
  • [21] Q. You, H. Jin, and J. Luo, “Visual sentiment analysis by attending on local image regions,” in AAAI, 2017.
  • [22] L. Karacan, Z. Akata, A. Erdem, and E. Erdem, “Learning to generate images of outdoor scenes from attributes and semantic layouts,” arXiv preprint arXiv:1612.00215, 2016.
  • [23] P. Sangkloy, J. Lu, C. Fang, F. Yu, and J. Hays, “Scribbler: Controlling deep image synthesis with sketch and color,” in CVPR, 2017.
  • [24] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in CVPR, 2017.
  • [25] J. Liao, Y. Yao, L. Yuan, G. Hua, and S. B. Kang, “Visual attribute transfer through deep image analogy,” arXiv preprint arXiv:1705.01088, 2017.
  • [26] X. Huang and S. J. Belongie, “Arbitrary style transfer in real-time with adaptive instance normalization,” in ICCV, 2017.
  • [27] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in CVPR, 2016.
  • [28] L. A. Gatys, A. S. Ecker, M. Bethge, A. Hertzmann, and E. Shechtman, “Controlling perceptual factors in neural style transfer,” in CVPR, 2017.
  • [29] D. Kotovenko, A. Sanakoyeu, S. Lang, and B. Ommer, “Content and style disentanglement for artistic style transfer,” in ICCV, 2019.
  • [30] Y. Li, M.-Y. Liu, X. Li, M.-H. Yang, and J. Kautz, “A closed-form solution to photorealistic image stylization,” in ECCV, 2018.
  • [31] F. Luan, S. Paris, E. Shechtman, and K. Bala, “Deep photo style transfer,” in CVPR, 2017.
  • [32] J. Yoo, Y. Uh, S. Chun, B. Kang, and J.-W. Ha, “Photorealistic style transfer via wavelet transforms,” in ICCV, 2019.
  • [33] S. Bae, S. Paris, and F. Durand, “Two-scale tone management for photographic look,” in ACM Transactions on Graphics, 2006.
  • [34] J. An, H. Xiong, J. Huan, and J. Luo, “Ultrafast photorealistic style transfer via neural architecture search,” in AAAI, 2020.
  • [35] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in CVPR, 2017.
  • [36] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  • [37] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in CVPR, 2018.
  • [38] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” TIP, 2004.
  • [39] K.-N. Lianos, J. L. Schonberger, M. Pollefeys, and T. Sattler, “Vso: Visual semantic odometry,” in ECCV, 2018.
  • [40] M.-J. Chen and A. C. Bovik, “Fast structural similarity index algorithm,” Journal of Real-Time Image Processing, 2011.
  • [41] S. Xie and Z. Tu, “Holistically-nested edge detection,” in ICCV, 2015.
  • [42] L. Gatys, A. S. Ecker, and M. Bethge, “Texture synthesis using convolutional neural networks,” in NeurIPS, 2015.
  • [43] L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in CVPR, 2016.
  • [44] L. A. Gatys, M. Bethge, A. Hertzmann, and E. Shechtman, “Preserving color in neural artistic style transfer,” arXiv preprint arXiv:1606.05897, 2016.
  • [45] E. Risser, P. Wilmot, and C. Barnes, “Stable and controllable neural texture synthesis and style transfer using histogram losses,” arXiv preprint arXiv:1701.08893, 2017.
  • [46] S. Li, X. Xu, L. Nie, and T.-S. Chua, “Laplacian-steered neural style transfer,” in ACM MM, 2017.
  • [47] Y. Li, N. Wang, J. Liu, and X. Hou, “Demystifying neural style transfer,” arXiv preprint arXiv:1701.01036, 2017.
  • [48] D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [49] Y. Li, C. Fang, J. Yang, Z. Wang, X. Lu, and M.-H. Yang, “Universal style transfer via feature transforms,” in NeurIPS, 2017.
  • [50] T. Q. Chen and M. Schmidt, “Fast patch-based style transfer of arbitrary style,” arXiv preprint arXiv:1612.04337, 2016.