This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Joint Depth Estimation and Mixture of Rain Removal From a Single Image

Yongzhen Wang, Xuefeng Yan, Yanbiao Niu, Lina Gong, Yanwen Guo, Member, IEEE,
and Mingqiang Wei, Senior Member, IEEE
Y. Wang, X. Yan, Y. Niu, and L. Gong are with the School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China (e-mail: wangyz@nuaa.edu.cn, yxf@nuaa.edu.cn, niuyanbiao@126.com, gonglina@nuaa.edu.cn).Y. Guo is with the National Key Lab for Novel Software Technology, Nanjing University, Nanjing 210000, China (e-mail: ywguo@nju.edu.cn). M. Wei is with the Shenzhen Research Institute, Nanjing University of Aeronautics and Astronautics, Shenzhen 518038, China (e-mail: mingqiang.wei@gmail.com).
Abstract

Rainy weather significantly deteriorates the visibility of scene objects, particularly when images are captured through outdoor camera lenses or windshields. Through careful observation of numerous rainy photos, we have found that the images are generally affected by various rainwater artifacts such as raindrops, rain streaks, and rainy haze, which impact the image quality from both near and far distances, resulting in a complex and intertwined process of image degradation. However, current deraining techniques are limited in their ability to address only one or two types of rainwater, which poses a challenge in removing the mixture of rain (MOR). In this study, we propose an effective image deraining paradigm for Mixture of rain REmoval, called DEMore-Net, which takes full account of the MOR effect. Going beyond the existing deraining wisdom, DEMore-Net is a joint learning paradigm that integrates depth estimation and MOR removal tasks to achieve superior rain removal. The depth information can offer additional meaningful guidance information based on distance, thus better helping DEMore-Net remove different types of rainwater. Moreover, this study explores normalization approaches in image deraining tasks and introduces a new Hybrid Normalization Block (HNB) to enhance the deraining performance of DEMore-Net. Extensive experiments conducted on synthetic datasets and real-world MOR photos fully validate the superiority of the proposed DEMore-Net. Code is available at https://github.com/yz-wang/DEMore-Net.

Index Terms:
DEMore-Net, Image deraining, Depth estimation, Mixture of rain, Joint learning, Hybrid normalization block

I Introduction

Rain, being one of the most ubiquitous weather phenomena, unavoidably induces conspicuous image quality degradation and hinders the performance of various outdoor vision systems [1, 2, 3]. The visibility of the captured scenes is significantly deteriorated by the presence of rain streaks in the atmosphere, as well as the detrimental effect of raindrops that fall onto the camera lens. Moreover, the accumulation of minuscule raindrops in the distant background also engenders a haze-like effect, further exacerbating the visual degradation. This intricate and tangled process of image deterioration, resulting from these three forms of rainwater, is a captivating phenomenon and we designate it as a Mixture of Rain (MOR).

Image deraining aims at restoring the sharp images from their rainy counterparts, which is a challenging ill-posed problem. Existing deraining techniques can roughly fall into two categories: rain streak removal and raindrop removal. Early rain streak removal wisdom commonly exploits hand-crafted priors with empirical observations, such as sparse coding [4], low-rank representation [5], and Gaussian Mixture Model (GMM) [6]. Notwithstanding the commendable outcomes achieved by the prior-based deraining algorithms, their capacity to model and remove rain is restricted. The advent of deep learning techniques in the field of image enhancement [7, 8] has led to a proliferation of learning-based approaches for rain streak and raindrop removal [9, 10, 11, 12, 13, 14]. Despite producing promising results on various datasets, most of these approaches only focus on addressing one type of rainwater, thereby being insufficient for effectively handling the MOR challenge.

Recently, this issue has been picked up by several learning-based deraining studies [15, 16, 17, 18]. These algorithms argue that more than one type of rainwater is present in rainy images and attempt to remove them in one go. Hu et al. [15] first observed that rainy images are composed of rain streaks and rainy haze, which often co-occur during image capture. To address this issue, they develop DAF-Net to remove rain streaks and haze simultaneously. Afterward, several deraining paradigms have been proposed for removing both forms of rainwater artifacts with some success. However, since these efforts only consider two types of rainwater and ignore the MOR effect, they cannot well address the challenging MOR problems. To our knowledge, there is presently only one deraining model that considers the removal of all three forms of rainwater, namely MBA-RainGAN [18]. Despite achieving satisfactory results in some rainy scenarios, the deraining capacity of MBA-RainGAN is still limited since it ignores the fact that depth information is quite important, which can provide additional beneficial information to guide the network to remove different forms of rainwater. As illustrated in Fig. 1, compared with the state-of-the-art (SOTA) deraining algorithms, the proposed DEMore-Net with the guidance information of the scene depth produces a much clearer and perceptually more pleasing derained result.

Refer to caption
Figure 1: Image derained results on a MOR image from the RainCityscapes++ [18] dataset. From (a) to (h): (a) the input rainy image, and the deraining results of (b) CMFNet [19], (c) Syn2Real [20], (d) CCN [17], (e) CCN [17] + SGID [21] (cascading combination method), (f) MBA-RainGAN [18], (g) our DEMore-Net, respectively, and (h) the ground truth.

This work goes beyond previous image deraining techniques by considering the importance of depth information and developing a unified deraining paradigm called DEMore-Net that integrates depth estimation and image deraining. DEMore-Net leverages a joint learning framework to simultaneously perform image deraining and depth estimation tasks and encourages them to collaborate and promote each other. The estimated depth map serves as a guide to removing different forms of rainwater in a targeted manner depending on the distance, while the image deraining module helps the depth estimation network learn better depth estimation. Additionally, a novel Hybrid Normalization Block (HNB) is proposed to advance the stability of the network training and enhance its learning and generalization capacities, thereby helping the model in tackling the intractable MOR problem. Moreover, to further advance the MOR removal capacity of the model, an efficient feature enhancement network (self-calibrated convolutions [22]) is introduced in the design of our DEMore-Net to generate more discriminative feature representations. Extensive experiments on synthetic MOR testbeds (RainCityscapes++ [18]) and real-world MOR photos demonstrate that our DEMore-Net outperforms the SOTA image deraining algorithms significantly.

Overall, the main contributions are summarized as follows:

  • A novel unified deraining paradigm is proposed for the removal of mixture of rain, called DEMore-Net, which combines depth estimation and image deraining tasks by a joint learning framework. DEMore-Net is trained in an end-to-end fashion to simultaneously learn about depth estimation and MOR removal, thus encouraging the two sub-tasks to benefit from each other.

  • We propose a novel Hybrid Normalization Block (HNB) to enhance the stability of the network during the training phase and facilitate its learning and generalization capacities for better recovery of MOR images.

  • We compare the proposed DEMore-Net with 14 state-of-the-art image deraining approaches through considerable experiments. The results are evaluated in terms of full- and no-referenced image quality assessments, visual quality, and human subjective surveys. As observed, the proposed DEMore-Net achieves SOTA performance on both synthetic MOR testbeds and real MOR photos.

The subsequent sections of this paper are structured as follows. In Section II, a brief review of the relevant literature is provided, which is divided into two categories: single-type rain removal and multi-type rain removal approaches. Section III presents a detailed overview of DEMore-Net, which is proposed for removing the mixture of rain. Section IV describes the implemented experiments and discusses the results, followed by conclusions in Section V.

II Related Work

In this section, we briefly categorize the discussion into two aspects: single-type rain removal and multi-type rain removal algorithms.

II-A Single-Type Rain Removal

Rain streak removal. Conventional rain streak removal efforts resort to exploiting hand-crafted priors based on image statistics to restore the rainy images [4, 5, 6, 23, 24]. Kang et al. [4] employ dictionary learning and sparse coding to decompose the rainy images into different components and then reconstruct the clean images. Chen et al. [5] develop a generalized low-rank appearance model to remove rain streaks from images. Similarly, Chang et al. [23] exploit a low-rank image decomposition model to restore the rain-free images from their rainy versions. Zhu et al. [24] propose a joint optimization framework with three image priors for rain streak removal from rainy images.

Recently, learning-based approaches have demonstrated their superiority for rain streak removal [9, 10, 25, 26, 27]. Yang et al. [9] develop a multi-task framework to detect and remove rain streaks jointly. Zhang et al. [10] create a density-aware image deraining network to simultaneously predict rain density and remove rain streaks. Li et al. [25] develop a recurrent squeeze-and-excitation contextual dilated network for single image deraining and achieve very promising results. Zhang et al. [26] exploit a conditional GAN-based model with additional regularization for rain streaks removal. Ahn et al. [27] develop a two-step rain removal method that first estimates the rain density and rain streak intensity and then remove them.

Raindrop removal. Since most rain streak removal algorithms cannot be directly applied to raindrop removal, many approaches have been proposed for raindrop detection and removal [28, 29, 12, 30, 13, 14]. Kurihata et al. [28] employ the well-known PCA to learn the shape of raindrops and attempt to match the rainy regions. Roser et al. [29] propose an algorithm for monocular raindrop detection in single images. Eigen et al. [31] develop a convolutional network to remove raindrops from a single image. Latter, You et al. [32] exploit Spatio-temporal information for video raindrop removal. More recently, Qian et al. [12] propose an attentive GAN-based network for single-image raindrop removal. Quan et al. [30] create a CNN-based network to remove raindrops by using shape-driven attention and channel re-calibration. Yan et al. [14] propose a two-stage video-based raindrop removal approach that first employs a single image module to produce the initial clean results and then refine them by using temporal constraints.

II-B Multi-Type Rain Removal

Very recently, several image deraining approaches [15, 16, 33, 17, 18] have noticed the presence of more than one type of rainwater artifact in rainy images and attempted to remove them in one go. Hu et al. [15] first establish a new dataset with rain streaks and rainy haze, and then develop an end-to-end model to remove them simultaneously. Guo et al. [16] propose an integrated multi-task framework to handle the joint raindrop and haze removal problem by combining the atmospheric scattering model and deep neural network. Zhang et al. [33] create a dual branch neural network for removing both rain streaks and raindrops. Quan et al. [17] propose an effective cascaded network for removing raindrops and rain streaks simultaneously via using the neural architecture search algorithm. Shen et al. [18] develop a multi-branch attention GAN-based framework to simultaneously remove rain streaks, raindrops, and rainy haze, which is the only study that considers the MOR effect among existing rain removal algorithms. However, these methods either ignore the MOR effect or do not exploit the depth information of the scene, which can be regarded as important prior knowledge to guide the network in removing different types of rainwater artifacts, thus limiting their rain removal capacity in these complex scenarios.

III DEMore-Net

As the scene depth can accurately portray the different forms of rainwater artifacts according to distance, which can be used as important prior knowledge to guide the network to remove these artifacts in a targeted manner. Unfortunately, prevailing image deraining methods tend to overlook this significant guidance information, thus limiting their deraining performance in various scenarios. To address this limitation, we propose a unified and efficient deraining paradigm that seamlessly integrates the depth estimation and image deraining tasks for MOR removal, termed DEMore-Net. In the following, we first present the imaging model of MOR images, which is degraded by three forms of rainwater. Next, the overview of DEMore-Net is elaborated to reveal how we address the entangled MOR removal problem. Subsequently, we describe the network architecture of the MOR removal and depth estimation networks, showcasing how they operate in our unified deraining paradigm. Finally, the proposed Hybrid Normalization Block (HNB) is introduced for better restoring the MOR images.

Refer to caption
Figure 2: Overview of DEMore-Net. It is composed of three components: a generator composed of a MOR removal network and a depth-guided non-local (DGNL) module, a depth estimation network, and a discriminator. DRB refers to dilated residual blocks, HNB refers to Hybrid Normalization Block, and SC Conv denotes self-calibrated convolutions.

III-A MOR Imaging Model

Rain Streak Model. We adopt the widely used rain model [34] to define a rain streak degraded image RsR_{s} as a superposition of the clean background BB and the accumulated rain streak layer SS, and the observed Rs(x)R_{s}(x) at pixel xx can be expressed as follows:

Rs(x)=B(x)+S(x),R_{s}(x)=B(x)+S(x), (1)

where

S(x)=Spattern (x)tr(x).S(x)=S_{\text{pattern }}(x)\odot t_{r}(x). (2)

In the above equations, Spattern (x)[0,1]S_{\text{pattern }}(x)\in[0,1] denotes the intensity of uniformly-distributed rain streaks in the image; \odot refers to the pixel-wise multiplication; and tr(x)t_{r}(x) denotes the intensity map of rain streak, which depends on the value of the scene depth d(x)d(x) and can be expressed as:

tr(x)=eαd(x),t_{r}(x)=e^{-\alpha d(x)}, (3)

where α\alpha denotes the attenuation coefficient, which controls the intensity of the rain streaks. Specifically, objects close to the camera lens are mainly affected by the raindrops and the rain streak intensity is smaller here. As the objects are far from the camera, rain streak intensity gradually increases to its maximum intensity and then falls to zero with the increase of d(x)d(x). At this point, the image is mainly degraded by the rainy haze.

Raindrop Model. A raindrop degraded image RdR_{d} can be modeled as a combination of the clean background BB as well as the impact of the raindrops DD, and the observed Rd(x)R_{d}(x) can be expressed as [12]:

Rd(x)=(1M(x))B(x)+D(x),R_{d}(x)=(1-M(x))\odot B(x)+D(x), (4)

where M(x){1,0}M(x)\in{\{1,0\}} denotes whether the pixel xx is corrupted by raindrops or not, and D(x)[0,1]D(x)\in[0,1] refers to the raindrop layer.

Rainy Haze Model. Unlike rain streaks, the visual intensity of rainy haze increases exponentially with the scene depth d(x)d(x), we adopt the standard atmosphere scattering model [35] to simulate the image degradation process caused by rainy haze, and the observed hazy image Rh(x)R_{h}(x) at pixel xx can be formulated as:

Rh(x)=B(x)t(x)+A(1t(x)),R_{h}(x)={B}(x)t(x)+{A}(1-t(x)), (5)

where

t(x)=eβd(x).t(x)=e^{-\beta d(x)}. (6)

In the above equations, AA represents the global atmosphere light, t(x)t(x) denotes the atmospheric transmission, and β\beta denotes the atmosphere scattering parameter. Note that a larger β\beta suggests a thicker rainy haze and vice versa.

MOR Model. In rainy weather, especially under heavy rain conditions, raindrops, rain streaks, and rainy haze often appear simultaneously during outdoor image capture. Therefore, a MOR image Rmor(x)R_{mor}(x) can be formulated as follows:

Rmor(x)=((1M(x))(B(x)+S(x))+A^D(x))tx+(1tx)A^,\begin{split}R_{mor}(x)=&((1-M(x))\odot(B(x)+S(x))+\hat{A}D(x))t_{x}\\ &+(1-t_{x})\hat{A},\end{split} (7)

where A^\hat{A} refers to the light conditions and color cast from active light sources. Unlike the fixed global atmosphere light AA, A^\hat{A} enables us to model the nighttime MOR images, which expands the application scenarios of the MOR imaging model. Note that rain streaks and rainy haze may also change the illumination conditions, which in turn affects the transparency of raindrops during image capture. That is, these three rainwater artifacts interact with each other to form the final MOR effect, which is a quite complex and entangled process.

III-B Overview of DEMore-Net

As observed in Eq. 7, scene depth plays a crucial role in the degradation process of MOR images. To fully exploit the informative guidance provided by the scene depth, we present a highly effective and unified deraining paradigm called DEMore-Net. As illustrated in Fig. 2, DEMore-Net consists of two principal modules: the MOR removal sub-network and the depth estimation sub-network. In this way, the joint learning framework enables the MOR removal and depth estimation tasks to collaborate and benefit from each other. DEMore-Net operates in an end-to-end fashion, taking a MOR photo as input and producing the corresponding MOR-free image and depth map as outputs. To produce clearer and more realistic MOR-free images, we introduce the adversarial training strategy in our MOR removal sub-network.

The pipeline of DEMore-Net can be described in detail as follows: First, the input MOR image undergoes preliminary image restoration and depth estimation tasks through both the MOR removal sub-network and the depth estimation sub-network. Then, the predicted depth map is utilized by the MOR removal sub-network to provide guidance for better restoring the MOR image. To achieve this, a depth-guided non-local (DGNL) [36] module is employed to fuse the depth information with the coarsely restored features, resulting in a refined MOR-free image. Finally, a discriminator is employed to evaluate the authenticity of the restored MOR-free image, determining whether it is a real clean image or a fake image generated by the MOR removal network. This allows for the generation of more realistic images.

III-C Network Architecture

Overall, DEMore-Net comprises three components: 1) a depth estimation sub-network that employs an encoder-decoder network with multiple feature enhancement modules to produce the depth map; 2) a MOR removal sub-network that produces the MOR-free image using the estimated depth map as additional guidance; and 3) a discriminator that utilizes adversarial training strategies to improve the quality of the restored MOR-free images.

Depth Estimation Network. Inspired by the success of encoder-decoder architecture and ResNet in image restoration field [12, 37, 38], we exploit an encoder-decoder network as our depth estimation network and introduce an up-to-date multi-scale feature extraction network (self-calibrated convolutions [22]) to further improve the prediction accuracy of depth maps. Self-calibrated convolution module is an improved convolutional network, which can construct long-range spatial and inter-channel dependencies for each spatial location in the feature space, thereby enriching the output features and helping CNNs generate more discriminative representations. Herein, we employ self-calibrated convolutions as our feature enhancement module to improve the depth estimation network’s performance.

As depicted in Fig. 2, the depth estimation network adopts 11 convolutional blocks to extract features from the input MOR image and then output the depth map in an end-to-end manner. For the first 10 convolution blocks, each block contains a convolutional operation, a residual block, a hybrid normalization, a Scaled Exponential Linear Unit (SELU), and a self-calibrated convolution module. Note that all the normalization methods used in this work are the proposed hybrid normalization block, which will be described in the next section. After extracting the image features, we adopt a convolutional operation, and a sigmoid activation function (i.e., 11th11^{th} convolutional block) to output the final depth map in a supervised manner. Here, we adopt the simple L1L_{1} loss to train the depth estimation network, which can be expressed as:

L1(D)=1WHw=1Wh=1H|D(x)w,hy^w,h|1,L_{1}(D)=\frac{1}{WH}\sum_{w=1}^{W}\sum_{h=1}^{H}\left|D(x)^{w,h}-\hat{y}^{w,h}\right|_{1}, (8)

where D(x)D(x) is the predicted depth map, y^\hat{y} is the ground truth depth map; WW and HH denote the width and height of the depth map, respectively.

MOR Removal Network. Considering that adversarial training strategies can encourage the generated images close to the clean images (i.e., ground truth) and make them more realistic, we built the MOR removal sub-network based on a standard generative adversarial network. As demonstrated in Fig. 2, the proposed generator consists of a convolution head to extract image features, a DenseNet-shape body to extract deeper features and restore the clean features simultaneously, and a 1×\times1 convolution tail to convert channel numbers and output the restored clean image features. The DenseNet-shape body is a stack of dilated residual blocks (DRB) with skip-connections, which contains two 3×\times3 dilated convolutions and a non-linear ReLU layer. Although applying dilated convolutions can enlarge the fields of view of the model without adding additional computation, it may easily cause the well-known gridding problem. To address this issue, we follow [36] to combine dilated convolutions with conventional convolutions and set the dilation rates of these 11 DRBs to {1,1,2,2,4,8,4,2,2,1,1}\left\{1,1,2,2,4,8,4,2,2,1,1\right\}. Finally, the depth-guided non-local (DGNL) module is employed to fuse the predicted clean image features with the depth map, thus helping the model to better remove different types of rainwater according to the distance.

Refer to caption
Figure 3: The architecture of the depth-guided non-local network.

The function of the discriminator is to determine whether an input image is a real MOR-free image or a fake image produced by the restoration network, thereby encouraging the generator to generate high-quality images. For adversarial training, we adopt 3 convolutional layers and a linear layer to construct our discriminator. Considering that the Least-Squares GAN (LSGAN) loss is more effective in improving the training stability than vanilla GAN loss, we employ the LSGAN to train our MOR removal network. The definition of adversarial loss can be formulated as:

Ladv(G)=EG(x)Pfake[(D(G(x))1)2],\displaystyle L_{adv}(G)=E_{G(x)\sim P_{fake}}[(D(G(x))-1)^{2}], (9)
Ladv(D)=EyPreal[(D(y)1)2]+EG(x)Pfake[(D(G(x)))2],\begin{split}L_{adv}(D)=&E_{y\sim P_{real}}[(D(y)-1)^{2}]\\ &+E_{G(x)\sim P_{fake}}[(D(G(x)))^{2}],\end{split} (10)

where yy refers to the ground truth images and G(x)G(x) represents the restored MOR-free images.

Depth-Guided Non-Local Module. The depth-guided non-local (DGNL) module [36] is an effective feature fusion network, which is employed to build the relationship between each pair of spatial locations based on the depth map and hence enriches the output features. In addition, as mentioned previously, both the type and visual intensity of rainwater artifacts in MOR images depend on scene depth, as does the process of MOR removal. Therefore, we adopt the DGNL module to fuse the depth information with the restored clean image features to help our DEMore-Net better recover the MOR-free images.

As depicted in Fig. 3, the DGNL module is extended based on the Non-local neural network [39] and aggregated the depth information as additional guidance to enrich the final output features. In light of this, we regard the depth information as a piece of prior knowledge and leverage the DGNL module to fuse the depth map with restored image features, thus helping DEMore-Net to remove different types of rainwater in a depth-guided manner. After fusing the depth and image features, we employ two convolutional operations to enlarge the feature map and output the final MOR-free images.

Refer to caption
Figure 4: The architecture of the proposed hybrid normalization block (HNB). In our designation, Batch Normalization and Instance Normalization are adopted to preserve content-related features while learning features that are invariant to appearance changes, thus enhancing the MOR removal capacity of the network.

III-D Hybrid Normalization Block

Normalization has become a crucial component in various high-level vision tasks but is rarely used in low-level vision tasks, especially Batch Normalization (BN). In fact, the work [40] has demonstrated that Instance Normalization (IN) can learn features that are not affected by appearance changes, such as brightness, colors, and styles, whilst BN is important to preserve content-related features. Both BN and IN are essential as they can promote each other to benefit the vision tasks. Inspired by this, we carefully integrate BN and IN as building blocks and develop a Hybrid Normalization Block (HNB) to advance the network performance in MOR removal tasks. Specifically, HNB leverages both BN and IN in one normalization operation, thus improving its learning and generalization capacities. Also, the additional parameters and computational costs introduced by HNB can be ignored.

As depicted in Fig. 4, given a feature map FinCin×H×WF_{in}\in\mathbb{R}^{C_{in}\times H\times W}, a 3×\times3 convolution is first adopted to produce the intermediate feature map FmidCout×H×WF_{mid}\in\mathbb{R}^{C_{out}\times H\times W}. Then, HNB divides the input FmidF_{mid} into two parts, namely, Fmid1F_{mid_{1}} and Fmid2F_{mid_{2}} (Fmid1&Fmid 2Cout /2×H×WF_{\operatorname{mid}_{1}}\&F_{\text{mid }_{2}}\in\mathbb{R}^{C_{\text{out }}/2\times H\times W}). The first part Fmid1F_{mid_{1}} is normalized by BN with learnable parameters and the second part Fmid2F_{mid_{2}} is normalized by IN, and then they are concatenated in the channel dimension. HNB employs BN on half of the channels and IN on the other half, so that content-related features can be preserved while learning features that are invariant to appearance changes. After concatenating the two intermediate features, the output feature map FoutCout×H×WF_{out}\in\mathbb{R}^{C_{out}\times H\times W} is obtained by passing features to a SELU layer.

III-E Loss Functions

To train the proposed DEMore-Net more effectively, we take into account all the positive factors that can improve the quality of the restored MOR-free images. Besides depth estimation loss and adversarial loss, we also adopt the reconstruction loss, Dark Channel (DC) loss [41] and Total Variation (TV) loss [42] to train our DEMore-Net. The total loss function can be expressed as:

LTotal=L1(D)+λ1Ladv(G)+Lrec+λ2Ldc+λ3Ltv,\begin{split}L_{{Total}}=&L_{1}(D)+\lambda_{1}L_{adv}(G)+L_{rec}+\lambda_{2}L_{dc}\\ &+\lambda_{3}L_{{tv}},\end{split} (11)

where λi,i=1,2,3\lambda_{i},i=1,2,\cdots 3, are loss weights, and we set λ1=0.1\lambda_{1}=0.1, λ2=0.01\lambda_{2}=0.01, and λ3=0.01\lambda_{3}=0.01 in our experiments.

Reconstruction Loss. The reconstruction loss in this work is a combination of multi-scale structural similarity (MS-SSIM) loss [43] and L1L_{1} loss, which is formulated as follows:

Lrec=αLmsssim+(1α)L1(R),L_{rec}=\alpha\cdot L_{ms{-}ssim}+(1-\alpha)\cdot L_{1}(R), (12)

where

Lmsssim=1m=1M(2μG(x)μy+C1μG(x)2+μy2+C1)βm(2σG(x)y+C2σG(x)2+σy2+C2)γm,\begin{split}&L_{ms{-}ssim}=1-\\ &\prod_{m=1}^{M}\left(\frac{2\mu_{G(x)}\mu_{y}+C_{1}}{\mu_{G(x)}^{2}+\mu_{y}^{2}+C_{1}}\right)^{\beta_{m}}\left(\frac{2\sigma_{G(x)y}+C_{2}}{\sigma_{G(x)}^{2}+\sigma_{y}^{2}+C_{2}}\right)^{\gamma_{m}},\end{split} (13)
L1(R)=1CWHc=1Cw=1Wh=1H|G(x)c,w,hyc,w,h|1.L_{1}(R)=\frac{1}{CWH}\sum_{c=1}^{C}\sum_{w=1}^{W}\sum_{h=1}^{H}\left|G(x)^{c,w,h}-y^{c,w,h}\right|_{1}. (14)

In the above equations, α\alpha is the hyperparameter, and we empirically set α=0.1\alpha=0.1 in our experiments. MM is the total number of the scales and is set to 5 according to [43]; μG(x)\mu_{G(x)}, μy\mu_{y} and σG(x)\sigma_{G(x)}, σy\sigma_{y} represent the mean and standard deviations of G(x)G(x) and yy, σG(x)y\sigma_{G(x)y} refers to their covariance. βm\beta_{m} and γm\gamma_{m} denote the relative importance of the two components, both are simply set to 1 in our experiments. Two small constants C1=0.0001C_{1}=0.0001 and C2=0.0009C_{2}=0.0009 are added to avoid the unstable case of division by zero.

Dark Channel Loss. In light of the effectiveness of the dark channel prior [44] in the image dehazing field, we apply the Dark Channel loss to help the DEMore-Net deal with the rainy haze in the MOR images, which is formulated as:

Ldc=𝐃G(x)1,L_{dc}=\left\|\mathbf{D}_{G\left(x\right)}\right\|_{1}, (15)

where 𝐃G(x)\mathbf{D}_{G\left(x\right)} represents the vector form of the dark channel of the restored MOR-free images G(x)G\left(x\right). Since dark channel operation is a highly non-convex and non-linear term, it cannot be directly embedded into the learning networks. We adopt the look-up table scheme (following [41]) to implement the forward and backward steps of the dark channel operation.

Total Variation Loss. We adopt the total variation loss [42] to reduce the difference between adjacent pixel values to make the restored images more natural, which can be expressed as:

Ltv=hG(x)+vG(x)1,L_{tv}=\left\|\nabla_{h}G(x)+\nabla_{v}G(x)\right\|_{1}, (16)

where h\nabla_{h} and v\nabla_{v} denote the horizontal and vertical differential operation matrices, respectively.

IV Experiments

In this section, comprehensive experiments are conducted on both synthetic RainCityscapes++ benchmark [18] and real-world rainy images to evaluate the MOR removal capacity of DEMore-Net and other algorithms. Moreover, we also perform an ablation study to evaluate the effectiveness of each component in DEMore-Net. The details are as follows.

IV-A Implementation Details

Dataset. To the best of our knowledge, the RainCityscapes++ dataset [18] stands as the sole synthetic MOR benchmark to encompass raindrops, rain streaks, and rainy haze in one consolidated dataset. It comprises a total of 8580 synthetic MOR images, out of which 7580 images serve for training and 1000 images for testing. Henceforth, we adopt the RainCityscapes++ dataset to train and evaluate the performance of our proposed DEMore-Net.

Training Details. DEMore-Net is implemented by PyTorch 1.7 on a system with an Intel Core i9-12900KF CPU, 32 GB RAM, and an NVIDIA GeForce RTX 3090 GPU. We adopt the Adam optimizer with a batch size of 2 to train our model, where the momentum parameters β1\beta_{1} and β2\beta_{2} are set to 0.9 and 0.999. The total number of iterations and the initial learning rate are empirically set to 300,000300,000 and 5×1045\times 10^{-4}, respectively. Considering that the original image size of the RainCityscapes++ dataset is too large (2048×10242048\times 1024 pixels), we resize the images to 720×480720\times 480 pixels for both training and inference.

Evaluation Settings. To quantitatively assess the performance of DEMore-Net, we employ the average Peak Signal to Noise Ratio (PSNR) and Structural Similarity index (SSIM) as the evaluation metrics, which are the most widely used image objective evaluation indexes in the field of image restoration. The proposed DEMore-Net is compared with various SOTA image deraining approaches. These algorithms can be divided into two categories, namely, single-type rain removal methods and multi-type rain removal methods. For single-type rain removal methods, we compare with both raindrop, rain streak, and rainy haze removal approaches, including AttentiveGAN [12], CMFNet [19], MSPFN [45], Syn2Real [20], DGCN [46], MPRNet [37], LD-Net [47], and SGID [21]. For multi-type rain removal methods, we compare with DAF-Net [15], DGNL-Net [36], CCN [17], MBA-RainGAN [18], and the cascade of different types rain removal models (i.e., raindrop+rain streak+rainy haze removal).

TABLE I: Comparison of DEMore-Net with SOTA image deraining algorithms on the RainCityscapes++ test set. RainD, RainS, and RainH are abbreviations of Raindrop, Rain Streak, and Rainy Haze, respectively.
Method Publication Removal Type PSNR\uparrow SSIM\uparrow
AttentiveGAN [12] CVPR’18 Raindrop 24.03 0.842
CMFNet [19] arXiv’22 Raindrop 28.59 0.891
MSPFN [45] CVPR’20 Rain Streak 22.19 0.704
Syn2Real [20] CVPR’20 Rain Streak 27.15 0.898
DGCN [46] AAAI’21 Rain Streak 23.64 0.794
MPRNet [37] CVPR’21 Rain Streak 22.87 0.762
LD-Net [47] TIP’21 Rainy Haze 20.06 0.741
SGID [21] TIP’22 Rainy Haze 21.87 0.761
DAF-Net [15] CVPR’19 RainS + RainH 25.21 0.852
DGNL-Net [36] TIP’21 RainS + RainH 26.89 0.864
CCN [17] CVPR’21 RainD + RainS 27.51 0.901
MBA-RainGAN [18] ICASSP’22 RainD + RainS + RainH 29.16 0.913
DGNL-Net + CMFNet - RainD + RainS + RainH 24.89 0.866
CCN + SGID - RainD + RainS + RainH 24.43 0.867
DEMore-Net ours RainD + RainS + RainH 35.46 0.959

IV-B Comparison with State-of-the-arts

Comparison on Synthetic Dataset. For quantitative evaluation, we report the averaged PSNR and SSIM metrics of 14 state-of-the-art dehazing algorithms on the RainCityscapes++ test set in Table I. For a fair comparison, all compared methods are retrained on the RainCityscapes++ training set. As can be seen, since single-type rain removal algorithms can only remove one type of rainwater artifact, they cannot achieve satisfactory results. Surprisingly, cascading combinations of different rain removal methods (raindrop+rain streak+rainy haze) also usually yielded poor results. We argue this is because different restoration methods may introduce additional artifacts or lose details in the image processing process (see Fig. 5), and this process is constantly accumulating, degrading the quality of the final restored images. Compared with these SOTA approaches, the proposed DEMore-Net achieves the best performance with 35.46 PSNR and 0.959 SSIM.

Refer to caption
Figure 5: Image deraining results on the RainCityscapes++ dataset. From (a) to (h): (a) the rainy image, and the deraining results of (b) CMFNet [19], (c) Syn2Real [20], (d) CCN [17], (e) CCN + SGID, (f) MBA-RainGAN [18], (g) our DEMore-Net, respectively, and (h) the ground truth. As observed, the proposed DEMore-Net can produce high-quality MOR-free images with well-preserved details.
Refer to caption
Figure 6: Image deraining results on the real-world rainy images. Our DEMore-Net can generate both MOR-free and more natural images.

Moreover, Fig. 5 exhibits a visual comparison of synthetic MOR images from the RainCityscapes++ dataset. As observed, Syn2Real cannot completely remove the rainy haze in the image, and some haze remains in the sky area. Although CMFNet and CCN succeed in removing the rain to some extent, they cannot preserve the details of the image well, making the image look a little blurry. CCN+SGID (cascading combination method) seems unable to cope with the intractable MOR problem by combining it with another dehazing algorithm. Compared with the original CCN approach, it introduces additional artifacts and tends to darken the restored images. The deraining results of MBA-RainGAN are pretty good but still cannot completely remove the rainy haze in sky regions. In contrast, our DEMore-Net can produce much clearer and more natural MOR-free images.

Comparison on Real-World MOR Images. To evaluate the effectiveness of DEMore-Net in real-world rainy scenarios, we compare our method with SOTA rain removal approaches on real-world MOR images. Fig. 6 displays 4 real-world rainy images and the deraining results by different algorithms. As observed, Syn2Real cannot produce satisfactory deraining results due to color distortion. For CMFNet and CCN, although they have largely overcome the issue of color distortion, they introduce additional artifacts into the image, making the deraining results look unreal and unnatural. The results of CCN+SGID look very dark, and there is still some remaining rain. MBA-RainGAN can remove most of the rain streaks and raindrops in the image, but still cannot effectively address the rainy haze in some regions. Compared with these SOTA methods, our DEMore-Net produces the most realistic MOR-free with perceptually pleasing and high quality.

To better understand the deraining capacity of DEMore-Net in real-world rainy scenarios, a user study is performed to quantitatively evaluate the deraining results. To be more specific, we first prepare 50 real-world rainy images from an existing real-world rainy dataset (DDN-SIRR [11]) or from the Internet. Then, we apply 5 representative SOTA deraining algorithms and our DEMore-Net to restore these 50 images. Next, 10 participants (5 males/5 females) are recruited and asked to score the deraining results on a scale from 1 (worst) to 5 (best). We exhibit to each participant these 300 derained images in a random order, without telling them the corresponding deraining algorithms. The experimental results are depicted in Table II, demonstrating that our DEMore-Net can cope well with real-world MOR removal tasks.

TABLE II: User study results. Mean ratings are given ± standard deviation for each deraining algorithm.
Method Rating (mean & standard dev.)
CMFNet [19] 3.34 ± 0.45
Syn2Real [20] 3.16 ± 0.55
CCN [17] 3.13 ± 0.74
CCN + SGID 2.75 ± 0.82
MBA-RainGAN [18] 3.42 ± 0.57
DEMore-Net (ours) 3.74 ± 0.33
TABLE III: Quantitative evaluations (NIQE/SSEQ/BRISQUE/PI) with SOTA deraining approaches on 50 real-world rainy images. Red and blue indicate the 1st1^{st} and 2nd2^{nd} ranks, respectively.
Method NIQE\downarrow SSEQ\downarrow BRISQUE\downarrow PI\downarrow
Rainy image 4.582 28.456 23.276 3.259
Syn2Real [20] 3.894 28.559 19.142 2.594
CMFNet [19] 4.495 27.322 21.934 3.144
CCN [17] 3.961 26.382 22.574 3.106
CCN + SGID 4.553 28.450 23.052 3.077
MBA-RainGAN [18] 4.409 19.626 17.938 2.939
DEMore-Net 3.714 19.480 18.875 2.555

For quantitative comparison, 4 well-known no-reference image quality assessment indexes are employed to evaluate the performance of real-world MOR removal, including NIQE, SSEQ, BRISQUE, and PI [48]. All these indexes are evaluated on the above-mentioned 50 real-world rainy images. As tabulated in Table III, our DEMore-Net achieves the best performance in terms of NIQE, SSEQ, and PI, showing that the restored MOR-free images produced by our approach are much clearer and more realistic. Furthermore, DEMore-Net also achieves an impressive performance in BRISQUE. In a nutshell, our DEMore-Net wins three of the four indexes, fully validating the superiority of DEMore-Net on real-world rain removal tasks.

IV-C Ablation Study

The proposed DEMore-Net exhibits superior MOR removal capacity compared to 14 SOTA deraining algorithms. To evaluate the effectiveness of DEMore-Net, an ablation study is performed to validate different components, including the DenseNet structure, self-calibrated convolutions, depth estimation branch, adversarial training strategy, and Hybrid Normalization Block (HNB).

We first adopt 11 dilated residual blocks (DRB) with 2 convolutional operations to construct the base model of the MOR removal network (without depth estimation branch) and then train this model via the aforementioned implementation details. The normalization approach used here is group normalization [49]. Next, different components are gradually added to our base model:

  1. 1.

    base model + DenseNet structure \rightarrow V1V_{1},

  2. 2.

    V1V_{1} + self-calibrated convolutions \rightarrow V2V_{2},

  3. 3.

    V2V_{2} + depth estimation branch (including DGNL) \rightarrow V3V_{3},

  4. 4.

    V3V_{3} + adversarial training strategy \rightarrow V4V_{4},

  5. 5.

    V4V_{4} + Hybrid Normalization Block \rightarrow V5V_{5} (full model).

All these models are retrained in the same training strategy as before and evaluated on the RainCityscapes++ dataset. The experimental results of these models are depicted in Table IV.

As exhibited in Table IV, each component in our DEMore-Net contributes to MOR removal, especially the well-designed depth estimation branch, which achieves a 4.96dB PSNR and 0.057 SSIM improvement over variant V2V_{2}. Additionally, the proposed HNB has significantly advanced the performance of our network in terms of PSNR, and the use of adversarial training strategy and self-calibrated convolutions improve the model performance in SSIM. The adoption of the DenseNet structure has also improved the deraining performance of the model. As observed, the committee with these five components produced the best MOR removal performance, indicating that the five components can complement each other.

TABLE IV: Ablation study on DEMore-Net. As observed, V5V_{5} (our full model) achieves the best performance.
Variants Base V1V_{1} V2V_{2} V3V_{3} V4V_{4} V5V_{5}
DenseNet w/o
SC Conv w/o w/o
Depth Estimation Branch w/o w/o w/o
Adversarial Training w/o w/o w/o w/o
HNB w/o w/o w/o w/o w/o
PSNR 24.38 25.60 26.86 31.82 33.43 35.46
SSIM 0.845 0.855 0.869 0.926 0.941 0.959

IV-D Application

Typically, the accuracy of object detectors drops significantly in rainy conditions due to obvious visibility degradation. To prove that our DEMore-Net can benefit vision-based applications, we adopt a pre-trained YOLOXs [50] detector to detect objects on real-world rainy images and the corresponding deraining results by different algorithms. As depicted in Fig. 7, after removing the rain, the confidence in recognizing objects is greatly improved. Clearly, DEMore-Net outperforms the other image deraining approaches, fully demonstrating the effectiveness of our approach in real-world deraining tasks.

Refer to caption
Figure 7: Object detection results on the real-world rainy images and images restored by different deraining algorithms.

IV-E Efficiency Analysis

Considering that efficiency is quite crucial for computer vision systems, we evaluate the computational performance of some typical deraining algorithms and report their average running times in Table V. All the methods are implemented on an NVIDIA GeForce RTX 3090 GPU. It can be observed that DEMore-Net takes on average about 0.07ss to process a rainy image from the RainCityscapes++ dataset (720×\times480 pixels). Our DEMore-Net has the second-fastest inference speed among the 13 representative deraining approaches.

TABLE V: Average runtime of different deraining algorithms tested on the RainCityscapes++ dataset.
Method Platform Average time
AttentiveGAN [12] PyTorch (GPU) 0.25ss
CMFNet [19] PyTorch (GPU) 0.22ss
MSPFN [45] TensorFlow (GPU) 0.68ss
Syn2Real [20] PyTorch (GPU) 0.37ss
DGCN [46] TensorFlow (GPU) 0.28ss
MPRNet [37] PyTorch (GPU) 0.23ss
LD-Net [47] PyTorch (GPU) 0.35ss
SGID [21] PyTorch (GPU) 0.52ss
DAF-Net [15] Caffe (GPU) 0.05ss
DGNL-Net [36] PyTorch (GPU) 0.09ss
CCN [17] PyTorch (GPU) 0.36ss
MBA-RainGAN [18] PyTorch (GPU) 0.43ss
DEMore-Net (ours) PyTorch (GPU) 0.07ss

V Conclusion

This paper presents a highly efficient unified paradigm for MOR removal, named DEMore-Net. The proposed methodology leverages a joint learning framework to perform depth estimation and image deraining tasks simultaneously. By considering the depth map as an essential prior knowledge, the network is guided to remove various types of rainwater more effectively. Additionally, the deraining module’s output features are shared to improve the depth prediction in the other branch, resulting in a collaborative and mutually beneficial approach. To enhance the model’s stability and generalization abilities, a novel Hybrid Normalization Block (HNB) is developed in the network design. Moreover, to further improve the MOR removal capacity of DEMore-Net, the self-calibrated convolution network is introduced as a feature enhancement module to produce more discriminative feature representations. Both quantitative and qualitative evaluations demonstrate that DEMore-Net surpasses 14 contemporary SOTA image deraining algorithms.

References

  • [1] S. Cobreces, E. J. Bueno, D. Pizarro, F. J. Rodriguez, and F. Huerta, “Grid impedance monitoring system for distributed power generation electronic interfaces,” IEEE Trans. Instrum. Meas., vol. 58, no. 9, pp. 3112–3121, Sep. 2009.
  • [2] N. Buch, S. A. Velastin, and J. Orwell, “A review of computer vision techniques for the analysis of urban traffic,” IEEE Trans. Intell. Transp. Syst., vol. 12, no. 3, pp. 920–939, Sep. 2011.
  • [3] S. S. M. Ali, B. George, L. Vanajakshi, and J. Venkatraman, “A multiple inductive loop vehicle detection system for heterogeneous and lane-less traffic,” IEEE Trans. Instrum. Meas., vol. 61, no. 5, pp. 1353–1360, May. 2012.
  • [4] L.-W. Kang, C.-W. Lin, and Y.-H. Fu, “Automatic single-image-based rain streaks removal via image decomposition,” IEEE Trans. Image Process., vol. 21, no. 4, pp. 1742–1755, Apr. 2012.
  • [5] Y.-L. Chen and C.-T. Hsu, “A generalized low-rank appearance model for spatio-temporally correlated rain streaks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2013, pp. 1968–1975.
  • [6] Y. Li, R. T. Tan, X. Guo, J. Lu, and M. S. Brown, “Rain streak removal using layer priors,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2736–2744.
  • [7] N. Singh and A. K. Bhandari, “Principal component analysis-based low-light image enhancement using reflection model,” IEEE Trans. Instrum. Meas., vol. 70, pp. 1–10, Jul. 2021.
  • [8] Z. Zhou, Z. Shi, and W. R. Ren, “Linear contrast enhancement network for low-illumination image enhancement,” IEEE Trans. Instrum. Meas., vol. 71, pp. 1–12, Dec. 2022.
  • [9] W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan, “Deep joint rain detection and removal from a single image,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 1357–1366.
  • [10] H. Zhang and V. M. Patel, “Density-aware single image de-raining using a multi-stream dense network,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, pp. 695–704.
  • [11] W. Wei, D. Meng, Q. Zhao, Z. Xu, and Y. Wu, “Semi-supervised transfer learning for image rain removal,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 3877–3886.
  • [12] R. Qian, R. T. Tan, W. Yang, J. Su, and J. Liu, “Attentive generative adversarial network for raindrop removal from a single image,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, pp. 2482–2491.
  • [13] Q. Luo, K. Liu, J. Su, C. Yang, W. Gui, L. Liu, and O. Silvén, “Waterdrop removal from hot-rolled steel strip surfaces based on progressive recurrent generative adversarial networks,” IEEE Trans. Instrum. Meas., vol. 70, pp. 1–11, Jul. 2021.
  • [14] W. Yan, L. Xu, W. Yang, and R. T. Tan, “Feature-aligned video raindrop removal with temporal constraints,” IEEE Trans. Image Process., vol. 31, pp. 3440–3448, May. 2022.
  • [15] X. Hu, C.-W. Fu, L. Zhu, and P.-A. Heng, “Depth-attentional features for single-image rain removal,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 8022–8031.
  • [16] Y. Guo, J. Chen, X. Ren, A. Wang, and W. Wang, “Joint raindrop and haze removal from a single image,” IEEE Trans. Image Process., vol. 29, pp. 9508–9519, Oct. 2020.
  • [17] R. Quan, X. Yu, Y. Liang, and Y. Yang, “Removing raindrops and rain streaks in one go,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 9147–9156.
  • [18] Y. Shen, Y. Feng, W. Wang, D. Liang, J. Qin, H. Xie, and M. Wei, “Mba-raingan: A multi-branch attention generative adversarial network for mixture of rain removal,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May. 2022, pp. 3418–3422.
  • [19] C.-M. Fan, T.-J. Liu, and K.-H. Liu, “Compound multi-branch feature fusion for real image restoration,” arXiv preprint arXiv:2206.02748, 2022.
  • [20] R. Yasarla, V. A. Sindagi, and V. M. Patel, “Syn2real transfer learning for image deraining using gaussian processes,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 2726–2736.
  • [21] H. Bai, J. Pan, X. Xiang, and J. Tang, “Self-guided image dehazing using progressive feature fusion,” IEEE Trans. Image Process., vol. 31, pp. 1217–1229, Jan. 2022.
  • [22] J.-J. Liu, Q. Hou, M.-M. Cheng, C. Wang, and J. Feng, “Improving convolutional networks with self-calibrated convolutions,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 10 096–10 105.
  • [23] Y. Chang, L. Yan, and S. Zhong, “Transformed low-rank model for line pattern noise removal,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 1726–1734.
  • [24] L. Zhu, C.-W. Fu, D. Lischinski, and P.-A. Heng, “Joint bi-layer optimization for single-image rain streak removal,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2526–2534.
  • [25] X. Li, J. Wu, Z. Lin, H. Liu, and H. Zha, “Recurrent squeeze-and-excitation context aggregation net for single image deraining,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Sep. 2018, pp. 254–269.
  • [26] H. Zhang, V. Sindagi, and V. M. Patel, “Image de-raining using a conditional generative adversarial network,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 11, pp. 3943–3956, Nov. 2019.
  • [27] N. Ahn, S. Y. Jo, and S. Kang, “Eagnet: Elementwise attentive gating network-based single image de-raining with rain simplification,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 2, pp. 608–620, Mar. 2022.
  • [28] H. Kurihata, T. Takahashi, I. Ide, Y. Mekada, H. Murase, Y. Tamatsu, and T. Miyahara, “Rainy weather recognition from in-vehicle camera images for driver assistance,” in IEEE Proceedings of Intelligent Vehicles Symposium, Jan. 2005, pp. 205–210.
  • [29] M. Roser and A. Geiger, “Video-based raindrop detection for improved image registration,” in Proc. IEEE Int. Conf. Comput. Vis. Workshops (ICCVW), Sep. 2009, pp. 570–577.
  • [30] Y. Quan, S. Deng, Y. Chen, and H. Ji, “Deep learning for seeing through window with raindrops,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 2463–2471.
  • [31] D. Eigen, D. Krishnan, and R. Fergus, “Restoring an image taken through a window covered with dirt or rain,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2013, pp. 633–640.
  • [32] S. You, R. T. Tan, R. Kawakami, Y. Mukaigawa, and K. Ikeuchi, “Adherent raindrop modeling, detection and removal in video,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 9, pp. 1721–1733, Sep. 2016.
  • [33] K. Zhang, D. Li, W. Luo, and W. Ren, “Dual attention-in-attention model for joint rain streak and raindrop removal,” IEEE Trans. Image Process., vol. 30, pp. 7608–7619, Sep. 2021.
  • [34] Y. Luo, Y. Xu, and H. Ji, “Removing rain from a single image via discriminative sparse coding,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 3397–3405.
  • [35] S. G. Narasimhan and S. K. Nayar, “Chromatic framework for vision in bad weather,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2000, pp. 598–605.
  • [36] X. Hu, L. Zhu, T. Wang, C.-W. Fu, and P.-A. Heng, “Single-image real-time rain removal based on depth-guided non-local features,” IEEE Trans. Image Process., vol. 30, pp. 1759–1770, Jan. 2021.
  • [37] S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M.-H. Yang, and L. Shao, “Multi-stage progressive image restoration,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2021, pp. 14 821–14 831.
  • [38] Y. Wang, X. Yan, D. Guan, M. Wei, Y. Chen, X.-P. Zhang, and J. Li, “Cycle-snspgan: Towards real-world image dehazing via cycle spectral normalized soft likelihood estimation patch gan,” IEEE Trans. Intell. Transp. Syst., May. 2022.
  • [39] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2018, pp. 7794–7803.
  • [40] X. Pan, P. Luo, J. Shi, and X. Tang, “Two at once: Enhancing learning and generalization capacities via ibn-net,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Sep. 2018, pp. 464–479.
  • [41] L. Li, Y. Dong, W. Ren, J. Pan, C. Gao, N. Sang, and M.-H. Yang, “Semi-supervised image dehazing,” IEEE Trans. Image Process., vol. 29, pp. 2766–2779, Nov. 2019.
  • [42] H. A. Aly and E. Dubois, “Image up-sampling using total-variation regularization with a new observation model,” IEEE Trans. Image Process., vol. 14, no. 10, pp. 1647–1659, Oct. 2005.
  • [43] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in Proc. 37th Asilomar Conf. Signals, Syst. Comput., vol. 2, Nov. 2003, pp. 1398–1402.
  • [44] K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 12, pp. 2341–2353, Dec. 2011.
  • [45] K. Jiang, Z. Wang, P. Yi, C. Chen, B. Huang, Y. Luo, J. Ma, and J. Jiang, “Multi-scale progressive fusion network for single image deraining,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 8346–8355.
  • [46] X. Fu, Q. Qi, Z.-J. Zha, Y. Zhu, and X. Ding, “Rain streak removal via dual graph convolutional network,” in Proc. Conf. Artif. Intell. (AAAI), vol. 35, no. 2, Feb. 2021, pp. 1352–1360.
  • [47] H. Ullah, K. Muhammad, M. Irfan, S. Anwar, M. Sajjad, A. S. Imran, and V. H. C. de Albuquerque, “Light-dehazenet: a novel lightweight cnn architecture for single image dehazing,” IEEE Trans. Image Process., vol. 30, pp. 8968–8982, Oct. 2021.
  • [48] Y. Blau, R. Mechrez, R. Timofte, T. Michaeli, and L. Zelnik-Manor, “The 2018 PIRM challenge on perceptual image super-resolution,” in Proc. Eur. Conf. Comput. Vis. Workshops (ECCVW), vol. 11133, Sep. 2018, pp. 334–355.
  • [49] Y. Wu and K. He, “Group normalization,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Sep. 2018, pp. 3–19.
  • [50] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “Yolox: Exceeding yolo series in 2021,” arXiv preprint arXiv:2107.08430, 2021.