This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Categorized Reflection Removal Dataset with Diverse Real-world Scenes

Chenyang Lei1∗    Xuhua Huang1,2    Chenyang Qi1    Yankun Zhao1
Wenxiu Sun3    Qiong Yan3    Qifeng Chen1

1HKUST 2CMU 3SenseTime
Equal contribution
Abstract

Due to the lack of a large-scale reflection removal dataset with diverse real-world scenes, many existing reflection removal methods are trained on synthetic data plus a small amount of real-world data, which makes it difficult to evaluate the strengths or weaknesses of different reflection removal methods thoroughly. Furthermore, existing real-world benchmarks and datasets do not categorize image data based on the types and appearances of reflection (e.g., smoothness, intensity), making it hard to analyze reflection removal methods. Hence, we construct a new reflection removal dataset that is categorized, diverse, and real-world (CDR). A pipeline based on RAW data is used to capture perfectly aligned input images and transmission images. The dataset is constructed using diverse glass types under various environments to ensure diversity. By analyzing several reflection removal methods and conducting extensive experiments on our dataset, we show that state-of-the-art reflection removal methods generally perform well on blurry reflection but fail in obtaining satisfying performance on other types of real-world reflection. We believe our dataset can help develop novel methods to remove real-world reflection better. Our dataset is available at https://alexzhao-hugga.github.io/Real-World-Reflection-Removal/.

Blurry

Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption

Blurry

Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption

Sharp

Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption

Sharp

Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
Input Zhang et al. [40] Wei et al. [32] CoRRN [30] BDN [37] Wen et al. [33]
Figure 1: The performance of existing methods on different types of reflection is quite different. Most algorithms can remove the blurry reflection but cannot remove the sharp reflection well.

1 Introduction

Reflection removal is a task to remove undesirable reflection artifacts from a photograph. Existing deep learning based approaches for reflection removal have demonstrated superior performance on synthetic data, but we find that their performance degrades severely in diverse real-world data, as shown in Fig. 1. Most existing learning-based methods [5, 40] are trained on synthetic data created under various assumptions. Hence, their performance is limited by the domain gap between real-world and synthetic data. The assumptions to create the synthetic data are often simplified, and thus these approaches have sub-optimal performance on real-world data.

The existing real-world benchmark dataset SIR2 facilitates the research in reflection removal, but it has weaknesses as reported in the paper [29]. In this dataset, most images only contain flat objects or small object (i.e., postcards and solid objects) under controlled lighting. These images do not represent the scenes in our daily live because object distance, scales, and natural illumination variation are quite diverse in the wild. In the SIR2 dataset, only collected 55 pairs of images with ground-truth transmission in the wild. There are also other real-world datasets [40, 15] with high-quality ground truth but the they have a small number of images.

To address the limitations of existing reflection removal datasets, we present a large reflection removal dataset CDR that contains diverse scene in the real world. Compared with prior work, CDR has several advantages, as shown in Table 1. In terms of the number of real-world images, ours is much larger and more diverse than existing ones. We construct the dataset following several principles to ensure image quality and diversity. First, we capture our data in the wild since it is similar to images captured in daily life. Second, to ensure perfect alignment between transmission and input mixed images, we obtain the transmission by subtracting reflection from the mixed image in the raw data space [14]. Third, we capture the images using different glasses in diverse scenes to ensure the diversity of our dataset. We believe that the performance of a reflection removal method is related to the smoothness of reflection, and thus we carefully categorize the collected data into different types of reflection. We split the data based on the smoothness of reflection (i.e., sharp reflection or blurry reflection). In our experiments, all existing methods are sensitive to the smoothness of reflection.

We hope the CDR dataset can facilitate the research in reflection removal. The CDR dataset can provide a more extensive evaluation for reflection removal methods. In addition, the detailed categorization can help analyze the bottlenecks of existing methods.

2 Background

2.1 Single Image Reflection Removal

Most single image reflection removal methods [5, 40, 37, 38] rely on various assumptions. Considering image gradients, Arvanitopoulos et al. [3] propose the idea of suppressing the reflection, and Yang et al. [38] propose a faster method based on convex optimization. These methods fail to remove sharp reflection. Under the assumption that transmission is in focus, Punnappurath et al. [23] design a method based on dual-pixel camera input. CEILNet [5], Zhang et al. [40], and BDN [37] assume that reflection is out of focus and synthesize images to train their neural networks. CEILNet [5] estimates target edges first and uses it as guidance to predict the transmission layer. Zhang et al. [40] use perceptual and adversarial losses to capture the difference between reflection and transmission. BDN [37] estimates the reflection images, which is then used to estimate the transmission layer. These methods [40, 5, 37] work well when reflection is more defocused than transmission but fail otherwise. For these deep learning based approaches, training data is critical for good performance. To bridge the gap between synthetic and real-world data, Zhang et al. [40] and Wei et al. [32] collected some real-world images for training. However, their images have misalignment issues between transmission and input images, and the dataset size is small. Wei et al. [32] propose to use high-level features that are less sensitive to small misalignment to calculate losses. To obtain more realistic and diverse data, Wen et al. [33] and Ma et al. [20] propose to synthesize data using a deep neural network and achieve better performance and generalization. Kim et al. [11] propose a physics based method to render the reflection and achieve better performance than using synthetic images.

Glass type Categorized Scene-level data Alignment Reflection image Curve glass Data type Training set
SIR2-Wild [29] 3 No 55 Calibrated Yes No RGB No
Zhang et al. [40] 1 No <<110 Calibrated No No RGB No
Nature [15] 2 No << 220 Misalignment No No RGB No
Ours >200 Yes 1,063 Perfect Yes Yes RGB&Raw Yes
Table 1: The comparisons between our data and existing datasets [29, 40, 15]. Scene-level data: the data that is captured in the wild instead of lab environments.

2.2 Reflection Removal with Multiple Images

Utilizing multiple images as input provides additional information, which makes it possible to relax some strict assumptions used in prior work. A number of approaches [28, 25, 24, 16, 8, 9, 36, 27, 2, 18] exploit the relative motion between reflection and transmission with multiple images captured with camera movement to remove reflection.

Some other methods may take a sequence of images under specific conditions or camera settings. For example, pairs of flash and no-flash images [1, 13], near-infrared cameras [10], light field cameras  [31], dual-pixel cameras [23], and polarization cameras  [19, 22, 26, 12, 6] can be used.

Different from supervised learning models, Double-DIP [7] separates a mixed image into reflection and transmission layers based on internal self-similarities in multiple superpositions, in an unsupervised fashion.

Although the methods based on multiple images do not rely on strict assumptions for appearances of reflections (e.g., blurry reflection, ghosting effects), they need additional requirements on data [36] or special devices [14], which may prevent them from broader applications. Therefore, in this work, we focus on building a dataset and evaluating single image reflection removal methods.

2.3 SIR2 benchmark dataset

Wan et al. [29] propose the SIR2 benchmark dataset for single image reflection removal. This dataset contains images taken in controlled scenes and in the wild. To solve the misalignment problem, they calibrate the alignment between the mixed image MM and the background BB. The dataset was captured with three glasses of different thicknesses, various combinations of aperture sizes, and different exposure times to improve the diversity in this dataset.

As described by Wan et al. [29], most of the objects in their controlled scene contain only flat objects (postcard) or objects with similar scales (solid objects). However, real-world scenes contains objects at different depths, and the natural environment illumination also varies greatly, while the controlled scenes are mostly captured in an indoor office environment. To address this limitation, 55 pairs of images with ground-truth reflection and transmission are captured in the wild, but 55 pairs are far from large scale. Also, this dataset does not provide a standard split among the training set, the validation set, and the test set.

3 CDR Dataset

In this section, we describe the features of our CDR dataset for reflection removal, which stands for “Categorized, Diverse, and Real-world.” A triplet {M,R,T}\{M,R,T\} is collected in each scene where MM is the mixed image, RR is the reflection image, and TT is the transmission image.

Refer to caption
Figure 2: Due to refraction, spatial shift and intensity difference exist between BB and TT. The difference map visualizes the misalignment between BB and TT. The sum of the reflection RR and the transmission TT equals to the mixed image MM in the raw data space.
Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
{MM,RR,TT} on curved glass {MM,RR,TT} on colored (red) glass
Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
{MM,RR,TT} on dynamic transmission {MM,RR,TT} on dynamic transmission
Figure 3: More examples about the data diversity. In addition to glass types, we are also able to capture dynamic scenes, which enriches the scene diversity.
Refer to caption Refer to caption Refer to caption Refer to caption
Ours-M1M1 Ours-R1R1 Ours-M2M2 Ours-R2R2
Figure 4: With the M-R pipeline proposed by Lei et al. [14], we can utilize a diverse set of glasses existing in our daily life (e.g., the curved and colored glass on the telephone booth, and the glass as a door).

This dataset is collected mainly by three cameras: a DSLR (Digital Single-Lens Reflex) camera Canon EOS 50D, a MILC (Mirrorless Interchangeable Lens Cameras) Nikon Z6, and a smartphone camera Huawei Mate30. In total, we provide 1,063 triplets in our dataset.

We collected the real-world data in the wild. Compared with data captured in a controlled environment, real-wold data in the wild contains objects of various distances and illumination variation, as reported by SIR2 [29]. We improve our dataset in various aspects. Table 1 summarizes the main differences between the CDR dataset and existing datasets [29, 40, 15]. Specifically, our main advantages are listed as followed:

(a) Data categorization. We notice that the performance of a reflection removal model is related to the appearance of reflection. To facilitate in-depth analysis, we split all the images according to the reflection types.

(b) Perfect alignment. We provide perfect alignment between the mixed image MM and transmission TT. Existing datasets [29, 40] has the misalignment issue between MM and TT. The misalignment issue does not only degrade a model’s performance but also makes evaluation less accurate. A reflection removal model trained on misaligned paired data often generate blurry images [32]. Note that a single pixel shift in an image can affect evaluation metrics PSNR and SSIM significantly.

(c) Diversity. We provide much more diverse data by utilizing different types of glasses exsiting in our daily life, including colored and curved glasses shown in Fig. 3. We capture various objects in different environments and lighting conditions with three different types of cameras. As mentioned before, when we capture images in the wild, we also guarantee diversity of the smoothness and intensity of reflection.

(d) Large-scale data. Our dataset is significantly larger than existing datasets or benchmarks. In total, our dataset contains 1,063 triplets M,R,T{M,R,T}.We believe our evaluation is more accurate because there is no misalignment between the mixed image and the ground truth.

(e) Other advantages. Compared with existing datasets [40, 32, 15], we provide reflection image since many works have demonstrate the effectiveness of using reflection [37, 14]. We also provide raw images instead of only RGB images for future study in reflection removal.

Our CDR dataset is publicly available, which can be used for training and evaluation. The detailed categories can help researchers understand the strengths and weaknesses of existing methods. We hope it can accelerate the research in reflection removal.

Refer to caption
Figure 5: The post-processing pipeline. Ground-truth transmission TT is obtained in the RAW space. Then all the RAW images are passed through an “ISP” to obtain the corresponding RGB images. Finally, the regions of interest regions are cropped out.
Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
RGB MM RGB RR RGB MRM-R Gamma MRM-R Raw MRM-R
Figure 6: If MRM-R is applied other than the raw data space, undesirable residuals will appear. “RGB MRM-R”: do MRM-R on RGB images. “Gamma MRM-R”: use M2.2R2.2M^{2.2}-R^{2.2} to reduce the impact of gamma correction. “Raw MRM-R”: do MRM-R on raw data.

3.1 Data Aquisition

In order to collect diverse data with perfect alignment in the wild, we adopt the M-R pipeline proposed by Lei et al. [14] where TT is obtained by MRM-R. Different from Lei et al. [14] which implements MRM-R on raw data obtained by a polarization sensor, we use normal cameras that provide the raw data to construct the CDR dataset.

Fig. 4 shows our data collection pipeline. The first step is to have an appropriate glass. Since we do not need to remove the glass to obtain background BB as other methods do [29, 32, 40], we can utilize immovable glasses in the real world. As the second step, we use a piece of black cloth behind the glass to block transmission to obtain the reflection RR. In the end, we remove the cloth to collect the mixed image MM. Please refer to Fig. 3 for some captured example iamges.

There are some details in real capturing progress:

  • To ensure perfect alignment between MM and RR, we use a tripod to fix the camera to take images.

  • To ensure the exposure is consistent between MM and RR, all the cameras are set to the manual mode with a fixed camera setting including ISO and exposure time.

  • To reduce the noise level in MM and RR as much as possible, we capture data with a long exposure time and a small ISO.

  • The objects in the reflection (not transmission) need be static in both MM and RR to ensure the perfect alignment in MRM-R.

3.2 Post Processing

Fig. 5 shows the overall pipeline of our post-processing step. With raw MM and raw RR, we calculate raw TT by T=MRT=M-R in the raw data space. Note that MRM-R should be implemented in the RGB space because the linearity between light intensities and RGB values does not hold, as shown in Fig. 6. In T=MRT=M-R, negative values may appear due to noise, and they are set to zeros directly. The black level of a camera is added back to ensure its ISP can be applied to TT. So far, the raw data format of TT is the same as MM and RR.

Since most existing reflection removal algorithms adopt RGB images as input, we need to convert raw images from the raw data space to RGB space. However, the camera default ISP is not public. Therefore, we implement our own image signal processing (ISP) pipeline to generate RGB images for MM, TT, and RR using the same ISP. As for the metadata of TT, we simply apply the metadata of MM to TT directly.

Refer to caption Refer to caption Refer to caption
(a) MM: Our ISP (b) MM: Lightroom (c) MM: Camera
Refer to caption Refer to caption Refer to caption
(d) MM: Our ISP1 (e) TT: Our ISP1 (f) TT: Our ISP2
Figure 7: The first row shows that our ISP generates similar results compared with Lightroom and Camera Output. The second row shows that different ISPs can be applied to TT to achieve similar results. Note that Lightroom is a professional ISP software supported by Adobe.

After obtaining RGB TT images, we need to crop the area of interest for TT. In the dataset, we eliminate the triplets in the following two cases: (1) If multiple glasses exist behind the covered glass (i.e., glasses exist in the background), the obtained TT still has glasses. (2) If there is no glass in MM, then no black cloth exists to cover the transmission. In this case, the obtained TT will equal 0, which contradicts the ground truth.

Different from SIR2 that classify images as bright or dark scenes according to absolute intensity, focus, or thickness of glass, we categorize images according to three criteria: relative intensity, smoothness between RR and TT and the ghosting effect. First, we observe that the impact of reflection is decided by the relative intensity instead of absolute intensity. We calculate the mean intensity ratio for each pair of RR and TT and categorize them as weak reflection, moderate reflection or strong reflection. Second, we sort the data based on the smoothness of reflection and transmission: BRST (blurry reflection and sharp transmission), SRST (sharp reflection and sharp transmission), BRBT (blurry reflection and blurry transmission). In fact, the BRST type has been generally used in previous work. At last, we pick up the data which has ghosting effect as a class.

4 Experiments

We first describe our experimental setup. The images from all three cameras are provided for training, which can reduce the impacts of the domain gap. Following previous work [29], we choose PSNR, SSIM and NCC as our main evaluation metrics.

The baselines for comparison in our experiment include CEILNet [5], CoRRN [30], BDN [37], Zhang et al. [40], Wei et al. [32], Yang et al. [38], Arvanitopoulos et al. [3], Li et al. [17], IBCLN [15], and Kim et al. [11]. All these methods take a single RGB image as input. For learning-based methods, we use their pre-trained models by default as some methods do not provide training code. For Yang et al. [38], different thresholds might result in different output images. Therefore, except for the original threshold, multiple thresholds are used for comparison, and the best result is reported.

4.1 Evaluation

All SRST BRST Non-ghosting
PSNR SSIM NCC PSNR SSIM NCC PSNR SSIM NCC PSNR SSIM NCC
Li et al. [17] 12.73 0.650 0.721 12.26 0.565 0.644 13.19 0.723 0.789 12.56 0.624 0.703
Arvan. et al. [3] 19.63 0.753 0.788 18.24 0.680 0.691 20.91 0.816 0.873 19.00 0.727 0.752
Yang et al. [38] 19.42 0.767 0.782 18.10 0.680 0.676 20.65 0.841 0.874 18.78 0.738 0.744
CEILNet [5] 17.96 0.708 0.757 16.17 0.596 0.654 19.49 0.802 0.847 17.24 0.673 0.720
Zhang et al. [40] 15.20 0.694 0.703 13.52 0.590 0.612 16.58 0.780 0.785 14.48 0.662 0.677
BDN [37] 18.97 0.758 0.745 19.04 0.713 0.642 19.06 0.799 0.836 18.62 0.733 0.698
Wei et al. [32] 21.01 0.762 0.756 19.52 0.672 0.631 22.36 0.839 0.864 20.50 0.731 0.713
CoRRN [30] 20.22 0.774 0.764 20.32 0.699 0.656 20.08 0.838 0.859 20.37 0.750 0.723
IBCLN [15] 19.85 0.764 0.735 18.33 0.671 0.613 21.14 0.842 0.846 19.23 0.735 0.687
Kim et al. [11] 21.00 0.760 0.769 19.27 0.676 0.654 22.61 0.833 0.871 20.42 0.731 0.726
Weak RR Moderate RR Strong RR Ghosting
PSNR SSIM NCC PSNR SSIM NCC PSNR SSIM NCC PSNR SSIM NCC
Li et al. [17] 14.36 0.779 0.841 12.47 0.636 0.709 8.89 0.309 0.401 13.36 0.742 0.785
Arvan. et al. [3] 23.52 0.878 0.941 18.43 0.744 0.765 13.56 0.397 0.423 21.88 0.844 0.919
Yang et al. [38] 23.18 0.903 0.937 18.28 0.754 0.755 13.50 0.402 0.425 21.72 0.870 0.917
CEILNet [5] 21.34 0.862 0.910 17.02 0.685 0.731 12.06 0.341 0.397 20.51 0.836 0.886
Zhang et al. [40] 17.20 0.827 0.822 15.10 0.685 0.688 9.33 0.311 0.402 17.81 0.806 0.797
BDN [37] 21.10 0.867 0.909 18.25 0.746 0.711 16.15 0.485 0.411 20.20 0.850 0.909
Wei et al. [32] 24.89 0.901 0.929 19.42 0.737 0.714 17.00 0.450 0.423 22.80 0.871 0.908
CoRRN [30] 20.50 0.890 0.928 21.01 0.768 0.735 15.12 0.433 0.394 19.70 0.861 0.911
IBCLN [15] 23.17 0.899 0.908 18.98 0.752 0.714 13.81 0.395 0.290 22.07 0.867 0.906
Kim et al. [11] 25.03 0.897 0.946 19.66 0.740 0.730 15.25 0.431 0.416 23.10 0.865 0.925
Table 2: Quantitative results for different methods on our dataset. A detailed analysis is presented in the paper. The first, second, and third best results are marked by bold font, bold font with underline, and underline only.

One interesting observation is that the performance of most existing single image reflection removal methods is highly related to the type of reflection.

Quantitative results. Table 2 shows the performance of the evaluated methods in terms of PSNR, SSIM, and NCC. We also split data according to smoothness, relative intensity, and ghosting effect of reflection and report separate results to analyze the impact of different image patterns. We find that learning free methods [3, 38] achieve good performance in cases that follow their model assumptions. Although these methods usually rank poorly on average, they can perform quite well when the reflection is blurry, weak, or has ghosting effect.

In the following, we analyze the impact of different factors on the results.

1) Impact of real data. The methods [32, 30] trained on real data achieve better performance on general real-scene data. Since Kim et al. [11] synthesized physically-based data for training, they also achieve good performance on real-world data. These data include SRST data, reflection with moderate intensity or strong intensity. As we can see, these methods [11, 32, 30] are often the first, second, and third best-performing methods.

2) The smoothness of reflection/transmission. The results on different smoothness of reflection and transmission are consistent with previously adopted assumptions. All methods perform relatively well the BRST (blurry reflection and sharp transmission) set. However, when the assumption does not hold (e.g., on SRST set), the performance degrades heavily. This phenomenon occurs to all evaluated algorithms. Note that in our dataset, almost 50% percents of MM are SRST.

3) Ghosting effect. Another assumption about reflection is the ghosting effect. Although most deep learning methods do not synthesize ghosting reflection data, they still achieve better performance on such kind of data.

4) Reflection Intensity. The difficulty of reflection removal increases significantly when the intensity of the reflection increases. However, the importance of reflection removal also increases in this case. When reflection is too weak, we even do not need to remove it because it is almost invisible. When the reflection is moderate or strong, the image quality suffers heavily.

BRST

Refer to caption Refer to caption Refer to caption Refer to caption
Input Yang et al. [38] Arvanitopoulos et al. [3] CEILNet [5]
Refer to caption Refer to caption Refer to caption Refer to caption
CoRRN [30] Zhang et al. [40] BDN [37] Wei et al. [32]

SRST

Refer to caption Refer to caption Refer to caption Refer to caption
Input GT Zhang et al. [40] Wei et al. [32]
Refer to caption Refer to caption Refer to caption Refer to caption
CoRRN [30] BDN [37] Yang et al. [38] Li et al. [17]
Figure 8: Most methods cannot remove the sharp reflection. This is probably because learning-based methods are trained on synthetic data where RR is blurry. Learning free methods often assume reflection is blurry. However, sharp reflection is quite common in the real world. Figure best viewed in the electronic version.

Qualitative results. We present qualitative results in Fig. 8 to analyze the results further. The perceptual performance is consistent with the quantitative results on different reflection types.

The performance on SRST is not satisfactory. For SRST, most methods cannot remove most reflection. For BRST, most benchmarked methods can remove the reflection when the reflection is weak. Learning free methods achieve good performance in this case [38, 3]. However, it is quite rare when the reflection is both blurry and weak.

Another problem that occurs commonly is the degradation of image quality as shown in Fig. 8. In some cases, a method may remove the reflection nearly completely, but transmission is modified at the same time.

4.2 Open Problems and Discussion

From the quantitative evaluation and perceptual results, we find that state-of-the-art single image reflection removal methods are still far from perfect, although they have achieved great performance on synthetic data or real-world data in a controlled environment.

From our experiment, we find that evaluation on synthetic data is flawed. The improvement of results on synthetic data or real-world data in a controlled environment cannot represent real improvement. If we want to apply the reflection removal method in our daily life, we should aim to achieve excellent performance on real-world data collected in the wild.

We believe we should relax strong assumptions in reflection removal methods. Strong assumptions may make the method perform well on a certain type of reflection but fail on other types. As analyzed above, most methods can achieve satisfactory results on blurry reflection but have poor performance on sharp reflection. It is definitely a difficult task to distinguish between the sharp reflection and transmission.

4.3 Reflection Removal on Raw Images

It is unclear whether single image reflection removal can benefit from raw images. However, we keep the raw data in our dataset since applying raw images has achieved amazing results on low-level computer vision tasks, including low-light image enhancement  [4], super-resolution [35], image denoising [39] and ISP [21, 34]. We leave the study for raw images on single image reflection removal to future work. To the best of our knowledge, our dataset is the first dataset containing raw images for single image reflection removal.

5 Conclusion

In this work, we propose a new dataset CDR for single image reflection removal. Compared with other reflection removal datasets, our dataset is categorized according to reflection types, has the perfect alignment, and contains diverse scenes. We carefully categorize the captured images into different classes and analyze the performance of state-of-the-art methods. The experimental results show that the performance of these state-of-the-art methods is highly related to the appearance and intensity of reflection. When the pre-adopted assumptions do no hold on real-world images, the methods based on these assumptions cannot achieve top performance. We believe researchers can utilize our benchmark to do research on real-world data in the wild. In addition to RGB images, the raw data is also provided for future study.

References

  • [1] Amit Agrawal, Ramesh Raskar, Shree K. Nayar, and Yuanzhen Li. Removing photography artifacts using gradient projection and flash-exposure sampling. TOG, 2005.
  • [2] Jean-Baptiste Alayrac, Joao Carreira, and Andrew Zisserman. The visual centrifuge: Model-free layered video representations. In CVPR, 2019.
  • [3] Nikolaos Arvanitopoulos, Radhakrishna Achanta, and Sabine Susstrunk. Single image reflection suppression. In CVPR, 2017.
  • [4] Chen Chen, Qifeng Chen, Jia Xu, and Vladlen Koltun. Learning to see in the dark. In CVPR, 2018.
  • [5] Qingnan Fan, Jiaolong Yang, Gang Hua, Baoquan Chen, and David Wipf. A generic deep architecture for single image reflection removal and image smoothing. In ICCV, 2017.
  • [6] H. Farid and E. H. Adelson. Separating reflections and lighting using independent components analysis. In CVPR, 1999.
  • [7] Yossi Gandelsman, Assaf Shocher, and Michal Irani. ”double-dip”: Unsupervised image decomposition via coupled deep-image-priors. In CVPR, 2019.
  • [8] Xiaojie Guo, Xiaochun Cao, and Yi Ma. Robust separation of reflection from multiple images. In CVPR, 2014.
  • [9] Byeong-Ju Han and Jae-Young Sim. Reflection removal using low-rank matrix completion. In CVPR, 2017.
  • [10] Y. Hong, Y. Lyu, S. Li, and B. Shi. Near-infrared image guided reflection removal. In ICME, 2020.
  • [11] Soomin Kim, Yuchi Huo, and Sung-Eui Yoon. Single image reflection removal with physically-based training images. In CVPR, 2020.
  • [12] Naejin Kong, Yu-Wing Tai, and Joseph S. Shin. A physically-based approach to reflection separation: from physical modeling to constrained optimization. TPAMI, 2014.
  • [13] Chenyang Lei and Qifeng Chen. Robust reflection removal with reflection-free flash-only cues. In CVPR, 2021.
  • [14] Chenyang Lei, Xuhua Huang, Mengdi Zhang, Wenxiu Sun, Qiong Yan, and Qifeng Chen. Polarized reflection removal with perfect alignment in the wild. In CVPR, 2020.
  • [15] Chao Li, Yixiao Yang, Kun He, Stephen Lin, and John E. Hopcroft. Single image reflection removal through cascaded refinement. In CVPR, 2020.
  • [16] Yu Li and Michael S Brown. Exploiting reflection change for automatic reflection removal. In ICCV, 2013.
  • [17] Yu Li and Michael S Brown. Single image layer separation using relative smoothness. In CVPR, 2014.
  • [18] Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang, Yung-Yu Chuang, and Jia-Bin Huang. Learning to see through obstructions. In CVPR, 2020.
  • [19] Youwei Lyu, Zhaopeng Cui, Si Li, Marc Pollefeys, and Boxin Shi. Reflection separation using a pair of unpolarized and polarized images. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  • [20] Daiqian Ma, Renjie Wan, Boxin Shi, Alex C. Kot, and Ling-Yu Duan. Learning to jointly generate and separate reflections. In ICCV, 2019.
  • [21] Hao Ouyang, Zifan Shi, Chenyang Lei, Ka Lung Law, and Qifeng Chen. Neural camera simulators. In CVPR, 2021.
  • [22] Wieschollek Patrick, Gallo Orazio, Gu Jinwei, and Kautz Jan. Separating reflection and transmission images in the wild. In ECCV, 2018.
  • [23] Abhijith Punnappurath and Michael S. Brown. Reflection removal using a dual-pixel sensor. In CVPR, 2019.
  • [24] Bernard Sarel and Michal Irani. Separating transparent layers through layer information exchange. In ECCV, 2004.
  • [25] Bernard Sarel and Michal Irani. Separating transparent layers of repetitive dynamic behaviors. In ICCV, 2005.
  • [26] Yoav Schechner, Joseph Shamir, and Nahum Kiryati. Polarization and statistical analysis of scenes containing a semireflector. Journal of the Optical Society of America. A, Optics, image science, and vision, 2000.
  • [27] Chao Sun, Shuaicheng Liu, Taotao Yang, Bing Zeng, Zhengning Wang, and Guanghui Liu. Automatic reflection removal using gradient intensity and motion cues. In ACM-MM, 2016.
  • [28] Richard Szeliski, Shai Avidan, and P Anandan. Layer extraction from multiple images containing reflections and transparency. In CVPR, 2000.
  • [29] Renjie Wan, Boxin Shi, Ling-Yu Duan, Ah-Hwee Tan, and Alex C Kot. Benchmarking single-image reflection removal algorithms. In ICCV, 2017.
  • [30] Renjie Wan, Boxin Shi, Haoliang Li, Ling-Yu Duan, Ah-Hwee Tan, and Alex Kot Chichung. Corrn: Cooperative reflection removal network. TPAMI, 2019.
  • [31] Qiaosong Wang, Haiting Lin, Yi Ma, Sing Bing Kang, and Jingyi Yu. Automatic layer separation using light field imaging. arXiv preprint arXiv:1506.04721, 2015.
  • [32] Kaixuan Wei, Jiaolong Yang, Ying Fu, David Wipf, and Hua Huang. Single image reflection removal exploiting misaligned training data and network enhancements. In CVPR, 2019.
  • [33] Qiang Wen, Yinjie Tan, Jing Qin, Wenxi Liu, Guoqiang Han, and Shengfeng He. Single image reflection removal beyond linearity. In CVPR, 2019.
  • [34] Yazhou Xing, Zian Qian, and Qifeng Chen. Invertible image signal processing. In CVPR, 2021.
  • [35] Xiangyu Xu, Yongrui Ma, and Wenxiu Sun. Towards real scene super-resolution with raw images. In CVPR, 2019.
  • [36] Tianfan Xue, Michael Rubinstein, Ce Liu, and William T Freeman. A computational approach for obstruction-free photography. TOG, 2015.
  • [37] Jie Yang, Dong Gong, Lingqiao Liu, and Qinfeng Shi. Seeing deeply and bidirectionally: a deep learning approach for single image reflection removal. In ECCV, 2018.
  • [38] Yang Yang, Wenye Ma, Yin Zheng, Jian-Feng Cai, and Weiyu Xu. Fast single image reflection suppression via convex optimization. In CVPR, 2019.
  • [39] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Cycleisp: Real image restoration via improved data synthesis. In CVPR, 2020.
  • [40] Xuaner Zhang, Ren Ng, and Qifeng Chen. Single image reflection separation with perceptual losses. In CVPR, 2018.