This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Zero-shot denoising via neural compression:
Theoretical and algorithmic framework

Ali Zafari
ali.zafari@rutgers.edu
Equal contribution.
   Xi Chen11footnotemark: 1
xi.chen15@rutgers.edu
   Shirin Jalali
shirin.jalali@rutgers.edu

Department of Electrical and Computer Engineering
Rutgers University
Abstract

Zero-shot denoising aims to denoise observations without access to training samples or clean reference images. This setting is particularly relevant in practical imaging scenarios involving specialized domains such as medical imaging or biology. In this work, we propose the Zero-Shot Neural Compression Denoiser (ZS-NCD), a novel denoising framework based on neural compression. ZS-NCD treats a neural compression network as an untrained model, optimized directly on patches extracted from a single noisy image. The final reconstruction is then obtained by aggregating the outputs of the trained model over overlapping patches. Thanks to the built-in entropy constraints of compression architectures, our method naturally avoids overfitting and does not require manual regularization or early stopping. Through extensive experiments, we show that ZS-NCD achieves state-of-the-art performance among zero-shot denoisers for both Gaussian and Poisson noise, and generalizes well to both natural and non-natural images. Additionally, we provide new finite-sample theoretical results that characterize upper bounds on the achievable reconstruction error of general maximum-likelihood compression-based denoisers. These results further establish the theoretical foundations of compression-based denoising. Our code is available at: https://github.com/Computational-Imaging-RU/ZS-NCDenoiser.

1 Introduction

Background and motivation

Denoising is a fundamental problem in classical signal processing and has recently gained renewed attention from the machine learning community. Let 𝒙=(x1,,xn)+n{\bm{x}}=(x_{1},\ldots,x_{n})\in\operatorname{\mathbb{R}}_{+}^{n} denote a non-negative signal of length nn, where signal 𝒙{\bm{x}} is not observable in many systems. Instead, we observe a noisy version 𝒚=(y1,,yn){\bm{y}}=(y_{1},\ldots,y_{n}), where the observations are conditionally independent given 𝒙{\bm{x}}, and each entry is distributed according to a common conditional distribution:

𝒚i=1np(yixi).\textstyle{\bm{y}}\sim\prod_{i=1}^{n}p(y_{i}\mid x_{i}).

We assume that the noise mechanism is memoryless (independent across coordinates) and homogeneous (identical across entries). The goal of a denoising algorithm is to estimate 𝒙{\bm{x}} from the noisy observations 𝒚{\bm{y}}. Given its prevalence in imaging and data acquisition systems, denoising has been a central topic in signal processing for decades. Classical denoising methods rely on explicit structural assumptions about the underlying signal 𝒙{\bm{x}}, often hand-crafted by domain experts [1, 2, 3, 4, 5, 6, 7, 8]. In contrast, recent advances in machine learning have enabled a new class of data-driven denoising algorithms. These methods learn the optimal denoising function from data, leveraging statistical patterns directly from signal and noise distributions.

While learning-based approaches achieve state-of-the-art performance and often outperform classical methods in controlled settings, they face significant challenges in practice:

  1. 1.

    Supervision requirement: Most learning-based methods require training set of paired samples {(𝒚i,𝒙i)}i=1m\{({\bm{y}}_{i},{\bm{x}}_{i})\}_{i=1}^{m}, where 𝒙i{\bm{x}}_{i} is clean signal and 𝒚i{\bm{y}}_{i} is its noisy counterpart. In practical scenarios such as medical imaging, {𝒙i}i=1m\{{\bm{x}}_{i}\}_{i=1}^{m} are unavailable or prohibitively expensive to obtain.

  2. 2.

    Data efficiency: These methods usually need lots of training data. Acquiring sufficient samples is difficult or costly, particularly in domains with strict data acquisition constraints.

To mitigate the reliance on paired clean and noisy samples, several self-supervised denoising methods have been developed that learn directly from noisy observations, without access to clean ground truth signals [9, 10, 11, 12, 13]. While these approaches alleviate the supervision requirement, they typically depend on access to large collections of noisy data and often yield suboptimal performance compared to methods trained with clean targets. Moreover, the absence of clean supervision necessitates the use of complex neural architectures and training schemes, which can make these methods computationally demanding and difficult to optimize in practice.

These challenges have sparked growing interest in zero-shot denoisers, which aim to recover clean signals from noisy observations without access to paired data or extensive noisy training data. Such methods are particularly appealing in domains where acquiring clean data is infeasible, and they offer the potential for deployable denoisers that adapt to individual inputs with general purpose.

Refer to caption
Figure 1: Zero-Shot Neural Compression Denoiser (ZS-NCD). Learning phase: a neural compression model (architecture shown in Fig. 5 of the supplementary material) is trained on overlapping patches extracted from a single noisy image. Denoising phase: each pixel is reconstructed by averaging predictions across neighboring patches processed by the trained model.
From neural compression to zero-shot denoising

Denoising algorithms—ranging from classical signal processing techniques to deep learning methods—fundamentally rely on the assumption that real-world signals are highly structured. Compression-based denoising leverages this same principle, but rather than directly solving the inverse problem, it instead performs lossy compression on the noisy observation 𝒚{\bm{y}}, under the hypothesis that the clean signal 𝒙{\bm{x}} lies in a lower-complexity subspace and is therefore more compressible.

In lossy compression, the goal is to represent signals from a target class using discrete encodings with minimal distortion. When applied to noisy data, the intuition is that a lossy compressor—operating at a distortion level matched to the noise—will favor reconstructions close to the original clean signal. While this approach has a strong theoretical foundation [14, 15], classical compression-based denoisers have shown limited empirical success, particularly for natural image denoising.

In this work, we revisit this idea in light of recent progress in neural compression, where learned encoders and decoders have demonstrated strong rate-distortion performance across a variety of image domains [16, 17]. Building on this foundation, we propose a zero-shot denoising method that we call the Zero-Shot Neural Compression Denoiser (ZS-NCD). Unlike traditional neural compression models that are trained on large corpora of clean high-resolution images, ZS-NCD learns directly from a single noisy input image. Specifically, we extract overlapping patches from the noisy image, and train a neural compression network on those patches alone—without any clean supervision or prior dataset. Once trained, the denoiser is applied to all patches from the same image, and the final output is obtained by averaging the predictions in overlapping regions. This approach is illustrated in Figure 1.

Despite relying solely on the noisy input and operating without supervision, ZS-NCD achieves state-of-the-art performance among zero-shot denoising methods across diverse noise models, and remains robust even on inputs that lie outside the natural image distribution. We compare it against the baselines in Figure 2, ZS-NCD shows superior performance in denoising and training stability.

Refer to caption
Refer to caption
Figure 2: Zero-shot denoising of Kodim05 with AWGN (σ=25\sigma=25). Left: PSNR versus training iterations for zero-shot denoisers. Performance of BM3D [18] and Restormer [19] are included as a classical baseline and as a supervised empirical upper bound, respectively. Right: Visual reconstructions with PSNR/SSIM above each image. Compression-based denoising based on JPEG-2K [20] achieves inferior performance. Learning-based zero-shot denoisers often struggle with either overfitting or high bias. DIP [21] and DD [22] require early stopping to avoid overfitting. ZS-N2S [12] and S2S [23] struggle with high-resolution color images, and ZS-N2N [24] often produces noisy outputs with potential overfitting. BM3D tends to oversmooth the denoised image. In contrast, ZS-NCD avoids these issues.
Paper contributions

This paper introduces a zero-shot image denoising framework based on neural compression. Our main contributions are:

  • \bullet

    A zero-shot denoising algorithm using neural compression. We propose a fully unsupervised method that trains a neural compression network on image patches of the noisy input. It does not rely on clean images, paired datasets, or prior training on the target distribution. It is architecture-agnostic and leverages only the structure present in the observed noisy image.

  • \bullet

    Theoretical results connecting denoising and compression performance. We establish finite-sample upper bounds on the reconstruction error of the proposed compression-based maximum likelihood denoisers, for both Gaussian and Poisson noise models.

  • \bullet

    Extensive empirical validation. We demonstrate that our method achieves state-of-the-art performance among zero-shot denoising techniques across a range of noise models and datasets.

2 Related work

Self-supervised and zero-shot denoising

Supervised learning-based denoisers such as DnCNN [25] and Restormer [19] achieve state-of-the-art performance across various noise models, but require large datasets of paired clean and noisy images—often impractical in real-world settings. To avoid clean images, self-supervised methods have been proposed, including Noise2Noise [10], Noise2Self [12], Noise2Void [11], Noise2Same[26] and Noise2Score [13], which only use noisy images for training. However, their reliance on large noisy datasets remains a limitation. Zero-shot denoisers address this by training on a single noisy image. These include (i) untrained networks like DIP [27] and Deep Decoder [22], and (ii) single-image adaptations of self-supervised methods, e.g., ZS-N2N [24], ZS-N2S [12] and its augmented variant with ensembling, S2S [23]. DIP-based models avoid masking and leverage full-image context, but require early stopping or under-parameterization to avoid overfitting. Self-supervised variants suffer from masking-induced information loss. Hybrid approaches, such as masked pretraining-based method [28], uses external datasets for training and perform zero-shot inference, thus falling outside the zero-shot setting studied here.

Neural compression

Learning-based lossy compression, often referred to as neural compression, uses an autoencoder architecture combined with an entropy model to estimate and constrain the bitrate at the bottleneck [29, 30]. These methods have significantly outperformed traditional codecs, particularly in image [29, 31, 32, 33, 34] and video [35, 36, 37] compression. In addition to these standard settings, several works have also explored applying neural compression to noisy data, either by adapting neural compression models for more efficient encoding of noisy images [38, 39, 40, 41].

Compression-based denoising

Compression-based denoising leverages the insight that structured signals are inherently more compressible than their noisy counterparts. This connection was formalized by Donoho [14], who introduced the minimum Kolmogorov complexity estimator, and further refined by Weissman et al. [15], showing that, under certain conditions on both the signal and the noise, optimal lossy compression of a noisy signal-followed by suitable post-processing-can asymptotically achieve optimal denoising performance. Prior to these theoretical developments, early empirical methods such as wavelet-based schemes [42, 43, 44] and MDL-inspired heuristics [45] had explored this principle. Nevertheless, traditional compression-based denoisers have generally underperformed in high-dimensional settings such as image denoising.

Learning-based joint compression and denoising using neural compression has been explored in recent works [46, 47], where the goal is to achieve lower rate in compression. The empirical application of training neural compression for AWGN denoising was also proposed in [48]. Training the neural compression models in these works requires a dataset of images. In contrast, our proposed ZS-NCD is a two-step denoiser based on neural compression, trained on a single noisy image. It achieves state-of-the-art performance across both AWGN and Poisson noise models. Moreover, we contribute new theoretical results that advance the foundations of compression-based denoising.

3 Compression-based denoising: Theoretical foundations

Lossy compression

Let 𝒬n\mathcal{Q}\subset\operatorname{\mathbb{R}}^{n} denote the signal class of interest, such as vectorized natural images of a fixed size. A lossy compression code for 𝒬\mathcal{Q} is defined by an encoder-decoder pair (f,g)(f,g), f:𝒬{1,,2R}f:\mathcal{Q}\to\{1,\ldots,2^{R}\}, and g:{1,,2R}ng:\{1,\ldots,2^{R}\}\to\operatorname{\mathbb{R}}^{n}. The performance of a lossy code is characterized by: i) Rate RR, indicating the number of distinct codewords; ii) Distortion δ\delta, defined as the worst-case per-symbol mean squared error (MSE) over the signal class:

δ=sup𝒙𝒬1n𝒙g(f(𝒙))22.\textstyle\delta=\sup_{{\bm{x}}\in\mathcal{Q}}\frac{1}{n}\|{\bm{x}}-g(f({\bm{x}}))\|_{2}^{2}.

The set of reconstructions produced by the decoder forms the codebook:

𝒞={g(i):i=1,,2R}n.\textstyle\mathcal{C}=\{g(i):i=1,\ldots,2^{R}\}\subset\operatorname{\mathbb{R}}^{n}.
Compression-based denoising

We propose compression-based denoising as a structured maximum likelihood (ML) estimation. Given a noisy observation 𝒚i=1np(yixi){\bm{y}}\sim\prod_{i=1}^{n}p(y_{i}\mid x_{i}) and a a lossy compression code (f,g)(f,g) for 𝒬\mathcal{Q}, the compression-based ML denoiser solves

𝒙^=argmin𝐜𝒞(𝐜;𝒚),where(𝐜;𝒚):=i=1nlogp(yici).\textstyle\hat{\bm{x}}=\arg\min_{{\bf c}\in\mathcal{C}}\mathcal{L}({\bf c};{\bm{y}}),\quad\text{where}\quad\mathcal{L}({\bf c};{\bm{y}}):=-\sum_{i=1}^{n}\log p(y_{i}\mid c_{i}).

This formulation leverages the fact that clean signals, by virtue of their structure, are more compressible than their noisy counterparts. Therefore, the most likely codeword under the noise model, when selected from a codebook designed to represent clean signals, serves as a natural denoising estimate. This ML-based view unifies denoising across noise models and provides a principled way to select reconstructions from a discrete, structure-aware prior.

In the case of AWGN: 𝒚=𝒙+𝒛{\bm{y}}={\bm{x}}+{\bm{z}}, where 𝒛𝒩(𝟎,σz2In){\bm{z}}\sim\mathcal{N}(\mathbf{0},\sigma_{z}^{2}I_{n}), the described denoiser simplifies to:

𝒙^=argmin𝐜𝒞𝒚𝐜22.\displaystyle\textstyle\hat{\bm{x}}=\arg\min_{{\bf c}\in\mathcal{C}}\|{\bm{y}}-{\bf c}\|_{2}^{2}. (1)

That is, denoising corresponds to projecting the noisy observation onto the nearest codeword.

Poisson noise commonly arises in low-light and photon-limited imaging scenarios. In this setting, each yiy_{i} is modeled as a Poisson random variable with mean αxi\alpha x_{i}: yiPoisson(αxi)y_{i}\sim\mathrm{Poisson}(\alpha x_{i}). Under this model, the compression-based ML denoiser simplifies to

𝒙^=argmin𝐜𝒞i=1n(αciyilogci).\displaystyle\textstyle\hat{\bm{x}}=\arg\min_{{\bf c}\in\mathcal{C}}\sum_{i=1}^{n}\left(\alpha c_{i}-y_{i}\log c_{i}\right). (2)

While the loss function in (2) is statistically well-motivated, it is more sensitive to optimization issues than its Gaussian counterpart due to the curvature and nonlinearity of the log term. To improve robustness and simplify optimization, we also consider an alternative loss based on a normalized squared error between 𝐜{\bf c} and the rescaled observations:

𝒙^=argmin𝐜𝒞𝐜1α𝒚22.\displaystyle\textstyle\hat{\bm{x}}=\arg\min_{{\bf c}\in\mathcal{C}}\left\|{\bf c}-\frac{1}{\alpha}{\bm{y}}\right\|_{2}^{2}. (3)
Theoretical analysis

We begin by analyzing the performance of compression-based ML denoising under AWGN. The following result provides a non-asymptotic upper bound on the reconstruction error in terms of the compression rate and distortion. All proofs can be found in Appendix 6.

Theorem 1.

Assume that 𝒙𝒬{\bm{x}}\in\mathcal{Q} and let (f,g)(f,g) denote a lossy compression for 𝒬\mathcal{Q} that operates at rate RR and distortion δ\delta. Consider 𝒚=𝒙+𝒛{\bm{y}}={\bm{x}}+{\bm{z}}, where 𝒛𝒩(𝟎,σz2In){\bm{z}}\sim\mathcal{N}(\mathbf{0},\sigma_{z}^{2}I_{n}). Let 𝒙^\hat{\bm{x}} denote the output of the compression-based denoiser defined by (f,g)(f,g) as in (1). Then,

1n𝒙𝒙^2\displaystyle{1\over\sqrt{n}}\|{\bm{x}}-\hat{\bm{x}}\|_{2} δ+2σz(2ln2)Rn(1+2η),\displaystyle\leq\sqrt{\delta}+2\sigma_{z}\sqrt{(2\ln 2)R\over n}(1+2\sqrt{\eta}), (4)

with a probability larger than 12ηR+21-2^{-\eta R+2}.

This bound decomposes the denoising error into two terms: a distortion term δ\sqrt{\delta}, which reflects the approximation quality of the compression code, and a rate-dependent term that scales with the square root of the code rate RR. The latter captures the likelihood concentration around the clean signal in high-probability regions of the noise distribution. Notably, the result holds non-asymptotically and does not assume the code is optimal, only that it provides a distortion-δ\delta covering of 𝒬\mathcal{Q}. This highlights that even non-ideal compression codes can enable effective denoising, provided the rate-distortion tradeoff is well-calibrated.

To better understand the implications of Theorem 1, in the following corollary, we focus on the special case of kk-sparse signals.

Corollary 1 (AWGN, sparse signals).

Let 𝒬n\mathcal{Q}_{n} denote the set of kk-sparse vectors in n\operatorname{\mathbb{R}}^{n} satisfying 𝒙21\|{\bm{x}}\|_{2}\leq 1. Fix a parameter η(0,1)\eta\in(0,1), and suppose 𝒚=𝒙+𝒛{\bm{y}}={\bm{x}}+{\bm{z}} where 𝒛𝒩(0,σz2In){\bm{z}}\sim\mathcal{N}(0,\sigma_{z}^{2}I_{n}). Then, there exists a family of compression codes such that, when used with the denoiser defined in (1), the estimate 𝒙^\hat{\bm{x}} satisfies

1nσz2𝒙^𝒙22Cklog2nn+γn,\frac{1}{n\sigma_{z}^{2}}\|\hat{{\bm{x}}}-{\bm{x}}\|_{2}^{2}\leq C\cdot\frac{k\log_{2}n}{n}+\gamma_{n},

with probability at least 14(kn3k/2)η1-\frac{4}{(kn^{3k/2})^{\eta}}. Here, γn=o(1)\gamma_{n}=o(1) and C=4ln2(1+2η)2C=4\ln 2(1+2\sqrt{\eta})^{2}.

To contextualize this result, consider the asymptotic setting where nn\to\infty and σz20\sigma_{z}^{2}\to 0. For an i.i.d. Bernoulli–Gaussian source with (xi0)=p\mathbb{P}\left\lparen x_{i}\neq 0\right\rparen=p, the minimum mean squared error (MMSE), normalized by the noise power σz2\sigma_{z}^{2}, converges to pp [49]. In contrast, our result is non-asymptotic and worst-case: it shows that the MSE of compression-based denoising is bounded by CklognnC\cdot\frac{k\log n}{n}. In the probabilistic setting where k/npk/n\approx p, the two results exhibit consistent scaling with sparsity.

We next extend our analysis to signal-dependent noise model. Poisson noise is particularly relevant in imaging applications such as microscopy and astronomy, where photon counts vary with signal intensity. Unlike Gaussian noise, Poisson observations induce a non-linear likelihood surface, making analysis more delicate. Theorem 2 and 3 establish performance guarantees for compression-based Poisson denoising, using both exact ML formulation and a practical squared-error surrogate.

Theorem 2.

Consider the same setup of lossy compression as in Theorem 1. Assume that for any 𝒙𝒬{\bm{x}}\in\mathcal{Q}, xi(xmin,xmax)x_{i}\in(x_{\min},x_{\max}), where 0<xmin<xmax<10<x_{\min}<x_{\max}<1. Assume that y1,,yny_{1},\ldots,y_{n} are independent with yiPoisson(αxi)y_{i}\sim\mathrm{Poisson}(\alpha x_{i}). Let 𝒙^\hat{\bm{x}} denote the solution of (2). Let C1=xmax5/(xmin2)C_{1}={x_{\max}^{5}/(x_{\min}^{2})} and C2=xmax2xmin3β(4ln2)(1+η+η)C_{2}={x_{\max}^{2}\over x_{\min}^{3}}\beta\sqrt{({4\over\ln 2})}(\sqrt{1+\eta}+\sqrt{\eta}). Then, with a probability larger than 12ηR+21-2^{-\eta R+2},

1n𝒙𝒙^22\displaystyle{1\over n}\|{\bm{x}}-\hat{\bm{x}}\|_{2}^{2} C1δ+C2Rnα.\displaystyle\leq C_{1}\delta+C_{2}\sqrt{{R\over n\alpha}}. (5)
Theorem 3.

Consider the same setup as in Theorem 2. Let 𝒙^\hat{\bm{x}} denote the solution of (3). Let C=(4ln2)(1+η+η+1)C=\left\lparen 4\sqrt{\ln 2}\right\rparen\left\lparen\sqrt{1+\eta}+\sqrt{\eta}+1\right\rparen. Then, with a probability larger than 12ηR+21-2^{-\eta R+2},

1n𝒙𝒙^22\displaystyle{1\over n}\|{\bm{x}}-\hat{\bm{x}}\|_{2}^{2} δ+CRnα\displaystyle\leq\delta+C\sqrt{R\over n\alpha} (6)
Remark 1.

Theorems 2 and  3 show that, in the case of Poisson noise, there is no loss in performance, when the more computationally efficient MSE loss function is minimized instead of the ML loss function. This result is also consistent with our simulations reported later in Section 5.

4 Zero-shot compression-based denoiser

We refer to a general class of learning-based denoisers that operate by compressing noisy images using neural compression as the Neural Compression Denoiser (NCD). In this framework, denoising is achieved by identifying a low-complexity reconstruction from the output of a neural compression model. In the previous section, we characterized the performance of such denoisers in a setting where the compression code is fixed in advance, either learned from external data or designed using classical methods, and applied independently of the noisy input. This setup is not zero-shot, as it relies on prior training or code design. Inspired by this idea, we now propose a fully unsupervised variant: the Zero-Shot Neural Compression Denoiser (ZS-NCD). In ZS-NCD, a neural compression network is trained directly on patches extracted from a single noisy image, without access to clean targets or external data. This section describes the ZS-NCD architecture and optimization procedure in detail.

Proposed zero-shot denoiser: ZS-NCD

Let 𝒫(i,j):h×wk×k\mathcal{P}_{(i,j)}:\operatorname{\mathbb{R}}^{h\times w}\to\operatorname{\mathbb{R}}^{k\times k} denote the patch extraction operator, which returns a k×kk\times k patch whose top-left corner is at pixel (i,j)(i,j). Let f𝜽1f_{\bm{\theta}_{1}} and g𝜽2g_{\bm{\theta}_{2}} denote the encoder and decoder networks, parameterized by weights 𝜽1\bm{\theta}_{1} and 𝜽2\bm{\theta}_{2}, respectively. Define \mathcal{I} as the set of all coordinates (i,j){1,,hk+1}×{1,,wk+1}(i,j)\in\{1,\ldots,h-k+1\}\times\{1,\ldots,w-k+1\} from which a valid k×kk\times k patch can be extracted.

Given a single noisy image 𝒚{\bm{y}}, the ZS-NCD is trained to minimize the following patchwise objective:

(𝜽^1,𝜽^2)=argmin(𝜽1,𝜽2)(i,j)(K(g𝜽2(f𝜽1(𝒫(i,j)(𝒚))),𝒫(i,j)(𝒚))λlog(f𝜽1(𝒫(i,j)(𝒚)))),\displaystyle(\hat{\bm{\theta}}_{1},\hat{\bm{\theta}}_{2})=\operatorname*{arg\,min}_{(\bm{\theta}_{1},\bm{\theta}_{2})}\sum_{(i,j)\in\mathcal{I}}\left(\mathcal{L}_{K}(g_{\bm{\theta}_{2}}(f_{\bm{\theta}_{1}}(\mathcal{P}_{(i,j)}({\bm{y}}))),\mathcal{P}_{(i,j)}({\bm{y}}))-\lambda\log\mathbb{P}\left\lparen f_{\bm{\theta}_{1}}(\mathcal{P}_{(i,j)}({\bm{y}}))\right\rparen\right), (7)

where (f𝜽1(𝒫(i,j)(𝒚)))\mathbb{P}\left\lparen f_{\bm{\theta}_{1}}(\mathcal{P}_{(i,j)}({\bm{y}}))\right\rparen denotes the likelihood (or entropy model) of the latent code produced by the encoder, and λ>0\lambda>0 is a hyperparameter controlling the trade-off between fidelity and compressibility. In (7), K=k2K=k^{2}, and the function K:K×K+\mathcal{L}_{K}:\operatorname{\mathbb{R}}^{K}\times\operatorname{\mathbb{R}}^{K}\to\operatorname{\mathbb{R}}_{+} is a distortion loss determined by the noise model, as defined in Section 3. For example, in the AWGN case, K\mathcal{L}_{K} corresponds to the squared 2\ell_{2} norm of the distance between a noisy patch and its neural compression reconstruction. Note that f𝜽1f_{\bm{\theta}_{1}} maps the input into a discrete latent space, which is non-differentiable and thus incompatible with standard gradient-based optimization. To address this, we follow the neural compression framework of [29], using a continuous relaxation during training (e.g., uniform noise injection) and applying actual discretization only at test time. The entropy term \mathbb{P} is modeled using a factorized, non-parametric density [31].

After training, the denoised image is obtained by applying the encoder and decoder to each patch and averaging the overlapping outputs. For each pixel (i,j)(i,j), let (i,j)\mathcal{I}_{(i,j)}\subset\mathcal{I} denote the set of patch locations such that 𝒫(i,j)\mathcal{P}_{(i^{\prime},j^{\prime})} includes the pixel (i,j)(i,j). The final estimate at location (i,j)(i,j) is given by

x^(i,j)=1|(i,j)|(i,j)(i,j)g𝜽2(f𝜽1(𝒫(i,j)(𝒚)))|(ii,jj),\displaystyle\hat{x}_{(i,j)}=\frac{1}{|\mathcal{I}_{(i,j)}|}\sum_{(i^{\prime},j^{\prime})\in\mathcal{I}_{(i,j)}}\left.g_{\bm{\theta}_{2}}(f_{\bm{\theta}_{1}}(\mathcal{P}_{(i^{\prime},j^{\prime})}({\bm{y}})))\right|_{(i-i^{\prime},j-j^{\prime})}, (8)

where |(i,j)||\mathcal{I}_{(i,j)}| denotes the number of patches covering pixel (i,j)(i,j), and |(a,b)\left.\cdot\right|_{(a,b)} denotes the (a,b)(a,b)-th pixel of the patch output. For interior pixels away from the boundary, |(i,j)|=K|\mathcal{I}_{(i,j)}|=K. As shown later in Section 5, this aggregating of reconstructed patches significantly enhances denoising performance.

Algorithm 1 Finding Lagrangian coefficient λ\lambda
1:  Initialize: λ(0)\lambda^{(0)}
2:  for iteration k=0,1,,Kmax1k=0,1,\dotsc,K_{\max}-1 do
3:     Estimate 𝒙^(k)\hat{\bm{x}}^{(k)} from (8), and update λ\lambda:
λ(k+1):={2λ(k),if 𝒙^(k)𝒚2>nσz2λ(k)/2,otherwise\lambda^{(k+1)}:=\begin{cases}2\lambda^{(k)},&\text{if }\|\hat{\bm{x}}^{(k)}-\bm{y}\|^{2}>n\sigma_{z}^{2}\\ \lambda^{(k)}/2,&\text{otherwise}\end{cases}
4:  end for
5:  Return: λ\lambda^{*}
Refer to caption
Figure 3: Effect of λ\lambda in denoising Mouse Nuclei image.
Setting the hyperparameter λ\lambda

The ZS-NCD objective in (7) includes a hyperparameter, λ\lambda, which balances reconstruction fidelity and compressibility. Interpreted through the lens of lossy compression, varying λ\lambda allows the model to explore different rate-distortion trade-offs. However, in the context of denoising, our goal is not compression but accurate signal recovery from the noisy observation 𝒚{\bm{y}}. This raises the central question: how should λ\lambda be selected to optimize denoising performance?

Let 𝒙^\hat{\bm{x}} denote the output of the ZS-NCD denoiser, and consider the AWGN model 𝒚=𝒙+𝒛{\bm{y}}={\bm{x}}+{\bm{z}} with 𝒛𝒩(0,σz2In){\bm{z}}\sim\mathcal{N}(0,\sigma_{z}^{2}I_{n}). Then,

1n𝔼[𝒚𝒙^22]=1n𝔼[𝒚𝒙+𝒙𝒙^22]=σz2+1n𝒙𝒙^22+2n𝔼[𝒛T(𝒙𝒙^)],\displaystyle\textstyle{1\over n}\mathbb{E}\left[\|{\bm{y}}-\hat{\bm{x}}\|_{2}^{2}\right]={1\over n}\mathbb{E}\left[\|{\bm{y}}-{\bm{x}}+{\bm{x}}-\hat{\bm{x}}\|_{2}^{2}\right]=\sigma_{z}^{2}+{1\over n}\|{\bm{x}}-\hat{\bm{x}}\|_{2}^{2}+{2\over n}\mathbb{E}\left[{\bm{z}}^{T}({\bm{x}}-\hat{\bm{x}})\right], (9)

the first term is the noise variance, and the second is the true denoising error. While 𝒛{\bm{z}} and 𝒙^\hat{\bm{x}} are not fully independent, they are intuitively weakly correlated in successful denoising regimes, where the estimate 𝒙^\hat{\bm{x}} depends only indirectly on the noise. Thus, the cross term is expected to be small: 1n𝔼[𝒛(𝒙𝒙^)]0.\frac{1}{n}\mathbb{E}\left[{\bm{z}}^{\top}({\bm{x}}-\hat{\bm{x}})\right]\approx 0. This approximation suggests that 1n𝒚𝒙^22\frac{1}{n}\|{\bm{y}}-\hat{\bm{x}}\|_{2}^{2} should be close to σz2\sigma_{z}^{2} when 𝒙^\hat{\bm{x}} is a high-quality estimate. Based on this insight, we propose a simple and effective heuristic for choosing λ\lambda: select the value that makes 1n𝒚𝒙^22\frac{1}{n}\|{\bm{y}}-\hat{\bm{x}}\|_{2}^{2} closest to the known noise variance σz2\sigma_{z}^{2}. This procedure can be implemented efficiently via a tree-based search strategy, as described in Algorithm 1. To apply Algorithm 1, one needs an estimate of the noise power σz2\sigma_{z}^{2}. This is a well-studied problem and there exist robust algorithms for estimating the power of noise [2, 50]. For example, in [2], it is shown that the noise power can be estimated from the median of the absolute differences of wavelet coefficients. Consequently, the performance of ZS-NCD is relatively robust to the choice of λ\lambda. For instance, on the Nuclei dataset (Figure 3), ZS-NCD outperforms the state-of-the-art zero-shot learning-based denoiser, ZS-Noise2Noise, across a wide range of λ\lambda values, for both σz=10\sigma_{z}=10 and σz=20\sigma_{z}=20. A similar approach can be used in the case of Poisson noise as well. (See Section A.2)

Table 1: Denoising performance comparison under AWGN and Poisson Noise, average PSNR(dB) and SSIM are reported. Best results are in bold, second-best are underlined.
Noise Parameter AWGN, 𝒩(0,σ2)\mathcal{N}(0,\sigma^{2}) Poisson, Poisson(α𝒙)/α\mathrm{Poisson}(\alpha\bm{x})/\alpha
σ\sigma or α\alpha Method Set11 Set13 Kodak24 Set11 Set13 Kodak24
15 JPEG2K 27.45 / 0.7699 26.69 / 0.7543 27.86 / 0.7457 22.35 / 0.5882 21.76 / 0.5494 22.56 / 0.5249
BM3D 32.22 / 0.8992 31.15 / 0.8808 32.37 / 0.8754 26.66 / 0.7505 25.64 / 0.6912 27.04 / 0.6900
DIP 29.11 / 0.7990 30.31 / 0.8570 31.42 / 0.8454 23.69 / 0.5863 25.14 / 0.6916 26.37 / 0.6761
DD 28.83 / 0.8215 29.22 / 0.8371 28.71 / 0.8016 24.37 / 0.6629 24.96 / 0.7006 25.59 / 0.6679
S2S 26.81 / 0.8158 20.61 / 0.6879 23.08 / 0.7695 21.75 / 0.6872 19.23 / 0.6553 22.52 / 0.7418
ZS-N2S 28.92 / 0.8495 18.18 / 0.5690 18.68 / 0.5540 25.06 / 0.7051 21.23 / 0.6066 22.24 / 0.6170
ZS-N2N 30.01 / 0.8169 30.95 / 0.8701 32.30 / 0.8650 24.04 / 0.5766 25.37 / 0.6878 26.80 / 0.6757
ZS-NCD 31.35 / 0.8580 31.93 / 0.8983 33.18 / 0.9026 25.65 / 0.7132 26.44 / 0.7434 27.64 / 0.7432
25 JPEG2K 24.91 / 0.6997 24.32 / 0.6676 25.43 / 0.6550 23.03 / 0.6108 22.65 / 0.5952 23.58 / 0.5680
BM3D 30.83 / 0.8659 28.81 / 0.8213 29.98 / 0.8092 22.70 / 0.5741 22.17 / 0.5992 24.13 / 0.5931
DIP 26.60 / 0.7128 27.85 / 0.7837 28.90 / 0.7738 24.94 / 0.6512 26.13 / 0.7289 27.49 / 0.7243
DD 26.93 / 0.7530 27.40 / 0.7832 27.62 / 0.7496 25.48 / 0.7022 26.04 / 0.7373 26.56 / 0.7060
S2S 23.32 / 0.7306 17.95 / 0.5998 20.69 / 0.6949 23.40 / 0.7355 20.18 / 0.6927 23.09 / 0.7674
ZS-N2S 27.30 / 0.7971 20.39 / 0.6200 20.89 / 0.6156 26.01 / 0.7478 21.19 / 0.6312 21.47 / 0.6277
ZS-N2N 27.18 / 0.7173 28.36 / 0.8001 29.54 / 0.7798 25.40 / 0.6432 26.75 / 0.7455 28.21 / 0.7374
ZS-NCD 28.93 / 0.8079 29.33 / 0.8351 30.60 / 0.8144 27.10 / 0.7431 27.60 / 0.7827 28.77 / 0.7677
50 JPEG2K 22.05 / 0.5794 21.43 / 0.5295 22.17 / 0.5055 24.77 / 0.6811 24.25 / 0.6696 25.52 / 0.6608
BM3D 28.25 / 0.8049 25.78 / 0.7134 27.06 / 0.7047 23.09 / 0.5787 23.00 / 0.6281 24.49 / 0.6008
DIP 23.46 / 0.5783 24.82 / 0.6748 25.90 / 0.6494 26.30 / 0.7004 27.72 / 0.7845 29.12 / 0.7845
DD 24.01 / 0.6584 24.56 / 0.6779 24.98 / 0.6413 26.87 / 0.7455 27.43 / 0.7867 27.71 / 0.7543
S2S 17.41 / 0.5200 14.21 / 0.3938 17.00 / 0.5325 25.70 / 0.7896 21.75 / 0.7365 23.88 / 0.8014
ZS-N2S 24.74 / 0.6883 20.62 / 0.5880 20.05 / 0.5774 27.08 / 0.7855 20.75 / 0.6033 20.25 / 0.5993
ZS-N2N 23.52 / 0.5457 24.67 / 0.6444 25.82 / 0.6151 27.26 / 0.7216 28.57 / 0.8112 30.13 / 0.8076
ZS-NCD 25.58 / 0.7144 25.87 / 0.7269 27.89 / 0.7464 28.44 / 0.7914 29.09 / 0.8223 30.60 / 0.8235

5 Experiments

In this section, we evaluate the denoising performance of ZS-NCD on both synthetic and real-world noise, across natural and microscopy images. We compare against representative zero-shot denoisers, including both traditional and learning-based methods. All baselines are dataset-free, i.e., they operate solely on the noisy image to be denoised. For non-learning methods, we include JPEG-2K and BM3D. Although rarely used as a denoising baseline, JPEG-2K provides a useful point of comparison from the perspective of compression-based denoising, as it represents a fixed, pre-defined compression code. For learning-based methods, we evaluate Deep Image Prior (DIP) [21], Deep Decoder (DD) [22], Zero-Shot Noise2Self (ZS-N2S) [12], Self2Self (S2S) [23], and Zero-Shot Noise2Noise (ZS-N2N) [24].

Due to instability in training for several baselines, we report their best achieved performance (with early stopping or model selection), whereas ZS-NCD is evaluated at its final training iteration, without manual tuning or stopping criteria.

Natural images with synthetic noise

We consider two synthetic noise models, AWGN 𝒩(0,σz2)\mathcal{N}(0,\sigma^{2}_{z}), where σz\sigma_{z} is the standard deviation of the Gaussian distribution, and Poisson noise defined as Poisson(αx)\mathrm{Poisson}(\alpha x), where α\alpha is the scale factor. Note that Poisson noise is signal-dependent noise with 𝔼[𝒚]=α𝒙\mathbb{E}\left[{\bm{y}}\right]=\alpha{\bm{x}}. To re-scale the noisy image to the range of clean image, we followed the literature by assuming that the the scale α\alpha is known, and normalize the noisy image as 𝒚/α{\bm{y}}/\alpha in the experiments of this section. We evaluate on grayscale Set11 [25], RGB Set13 [51] (center-cropped to 192×192192\times 192) and Kodak24 [52] datasets. Table 1 presents the denoising performance of various methods. BM3D achieves the strongest results on grayscale images, though it relies on accurate knowledge of the noise power parameter. Existing learning-based zero-shot denoisers, in contrast, often exhibit inconsistent performance across noise levels and image resolutions. For example, ZS-N2S and Self2Self degrade on high-resolution images, likely due to the limitations of training with masked pixels. ZS-N2N performs well on high-resolution images from Kodak24 but suffers on lower-resolution images in Set13 (192×192192\times 192), as it is trained to map between two downscaled versions of the same noisy image. In comparison, ZS-NCD maintains robust performance across different noise levels and image sizes. The more realistic case of not having access to the noise parameter α\alpha is discussed in Appendix A.2. In both noise regimes, we use MSE as the loss function. However, for Poisson noise, minimizing the negative log-likelihood is also a natural choice. We defer the results using this loss to Appendix A.2.

Table 2: First 2 rows: Denoising performance (average PSNR / SSIM of 6 images) under AWGN 𝒩(0,σ2I)\mathcal{N}(0,\sigma^{2}I) on Mouse Nucle fluorescence microscopy images (image size 128×128128\times 128). Noise levels are 10 and 20. Last row: Real camera denoising performance on camera image dataset: PolyU. The images are cropped into size of 512×512512\times 512. We report the average PSNR / SSIM of 6 random images.
σ\sigma JPEG2K BM3D DIP DD ZS-N2N ZS-N2S S2S ZS-NCD
10 32.89 / 0.8294 38.65 / 0.9640 36.43 / 0.8789 37.33 / 0.9533 36.17 / 0.9319 31.26 / 0.8812 12.63 / 0.2966 38.23 / 0.9508
20 28.57 / 0.6986 34.96 / 0.9296 32.32 / 0.7889 33.50 / 0.9092 32.25 / 0.8532 30.41 / 0.8600 10.09 / 0.1559 34.71 / 0.9093
Unknown 32.89 / 0.8294 35.71 / 0.9506 35.43 / 0.9408 34.83 / 0.9395 34.07 / 0.9028 23.61 / 0.8344 35.66 / 0.9527 35.84 / 0.9534
Fluorescence microscopy and real camera images

To evaluate performance in low-data and domain-shift settings, we test ZS-NCD on Mouse Nuclei fluorescence microscopy images [53], which differ significantly from natural images in structure and texture. We also assess real-world denoising using the PolyU dataset [54], which contains high-resolution images captured by Canon, Nikon, and Sony cameras. Ground-truth images are obtained by averaging multiple captures, while the noisy inputs are single-shot acquisitions. Results are shown in Table 2. ZS-NCD consistently outperforms other learning-based zero-shot denoisers, demonstrating robustness to unknown noise models and non-natural image distributions.

Table 3: AWGN denoising on Set11.
ZS-NCD σ=25\sigma=25 σ=50\sigma=50
Conv 28.93 / 0.8079 25.58 / 0.7144
MLP 29.52 / 0.8363 25.89 / 0.7306
Robustness to overfitting.

Most learning-based zero-shot methods are prone to overfitting due to the lack of clean targets and the use of overparameterized networks.

In contrast, ZS-NCD, grounded in compression-based denoising theory, overcomes this issue given the entropy constraint. To further highlight this key aspect of ZS-NCD, we replace the convolutional encoder-decoder (0.4\approx 0.4M params) with a fully connected MLP (2.3\approx 2.3M params) and observe that, instead of degradation, the performance improves using the same λ\lambda (see Table 3).

Effect of overlapping patch aggregation.

As described in Section 4 and illustrated in Fig. 1, ZS-NCD denoises each pixel by aggregating outputs from overlapping patches, where each patch is first compressed and then decompressed using a learned neural compression model. Intuitively, one might expect the most accurate reconstruction for a given pixel to come from the patch in which it lies at the center, as this location benefits from the largest available spatial context, which has been observed in [55].

This observation leads to the question: Does averaging over overlapping reconstructions improve denoising quality, or would it suffice to use only the patch where pixel appears at a fixed position (e.g., the center)? From a computational perspective, both strategies are equivalent, since in both methods every patch is processed, but in averaging scheme, each patch contributes to all the pixels it covers.

To investigate this, we conducted an ablation in which, instead of averaging, each pixel (i,j)(i,j) is reconstructed solely from one of the k×kk\times k patches in which it appears, using a fixed location in the patch (e.g., top-left, center, etc.). The results are shown in Fig. 4, where each heatmap entry reports the PSNR

Refer to caption
Figure 4: Denoising AWGN (σz=25\sigma_{z}=25) on image Parrot using ZS-NCD, where only a single pixel from each overlapping patch (stride 1) is retained after compression. Each heatmap value indicates the PSNR achieved when denoising is based solely on the pixel at that specific location within each patch.

obtained by using only that specific location in the patch for reconstruction. As expected, performance is best when the pixel is centrally located, and degrades as it moves toward the patch boundaries.

However, the key observation is that averaging across all overlapping reconstructions yields a substantial performance gain. For instance, in denoising Parrot (from Set11 dataset), the best single-location reconstruction achieves 25.90 dB (center), while averaging achieves 28.14 dB, a gain of over 2 dB. This highlights the denoising benefit of combining multiple noisy views of each pixel, consistent with principles from ensembling and variance reduction.

6 Proof of Theorems

6.1 Auxiliary lemmas

Before stating the proofs of the mains theorems, here we state some lemmas that will be used later in the proofs.

Lemma 1.

Assume that 0<αmα1,α2αM<0<\alpha_{m}\leq\alpha_{1},\alpha_{2}\leq\alpha_{M}<\infty. Then,

(αmαM)2(α2α1)22α1DKL(Poisson(α1)Poisson(α2))(αMαm)2(α2α1)22α1.({\alpha_{m}\over\alpha_{M}})^{2}\frac{(\alpha_{2}-\alpha_{1})^{2}}{2\alpha_{1}}\leq D_{\text{KL}}(\mathrm{Poisson}(\alpha_{1})\|\mathrm{Poisson}(\alpha_{2}))\leq({\alpha_{M}\over\alpha_{m}})^{2}\frac{(\alpha_{2}-\alpha_{1})^{2}}{2\alpha_{1}}.
Lemma 2.

Consider independent Poisson random variables Y1,,YnY_{1},\ldots,Y_{n}, where YiPoisson(αi)Y_{i}\sim\mathrm{Poisson}(\alpha_{i}). Consider w1,,wnw_{1},\ldots,w_{n}\in\operatorname{\mathbb{R}}. Let σn2=i=1nwi2αi\sigma_{n}^{2}=\sum_{i=1}^{n}w_{i}^{2}\alpha_{i} and wMmaxi{1,,n}|wi|w_{M}\triangleq\max_{i\in\{1,\ldots,n\}}|w_{i}|. Then, for any t[0,2σn23wM]t\in[0,\frac{2\sigma_{n}^{2}}{3w_{M}}],

(i=1nwiYii=1nwiαi+t)exp(t22(σn2+wMt/3)).\displaystyle\mathbb{P}\left\lparen\sum_{i=1}^{n}w_{i}Y_{i}\geq\sum_{i=1}^{n}w_{i}\alpha_{i}+t\right\rparen\leq\exp(-{t^{2}\over 2(\sigma_{n}^{2}+w_{M}t/3)}). (10)

and

(i=1nwiYii=1nwiαit)exp(t22(σn2+wMt/3)).\displaystyle\mathbb{P}\left\lparen\sum_{i=1}^{n}w_{i}Y_{i}\leq\sum_{i=1}^{n}w_{i}\alpha_{i}-t\right\rparen\leq\exp(-\frac{t^{2}}{2(\sigma_{n}^{2}+w_{M}t/3)}). (11)

6.1.1 Proof of Lemma 1

Proof.
DKL(Poisson(α1)Poisson(α2))\displaystyle D_{\text{KL}}(\mathrm{Poisson}(\alpha_{1})\|\mathrm{Poisson}(\alpha_{2})) =α2α1+α1logα1α2=α2α1α1log(1+α2α1α1).\displaystyle=\alpha_{2}-\alpha_{1}+\alpha_{1}\log\frac{\alpha_{1}}{\alpha_{2}}=\alpha_{2}-\alpha_{1}-\alpha_{1}\log(1+\frac{\alpha_{2}-\alpha_{1}}{\alpha_{1}}). (12)

Using the Taylor’s theorem,

log(1+u)=u+f′′(α)u22,\displaystyle\log(1+u)=u+f^{\prime\prime}(\alpha){u^{2}\over 2}, (13)

where f(u)=log(1+u)f(u)=\log(1+u) and α(0,u)\alpha\in(0,u). Note that

f′′(u)=1(1+u)2.f^{\prime\prime}(u)=-{1\over(1+u)^{2}}.

Letting u=α2α1α1u=\frac{\alpha_{2}-\alpha_{1}}{\alpha_{1}}, for α(0,u)\alpha\in(0,u),

(αMαm)2f′′(α)(αmαM)2.\displaystyle-({\alpha_{M}\over\alpha_{m}})^{2}\leq f^{\prime\prime}(\alpha)\leq-({\alpha_{m}\over\alpha_{M}})^{2}. (14)

Combining (12), (13) and (14) yields the desired result.

6.1.2 Proof of Lemma 2

Proof.

Define

μn=𝔼[i=1nwiYi]=i=1nwiαi,\mu_{n}=\mathbb{E}\left[\sum_{i=1}^{n}w_{i}Y_{i}\right]=\sum_{i=1}^{n}w_{i}\alpha_{i},

and

σn2=𝔼[(i=1nwi(Yiαi))2]=i=1nwi2αi.\sigma_{n}^{2}=\mathbb{E}\left[(\sum_{i=1}^{n}w_{i}(Y_{i}-\alpha_{i}))^{2}\right]=\sum_{i=1}^{n}w_{i}^{2}\alpha_{i}.

Consider s>0s>0, then using the Chernoff bound, we have

(i=1nwiYii=1nwiαi+t)\displaystyle\mathbb{P}\left\lparen\sum_{i=1}^{n}w_{i}Y_{i}\geq\sum_{i=1}^{n}w_{i}\alpha_{i}+t\right\rparen i=1n𝔼[exp(swi(Yiαi))]exp(st)\displaystyle\leq\frac{\prod_{i=1}^{n}\mathbb{E}\left[\exp(sw_{i}(Y_{i}-\alpha_{i}))\right]}{\exp(st)}
=exp(i=1n(αi(eswi1swi))st).\displaystyle=\exp(\sum_{i=1}^{n}(\alpha_{i}({\rm e}^{sw_{i}}-1-sw_{i}))-st). (15)

Note that for u(1,1)u\in(-1,1), eu1uu22(1u/3){\rm e}^{u}-1-u\leq{u^{2}\over 2(1-u/3)}. Assuming that s1wMs\leq{1\over w_{M}}, then |swi|<1|sw_{i}|<1, for all ii. Therefore,

exp(i=1n(αi(eswi1swi))st)\displaystyle\exp(\sum_{i=1}^{n}(\alpha_{i}({\rm e}^{sw_{i}}-1-sw_{i}))-st) exp(i=1nαi((swi)22(1swi/3))st)\displaystyle\leq\exp(\sum_{i=1}^{n}\alpha_{i}({(sw_{i})^{2}\over 2(1-sw_{i}/3)})-st)
exp(i=1nαi((swi)22(1swM/3))st)\displaystyle\leq\exp(\sum_{i=1}^{n}\alpha_{i}({(sw_{i})^{2}\over 2(1-sw_{M}/3)})-st)
=exp(s2σn22(1swM/3)st).\displaystyle=\exp({s^{2}\sigma_{n}^{2}\over 2(1-sw_{M}/3)}-st). (16)

Evaluating this bound at s=tσn2+wMt/3s={t\over\sigma_{n}^{2}+w_{M}t/3}, since 1swM/3=σn2σn2+wMt/31-sw_{M}/3={\sigma_{n}^{2}\over\sigma_{n}^{2}+w_{M}t/3}, it follows that

s2σn22(1swM/3)st\displaystyle{s^{2}\sigma_{n}^{2}\over 2(1-sw_{M}/3)}-st =st2=t22(σn2+wMt/3).\displaystyle=-{st\over 2}=-{t^{2}\over 2(\sigma_{n}^{2}+w_{M}t/3)}. (17)

To derive the other bound, we can follow the same steps and apply Chernoff bound as done in (15) to get

(i=1nwiYiμnt)\displaystyle\mathbb{P}\left\lparen\sum_{i=1}^{n}w_{i}Y_{i}\leq\mu_{n}-t\right\rparen exp(i=1nαi(eswi1+swi)st).\displaystyle\leq\exp(\sum_{i=1}^{n}\alpha_{i}(e^{-sw_{i}}-1+sw_{i})-st). (18)

We now use the inequality for u(1,1)u\in(-1,1):

eu1+uu22(1+u/3).e^{-u}-1+u\leq\frac{u^{2}}{2(1+u/3)}.

Assume s1wMs\leq\frac{1}{w_{M}} so that s|wi|1s|w_{i}|\leq 1 for all ii. Then:

i=1nαi(eswi1+swi)\displaystyle\sum_{i=1}^{n}\alpha_{i}(e^{-sw_{i}}-1+sw_{i}) i=1nαi(swi)22(1+swi/3)\displaystyle\leq\sum_{i=1}^{n}\alpha_{i}\cdot\frac{(sw_{i})^{2}}{2(1+sw_{i}/3)}
i=1nαi(swi)24/3\displaystyle\leq\sum_{i=1}^{n}\alpha_{i}\cdot\frac{(sw_{i})^{2}}{4/3}
=s2σn24/3.\displaystyle=\frac{s^{2}\sigma_{n}^{2}}{4/3}. (19)

Hence,

(i=1nwiYiμnt)\displaystyle\mathbb{P}\left\lparen\sum_{i=1}^{n}w_{i}Y_{i}\leq\mu_{n}-t\right\rparen exp(s2σn24/3st).\displaystyle\leq\exp\left(\frac{s^{2}\sigma_{n}^{2}}{4/3}-st\right). (20)

Setting s=3t2σn2s=\frac{3t}{2\sigma_{n}^{2}}, which satisfies s1wMs\leq\frac{1}{w_{M}},

s2σn24/3st=3t24σn2t22(σn2+wMt/3).\displaystyle\frac{s^{2}\sigma_{n}^{2}}{4/3}-st=-\frac{3t^{2}}{4\sigma_{n}^{2}}\leq-{t^{2}\over 2(\sigma_{n}^{2}+w_{M}t/3)}. (21)

6.2 Proof of Theorem 1

Proof.

Recall that 𝒚=𝒙+𝒛{\bm{y}}={\bm{x}}+{\bm{z}}, with 𝒛{\bm{z}} is i.i.d. 𝒩(0,σz2)\mathcal{N}(0,\sigma_{z}^{2}), and

𝒙^=argmin𝐜𝒞𝐜𝒚22,𝒙~=argmin𝐜𝒞𝐜𝒙22.\hat{\bm{x}}=\operatorname*{arg\,min}_{{\bf c}\in\mathcal{C}}\|{\bf c}-{\bm{y}}\|_{2}^{2},\quad\quad\tilde{\bm{x}}=\operatorname*{arg\,min}_{{\bf c}\in\mathcal{C}}\|{\bf c}-{\bm{x}}\|^{2}_{2}.

Since both 𝒙^,𝒙~\hat{\bm{x}},\tilde{\bm{x}} are in 𝒞\mathcal{C},

𝒙^𝒚22\displaystyle\|\hat{{\bm{x}}}-{\bm{y}}\|_{2}^{2} 𝒙~𝒚22\displaystyle\leq\|\tilde{{\bm{x}}}-{\bm{y}}\|_{2}^{2} (22)
(𝒙^𝒙)𝒛22\displaystyle\|(\hat{{\bm{x}}}-{\bm{x}})-{\bm{z}}\|_{2}^{2} (𝒙~𝒙)𝒛22\displaystyle\leq\|(\tilde{{\bm{x}}}-{\bm{x}})-{\bm{z}}\|_{2}^{2} (23)
𝒙^𝒙22𝒛,𝒙^𝒙+𝒛2\displaystyle\|\hat{{\bm{x}}}-{\bm{x}}\|^{2}-2\langle{\bm{z}},\hat{{\bm{x}}}-{\bm{x}}\rangle+\lVert{\bm{z}}\rVert^{2} 𝒙~𝒙22𝒛,𝒙~𝒙+𝒛2\displaystyle\leq\|\tilde{{\bm{x}}}-{\bm{x}}\|^{2}-2\langle{\bm{z}},\tilde{{\bm{x}}}-{\bm{x}}\rangle+\lVert{\bm{z}}\rVert^{2} (24)
𝒙^𝒙2\displaystyle\|\hat{{\bm{x}}}-{\bm{x}}\|^{2} 𝒙~𝒙22𝒛,𝒙~𝒙+2𝒛,𝒙^𝒙\displaystyle\leq\|\tilde{{\bm{x}}}-{\bm{x}}\|^{2}-2\langle{\bm{z}},\tilde{{\bm{x}}}-{\bm{x}}\rangle+2\langle{\bm{z}},\hat{{\bm{x}}}-{\bm{x}}\rangle (25)
𝒙^𝒙2\displaystyle\|\hat{{\bm{x}}}-{\bm{x}}\|^{2} 𝒙~𝒙2+2|𝒛,𝒙~𝒙|+2|𝒛,𝒙^𝒙|.\displaystyle\leq\|\tilde{{\bm{x}}}-{\bm{x}}\|^{2}+2\left\lvert\langle{\bm{z}},\tilde{{\bm{x}}}-{\bm{x}}\rangle\right\rvert+2\left\lvert\langle{\bm{z}},\hat{{\bm{x}}}-{\bm{x}}\rangle\right\rvert. (26)

Let 𝐞=𝒙^𝒙{\bf e}=\hat{{\bm{x}}}-{\bm{x}} denote the error of the compression-based estimate of ground truth 𝒙{\bm{x}} from its noisy version 𝒚{\bm{y}}, and 𝒅=𝒙~𝒙{\bm{d}}=\tilde{{\bm{x}}}-{\bm{x}} denote the distortion from the compressing the ground truth 𝒙{\bm{x}} with the compression code 𝒞\mathcal{C}, then we have

𝐞2\displaystyle\lVert{\bf e}\rVert^{2} 𝒅2+2|𝒛,𝐞|+2|𝒛,𝒅|\displaystyle\leq\lVert{\bm{d}}\rVert^{2}+2\lvert\langle{\bm{z}},{\bf e}\rangle\rvert+2\lvert\langle{\bm{z}},{\bm{d}}\rangle\rvert (27)
=𝒅2+2𝐞|𝒛,𝐞𝐞|+2𝒅|𝒛,𝒅𝒅|.\displaystyle=\lVert{\bm{d}}\rVert^{2}+2\lVert{\bf e}\rVert\left\lvert\langle{\bm{z}},\frac{{\bf e}}{\lVert{\bf e}\rVert}\rangle\right\rvert+2\lVert{\bm{d}}\rVert\left\lvert\langle{\bm{z}},\frac{{\bm{d}}}{\lVert{\bm{d}}\rVert}\rangle\right\rvert. (28)

For any possible reconstruction 𝐜𝒞{\bf c}\in\mathcal{C}, we define error vector 𝐞(𝐜)=𝐜𝒙{\bf e}^{({\bf c})}={\bf c}-{\bm{x}}. Given t1,t2>0t_{1},t_{2}>0, define event 1\mathcal{E}_{1} and 2\mathcal{E}_{2} as

1={|i=1nziei(𝐜)𝐞(𝐜)|t1:𝐜𝒞}.\displaystyle\mathcal{E}_{1}=\left\{\left\lvert\sum^{n}_{i=1}z_{i}\frac{e_{i}^{({\bf c})}}{\lVert{\bf e}^{({\bf c})}\rVert}\right\rvert\leq t_{1}:\;{\bf c}\in\mathcal{C}\right\}. (29)

and

2={|i=1nziei(𝒙~)𝐞(𝒙~)|t2},\displaystyle\mathcal{E}_{2}=\left\{\left\lvert\sum^{n}_{i=1}z_{i}\frac{e_{i}^{(\tilde{\bm{x}})}}{\lVert{\bf e}^{(\tilde{\bm{x}})}\rVert}\right\rvert\leq t_{2}\right\}, (30)

respectively. Conditioned on 12\mathcal{E}_{1}\cap\mathcal{E}_{2}, it follows from (28) that

𝐞2𝒅2+2t1𝐞+2t2𝒅.\displaystyle\lVert{\bf e}\rVert^{2}\leq\lVert{\bm{d}}\rVert^{2}+2t_{1}\lVert{\bf e}\rVert+2t_{2}\lVert{\bm{d}}\rVert. (31)

Therefore,

𝐞22t1𝐞+t12\displaystyle\lVert{\bf e}\rVert^{2}-2t_{1}\lVert{\bf e}\rVert+t_{1}^{2} 𝒅2+2t2𝒅+t22+(t12t22),\displaystyle\leq\lVert{\bm{d}}\rVert^{2}+2t_{2}\lVert{\bm{d}}\rVert+t_{2}^{2}+(t_{1}^{2}-t_{2}^{2}), (32)
|𝐞t1|\displaystyle\left\lvert\lVert{\bf e}\rVert-t_{1}\right\rvert (𝒅+t2)2+(t12t22),\displaystyle\leq\sqrt{(\lVert{\bm{d}}\rVert+t_{2})^{2}+(t_{1}^{2}-t_{2}^{2})}, (33)

and finally,

𝐞2\displaystyle\|{\bf e}\|_{2} 𝒅2+t1+t2+t12t22,\displaystyle\leq\|{\bm{d}}\|_{2}+t_{1}+t_{2}+\sqrt{t_{1}^{2}-t_{2}^{2}}, (34)

where the last line follows because a+ba+b\sqrt{a+b}\leq\sqrt{a}+\sqrt{b}, for all a,b>0a,b>0. To finish the proof we need to bound ((12)c)\mathbb{P}\left\lparen(\mathcal{E}_{1}\cap\mathcal{E}_{2})^{c}\right\rparen and set parameters t1t_{1} and t2t_{2}.

Note that for each 𝐜{\bf c}, 𝐞(𝐜)𝐞(𝐜)\frac{{\bf e}^{({\bf c})}}{\lVert{\bf e}^{({\bf c})}\rVert} is a unit vector in n\operatorname{\mathbb{R}}^{n}. Therefore, i=1nziei(𝐜)𝐞(𝐜)𝒩(0,σz2)\sum^{n}_{i=1}z_{i}\frac{e_{i}^{({\bf c})}}{\|{\bf e}^{({\bf c})}\|}\sim\mathcal{N}(0,\sigma^{2}_{z}). Hence,

(|i=1nziei(c)𝐞(c)|t)2exp(t22σz2).\displaystyle\mathbb{P}\left\lparen\left\lvert\sum^{n}_{i=1}z_{i}\frac{e_{i}^{(c)}}{\lVert{\bf e}^{(c)}\rVert}\right\rvert\geq t\right\rparen\leq 2\exp\left\lparen-\frac{t^{2}}{2\sigma_{z}^{2}}\right\rparen. (35)

Therefore, applying the union bound and noting that |𝒞|2R|\mathcal{C}|\leq 2^{R},

(1c)2R+1exp(t122σz2),\displaystyle\mathbb{P}\left\lparen\mathcal{E}_{1}^{c}\right\rparen\leq 2^{R+1}\exp\left\lparen-\frac{t_{1}^{2}}{2\sigma_{z}^{2}}\right\rparen, (36)

and

(2c)2exp(t222σz2).\displaystyle\mathbb{P}\left\lparen\mathcal{E}_{2}^{c}\right\rparen\leq 2\exp\left\lparen-\frac{t_{2}^{2}}{2\sigma_{z}^{2}}\right\rparen. (37)

For η(0,1)\eta\in(0,1), set

t1=σz2ln2R(1+η),t_{1}=\sigma_{z}\sqrt{2\ln 2R(1+\eta)},

and

t2=σz2ln2Rη.t_{2}=\sigma_{z}\sqrt{2\ln 2R\eta}.

Then,

(1c2c)2ηR+2.\displaystyle\mathbb{P}\left\lparen\mathcal{E}_{1}^{c}\cup\mathcal{E}_{2}^{c}\right\rparen\leq 2^{-\eta R+2}. (38)

Using the selected values of t1t_{1} and t2t_{2} in (34) yields the desired result, i.e.,

1n𝒙𝒙^2\displaystyle{1\over\sqrt{n}}\|{\bm{x}}-\hat{\bm{x}}\|_{2} δ+2σz(2ln2)Rn(1+2η),\displaystyle\leq\sqrt{\delta}+2\sigma_{z}\sqrt{(2\ln 2)R\over n}(1+2\sqrt{\eta}),

where we have used the fact that 1n𝒅22δ{1\over n}\|{\bm{d}}\|_{2}^{2}\leq\delta. ∎

6.3 Proof of Theorem 2

Proof.

Recall that 𝒙^=argmin𝐜𝒞(𝐜;𝒚)\hat{\bm{x}}=\arg\min_{{\bf c}\in\mathcal{C}}\mathcal{L}({\bf c};{\bm{y}}). Let

𝒙~=argmin𝐜𝒞𝒙𝐜2.\tilde{\bm{x}}=\arg\min_{{\bf c}\in\mathcal{C}}\|{\bm{x}}-{\bf c}\|_{2}.

Since both 𝒙^\hat{\bm{x}} and 𝒙~\tilde{\bm{x}} are in 𝒞\mathcal{C}, we have (𝒙^;𝒚)(𝒙~;𝒚)\mathcal{L}(\hat{\bm{x}};{\bm{y}})\leq\mathcal{L}(\tilde{\bm{x}};{\bm{y}}), or

i=1n(αx^iyilogx^i)i=1n(αx~iyilogx~i).\displaystyle\sum_{i=1}^{n}\left(\alpha{\hat{x}}_{i}-y_{i}\log{\hat{x}}_{i}\right)\leq\sum_{i=1}^{n}\left(\alpha{\tilde{x}}_{i}-y_{i}\log{\tilde{x}}_{i}\right). (39)

Given the input signal 𝒙n{\bm{x}}\in\operatorname{\mathbb{R}}^{n} and 𝐜𝒞{\bf c}\in\mathcal{C}, let Poisson(αxi)\mathrm{Poisson}(\alpha x_{i}) and Poisson(αci)\mathrm{Poisson}(\alpha c_{i}) denote the distributions corresponding to independent Poisson random variables with respective means αxi\alpha x_{i} and αci\alpha c_{i}. Note that with the chain rule for relative entropy we have

DKL(Poisson(α𝒙)Poisson(α𝐜))\displaystyle D_{\text{KL}}(\mathrm{Poisson}(\alpha{\bm{x}})\|\mathrm{Poisson}(\alpha{\bf c})) =i=1n(α(cixi)+αxilogxici).\displaystyle=\sum_{i=1}^{n}\Big{(}\alpha(c_{i}-x_{i})+\alpha x_{i}\log\frac{x_{i}}{c_{i}}\Big{)}. (40)

Adding i(αxi+αxilogxi)\sum_{i}(-\alpha x_{i}+\alpha x_{i}\log x_{i}) to the both sides of (39), it follows that

i=1n(α(x^ixi)yilogx^i+αxilogxi)i=1n(α(x~ixi)yilogx~i+αxilogxi),\displaystyle\sum_{i=1}^{n}\left(\alpha({\hat{x}}_{i}-x_{i})-y_{i}\log{\hat{x}}_{i}+\alpha x_{i}\log x_{i}\right)\leq\sum_{i=1}^{n}\left(\alpha({\tilde{x}}_{i}-x_{i})-y_{i}\log{\tilde{x}}_{i}+\alpha x_{i}\log x_{i}\right), (41)

or

DKL(Poisson(α𝒙)Poisson(α𝒙^))+i=1n(αxiyi)logx^i\displaystyle D_{\text{KL}}(\mathrm{Poisson}(\alpha{\bm{x}})\|\mathrm{Poisson}(\alpha\hat{\bm{x}}))+\sum_{i=1}^{n}(\alpha x_{i}-y_{i})\log{\hat{x}}_{i}
DKL(Poisson(α𝒙)Poisson(α𝒙~))+i=1n(αxiyi)logx~i.\displaystyle\leq D_{\text{KL}}(\mathrm{Poisson}(\alpha{\bm{x}})\|\mathrm{Poisson}(\alpha\tilde{\bm{x}}))+\sum_{i=1}^{n}(\alpha x_{i}-y_{i})\log{\tilde{x}}_{i}. (42)

Given t1,t2>0t_{1},t_{2}>0, define events 1\mathcal{E}_{1} and 2\mathcal{E}_{2} as

1={i=1n(yiαxi)logcit1:𝐜𝒞}\displaystyle\mathcal{E}_{1}=\left\{\sum_{i=1}^{n}(y_{i}-\alpha x_{i})\log c_{i}\leq t_{1}:\forall{\bf c}\in\mathcal{C}\right\} (43)

and

2={i=1n(yiαxi)logx~it2},\displaystyle\mathcal{E}_{2}=\left\{\sum_{i=1}^{n}(y_{i}-\alpha x_{i})\log{\tilde{x}}_{i}\geq-t_{2}\right\}, (44)

respectively. Conditioned on 12\mathcal{E}_{1}\cap\mathcal{E}_{2},

DKL(Poisson(α𝒙)Poisson(α𝒙^))\displaystyle D_{\text{KL}}(\mathrm{Poisson}(\alpha{\bm{x}})\|\mathrm{Poisson}(\alpha\hat{\bm{x}})) DKL(Poisson(α𝒙)Poisson(α𝒙~))+t1+t2,\displaystyle\leq D_{\text{KL}}(\mathrm{Poisson}(\alpha{\bm{x}})\|\mathrm{Poisson}(\alpha\tilde{\bm{x}}))+t_{1}+t_{2}, (45)

and consequently from Lemma 1,

xmin2xmax3α𝒙𝒙^22\displaystyle{x_{\min}^{2}\over x_{\max}^{3}}\alpha\|{\bm{x}}-\hat{\bm{x}}\|_{2}^{2} xmax2xmin3α𝒙𝒙~22+(t1+t2).\displaystyle\leq{x_{\max}^{2}\over x_{\min}^{3}}\alpha\|{\bm{x}}-\tilde{\bm{x}}\|_{2}^{2}+(t_{1}+t_{2}). (46)

To finish the proof, we bound (12)\mathbb{P}\left\lparen\mathcal{E}_{1}\cap\mathcal{E}_{2}\right\rparen and set t1t_{1} and t2t_{2}.

To bound (1c)\mathbb{P}\left\lparen\mathcal{E}_{1}^{c}\right\rparen, we apply Lemma 2, where for each 𝐜{\bf c}, we set wi(𝐜)=log1ciw_{i}({\bf c})=\log{1\over c_{i}}. Then,

σn(𝐜)=i=1n(log1ci)2αxiαβ2i=1nxi,\sigma^{n}({\bf c})=\sum_{i=1}^{n}(\log{1\over c_{i}})^{2}\alpha x_{i}\leq\alpha\beta^{2}\sum_{i=1}^{n}x_{i},

and

wM=maxi|wi|β,w_{M}=\max_{i}|w_{i}|\leq\beta,

where

β=log(1xmin).\beta=\log({1\over x_{\min}}).

Therefore, using the union bound, it follows that

(1c)\displaystyle\mathbb{P}\left\lparen\mathcal{E}_{1}^{c}\right\rparen 2Rexp(t122(αβ2i=1nxi+βt1/3))\displaystyle\leq 2^{R}\exp(-\frac{t_{1}^{2}}{2(\alpha\beta^{2}\sum_{i=1}^{n}x_{i}+\beta t_{1}/3)})
2Rexp(t122(nαβ2xmax+βt1/3)).\displaystyle\leq 2^{R}\exp(-\frac{t_{1}^{2}}{2(n\alpha\beta^{2}x_{\max}+\beta t_{1}/3)}). (47)

To bound (2c)\mathbb{P}\left\lparen\mathcal{E}_{2}^{c}\right\rparen, we again apply Lemma 2, with wi=log1x~iw_{i}=\log{1\over{\tilde{x}}_{i}}, and derive

(2c)\displaystyle\mathbb{P}\left\lparen\mathcal{E}_{2}^{c}\right\rparen exp(t222(αβ2i=1nxi+βt2/3))\displaystyle\leq\exp(-\frac{t_{2}^{2}}{2(\alpha\beta^{2}\sum_{i=1}^{n}x_{i}+\beta t_{2}/3)})
exp(t222(nαβ2xmax+βt1/3)).\displaystyle\leq\exp(-\frac{t_{2}^{2}}{2(n\alpha\beta^{2}x_{\max}+\beta t_{1}/3)}). (48)

Setting t1t_{1} and t2t_{2} such that they are both smaller than 3nαβ3n\alpha\beta, and noting that xmax<1x_{\max}<1, we have

(1c2c)2Rexp(t124nαβ2)+exp(t224nαβ2).\displaystyle\mathbb{P}\left\lparen\mathcal{E}_{1}^{c}\cap\mathcal{E}_{2}^{c}\right\rparen\leq 2^{R}\exp(-\frac{t_{1}^{2}}{4n\alpha\beta^{2}})+\exp(-\frac{t_{2}^{2}}{4n\alpha\beta^{2}}). (49)

Choosing t1=β4ln2nR(1+η)αt_{1}=\beta\sqrt{{4\over\ln 2}nR(1+\eta)\alpha} and t2=β4ln2nRηαt_{2}=\beta\sqrt{{4\over\ln 2}nR\eta\alpha}, it follows that

(1c2c)2ηR.\displaystyle\mathbb{P}\left\lparen\mathcal{E}_{1}^{c}\cap\mathcal{E}_{2}^{c}\right\rparen\leq 2^{-\eta R}. (50)

6.4 Proof of Theorem 3

Proof.

Recall that

𝒙^=argmin𝐜𝒞𝐜𝒚/α22,𝒙~=argmin𝐜𝒞𝐜𝒙22.\hat{\bm{x}}=\operatorname*{arg\,min}_{{\bf c}\in\mathcal{C}}\|{\bf c}-{\bm{y}}/\alpha\|_{2}^{2},\quad\quad\tilde{\bm{x}}=\operatorname*{arg\,min}_{{\bf c}\in\mathcal{C}}\|{\bf c}-{\bm{x}}\|^{2}_{2}.

Following the similar setup as in Section 6.2, we get

𝒙^𝒚/α2\displaystyle\|\hat{{\bm{x}}}-{\bm{y}}/\alpha\|^{2} 𝒙~𝒚/α2\displaystyle\leq\|\tilde{{\bm{x}}}-{\bm{y}}/\alpha\|^{2}
𝒙^𝒙+𝒙𝒚/α2\displaystyle\|\hat{{\bm{x}}}-{\bm{x}}+{\bm{x}}-{\bm{y}}/\alpha\|^{2} 𝒙~𝒙+𝒙𝒚/α2\displaystyle\leq\|\tilde{{\bm{x}}}-{\bm{x}}+{\bm{x}}-{\bm{y}}/\alpha\|^{2}
𝒙^𝒙22𝒚/α𝒙,𝒙^𝒙+𝒚/α𝒙2\displaystyle\|\hat{{\bm{x}}}-{\bm{x}}\|^{2}-2\langle{\bm{y}}/\alpha-{\bm{x}},\hat{{\bm{x}}}-{\bm{x}}\rangle+\lVert{\bm{y}}/\alpha-{\bm{x}}\rVert^{2} 𝒙~𝒙22𝒚/α𝒙,𝒙~𝒙+𝒚/α𝒙2\displaystyle\leq\|\tilde{{\bm{x}}}-{\bm{x}}\|^{2}-2\langle{\bm{y}}/\alpha-{\bm{x}},\tilde{{\bm{x}}}-{\bm{x}}\rangle+\lVert{\bm{y}}/\alpha-{\bm{x}}\rVert^{2}
𝒙^𝒙2\displaystyle\|\hat{{\bm{x}}}-{\bm{x}}\|^{2} 𝒙~𝒙22𝒚/α𝒙,𝒙~𝒙+2𝒚/α𝒙,𝒙^𝒙\displaystyle\leq\|\tilde{{\bm{x}}}-{\bm{x}}\|^{2}-2\langle{\bm{y}}/\alpha-{\bm{x}},\tilde{{\bm{x}}}-{\bm{x}}\rangle+2\langle{\bm{y}}/\alpha-{\bm{x}},\hat{{\bm{x}}}-{\bm{x}}\rangle
𝒙^𝒙2\displaystyle\|\hat{{\bm{x}}}-{\bm{x}}\|^{2} 𝒙~𝒙2+2|𝒚/α𝒙,𝒙~𝒙|+2|𝒚/α𝒙,𝒙^𝒙|.\displaystyle\leq\|\tilde{{\bm{x}}}-{\bm{x}}\|^{2}+2\left\lvert\langle{\bm{y}}/\alpha-{\bm{x}},\tilde{{\bm{x}}}-{\bm{x}}\rangle\right\rvert+2\left\lvert\langle{\bm{y}}/\alpha-{\bm{x}},\hat{{\bm{x}}}-{\bm{x}}\rangle\right\rvert.

Defining 𝐞{\bf e}, 𝒅{\bm{d}} and 𝐞(𝐜){\bf e}^{({\bf c})}, 𝐜𝒞{\bf c}\in\mathcal{C}, as done in the proof of Theorem 1, we have

𝐞2\displaystyle\lVert{\bf e}\rVert^{2} 𝒅2+2|𝒚/α𝒙,𝐞|+2|𝒚/α𝒙,𝒅|.\displaystyle\leq\lVert{\bm{d}}\rVert^{2}+2\left\lvert\langle{\bm{y}}/\alpha-{\bm{x}},{\bf e}\rangle\right\rvert+2\left\lvert\langle{\bm{y}}/\alpha-{\bm{x}},{\bm{d}}\rangle\right\rvert. (51)

Define events

1={|i=1n(yiαxi)ei(𝐜)|t1:𝐜=1,,2R},\displaystyle\mathcal{E}_{1}=\left\{\left\lvert\sum^{n}_{i=1}(y_{i}-\alpha x_{i})e_{i}^{({\bf c})}\right\rvert\leq t_{1}:\;{\bf c}=1,\ldots,2^{R}\right\}, (52)

and

2={|i=1n(yiαxi)ei(𝒙~)|t2}.\displaystyle\mathcal{E}_{2}=\left\{\left\lvert\sum^{n}_{i=1}(y_{i}-\alpha x_{i})e_{i}^{(\tilde{\bm{x}})}\right\rvert\leq t_{2}\right\}. (53)

Conditioned on 12\mathcal{E}_{1}\cap\mathcal{E}_{2}, it follows that

𝐞2\displaystyle\|{\bf e}\|^{2} 𝒅2+2(t1+t2)/α,\displaystyle\leq\|{\bm{d}}\|^{2}+2(t_{1}+t_{2})/\alpha, (54)

Using Lemma 2 with yiPoisson(αxi)y_{i}\sim\mathrm{Poisson}(\alpha x_{i}) and wi=ei(𝐜)w_{i}=e_{i}^{({\bf c})}, it follows that

(|i=1n(yiαxi)ei(𝐜)|t)2exp(t22(σn2+wMt/3)),\displaystyle\mathbb{P}\left\lparen\left\lvert\sum^{n}_{i=1}(y_{i}-\alpha x_{i})e_{i}^{({\bf c})}\right\rvert\geq t\right\rparen\leq 2\exp\left\lparen-\frac{t^{2}}{2(\sigma_{n}^{2}+w_{M}t/3)}\right\rparen, (55)

where

σn2\displaystyle\sigma_{n}^{2} =i=1nwi2αi=αi=1n(ei(𝐜))2xinαxmax3,\displaystyle=\sum_{i=1}^{n}w_{i}^{2}\alpha_{i}=\alpha\sum_{i=1}^{n}(e_{i}^{({\bf c})})^{2}x_{i}\leq n\alpha x_{\max}^{3}, (56)

and

wM=maxi{1,,n}|wi|xmax.w_{M}=\max_{i\in\{1,\ldots,n\}}|w_{i}|\leq x_{\max}.

Using the union bound and noting that |𝒞|2R|\mathcal{C}|\leq 2^{R}, we have

(1c)\displaystyle\mathbb{P}\left\lparen\mathcal{E}_{1}^{c}\right\rparen 2R+1exp(t122(nxmax3α+t1xmax/3))\displaystyle\leq 2^{R+1}\exp\left(-\frac{t_{1}^{2}}{2(nx_{\max}^{3}\alpha+t_{1}x_{\max}/3)}\right) (57)

and

(2c)2exp(t222(nxmax3α+t2xmax/3)).\displaystyle\mathbb{P}\left\lparen\mathcal{E}_{2}^{c}\right\rparen\leq 2\exp\left(-\frac{t_{2}^{2}}{2(nx_{\max}^{3}\alpha+t_{2}x_{\max}/3)}\right). (58)

Setting t1t_{1} and t2t_{2} such that they are both smaller than 3nxmax2α3nx_{\max}^{2}\alpha, and noting that xmax<1x_{\max}<1, we have

(1c2c)2R+1exp(t124nα)+2exp(t224nα).\displaystyle\mathbb{P}\left\lparen\mathcal{E}_{1}^{c}\cap\mathcal{E}_{2}^{c}\right\rparen\leq 2^{R+1}\exp(-\frac{t_{1}^{2}}{4n\alpha})+2\exp(-\frac{t_{2}^{2}}{4n\alpha}). (59)

For η(0,1)\eta\in(0,1), set

t1=2n(ln2)(1+η)αR,t_{1}=2\sqrt{n(\ln 2)(1+\eta)\alpha R},

and

t2=2nη(ln2)αR.t_{2}=2\sqrt{n\eta(\ln 2)\alpha R}.

Then,

(1c2c)2ηR+2.\displaystyle\mathbb{P}\left\lparen\mathcal{E}_{1}^{c}\cup\mathcal{E}_{2}^{c}\right\rparen\leq 2^{-\eta R+2}. (60)

Using the selected values of t1t_{1} and t2t_{2} in (54) yields the desired result, i.e.,

𝐞22\displaystyle\|{\bf e}\|_{2}^{2} 𝒅22+4n(ln2)(1+η)Rα+4nη(ln2)Rα\displaystyle\leq\|{\bm{d}}\|_{2}^{2}+4\sqrt{n(\ln 2)(1+\eta){R\over\alpha}}+4\sqrt{n\eta(\ln 2){R\over\alpha}}
𝒅22+nRα(4ln2)(1+η+η+1).\displaystyle\leq\|{\bm{d}}\|_{2}^{2}+\sqrt{{nR\over\alpha}}\left\lparen 4\sqrt{\ln 2}\right\rparen\left\lparen\sqrt{1+\eta}+\sqrt{\eta}+1\right\rparen.

6.5 Proof of Corollary 1

Proof.

First we need to design a lossy compression for the set of signals in 𝒬n\mathcal{Q}_{n}, defined as

𝒬n={𝒙n:𝒙0k,𝒙21}.\displaystyle\mathcal{Q}_{n}=\left\{{\bm{x}}\in\operatorname{\mathbb{R}}^{n}:\lVert{\bm{x}}\rVert_{0}\leq k,\lVert{\bm{x}}\rVert_{2}\leq 1\right\}. (61)

For a kk-sparse 𝒙𝒬n{\bm{x}}\in\mathcal{Q}_{n}, let 𝒙(k)k{\bm{x}}^{(k)}\in\operatorname{\mathbb{R}}^{k} denote the kk-dimensional vector derived from the non-zero coordinates of 𝒙{\bm{x}}. Let 2k={𝒂k:𝒂21}\mathcal{B}^{k}_{2}=\left\{{\bm{a}}\in\operatorname{\mathbb{R}}^{k}:\|{\bm{a}}\|_{2}\leq 1\right\}, then 𝒙(k)2k{\bm{x}}^{(k)}\in\mathcal{B}^{k}_{2}. We define a lossy compression code with rate RR and distortion δ\delta as follows:

  1. 1.

    To encode the number of non-zero entries and their locations in 𝒙𝒬n{\bm{x}}\in\mathcal{Q}_{n}, one spends at most log2k+klog2n\log_{2}k+k\log_{2}n bits.

  2. 2.

    To encode the values of the non-zero coordinates, the number of balls of radius ε\varepsilon required to cover 2k\mathcal{B}^{k}_{2}, N(2k,ε)N(\mathcal{B}^{k}_{2},\varepsilon) is bounded as

    (1ε)kN(2k,ε)(2ε+1)k.\displaystyle\left\lparen\frac{1}{\varepsilon}\right\rparen^{k}\leq N(\mathcal{B}^{k}_{2},\varepsilon)\leq\left\lparen\frac{2}{\varepsilon}+1\right\rparen^{k}.

    This implies that at most by using

    klog21εlog2N(2k,ε)klog2(2ε+1)\displaystyle k\log_{2}\frac{1}{\varepsilon}\leq\log_{2}N(\mathcal{B}^{k}_{2},\varepsilon)\leq k\log_{2}\left\lparen\frac{2}{\varepsilon}+1\right\rparen (62)

    bits, we can encode the values of the non-zero entries of 𝒙𝒬n{\bm{x}}\in\mathcal{Q}_{n}, or equivalently all the entries of 𝒙(k)2k{\bm{x}}^{(k)}\in\mathcal{B}^{k}_{2}, with error less than ε\varepsilon.

Note, as defined in Section 3, we measure the distortion of a lossy compression code as squared error. Therefore, to achieve the defined distortion δ\delta, we need to set ε=δ\varepsilon=\sqrt{\delta}, which implies that the overall rate is bounded as

log2k+klog2n+klog2(1δ)Rlog2k+klog2n+klog2(2δ+1),\displaystyle\log_{2}k+k\log_{2}n+k\log_{2}\left\lparen\frac{1}{\sqrt{\delta}}\right\rparen\leq R\leq\log_{2}k+k\log_{2}n+k\log_{2}\left\lparen\frac{2}{\sqrt{\delta}}+1\right\rparen, (63)

using the upper bound on the rate as in (63) in combination with Theorem 1, we have

1n𝒙𝒙^2\displaystyle{1\over\sqrt{n}}\|{\bm{x}}-\hat{\bm{x}}\|_{2} δ+2σz(2ln2)(log2k+klog2n+klog2(2δ+1))n(1+2η)\displaystyle\leq\sqrt{\delta}+2\sigma_{z}\sqrt{(2\ln 2)\left\lparen\log_{2}k+k\log_{2}n+k\log_{2}\left\lparen\frac{2}{\sqrt{\delta}}+1\right\rparen\right\rparen\over n}(1+2\sqrt{\eta})
δ+2σz(2ln2)(log2k+klog2n+k2log2(1δ)+klog2(2+δ)n)(1+2η),\displaystyle\leq\sqrt{\delta}+2\sigma_{z}\sqrt{(2\ln 2)\left\lparen\log_{2}k+k\log_{2}n+{k\over 2}\log_{2}({1\over\delta})+k\log_{2}(2+\sqrt{\delta})\over n\right\rparen}(1+2\sqrt{\eta}),

setting δ=1n\delta={1\over n}, with probability at least 12ηR+21-2^{-\eta R+2} we have

1n𝒙^𝒙\displaystyle\frac{1}{\sqrt{n}}\lVert\hat{{\bm{x}}}-{\bm{x}}\rVert 1n+σzClog2k+k2log2n+klog2(2+1/n)n\displaystyle\leq\frac{1}{\sqrt{n}}+\sigma_{z}C\sqrt{\log_{2}k+{k\over 2}\log_{2}n+k\log_{2}(2+1/\sqrt{n})\over n}
=1n+σzCklog2nn1+2log2kklog2n+2log2(2+1/n)log2n\displaystyle=\frac{1}{\sqrt{n}}+\sigma_{z}C\sqrt{\frac{k\log_{2}n}{n}}\sqrt{1+\frac{2\log_{2}k}{k\log_{2}n}+\frac{2\log_{2}(2+1/\sqrt{n})}{\log_{2}n}}
1n+σzCklog2nn(1+log2kklog2n+log2(2+1/n)log2n)\displaystyle\leq\frac{1}{\sqrt{n}}+\sigma_{z}C\sqrt{\frac{k\log_{2}n}{n}}\left\lparen 1+\frac{\log_{2}k}{k\log_{2}n}+\frac{\log_{2}(2+1/\sqrt{n})}{\log_{2}n}\right\rparen
=σzCklog2nn+(1n+σzClog2k2n+σzCklog2(2+1/n)n),\displaystyle=\sigma_{z}C\sqrt{\frac{k\log_{2}n}{n}}+\left\lparen\frac{1}{\sqrt{n}}+\frac{\sigma_{z}C\log_{2}k}{2\sqrt{n}}+\frac{\sigma_{z}C\sqrt{k}\log_{2}(2+1/\sqrt{n})}{\sqrt{n}}\right\rparen,

where C=2ln2(1+2η)C=2\sqrt{\ln 2}(1+2\sqrt{\eta}). Squaring both sides of the last inequality gives

1n𝒙^𝒙2σz2C2klog2nn+o(1),\displaystyle\frac{1}{n}\lVert\hat{{\bm{x}}}-{\bm{x}}\rVert^{2}\leq\sigma_{z}^{2}C^{2}\frac{k\log_{2}n}{n}+o(1), (64)

which holds with probability at least 14(kn3k/2)η1-\frac{4}{\left\lparen kn^{3k/2}\right\rparen^{\eta}}, using the lower bound on the rate in (63). ∎

7 Conclusions

We have studied maximum likelihood compression-based denoising, and provided theoretical characterization of its performance under both AWGN and Poisson noise. Furthermore, we introcued ZS-NCD, a new zero-shot neural-compression-based denoising and demonstrated that it achieves state-of-the-art performance among zero-shot methods, in both AWGN and Poisson denoising.

Acknowledgment

A.Z., X.C., S.J. were supported by NSF CCF-2237538.

References

  • [1] Norbert Wiener. Extrapolation, interpolation, and smoothing of stationary time series. The MIT press, 1964.
  • [2] David L Donoho and Iain M Johnstone. Ideal spatial adaptation by wavelet shrinkage. biometrika, 81(3):425–455, 1994.
  • [3] Stéphane Mallat. A wavelet tour of signal processing. Elsevier, 1999.
  • [4] D. L. Donoho. De-noising by soft-thresholding. IEEE transactions on information theory, 41(3):613–627, 2002.
  • [5] Javier Portilla, Vasily Strela, Martin J Wainwright, and Eero P Simoncelli. Image denoising using scale mixtures of gaussians in the wavelet domain. IEEE Transactions on Image processing, 12(11):1338–1351, 2003.
  • [6] Michael Elad and Michal Aharon. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image processing, 15(12):3736–3745, 2006.
  • [7] Stefan Roth and Michael J Black. Fields of experts. International Journal of Computer Vision, 82:205–229, 2009.
  • [8] Shuhang Gu, Lei Zhang, Wangmeng Zuo, and Xiangchu Feng. Weighted nuclear norm minimization with application to image denoising. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2862–2869, 2014.
  • [9] Shakarim Soltanayev and Se Young Chun. Training deep learning based denoisers without ground truth data. Advances in neural information processing systems, 31, 2018.
  • [10] Jaakko Lehtinen, Jacob Munkberg, Jon Hasselgren, Samuli Laine, Tero Karras, Miika Aittala, and Timo Aila. Noise2noise: Learning image restoration without clean data. arXiv preprint arXiv:1803.04189, 2018.
  • [11] Alexander Krull, Tim-Oliver Buchholz, and Florian Jug. Noise2void-learning denoising from single noisy images. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2129–2137, 2019.
  • [12] Joshua Batson and Loic Royer. Noise2self: Blind denoising by self-supervision. In International conference on machine learning, pages 524–533. PMLR, 2019.
  • [13] Kwanyoung Kim and Jong Chul Ye. Noise2score: tweedie’s approach to self-supervised image denoising without clean images. Advances in Neural Information Processing Systems, 34:864–874, 2021.
  • [14] David Leigh Donoho. The kolmogorov sampler. Department of Statistics, Stanford University, 2002.
  • [15] Tsachy Weissman and Erik Ordentlich. The empirical distribution of rate-constrained source codes. IEEE transactions on information theory, 51(11):3718–3733, 2005.
  • [16] Johannes Ballé, Philip A Chou, David Minnen, Saurabh Singh, Nick Johnston, Eirikur Agustsson, Sung Jin Hwang, and George Toderici. Nonlinear transform coding. IEEE Journal of Selected Topics in Signal Processing, 15(2):339–353, 2020.
  • [17] Yibo Yang, Stephan Mandt, Lucas Theis, et al. An introduction to neural data compression. Foundations and Trends® in Computer Graphics and Vision, 15(2):113–200, 2023.
  • [18] Kostadin Dabov, Alessandro Foi, Vladimir Katkovnik, and Karen Egiazarian. Image denoising by sparse 3-d transform-domain collaborative filtering. IEEE Transactions on image processing, 16(8):2080–2095, 2007.
  • [19] Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5728–5739, 2022.
  • [20] David S Taubman, Michael W Marcellin, and Majid Rabbani. Jpeg2000: Image compression fundamentals, standards and practice. Journal of Electronic Imaging, 11(2):286–287, 2002.
  • [21] Ulyanov Dmitry, Andrea Vedaldi, and Lempitsky Victor. Deep image prior. International Journal of Computer Vision, 128(7):1867–1888, 2020.
  • [22] Reinhard Heckel and Paul Hand. Deep decoder: Concise image representations from untrained non-convolutional networks. In International Conference on Learning Representations, 2019.
  • [23] Yuhui Quan, Mingqin Chen, Tongyao Pang, and Hui Ji. Self2self with dropout: Learning self-supervised denoising from single image. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1890–1898, 2020.
  • [24] Youssef Mansour and Reinhard Heckel. Zero-shot noise2noise: Efficient image denoising without any data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14018–14027, 2023.
  • [25] Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE transactions on image processing, 26(7):3142–3155, 2017.
  • [26] Yaochen Xie, Zhengyang Wang, and Shuiwang Ji. Noise2same: Optimizing a self-supervised bound for image denoising. Advances in neural information processing systems, 33:20320–20330, 2020.
  • [27] Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Deep image prior. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9446–9454, 2018.
  • [28] Xiaoxiao Ma, Zhixiang Wei, Yi Jin, Pengyang Ling, Tianle Liu, Ben Wang, Junkang Dai, and Huaian Chen. Masked pre-training enables universal zero-shot denoiser. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024.
  • [29] Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. End-to-end optimized image compression. In International Conference on Learning Representations, 2017.
  • [30] Lucas Theis, Wenzhe Shi, Andrew Cunningham, and Ferenc Huszár. Lossy image compression with compressive autoencoders. In International Conference on Learning Representations, 2017.
  • [31] Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. Variational image compression with a scale hyperprior. In International Conference on Learning Representations, 2018.
  • [32] David Minnen, Johannes Ballé, and George D Toderici. Joint autoregressive and hierarchical priors for learned image compression. In Advances in neural information processing systems, volume 31, 2018.
  • [33] Yinhao Zhu, Yang Yang, and Taco Cohen. Transformer-based transform coding. In International conference on learning representations, 2022.
  • [34] Jinming Liu, Heming Sun, and Jiro Katto. Learned image compression with mixed transformer-cnn architectures. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14388–14397, 2023.
  • [35] Eirikur Agustsson, David Minnen, Nick Johnston, Johannes Balle, Sung Jin Hwang, and George Toderici. Scale-space flow for end-to-end optimized video compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8503–8512, 2020.
  • [36] Fabian Mentzer, George Toderici, David Minnen, Sergi Caelles, Sung Jin Hwang, Mario Lucic, and Eirikur Agustsson. VCT: A video compression transformer. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
  • [37] Jiahao Li, Bin Li, and Yan Lu. Neural video compression with feature modulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26099–26108, 2024.
  • [38] Michela Testolina, Evgeniy Upenik, and Touradj Ebrahimi. Towards image denoising in the latent space of learning-based compression. In Applications of Digital Image Processing XLIV, volume 11842, pages 412–422. SPIE, 2021.
  • [39] Saeed Ranjbar Alvar, Mateen Ulhaq, Hyomin Choi, and Ivan V Bajić. Joint image compression and denoising via latent-space scalability. Frontiers in Signal Processing, 2:932873, 2022.
  • [40] Benoit Brummer and Christophe De Vleeschouwer. On the importance of denoising when learning to compress images. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2440–2448, 2023.
  • [41] Yuxin Xie, Li Yu, Farhad Pakdaman, and Moncef Gabbouj. Joint end-to-end image compression and denoising: Leveraging contrastive learning and multi-scale self-onns. arXiv preprint arXiv:2402.05582, 2024.
  • [42] Naoki Saito. Simultaneous noise suppression and signal compression using a library of orthonormal bases and the minimum description length criterion. In Wavelet Analysis and Its Applications, volume 4, pages 299–324. Elsevier, 1994.
  • [43] S Grace Chang, Bin Yu, and Martin Vetterli. Image denoising via lossy compression and wavelet thresholding. In Proceedings of International Conference on Image Processing, volume 1, pages 604–607. IEEE, 1997.
  • [44] S Grace Chang, Bin Yu, and Martin Vetterli. Adaptive wavelet thresholding for image denoising and compression. IEEE transactions on image processing, 9(9):1532–1546, 2000.
  • [45] Balas K Natarajan. Filtering random noise from deterministic signals via data compression. IEEE transactions on signal processing, 43(11):2595–2605, 1995.
  • [46] Léo Larigauderie, Michela Testolina, and Touradj Ebrahimi. On combining denoising with learning-based image decoding. In Applications of Digital Image Processing XLV, volume 12226, pages 193–206. SPIE, 2022.
  • [47] Zhihao Li, Yufei Wang, Alex Kot, and Bihan Wen. Compress clean signal from noisy raw image: A self-supervised approach. In Forty-first International Conference on Machine Learning, 2024.
  • [48] Ali Zafari, Xi Chen, and Shirin Jalali. Decompress: Denoising via neural compression. arXiv preprint arXiv:2503.22015, 2025.
  • [49] Yihong Wu and Sergio Verdú. MMSE dimension. IEEE Transactions on Information Theory, 57(8):4857–4879, 2011.
  • [50] Guangyong Chen, Fengyuan Zhu, and Pheng Ann Heng. An efficient statistical method for image noise level estimation. In Proceedings of the IEEE international conference on computer vision, pages 477–485, 2015.
  • [51] Roman Zeyde, Michael Elad, and Matan Protter. On single image scale-up using sparse-representations. In International conference on curves and surfaces, pages 711–730. Springer, 2010.
  • [52] Eastman Kodak. Kodak lossless true color image suite (PhotoCD PCD0992). URL http://r0k.us/graphics/kodak, 6:2, 1993.
  • [53] Tim-Oliver Buchholz, Mangal Prakash, Deborah Schmidt, Alexander Krull, and Florian Jug. Denoiseg: joint denoising and segmentation. In European Conference on Computer Vision, pages 324–337. Springer, 2020.
  • [54] Jun Xu, Hui Li, Zhetong Liang, David Zhang, and Lei Zhang. Real-world noisy image denoising: A new benchmark. arXiv preprint arXiv:1804.02603, 2018.
  • [55] Dailan He, Yaoyan Zheng, Baocheng Sun, Yan Wang, and Hongwei Qin. Checkerboard context model for efficient learned image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14771–14780, 2021.

Appendix A Appendix

In this section, we provide the details of the networks structures and experimental settings. We also present more experiments for Poisson denoising using MLE and MSE loss functions with unknown noise level.

A.1 Network structure

For our experiments we used 3 convolutional layers in the encoder with 128 number of channels for the first two layers in the encoder (and the last two layers of decoder), see Figure 5. For color images we choose the number of channels in the last encoder (and first decoder) layer equals to 32, and for grayscale images equals to 16. The MLP-based network of the ablation study in Section 5 has 3 fully connected layers in the encoder with 1024 hidden units for the first two layers in the encoder (and the last two layers of decoder). The number of hidden units in the last encoder (and first decoder) layer equals to 16. As activation function we use GDN [29] for Conv network and ReLU for MLP.

Refer to caption
Figure 5: Neural compression network used for denoising. Conv and FC denote the convolutiona and fully connected layer, respectively. GDN and ReLU are activation functions.

A.2 Additional experiments on Poisson denoising

Unknown noise level

In a more realistic case, we generally have no access to the noise level α\alpha in Poisson noise model, which makes the normalization using true α\alpha not practical. We propose to use the estimated λ^\hat{\lambda} for normalization both in MSE and MLE based denoising. Given that 𝔼[yi]=αxi\mathbb{E}\left[y_{i}\right]=\alpha x_{i}, so within a n-dimensional single noisy image, 1ni=1nyiα1ni=1nxi\frac{1}{n}\sum^{n}_{i=1}y_{i}\approx\alpha\frac{1}{n}\sum^{n}_{i=1}x_{i}, where in most of the cases, we assume 𝔼[𝒙]0.5\mathbb{E}\left[{\bm{x}}\right]\approx 0.5, so α^21ni=1nyi\hat{\alpha}\approx 2\frac{1}{n}\sum^{n}_{i=1}y_{i}. We then use the estimated noise level α^\hat{\alpha} to normalize both MSE and MLE based optimization for denoising Poisson noise. See Table 4 for the result of this noise parameter estimation on a sample image.

Setting the hyperparameter λ\lambda under Poisson noise

Having an estimate of noise parameter α\alpha, we can decompose the MSE between 𝒚{\bm{y}} and the denoised image 𝒙^\hat{\bm{x}} as

1n𝔼[𝒚/α𝒙^22]\displaystyle{1\over n}\mathbb{E}\left[\|{\bm{y}}/\alpha-\hat{\bm{x}}\|_{2}^{2}\right] =1n𝔼[𝒚/α𝒙+𝒙𝒙^22]\displaystyle={1\over n}\mathbb{E}\left[\|{\bm{y}}/\alpha-{\bm{x}}+{\bm{x}}-\hat{\bm{x}}\|_{2}^{2}\right]
=1nαi=1nxi+1n𝒙𝒙^22+2n𝔼[(𝒚α𝒙)T(𝒙𝒙^)],\displaystyle={1\over n\alpha}\sum_{i=1}^{n}x_{i}+{1\over n}\|{\bm{x}}-\hat{\bm{x}}\|_{2}^{2}+{2\over n}\mathbb{E}\left[({\bm{y}}-\alpha{\bm{x}})^{T}({\bm{x}}-\hat{\bm{x}})\right], (65)

with same logic described in Section 4, assuming that the second and third terms in (65) are close to zero for a close estimate 𝒙^\hat{\bm{x}} to 𝒙{\bm{x}}, then 1n𝔼[𝒚/α𝒙^22]12α{1\over n}\mathbb{E}\left[\|{\bm{y}}/\alpha-\hat{\bm{x}}\|_{2}^{2}\right]\approx\frac{1}{2\alpha} with the assumption that 𝔼[𝒙]0.5\mathbb{E}\left[{\bm{x}}\right]\approx 0.5 (as used in α\alpha estimation), we can use 𝒙^(k)𝒚2>12α\|\hat{\bm{x}}^{(k)}-\bm{y}\|^{2}>\frac{1}{2\alpha} as the decision rule in the Algorithm 1 to set the hyperparameter λ\lambda. We have empirically observed that MSE(𝒚/α{\bm{y}}/\alpha,𝒙^\hat{\bm{x}}) in training our networks is close to 12α\frac{1}{2\alpha} as reported in Table 4, which indicates that 12α\frac{1}{2\alpha} is a good approximation of the MSE that can be used as threshold in selecting λ\lambda. When α\alpha is not known, we obtain the estimate α^\hat{\alpha}, and use 12α^\frac{1}{2\hat{\alpha}} as a valid threshold to search λ\lambda.

Table 4: Analyzing the estimation of Poisson noise parameter for Barbara image in Set11 (MSE values are reported in terms of PSNR).
true α\alpha estimated α^\hat{\alpha} empirical MSE(𝒚/α{\bm{y}}/\alpha,𝒙^\hat{\bm{x}}) [dB] 1/(2α)1/(2\alpha) [dB] 1/(2α^)1/(2\hat{\alpha}) [dB]
25 23.02 17.12 16.98 16.63
50 46.05 19.66 20.00 19.64
MSE and likelihood estimation without knowing true α\alpha

We compare the MSE and MLE distortion for Poisson denoising using the described estimated α^\hat{\alpha} in Table 5.

Table 5: Minimizing Poisson negative log-likelihood (NLL) vs. MSE with estimated α^\hat{\alpha} for Cameraman image in Set11. PSNR / SSIM are reported here.
α\alpha MSE (with estimated α^\hat{\alpha}) NLL (with estimated α^\hat{\alpha})
15 23.41 / 0.7554 23.13 / 0.7567
50 25.22 / 0.7961 24.88 / 0.7460

A.3 Study on factors in patch-wise compression affecting denoising

In this section, we explain the intuition behind why learning compression networks and denoising on overlapped patches is feasible. The centered pixels in the patches are better compressed as empirically observed in [55], thus they can provide better denoising performance. To study the contribution of each patch containing the single pixel to be denoised we design the experiment that, in the denoising phase, we denoise the overlapped patches, but only a single pixel at the same location from each patch is used to construct the final denoised image, instead of averaging all of them as in (8). We show the denoising performance of each pixel location in Figure 6. The PSNR at each pixel denotes the denoising performance of only using the specific pixel of each overlapped patches with stride 1. We can find that the boundary pixels give lower PSNR, which is consistent with previous research findings that the centered pixels are better compressed. Next, we analyze the effect of patch size in both learning and denoising phases. Given that scaler quantization is applied and the entropy model is learned on latent code of the patches, the compression performance on the latent code is affected by both the patch size and the number of downsampling operations in CNN-based encoder. We design the experiment that 3 downsampling operators are applied to patch size 8 and 16, where the latent code sizes are 1×1×nb1\times 1\times n_{b} and 2×2×nb2\times 2\times n_{b} respectively, where the denoising performance at each pixel location is in Figure 6 (Left) and (Middle), and if we increase the downsampling to 4 for patch size 16, which results in the latent code size to be 1×1×nb1\times 1\times n_{b}, the denoising performance is in Figure 6 (Right). We find that spatial size of the latent code to be quantized matters given the scaler quantization limitation, the reconstructed output by the decoder will be restricted by the only correlated latent code as we can observe. Motivated from this, we perform the learning and denoising phases both patch-wise with proper networks structure, all pixels in each patch are used and the overlapped areas are averaged properly to reduce the variance of the compression-based estimates.

Refer to caption
Refer to caption
Refer to caption
Figure 6: Denoising AWGN (σz=25\sigma_{z}=25) of image Parrot by only compressing a single pixel in each overlapped patches with stride 1. The PSNR at each pixel denotes the final denoising performance by only compressing the pixel at that specific location in each patch. Left: patch size 8×88\times 8, with downsampling factor equals 8 in fθ1f_{\theta_{1}}; Middle: patch size 16×1616\times 16, with downsampling factor equals 8 in fθ1f_{\theta_{1}}; Right: patch size 16×1616\times 16, with downsampling factor equals 16 in fθ1f_{\theta_{1}}.

Appendix B Additional numerical results

In this section, we provide the full denoising numerical results of the denoisers on all the test images. All the experiments were run on Nvidia RTX 6000 Ada with 48 GB memory. It takes 40 minutes to denoise a grayscale image of size 256×256256\times 256, and 50 minutes for an RGB image of size 512×768512\times 768. Adam optimizer is used for training the networks over 20K steps, with initial learning rate of 5×1035\times 10^{-3} decreased to 5×1045\times 10^{-4} after 16K steps for the Conv-based network. The learning rate for MLP-based networks is 1×1031\times 10^{-3}.

B.1 Set11 Dataset

For noise levels (15,25,50)(15,25,50) we set λ=(300,850,3000)\lambda=(300,850,3000). Similar to Kodak and other experiments we set training epochs to have 20K steps of gradient back propagation. For Poisson denoising α=(15,25,50)\alpha=(15,25,50) the λ=(3000,1500,1000)\lambda=(3000,1500,1000). We report the detailed results of AWGN denoising in Table 6, and Poisson noise denoising in Table 7.

Table 6: Set11 Denoising performance comparison under AWGN 𝒩(0,σz2I)\mathcal{N}(0,\sigma_{z}^{2}I).
256×256256\times 256 512×512512\times 512
σ\sigma Method C.man House Peppers Starfish Monarch Airplane Parrot Barbara Boats Pirate Couple Average
15 BM3D 31.86/0.8991 35.04/0.8889 32.74/0.9119 31.20/0.9081 31.82/0.9396 31.11/0.9031 31.44/0.9017 32.98/0.9248 32.12/0.8596 31.97/0.8742 32.14/0.8800 32.22/0.8992
JPEG2K 27.12/0.7474 29.48/0.7621 27.96/0.7907 26.75/0.8077 26.74/0.8166 26.58/0.7664 27.30/0.7778 26.76/0.7690 27.87/0.7390 27.92/0.7471 27.44/0.7449 27.45/0.7699
DIP 27.94/0.7417 31.39/0.8111 29.80/0.8273 29.58/0.8605 29.93/0.8767 28.14/0.8047 28.37/0.7794 27.65/0.7538 29.48/0.7798 29.27/0.7817 28.65/0.7727 29.11/0.7990
DD 29.41/0.8099 32.83/0.8406 26.97/0.8488 29.39/0.8739 30.01/0.8957 26.44/0.8228 29.32/0.8447 24.48/0.7089 29.45/0.7883 29.78/0.8085 29.06/0.7938 28.83/0.8215
ZS-N2N 30.14/0.8133 32.19/0.8138 30.58/0.8264 29.52/0.8639 30.15/0.8551 29.98/0.8298 30.19/0.8290 27.70/0.7772 30.06/0.7900 30.06/0.7957 29.59/0.7913 30.01/0.8169
ZS-N2S 27.66/0.8272 31.08/0.8442 29.46/0.8675 28.83/0.8810 28.77/0.8961 27.34/0.8591 27.67/0.8528 28.75/0.8534 29.52/0.8139 29.41/0.8181 29.60/0.8311 28.92/0.8495
S2S 20.29/0.6769 32.96/0.8633 23.96/0.8387 25.50/0.8250 30.05/0.9269 28.10/0.8611 20.20/0.7132 30.35/0.8865 27.74/0.7871 29.97/0.8192 25.82/0.7754 26.81/0.8158
ZS-NCD 30.83/0.8554 34.45/0.8835 32.20/0.8844 31.34/0.8749 31.83/0.8966 30.07/0.8552 30.40/0.8464 31.14/0.8826 31.09/0.8014 30.82/0.8302 30.67/0.8279 31.35/0.8580
ZS-NCD (MLP) 31.18/0.8680 34.86/0.8887 32.43/0.9009 31.37/0.9053 32.04/0.9263 30.79/0.8848 31.02/0.8839 32.55/0.9123 31.82/0.8468 31.35/0.8552 31.55/0.8638 31.91/0.8851
25 BM3D 30.04/0.8512 34.09/0.8631 31.49/0.8948 29.93/0.8744 30.55/0.9285 29.15/0.8693 29.86/0.8664 32.27/0.9096 30.80/0.8148 30.26/0.8154 30.71/0.8375 30.83/0.8659
JPEG2K 24.49/0.6976 27.26/0.7269 24.93/0.7206 24.18/0.7167 24.06/0.7561 23.91/0.7126 24.38/0.7162 24.09/0.6825 25.61/0.6577 25.76/0.6566 25.28/0.6534 24.91/0.6997
DIP 25.23/0.6043 28.93/0.7545 27.39/0.7579 26.39/0.7777 27.47/0.8169 25.57/0.6983 26.29/0.7409 24.75/0.6356 27.05/0.6843 27.06/0.6857 26.52/0.6847 26.60/0.7128
DD 27.24/0.7521 30.48/0.8023 25.39/0.7591 26.86/0.8051 27.69/0.8526 24.93/0.7120 27.29/0.7863 23.81/0.6455 27.49/0.7163 27.87/0.7380 27.13/0.7131 26.93/0.7530
ZS-N2N 27.32/0.7089 29.36/0.7276 27.46/0.7240 26.61/0.7821 27.20/0.7634 27.02/0.7463 27.16/0.7149 25.49/0.6854 27.26/0.6779 27.48/0.6931 26.63/0.6673 27.18/0.7173
ZS-N2S 26.24/0.7843 29.23/0.8073 27.77/0.8233 27.61/0.8463 27.35/0.8569 25.86/0.8023 26.27/0.7997 26.43/0.7759 28.23/0.7580 27.52/0.7526 27.74/0.7617 27.30/0.7971
S2S 16.93/0.5998 29.12/0.8275 21.88/0.7666 21.14/0.6974 25.93/0.8606 24.12/0.7350 17.09/0.6069 25.79/0.7980 23.94/0.7061 27.32/0.7403 23.29/0.6979 23.32/0.7306
ZS-NCD 28.78/0.8237 32.14/0.8547 29.62/0.8406 28.48/0.8134 29.02/0.8494 27.77/0.8126 28.14/0.8007 28.39/0.8192 28.85/0.7444 28.64/0.7630 28.37/0.7648 28.93/0.8079
ZS-NCD (MLP) 29.08/0.8259 32.63/0.8525 29.85/0.8574 28.73/0.8547 29.42/0.8861 28.37/0.8477 28.75/0.8431 30.01/0.8658 29.59/0.7843 29.15/0.7859 29.10/0.7958 29.52/0.8363
50 BM3D 27.58/0.8045 32.25/0.8456 28.84/0.8487 26.86/0.7989 28.02/0.8813 26.33/0.8098 27.46/0.8200 29.12/0.8318 28.21/0.7306 28.10/0.7354 28.02/0.7472 28.25/0.8049
JPEG2K 21.49/0.5880 24.24/0.6444 21.72/0.6077 21.39/0.5784 20.86/0.6414 21.11/0.6021 21.29/0.6035 21.65/0.5377 22.83/0.5318 23.29/0.5356 22.67/0.5025 22.05/0.5794
DIP 22.73/0.5846 25.67/0.6475 23.81/0.5987 22.99/0.6406 23.06/0.6293 22.64/0.5522 23.02/0.5811 22.38/0.5316 23.90/0.5371 24.43/0.5524 23.40/0.5064 23.46/0.5783
DD 23.89/0.6487 27.27/0.7282 22.95/0.7276 23.44/0.6700 23.55/0.7319 22.52/0.6652 23.87/0.6471 22.72/0.5980 24.47/0.6050 25.19/0.6340 24.30/0.5872 24.01/0.6584
ZS-N2N 23.36/0.5324 25.17/0.5167 23.86/0.5669 22.92/0.6186 22.95/0.6010 23.39/0.5988 22.87/0.5136 22.62/0.5150 23.93/0.5138 24.30/0.5330 23.30/0.4930 23.52/0.5457
ZS-N2S 24.65/0.6966 26.72/0.7091 25.24/0.7297 24.05/0.7102 24.82/0.7618 24.04/0.7467 24.00/0.7078 22.81/0.5916 25.46/0.6512 25.55/0.6469 24.80/0.6197 24.74/0.6883
S2S 14.23/0.4809 21.14/0.6396 17.80/0.5763 15.71/0.4176 18.33/0.5955 15.70/0.4828 13.66/0.4446 17.60/0.4883 18.69/0.5264 19.55/0.5354 19.12/0.5325 17.41/0.5200
ZS-NCD 25.55/0.7616 28.62/0.7995 26.31/0.7604 24.59/0.6925 25.53/0.7585 24.65/0.7338 25.31/0.7228 24.06/0.6525 25.61/0.6538 25.92/0.6707 25.19/0.6519 25.58/0.7144
ZS-NCD (MLP) 25.61/0.7342 29.13/0.7881 26.30/0.7702 24.85/0.7426 25.12/0.7795 24.84/0.7497 25.24/0.7596 25.55/0.7146 26.26/0.6698 26.25/0.6717 25.65/0.6560 25.89/0.7306
Table 7: Set11 Denoising performance comparison under Poisson noise Poisson(α𝒙)/α\mathrm{Poisson}(\alpha{\bm{x}})/\alpha.
256×256256\times 256 512×512512\times 512
α\alpha Method C.man House Peppers Starfish Monarch Airplane Parrot Barbara Boats Pirate Couple Average
15 BM3D 26.64/0.7651 29.39/0.7668 27.13/0.7914 24.93/0.7519 26.32/0.8265 24.79/0.6730 26.26/0.7866 27.24/0.7860 26.82/0.6977 27.07/0.7048 26.67/0.7056 26.66/0.7505
JPEG2K 21.98/0.6032 24.35/0.6106 22.12/0.6213 21.52/0.5887 21.24/0.6493 20.87/0.5818 22.02/0.6378 21.94/0.5688 23.09/0.5346 23.75/0.5510 23.01/0.5237 22.35/0.5882
DIP 22.85/0.5382 26.32/0.6528 24.23/0.6138 23.23/0.6696 23.54/0.6875 22.07/0.4938 22.81/0.5723 22.59/0.5503 24.18/0.5533 24.95/0.5811 23.83/0.5362 23.69/0.5863
DD 24.45/0.6261 27.59/0.7453 23.20/0.7269 23.86/0.7164 24.66/0.7286 22.22/0.6055 24.38/0.6830 22.89/0.6081 24.64/0.6023 25.61/0.6516 24.58/0.5986 24.37/0.6629
ZS-N2N 24.19/0.5818 25.41/0.5346 24.65/0.6016 23.12/0.6520 23.92/0.6441 23.12/0.5565 23.83/0.5821 23.05/0.5503 24.40/0.5403 24.87/0.5684 23.94/0.5305 24.04/0.5766
ZS-N2S 24.94/0.7241 27.29/0.7317 25.71/0.7431 24.41/0.7417 25.37/0.7968 23.05/0.7051 24.98/0.7315 22.87/0.6087 26.08/0.6696 25.94/0.6655 25.09/0.6389 25.06/0.7051
S2S 23.53/0.7325 22.01/0.7409 22.73/0.7300 18.20/0.6010 21.81/0.7813 16.18/0.5010 20.27/0.7304 22.10/0.7261 23.49/0.6529 24.72/0.6782 24.17/0.6843 21.75/0.6872
ZS-NCD 25.73/0.7660 28.87/0.8015 26.54/0.7745 24.65/0.6988 25.86/0.7791 24.21/0.6568 25.52/0.7356 24.11/0.6562 25.48/0.6510 25.93/0.6705 25.29/0.6552 25.65/0.7132
25 BM3D 22.69/0.5154 22.82/0.4765 22.74/0.5930 22.11/0.6947 23.44/0.7213 19.60/0.3788 23.05/0.5991 23.09/0.6508 22.77/0.5123 23.89/0.5979 23.45/0.5753 22.70/0.5741
JPEG2K 22.54/0.6267 24.97/0.6773 22.87/0.6076 22.26/0.6378 22.55/0.6641 21.59/0.5649 22.62/0.6373 22.55/0.5976 23.71/0.5685 24.20/0.5801 23.49/0.5566 23.03/0.6108
DIP 24.21/0.5976 27.06/0.6553 25.76/0.6945 24.41/0.7312 25.21/0.7384 23.58/0.6290 24.69/0.6608 23.11/0.5903 25.24/0.6130 26.10/0.6528 24.91/0.5998 24.94/0.6512
DD 25.59/0.6695 28.47/0.7606 24.20/0.7348 25.14/0.7667 26.16/0.8022 23.24/0.6306 25.89/0.7289 23.27/0.6257 25.84/0.6517 26.76/0.6961 25.77/0.6573 25.48/0.7022
ZS-N2N 25.54/0.6334 27.14/0.6234 25.82/0.6522 24.33/0.7158 25.51/0.7109 24.53/0.6274 25.55/0.6617 24.09/0.6173 25.60/0.6018 26.11/0.6354 25.16/0.5958 25.40/0.6432
ZS-N2S 26.22/0.7776 27.81/0.7643 26.55/0.7768 24.82/0.7795 26.48/0.8254 24.77/0.7463 25.44/0.7839 23.24/0.6387 27.25/0.7143 27.13/0.7200 26.44/0.6986 26.01/0.7478
S2S 25.09/0.7572 24.10/0.7398 24.91/0.7733 19.11/0.6491 23.64/0.8226 17.93/0.6279 21.13/0.7692 24.01/0.7860 25.30/0.7058 26.45/0.7232 25.77/0.7360 23.40/0.7355
ZS-NCD 27.17/0.7635 30.09/0.8109 27.92/0.7932 26.27/0.7600 27.28/0.8093 24.93/0.6116 26.74/0.7551 26.24/0.7393 27.30/0.6993 27.32/0.7192 26.83/0.7123 27.10/0.7431
50 BM3D 22.94/0.5314 22.89/0.4548 23.22/0.5844 23.06/0.7150 23.87/0.6990 20.51/0.4115 23.65/0.6136 23.54/0.6545 22.82/0.5205 23.97/0.6087 23.47/0.5723 23.09/0.5787
JPEG2K 24.23/0.6635 26.87/0.6796 24.96/0.7042 24.08/0.7240 24.05/0.7568 23.40/0.6387 24.57/0.7077 23.95/0.6804 25.37/0.6414 25.73/0.6483 25.25/0.6475 24.77/0.6811
DIP 25.34/0.6348 28.88/0.7369 27.59/0.7559 26.18/0.7891 26.58/0.7728 24.69/0.6457 26.04/0.7062 23.88/0.6158 26.61/0.6712 27.21/0.7015 26.34/0.6747 26.30/0.7004
DD 27.24/0.7398 30.16/0.7784 25.44/0.7615 26.78/0.8127 27.82/0.8527 24.39/0.6568 27.39/0.7735 23.86/0.6518 27.30/0.7096 28.03/0.7476 27.16/0.7162 26.87/0.7455
ZS-N2N 27.63/0.7210 29.30/0.7113 27.92/0.7424 26.35/0.7857 27.38/0.7723 26.26/0.7037 27.68/0.7443 25.50/0.6883 27.31/0.6832 27.68/0.7075 26.84/0.6783 27.26/0.7216
ZS-N2S 26.82/0.8041 29.32/0.7832 27.75/0.8192 26.62/0.8243 28.03/0.8624 26.05/0.8097 26.56/0.8196 23.60/0.6606 27.92/0.7493 27.68/0.7557 27.54/0.7521 27.08/0.7855
S2S 26.72/0.8220 27.19/0.8106 27.83/0.8300 20.47/0.7126 26.38/0.8780 21.09/0.6607 22.61/0.8001 27.07/0.8457 27.39/0.7620 28.31/0.7754 27.64/0.7885 25.70/0.7896
ZS-NCD 28.24/0.8093 31.90/0.8488 29.44/0.8410 28.02/0.8064 28.78/0.8504 26.99/0.7422 27.93/0.7961 27.60/0.7920 28.16/0.7273 28.11/0.7466 27.72/0.7456 28.44/0.7914

B.2 Set 13 Dataset

All images are center-cropped at size of 192×192192\times 192. For this set of images we set λ=(100,200,800)\lambda=(100,200,800) and for noise levels σz=(15,25,50)\sigma_{z}=(15,25,50) and for Poisson denoising we have λ=(900,500,200)\lambda=(900,500,200) for noise levels α=(15,25,50)\alpha=(15,25,50). We report the detailed results of AWGN denoising in Table 8, and Poisson noise denoising in Table 9.

Table 8: Set13 Denoising performance comparison under AWGN 𝒩(0,σz2I)\mathcal{N}(0,\sigma_{z}^{2}I).
σ\sigma Method Baboon Barbara Bridge Coastguard Comic Face Flowers Foreman Man Monarch Peppers PPT3 Zebra Average
15 BM3D 28.56/0.7797 33.07/0.9151 30.39/0.8723 30.18/0.8799 28.74/0.9289 30.28/0.7665 29.62/0.9040 35.83/0.9369 29.88/0.8323 31.13/0.9361 31.77/0.8199 34.49/0.9588 31.02/0.9198 31.15/0.8808
JPEG2K 25.12/0.6539 27.01/0.7674 26.27/0.7426 25.79/0.7323 24.86/0.8017 27.35/0.6482 25.52/0.7986 30.86/0.8397 25.97/0.6983 25.63/0.7649 28.27/0.7161 28.00/0.8183 26.33/0.8243 26.69/0.7543
DIP 27.25/0.7498 30.92/0.8403 30.18/0.8692 30.79/0.9036 28.25/0.9091 29.62/0.7611 28.85/0.8817 33.81/0.8947 29.99/0.8333 31.18/0.9085 30.30/0.7782 32.22/0.8908 30.63/0.9203 30.31/0.8570
DD 26.35/0.7029 24.27/0.7066 29.16/0.8508 28.80/0.8421 26.44/0.8932 29.59/0.7398 27.34/0.8634 34.87/0.9274 28.68/0.8073 30.02/0.9151 31.07/0.8003 33.09/0.9277 30.22/0.9052 29.22/0.8371
ZS-N2N 28.63/0.7992 28.45/0.7803 32.08/0.9041 31.54/0.9141 28.70/0.9018 30.64/0.8048 29.67/0.8955 34.02/0.8817 31.63/0.8801 31.65/0.9120 31.28/0.8089 32.84/0.9009 31.27/0.9277 30.95/0.8701
ZS-N2S 20.92/0.5844 21.14/0.5730 21.37/0.6468 20.78/0.4893 15.80/0.4987 22.03/0.5641 17.49/0.5154 8.43/0.3637 22.69/0.6725 12.21/0.6124 21.60/0.6438 11.58/0.4462 20.37/0.7871 18.18/0.5690
S2S 22.36/0.5810 30.39/0.8769 22.74/0.7485 22.72/0.7108 17.44/0.7015 17.23/0.4383 21.14/0.7121 16.78/0.8102 15.65/0.4463 26.92/0.8838 24.20/0.7398 15.48/0.7528 14.84/0.5400 20.61/0.6879
ZS-NCD 28.10/0.7831 33.85/0.9208 31.49/0.9051 32.65/0.9345 29.23/0.9355 30.51/0.7891 29.69/0.9077 35.85/0.9381 31.49/0.8867 33.21/0.9445 31.60/0.8281 35.07/0.9601 32.37/0.9448 31.93/0.8983
25 BM3D 26.56/0.6892 30.70/0.8824 28.06/0.7955 27.59/0.7664 25.90/0.8698 28.87/0.6978 26.93/0.8396 33.51/0.9087 27.35/0.7428 28.72/0.9056 30.11/0.7823 31.66/0.9324 28.61/0.8643 28.81/0.8213
JPEG2K 23.98/0.5711 24.14/0.6801 24.14/0.6400 23.53/0.5801 21.36/0.7014 26.17/0.5692 22.77/0.6809 28.30/0.7960 23.51/0.5667 23.03/0.7101 26.47/0.6729 25.23/0.7707 23.58/0.7389 24.32/0.6676
DIP 25.70/0.6734 27.27/0.7021 27.79/0.7999 27.86/0.8126 25.11/0.8382 28.16/0.6599 26.03/0.8164 31.17/0.8384 27.27/0.7446 29.02/0.8785 28.82/0.7313 29.20/0.8106 28.65/0.8820 27.85/0.7837
DD 25.56/0.6589 23.56/0.6676 26.94/0.7758 26.77/0.7542 24.76/0.8397 28.37/0.6819 25.72/0.8058 32.22/0.8849 27.09/0.7366 27.68/0.8566 29.42/0.7615 29.94/0.8937 28.11/0.8641 27.40/0.7832
ZS-N2N 26.85/0.7287 26.73/0.7136 29.00/0.8282 28.30/0.8312 25.95/0.8462 28.71/0.7250 26.67/0.8248 31.31/0.8156 28.88/0.7940 29.03/0.8683 28.95/0.7282 29.78/0.8205 28.54/0.8768 28.36/0.8001
ZS-N2S 19.22/0.4963 15.44/0.3973 22.05/0.5859 21.55/0.5581 16.64/0.5734 25.76/0.6423 17.98/0.6324 20.90/0.6792 22.30/0.6170 21.29/0.7778 22.45/0.6535 17.04/0.6502 22.41/0.7964 20.39/0.6200
S2S 18.66/0.4947 24.97/0.7969 19.43/0.6054 20.69/0.5890 16.30/0.6168 15.08/0.3846 17.57/0.5267 15.53/0.7605 14.11/0.3892 21.34/0.7849 22.40/0.6795 13.60/0.6805 13.69/0.4890 17.95/0.5998
ZS-NCD 26.54/0.7128 30.65/0.8388 28.78/0.8332 29.46/0.8718 26.83/0.8847 29.14/0.7367 27.05/0.8393 32.99/0.8870 28.83/0.8027 30.18/0.8917 29.62/0.7656 31.56/0.8898 29.63/0.9023 29.33/0.8351
50 BM3D 24.66/0.5953 26.99/0.7854 25.31/0.6606 24.45/0.5245 22.30/0.7299 27.19/0.6140 23.52/0.7068 29.84/0.8400 24.77/0.6272 25.62/0.8492 27.34/0.7139 27.51/0.8657 25.59/0.7613 25.78/0.7134
JPEG2K 21.99/0.4807 21.81/0.5541 21.55/0.4898 21.41/0.3641 18.50/0.5254 23.43/0.4603 19.92/0.5254 24.51/0.6873 20.74/0.4169 19.95/0.5819 23.66/0.5810 21.26/0.6490 19.81/0.5670 21.43/0.5295
DIP 24.07/0.5778 23.06/0.5943 25.05/0.6773 24.10/0.6179 21.64/0.7142 26.07/0.5804 22.84/0.6871 28.34/0.7754 24.60/0.6173 25.68/0.7922 26.06/0.6398 25.60/0.7060 25.55/0.7926 24.82/0.6748
DD 23.87/0.5745 23.06/0.6436 24.01/0.6402 23.43/0.4732 21.37/0.6956 26.83/0.6127 22.73/0.6819 29.12/0.8434 24.08/0.6011 24.09/0.7616 26.82/0.7055 25.40/0.8137 24.41/0.7659 24.56/0.6779
ZS-N2N 24.41/0.5999 23.99/0.5573 25.19/0.6726 24.45/0.6513 21.87/0.7014 25.90/0.5732 22.72/0.6619 27.10/0.6419 24.91/0.6111 25.11/0.7571 25.35/0.5654 25.14/0.6243 24.62/0.7594 24.67/0.6444
ZS-N2S 22.32/0.5704 16.65/0.5507 22.29/0.6475 21.75/0.4859 15.72/0.3989 25.49/0.6110 18.56/0.6121 24.16/0.7812 22.50/0.5681 19.83/0.6154 19.54/0.5440 16.26/0.4869 23.00/0.7724 20.62/0.5880
S2S 14.08/0.3567 17.56/0.5046 14.95/0.3650 17.25/0.3204 13.55/0.3265 12.73/0.2916 13.31/0.2712 13.82/0.5715 12.16/0.2727 14.59/0.5444 17.16/0.4959 11.42/0.5186 12.16/0.2807 14.21/0.3938
ZS-NCD 24.32/0.6035 26.84/0.7333 25.70/0.7172 25.76/0.7385 23.14/0.7768 26.90/0.6401 23.74/0.7212 28.46/0.7716 25.46/0.6710 26.29/0.8075 26.40/0.6722 27.25/0.7856 26.10/0.8113 25.87/0.7269
Table 9: Set13 Denoising performance comparison under Poisson noise Poisson(α𝒙)/α\mathrm{Poisson}(\alpha{\bm{x}})/\alpha.
α\alpha Method Baboon Barbara Bridge Coastguard Comic Face Flowers Foreman Man Monarch Peppers PPT3 Zebra Average
15 BM3D 24.46/0.5651 26.63/0.7712 25.18/0.6462 24.45/0.5186 22.07/0.7101 27.46/0.6342 23.51/0.7017 29.05/0.7747 24.98/0.6405 25.60/0.8101 27.53/0.7041 26.16/0.7288 26.19/0.7802 25.64/0.6912
JPEG2K 22.01/0.4873 21.82/0.5515 21.77/0.5163 21.69/0.3727 18.47/0.5237 24.97/0.5289 20.42/0.5841 23.83/0.6996 21.69/0.4723 20.21/0.5809 24.17/0.6012 20.73/0.5858 21.04/0.6381 21.76/0.5494
DIP 24.30/0.5907 23.05/0.6010 25.46/0.7069 24.21/0.5996 21.79/0.7282 26.99/0.6150 23.41/0.7456 28.03/0.7686 25.10/0.6444 26.34/0.8136 26.26/0.6510 25.44/0.7088 26.37/0.8180 25.14/0.6916
DD 23.89/0.5750 23.16/0.6479 24.83/0.6873 23.80/0.5544 21.52/0.7107 27.22/0.6387 23.41/0.7339 29.25/0.8522 25.03/0.6514 24.67/0.7745 27.09/0.7156 25.12/0.7640 25.43/0.8020 24.96/0.7006
ZS-N2N 24.82/0.6330 24.11/0.5768 25.82/0.7281 25.35/0.6946 21.99/0.7055 27.11/0.6779 23.67/0.7490 27.20/0.6491 26.60/0.7290 25.99/0.7740 26.01/0.6047 25.02/0.6004 26.13/0.8190 25.37/0.6878
ZS-N2S 21.39/0.5390 17.46/0.4084 22.23/0.6428 21.83/0.5714 17.53/0.5771 25.14/0.6103 17.97/0.5082 24.33/0.7854 22.94/0.5959 21.15/0.7157 21.53/0.5467 19.43/0.6122 23.08/0.7725 21.23/0.6066
S2S 16.58/0.5042 21.66/0.6523 18.07/0.6269 22.14/0.6206 15.01/0.5319 24.18/0.6548 18.87/0.6736 14.80/0.7562 24.78/0.7188 17.73/0.7807 21.51/0.6626 12.61/0.5630 22.09/0.7728 19.23/0.6553
ZS-NCD 24.66/0.5807 27.63/0.7844 26.43/0.7443 26.40/0.7436 23.17/0.7781 27.53/0.6510 24.30/0.7680 29.58/00.8302 26.15/0.6997 26.46/0.7991 27.34/0.7196 26.68/0.7217 27.45/0.8440 26.44/0.7434
25 BM3D 20.61/0.4799 21.75/0.5241 22.93/0.6748 22.04/0.5620 19.94/0.6637 24.88/0.6358 22.35/0.7387 22.34/0.5167 23.11/0.5790 21.99/0.6799 22.75/0.5533 20.01/0.4239 23.48/0.7582 22.17/0.5992
JPEG2K 22.59/0.5060 22.38/0.5732 22.81/0.5904 22.12/0.4455 19.59/0.6087 25.30/0.5533 21.51/0.6444 25.19/0.7343 22.10/0.4841 21.52/0.6226 24.78/0.6181 22.07/0.6515 22.44/0.7057 22.65/0.5952
DIP 24.85/0.6243 24.02/0.5860 26.63/0.7646 25.41/0.6907 22.75/0.7640 27.42/0.6386 24.61/0.7935 29.42/0.8138 26.07/0.7119 27.02/0.8225 27.36/0.6839 26.86/0.7348 27.25/0.8467 26.13/0.7289
DD 24.51/0.6017 23.30/0.6625 25.78/0.7389 24.96/0.6226 22.80/0.7680 28.18/0.6808 24.65/0.7872 30.34/0.8723 25.91/0.6824 26.31/0.8306 28.07/0.7242 26.92/0.7823 26.80/0.8309 26.04/0.7373
ZS-N2N 25.81/0.6876 25.11/0.6387 27.17/0.7874 26.65/0.7562 23.45/0.7635 28.28/0.7277 24.95/0.8033 28.68/0.7031 28.10/0.7878 27.67/0.8276 27.47/0.6713 26.90/0.6864 27.43/0.8511 26.75/0.7455
ZS-N2S 19.29/0.4409 21.36/0.5393 21.65/0.6879 20.44/0.4223 16.56/0.4988 25.41/0.6417 15.88/0.5870 25.29/0.8037 19.35/0.6591 21.09/0.6962 24.71/0.6801 22.47/0.7677 22.00/0.7806 21.19/0.6312
S2S 17.40/0.5115 24.01/0.7441 18.56/0.6458 22.99/0.6967 15.48/0.5679 25.07/0.6826 19.74/0.7193 15.06/0.7385 26.12/0.7550 18.86/0.8094 23.26/0.6959 12.85/0.6302 22.89/0.8084 20.18/0.6927
ZS-NCD 25.23/0.6145 29.09/0.8189 27.60/0.7969 27.66/0.8053 24.50/0.8223 28.31/0.6908 25.65/0.8231 30.99/0.8571 27.21/0.7491 27.82/0.8309 28.40/0.7464 27.98/0.7490 28.42/0.8712 27.60/0.7827
50 BM3D 21.78/0.5516 22.44/0.5350 23.85/0.7114 22.60/0.6103 21.36/0.7094 25.06/0.6670 23.51/0.7921 23.08/0.4721 23.86/0.6477 22.82/0.6871 23.24/0.5494 21.35/0.4511 24.07/0.7814 23.00/0.6281
JPEG2K 23.83/0.5611 23.92/0.6612 24.26/0.6565 23.53/0.5775 21.09/0.6866 26.46/0.6041 22.88/0.7170 27.56/0.7817 23.77/0.5931 23.00/0.7123 26.46/0.6628 24.44/0.7320 24.00/0.7584 24.25/0.6696
DIP 25.67/0.6702 26.62/0.6890 28.12/0.8173 27.79/0.8136 24.29/0.8131 28.63/0.7149 26.30/0.8449 30.97/0.8395 27.29/0.7409 29.10/0.8779 28.24/0.7107 28.64/0.7849 28.74/0.8816 27.72/0.7845
DD 25.39/0.6439 23.57/0.6720 27.23/0.7930 26.69/0.7441 24.40/0.8285 28.69/0.7060 26.08/0.8358 32.04/0.8838 27.45/0.7575 27.94/0.8613 29.45/0.7641 29.37/0.8636 28.33/0.8740 27.43/0.7867
ZS-N2N 27.06/0.7422 26.44/0.7151 29.33/0.8535 28.72/0.8361 25.47/0.8308 29.57/0.7822 26.98/0.8566 30.77/0.7918 30.02/0.8491 29.50/0.8730 29.03/0.7330 29.21/0.7853 29.34/0.8968 28.57/0.8112
ZS-N2S 19.68/0.5011 19.80/0.4843 22.87/0.6603 19.48/0.3374 16.44/0.4783 18.21/0.4958 18.06/0.5203 25.22/0.8178 22.22/0.6118 21.95/0.7717 22.69/0.6847 19.84/0.7204 23.22/0.7586 20.75/0.6033
S2S 19.15/0.5499 27.76/0.8445 19.57/0.6641 23.63/0.7330 16.26/0.6303 26.79/0.7083 20.99/0.7695 15.34/0.7890 27.87/0.8113 21.54/0.8363 26.51/0.7211 13.43/0.6657 23.93/0.8512 21.75/0.7365
ZS-NCD 26.05/0.6622 30.15/0.8172 29.25/0.8571 29.83/0.8761 25.99/0.8641 29.36/0.7458 27.52/0.8790 32.51/0.8739 29.05/0.8147 29.27/0.8500 29.64/0.7647 29.53/0.7775 30.03/0.9081 29.09/0.8223

B.3 Kodak24 Dataset

For Gaussian denoising λ=(75,150,750)\lambda=(75,150,750) for noise levels σz=(15,25,50)\sigma_{z}=(15,25,50) and for Poisson denoising λ=(750,300,150)\lambda=(750,300,150) for α=(15,25,50)\alpha=(15,25,50). For BM3D Poisson denoising of α=(15,25,50)\alpha=(15,25,50) we set σBM3D=(50,25,15)\sigma_{\rm BM3D}=(50,25,15). We report the detailed results of AWGN denoising in Table 10, and Poisson noise denoising in Table 11.

Table 10: Kodak24 Denoising performance comparison under AWGN denoising 𝒩(0,σz2I)\mathcal{N}(0,\sigma_{z}^{2}I).
Method (σ\sigma) 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Average
JPEG2K (15) 25.20/0.7120 29.23/0.6996 29.81/0.7678 29.09/0.7299 25.55/0.7728 26.71/0.7343 28.97/0.7880 25.24/0.7730 29.29/0.7756 29.27/0.7540 27.34/0.7033 29.28/0.7219 24.73/0.7477 26.65/0.7137 29.23/0.7490 28.29/0.7205 28.86/0.7709 26.54/0.7195 27.76/0.7400 29.28/0.8055 27.21/0.7749 27.81/0.6983 30.82/0.8039 26.38/0.7215 27.86/0.7457
BM3D (15) 29.46/0.8549 33.06/0.8266 35.19/0.9096 33.30/0.8583 30.37/0.9023 31.00/0.8664 34.59/0.9384 30.37/0.9019 34.59/0.9083 34.38/0.8952 31.70/0.8475 34.12/0.8611 27.97/0.8295 30.60/0.8425 33.84/0.8781 32.69/0.8647 33.46/0.8898 30.48/0.8558 32.18/0.8568 33.84/0.8936 31.46/0.8932 31.60/0.8355 35.90/0.9205 30.83/0.8800 32.37/0.8754
DIP (15) 29.91/0.8724 31.98/0.7986 33.53/0.8588 32.07/0.8180 30.00/0.8875 30.85/0.8605 33.21/0.8851 29.54/0.8838 33.00/0.8551 32.71/0.8348 30.58/0.8144 32.79/0.8203 28.16/0.8619 30.37/0.8531 32.22/0.8231 32.26/0.8486 31.99/0.8492 29.78/0.8235 30.96/0.8111 32.66/0.8676 30.77/0.8396 30.82/0.8107 34.09/0.8716 29.94/0.8403 31.42/0.8454
DD (15) 26.10/0.7586 29.56/0.7297 30.98/0.8323 30.11/0.8023 26.17/0.8114 27.28/0.7715 30.92/0.8905 24.65/0.7766 31.34/0.8565 31.57/0.8480 28.45/0.7793 32.09/0.8177 23.42/0.6831 27.77/0.7891 29.31/0.7730 29.31/0.7730 30.93/0.8688 27.14/0.7826 28.25/0.7786 28.87/0.8549 28.03/0.8158 29.23/0.7820 31.88/0.8775 25.72/0.7853 28.71/0.8016
ZS-N2N (15) 31.13/0.8951 33.19/0.8418 34.04/0.8582 32.93/0.8463 30.38/0.8890 31.96/0.8826 33.51/0.8872 30.77/0.9045 33.75/0.8532 33.01/0.8367 32.15/0.8571 33.48/0.8290 28.98/0.8885 31.14/0.8681 32.96/0.8401 33.23/0.8687 33.21/0.8743 30.44/0.8587 32.65/0.8736 33.86/0.8856 32.51/0.8849 31.83/0.8416 33.85/0.8395 30.29/0.8548 32.30/0.8650
ZS-N2S (15) 18.03/0.4389 25.76/0.6821 17.06/0.6117 25.30/0.6915 18.35/0.5545 10.16/0.3461 22.51/0.7272 17.54/0.5125 15.42/0.6011 19.79/0.6228 23.15/0.6176 8.06/0.2929 19.24/0.3889 22.86/0.6371 23.91/0.7191 23.05/0.6010 16.88/0.5174 21.44/0.5637 11.59/0.5257 5.47/0.0930 22.77/0.6687 20.22/0.5615 22.33/0.7443 17.36/0.5761 18.68/0.5540
S2S (15) 25.73/0.7941 25.01/0.7193 23.30/0.8177 27.41/0.8106 22.73/0.7274 19.32/0.7144 29.75/0.9184 19.82/0.7318 29.36/0.8927 25.72/0.8723 22.71/0.7244 22.15/0.7985 19.75/0.5547 24.55/0.7461 17.09/0.7066 27.15/0.8337 21.30/0.7254 24.15/0.7016 28.58/0.7811 10.63/0.6711 23.49/0.8304 23.60/0.7693 22.17/0.8469 18.37/0.7790 23.08/0.7695
ZS-NCD (15) 31.28/0.9059 33.93/0.8669 35.61/0.9215 34.19/0.8876 31.63/0.9286 32.07/0.9009 35.34/0.9382 31.27/0.9222 35.03/0.9088 34.71/0.9097 32.40/0.8895 34.89/0.8939 28.61/0.8931 31.95/0.8978 34.37/0.8951 34.04/0.9004 34.04/0.9089 31.25/0.8860 33.26/0.8956 34.13/0.9096 32.45/0.8973 32.52/0.8762 35.83/0.9236 31.48/0.9046 33.18/0.9026
JPEG2K (25) 22.70/0.5639 27.54/0.6312 27.86/0.7130 27.10/0.6704 25.55/0.7728 24.21/0.6104 26.01/0.6863 22.11/0.6519 26.88/0.7440 26.57/0.7022 24.92/0.5912 27.20/0.6589 21.73/0.5638 24.45/0.5960 26.89/0.7150 26.60/0.6330 26.37/0.6632 24.10/0.5928 25.43/0.6575 26.43/0.7676 24.76/0.7002 25.40/0.5919 28.75/0.7779 23.70/0.6015 25.43/0.6550
BM3D (25) 26.98/0.7554 31.29/0.7717 32.74/0.8618 31.23/0.7994 27.56/0.8236 28.42/0.7789 31.87/0.9026 27.74/0.8497 32.20/0.8715 31.92/0.8480 29.29/0.7690 32.07/0.8068 25.21/0.6973 28.19/0.7513 31.75/0.8337 30.33/0.7813 31.03/0.8382 27.82/0.7616 30.14/0.7936 31.72/0.8522 28.89/0.8366 29.37/0.7491 33.59/0.8899 28.09/0.7966 29.98/0.8092
DIP (25) 26.90/0.7759 29.87/0.7204 30.91/0.7859 30.03/0.7510 26.40/0.7875 27.91/0.7697 30.77/0.8496 26.51/0.8059 30.41/0.7959 30.28/0.7784 27.77/0.7115 30.79/0.7586 25.52/0.7763 27.74/0.7625 30.16/0.7568 29.96/0.7803 29.72/0.7923 26.80/0.7322 28.77/0.7591 30.54/0.8216 28.41/0.7963 28.38/0.7238 31.72/0.8261 27.00/0.7388 28.90/0.7738
DD (25) 25.26/0.7193 29.52/0.7298 29.58/0.7641 28.73/0.7327 25.97/0.7902 26.42/0.7245 29.27/0.8238 23.99/0.7493 29.50/0.7825 29.41/0.7732 27.37/0.7223 30.11/0.7407 22.97/0.6570 26.79/0.7397 29.39/0.7752 28.17/0.7014 29.63/0.8146 26.21/0.7263 26.99/0.7077 28.03/0.8136 27.01/0.7501 28.00/0.7173 30.10/0.8084 25.12/0.7364 27.62/0.7496
ZS-N2N (25) 28.30/0.8249 30.67/0.7618 31.10/0.7676 30.28/0.7562 27.66/0.8231 29.01/0.7975 30.48/0.8057 27.82/0.8385 30.81/0.7545 30.24/0.7380 29.31/0.7645 30.69/0.7276 26.66/0.8113 28.32/0.7801 30.39/0.7501 30.46/0.7826 30.62/0.8000 28.07/0.7713 29.76/0.7902 30.95/0.8091 29.45/0.8093 29.09/0.7451 30.98/0.7461 27.99/0.7707 29.54/0.7798
ZS-N2S (25) 17.06/0.4493 26.46/0.6886 26.94/0.7692 24.96/0.7041 20.70/0.6776 21.31/0.5206 20.89/0.6756 16.54/0.5188 20.46/0.6864 22.14/0.7248 23.98/0.6349 10.42/0.4087 17.29/0.4015 22.67/0.6045 25.05/0.7175 25.18/0.6550 24.09/0.7115 21.99/0.5551 20.16/0.6362 5.76/0.3971 21.54/0.6619 22.72/0.5972 24.98/0.7995 18.52/0.5813 20.89/0.6156
S2S (25) 23.60/0.7118 20.04/0.6040 21.16/0.7692 23.86/0.7267 19.02/0.5867 17.49/0.7154 23.04/0.8244 17.92/0.6895 27.46/0.8618 24.39/0.8275 19.65/0.6347 21.04/0.7575 18.72/0.4512 22.24/0.5982 15.36/0.6307 25.01/0.7566 17.56/0.6245 20.93/0.5421 25.52/0.7214 10.09/0.6607 22.32/0.7695 22.60/0.6994 20.52/0.8077 17.67/0.7054 20.72/0.6949
ZS-NCD (25) 28.88/0.8364 31.87/0.7865 32.78/0.8226 30.38/0.7368 28.81/0.8615 29.58/0.8146 32.01/0.8524 28.87/0.8661 32.17/0.8171 32.18/0.8286 30.03/0.7961 32.64/0.8248 26.88/0.8315 29.50/0.8210 31.75/0.7962 31.14/0.7939 31.33/0.8108 29.02/0.8012 30.71/0.8008 31.77/0.8283 29.79/0.7852 30.14/0.7875 33.06/0.8351 29.21/0.8189 30.60/0.8144
JPEG2K (50) 20.49/0.3922 24.11/0.5988 24.31/0.6804 23.21/0.5842 19.99/0.4471 21.63/0.4500 22.79/0.5132 19.11/0.4891 22.96/0.4615 22.86/0.4371 21.92/0.4196 23.57/0.6529 19.47/0.3673 21.92/0.4560 22.37/0.4966 24.41/0.5511 22.58/0.4844 21.55/0.4266 22.13/0.4597 21.65/0.6122 21.96/0.4778 22.64/0.4896 23.40/0.7331 21.11/0.4517 22.17/0.5055
BM3D (50) 24.13/0.5866 29.05/0.6999 30.00/0.7943 28.58/0.7121 24.05/0.6675 25.50/0.6364 28.28/0.8224 24.25/0.7340 29.04/0.7971 28.61/0.7567 26.42/0.6604 29.58/0.7402 22.18/0.5110 25.41/0.6189 29.16/0.7708 27.76/0.6630 27.83/0.7463 24.82/0.6048 27.70/0.7210 28.92/0.7997 25.88/0.7341 26.97/0.6380 30.42/0.8371 24.92/0.6603 27.06/0.7047
DIP (50) 23.92/0.6204 27.50/0.6301 28.26/0.6983 27.23/0.6295 23.31/0.6552 24.92/0.6190 27.26/0.7310 22.73/0.6684 27.57/0.6856 28.32/0.7120 25.16/0.5884 27.95/0.6415 22.05/0.5876 24.69/0.6148 27.49/0.6667 26.78/0.6076 26.58/0.6644 24.01/0.5880 25.56/0.6271 27.48/0.7482 25.27/0.6876 25.56/0.5805 28.28/0.7283 23.67/0.6063 25.90/0.6494
DD (50) 22.80/0.5694 26.73/0.6471 27.70/0.7500 26.03/0.6491 22.99/0.6464 23.74/0.5735 25.83/0.6793 22.06/0.6528 26.30/0.7166 25.49/0.6800 24.86/0.5904 25.87/0.6003 21.31/0.5467 24.32/0.6093 26.48/0.6855 26.30/0.6032 26.06/0.7191 23.95/0.5820 24.49/0.5819 26.19/0.7493 24.27/0.6003 25.44/0.5967 27.24/0.7664 22.98/0.5957 24.98/0.6413
ZS-N2N (50) 24.35/0.6495 27.75/0.6501 27.35/0.5999 26.63/0.5774 23.65/0.6601 25.18/0.6166 26.54/0.6457 23.63/0.6966 27.08/0.5835 26.60/0.5708 25.61/0.5946 27.12/0.5673 23.06/0.6429 24.71/0.6124 27.01/0.6080 26.54/0.5792 26.80/0.6368 24.47/0.5828 25.95/0.6185 27.29/0.6791 25.51/0.6433 25.61/0.5731 26.87/0.5712 24.44/0.6037 25.82/0.6151
ZS-N2S (50) 19.72/0.5883 24.42/0.4968 26.65/0.7305 15.11/0.4906 15.88/0.5239 22.01/0.5405 25.43/0.7356 17.24/0.5660 16.06/0.5159 17.70/0.5480 23.87/0.6170 11.75/0.4637 14.02/0.3956 22.95/0.6008 25.16/0.6968 26.24/0.6367 25.18/0.7260 21.98/0.5817 14.07/0.5146 24.56/0.7294 12.48/0.3895 23.78/0.6031 17.04/0.6914 17.89/0.4751 20.05/0.5774
S2S (50) 19.95/0.4689 21.03/0.6290 17.13/0.6781 17.53/0.5853 14.73/0.3175 15.47/0.5126 20.89/0.6709 14.94/0.3918 21.78/0.7174 20.36/0.6616 15.85/0.4717 17.66/0.6762 15.68/0.2853 15.75/0.3939 12.24/0.5437 19.63/0.5559 13.91/0.4744 14.55/0.3322 18.09/0.5861 9.24/0.5644 18.79/0.5944 20.52/0.5298 16.35/0.6779 15.94/0.4615 17.00/0.5325
ZS-NCD (50) 26.01/0.7245 29.30/0.7205 30.20/0.7927 28.87/0.7287 25.60/0.7608 26.72/0.7133 29.05/0.8173 25.47/0.7785 29.75/0.7946 29.40/0.7790 27.18/0.7023 29.80/0.7531 23.91/0.6930 26.58/0.7086 29.38/0.7633 28.67/0.7191 28.76/0.7668 26.05/0.6914 28.22/0.7414 29.07/0.7850 27.17/0.7449 27.62/0.6922 30.48/0.8184 26.18/0.7240 27.89/0.7464
Table 11: Kodak24 Denoising performance performance under Poisson noise Poisson(α𝒙)/α\mathrm{Poisson}(\alpha\bm{x})/\alpha.
Method (α\alpha) 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Average
JPEG2K (15) 20.82/0.4172 25.15/0.6187 24.68/0.6894 23.43/0.5872 20.65/0.5202 21.66/0.4510 23.38/0.5545 19.29/0.5129 23.08/0.4707 23.18/0.4742 22.71/0.4834 23.41/0.6520 19.63/0.3773 22.63/0.5025 22.96/0.5573 24.58/0.5536 23.96/0.6000 22.39/0.5079 22.53/0.4785 21.03/0.5747 22.38/0.5103 22.79/0.4402 23.87/0.5752 21.32/0.4894 22.56/0.5249
BM3D (15) 24.18/0.5811 29.14/0.7007 30.21/0.8007 28.57/0.7097 24.18/0.6679 25.36/0.5866 28.76/0.8351 24.16/0.7193 29.14/0.7945 28.80/0.7620 26.59/0.6664 29.27/0.7134 22.06/0.4881 25.54/0.6146 28.84/0.7103 27.81/0.6562 28.18/0.7652 24.88/0.6056 27.66/0.7234 27.53/0.6116 25.90/0.7294 27.02/0.6352 30.39/0.8314 24.84/0.6514 27.04/0.6900
DIP (15) 24.26/0.6524 28.16/0.6587 28.70/0.7194 27.69/0.6423 24.37/0.7237 25.08/0.6237 27.75/0.7301 23.07/0.6912 27.66/0.6731 27.22/0.6660 25.76/0.6102 28.09/0.6682 22.64/0.6166 25.45/0.6635 28.03/0.6902 27.81/0.6691 27.56/0.7317 25.14/0.6761 25.90/0.6539 27.38/0.7276 25.82/0.7273 26.11/0.6091 28.85/0.7469 24.23/0.6551 26.37/0.6761
DD (15) 23.52/0.6317 27.46/0.6668 28.10/0.7644 26.64/0.6408 24.10/0.7155 24.21/0.5925 26.69/0.7177 22.50/0.6769 26.61/0.7161 26.03/0.6516 25.62/0.6349 25.96/0.5870 21.86/0.5989 25.00/0.6526 27.46/0.7212 26.63/0.6179 27.29/0.7523 25.14/0.6708 24.86/0.6072 26.23/0.7622 24.78/0.6287 26.01/0.6168 27.85/0.7655 23.62/0.6385 25.59/0.6679
ZS-N2N (15) 25.37/0.7129 29.02/0.7162 28.48/0.6631 27.79/0.6496 25.11/0.7471 25.80/0.6537 27.80/0.6982 24.17/0.7357 27.61/0.6054 27.20/0.5986 26.81/0.6704 27.29/0.5673 23.90/0.7183 25.92/0.6925 28.08/0.6750 27.91/0.6668 28.38/0.7375 26.19/0.7150 26.65/0.6696 27.83/0.7085 26.47/0.6953 26.44/0.6213 27.93/0.6353 25.10/0.6638 26.80/0.6757
ZS-N2S (15) 18.17/0.4762 23.25/0.5907 27.16/0.7496 24.53/0.6645 17.77/0.5489 22.42/0.5608 25.09/0.7159 15.75/0.3956 14.63/0.4183 22.84/0.6716 23.80/0.5989 24.71/0.6360 19.27/0.4577 22.51/0.5822 25.66/0.7098 26.49/0.6555 25.15/0.7713 22.36/0.6068 22.47/0.6694 24.92/0.7338 23.38/0.7072 23.98/0.6279 17.06/0.6786 20.40/0.5816 22.24/0.6170
S2S (15) 25.03/0.6917 28.20/0.7579 24.92/0.8189 26.33/0.7688 22.75/0.7643 17.10/0.6688 27.81/0.8779 17.18/0.6822 29.57/0.8712 24.12/0.7921 26.54/0.7360 18.80/0.7180 18.86/0.5165 23.71/0.6809 16.07/0.7693 24.51/0.7464 28.09/0.8149 23.76/0.7199 21.90/0.7145 10.74/0.7206 22.82/0.7762 22.77/0.6811 20.68/0.8252 18.13/0.6895 22.52/0.7418
ZS-NCD (15) 25.57/0.6860 29.16/0.7145 29.78/0.7999 29.01/0.7387 26.00/0.7874 26.21/0.6907 29.49/0.8493 25.77/0.7955 29.52/0.8022 29.13/0.7845 27.18/0.7206 29.44/0.7200 24.25/0.6846 26.65/0.7160 29.33/0.7604 26.32/0.7011 28.24/0.7903 25.60/0.6883 28.05/0.7459 27.71/0.6871 27.18/0.7588 27.51/0.6865 29.98/0.8425 25.69/0.7370 27.64/0.7432
JPEG2K (25) 21.37/0.4580 25.46/0.5785 25.60/0.6174 24.96/0.5626 22.01/0.6152 22.53/0.5087 24.46/0.6331 20.53/0.5871 24.45/0.5683 24.50/0.5692 23.64/0.5353 23.96/0.4892 20.38/0.4859 23.24/0.5303 24.52/0.6224 24.67/0.5549 25.02/0.6481 23.29/0.5701 23.46/0.5558 22.91/0.6487 23.16/0.5770 23.80/0.5007 25.79/0.6751 22.15/0.5401 23.58/0.5680
BM3D (25) 23.34/0.6677 26.19/0.6206 26.16/0.6566 25.34/0.5730 24.78/0.7690 22.26/0.5441 25.47/0.6135 22.19/0.7149 23.16/0.4315 24.18/0.5200 25.92/0.6327 20.97/0.3004 22.27/0.6244 24.40/0.6526 23.17/0.5810 25.33/0.5530 26.47/0.7299 26.02/0.7379 24.06/0.5471 20.88/0.3715 23.44/0.5145 24.16/0.5468 24.92/0.6625 23.97/0.6702 24.13/0.5931
DIP (25) 25.53/0.7181 28.94/0.6890 29.84/0.7694 28.81/0.6975 25.70/0.7741 25.92/0.6635 29.02/0.7859 24.40/0.7418 28.92/0.7396 28.41/0.7008 26.77/0.6669 28.88/0.6895 23.78/0.7052 26.65/0.7228 29.15/0.7423 28.60/0.7042 28.70/0.7762 26.27/0.7327 27.24/0.6992 28.76/0.7739 26.81/0.7320 27.00/0.6533 30.31/0.8012 25.41/0.7034 27.49/0.7243
DD (25) 24.50/0.6806 28.46/0.6993 28.85/0.7584 27.68/0.6746 24.91/0.7541 25.37/0.6639 27.98/0.7857 23.26/0.7120 27.68/0.7230 27.41/0.6725 26.63/0.6822 27.09/0.6650 22.51/0.6348 26.11/0.7108 28.33/0.7418 27.30/0.6488 28.62/0.7882 25.93/0.7216 25.81/0.6453 26.91/0.7499 25.76/0.6668 26.93/0.6640 28.85/0.7997 24.57/0.7007 26.56/0.7060
ZS-N2N (25) 26.74/0.7730 30.22/0.7610 29.87/0.7276 29.15/0.7157 26.59/0.8015 27.29/0.7272 29.32/0.7582 25.78/0.7910 29.07/0.6747 28.63/0.6673 28.19/0.7284 28.70/0.6385 25.27/0.7764 27.30/0.7542 29.35/0.7294 29.35/0.7345 29.69/0.7854 27.48/0.7688 28.20/0.7411 29.34/0.7639 28.05/0.7667 27.73/0.6866 29.42/0.7026 26.39/0.7239 28.21/0.7374
ZS-N2S (25) 19.86/0.5123 27.15/0.6869 27.66/0.7773 25.01/0.6794 16.99/0.5539 20.20/0.5373 22.75/0.6374 18.34/0.5867 23.41/0.6613 23.10/0.6715 23.82/0.6477 08.29/0.4261 18.89/0.4529 22.97/0.6329 25.75/0.7498 25.59/0.6726 24.26/0.7312 22.07/0.5632 20.01/0.6144 23.02/0.7161 19.75/0.6136 23.78/0.6429 17.86/0.7240 14.82/0.5742 21.47/0.6277
S2S (25) 26.06/0.7395 29.57/0.7862 25.56/0.8400 27.51/0.7961 23.50/0.8066 17.38/0.7114 29.35/0.9017 17.50/0.7124 27.17/0.8418 24.95/0.8210 27.41/0.7671 19.83/0.7251 19.36/0.5811 24.83/0.7263 16.35/0.7570 25.28/0.7770 29.17/0.8445 24.14/0.7567 22.71/0.7435 10.55/0.7026 23.30/0.8048 23.17/0.7129 21.39/0.8396 18.05/0.7221 23.09/0.7674
ZS-NCD (25) 27.28/0.7950 28.73/0.6916 31.52/0.8338 29.67/0.7383 28.10/0.8620 27.49/0.7226 30.51/0.8113 26.62/0.8213 29.38/0.6909 30.35/0.7690 29.00/0.7796 27.91/0.6114 25.54/0.7812 28.23/0.7943 30.12/0.7591 30.14/0.7763 30.21/0.8430 28.22/0.8104 29.15/0.7564 27.91/0.6373 28.02/0.7346 28.70/0.7297 31.31/0.8256 27.82/0.8012 28.77/0.7677
JPEG2K (50) 22.81/0.5792 27.64/0.6435 28.17/0.7405 27.37/0.6702 23.25/0.6815 24.13/0.5989 26.21/0.6923 22.05/0.6527 26.79/0.7217 26.61/0.6947 25.33/0.6203 26.58/0.6314 21.87/0.5654 24.74/0.6083 26.84/0.7123 26.93/0.6316 26.76/0.7079 24.65/0.6460 25.30/0.6374 25.51/0.7465 24.89/0.6912 25.48/0.5745 28.69/0.7914 23.80/0.6197 25.52/0.6608
BM3D (50) 23.61/0.6947 26.43/0.6614 25.75/0.5873 25.19/0.5574 25.63/0.8098 23.05/0.5705 24.91/0.5622 23.15/0.7419 23.07/0.4050 23.92/0.4606 25.46/0.6232 21.66/0.3188 23.39/0.7153 24.89/0.6969 24.36/0.6064 25.09/0.5461 26.54/0.7296 27.07/0.8022 24.16/0.5328 22.52/0.4178 23.61/0.5155 24.31/0.5637 25.32/0.6187 24.62/0.6807 24.49/0.6008
DIP (50) 26.99/0.7902 30.16/0.7388 31.29/0.8072 30.06/0.7482 27.92/0.8461 27.93/0.7588 30.76/0.8341 26.55/0.8115 30.48/0.7876 30.22/0.7813 28.36/0.7379 30.48/0.7423 25.72/0.7916 28.12/0.7881 30.28/0.7691 30.20/0.7849 30.03/0.8173 28.25/0.7989 28.90/0.7580 29.95/0.8063 28.48/0.7912 28.44/0.7227 31.92/0.8427 27.39/0.7732 29.12/0.7845
DD (50) 25.35/0.7243 29.68/0.7399 29.92/0.7853 28.91/0.7387 25.61/0.7889 26.45/0.7228 29.58/0.8315 24.06/0.7517 29.30/0.7749 29.46/0.7733 27.70/0.7395 29.40/0.7179 23.00/0.6655 27.12/0.7605 29.57/0.7827 28.35/0.7078 30.07/0.8418 25.34/0.6811 27.19/0.7155 28.06/0.8083 27.06/0.7424 28.18/0.7226 30.36/0.8311 25.35/0.7560 27.71/0.7543
ZS-N2N (50) 28.68/0.8398 31.92/0.8194 31.88/0.8066 31.08/0.7930 28.69/0.8626 29.28/0.8065 31.37/0.8279 27.95/0.8509 30.96/0.7565 30.41/0.7473 30.20/0.8052 30.48/0.7186 27.13/0.8442 29.20/0.8243 31.14/0.7981 31.13/0.8030 31.52/0.8435 29.31/0.8338 30.11/0.8114 31.31/0.8228 30.03/0.8245 29.53/0.7656 31.56/0.7839 28.24/0.7923 30.13/0.8076
ZS-N2S (50) 17.77/0.4038 23.17/0.5804 27.74/0.7819 24.06/0.6876 17.83/0.4760 16.78/0.4748 19.91/0.6418 16.97/0.5447 18.36/0.5424 19.84/0.6542 23.42/0.6709 22.99/0.6629 19.60/0.4294 20.94/0.5213 17.81/0.6187 22.65/0.6689 18.52/0.6975 21.88/0.5407 11.63/0.5187 14.93/0.5892 22.59/0.6971 23.64/0.5997 24.69/0.7912 18.31/0.5905 20.25/0.5993
S2S (50) 26.90/0.7813 30.64/0.8216 26.39/0.8623 29.02/0.8317 24.36/0.8434 17.63/0.7547 30.78/0.9235 18.04/0.7447 29.57/0.8712 25.66/0.8504 28.58/0.8014 20.91/0.7533 20.06/0.6555 26.36/0.7763 16.88/0.7924 26.40/0.8163 30.23/0.8731 24.61/0.7950 23.85/0.7780 10.47/0.7112 23.73/0.8345 23.65/0.7547 20.22/0.8513 18.09/0.7545 23.88/0.8014
ZS-NCD (50) 29.06/0.8523 31.29/0.7718 33.17/0.8727 31.40/0.7966 29.92/0.9048 29.23/0.7930 32.64/0.8616 28.61/0.8699 31.50/0.7770 32.10/0.8303 30.70/0.8341 30.38/0.7122 27.07/0.8446 29.97/0.8501 31.45/0.7971 31.81/0.8308 31.42/0.8767 30.02/0.8661 30.74/0.8154 29.37/0.7018 29.76/0.7945 30.20/0.7915 33.16/0.8656 29.52/0.8548 30.60/0.8235

B.4 Microscopy Mouse Nuclei Dataset

For these images with noise level σz=(10,20)\sigma_{z}=(10,20) we set λ=(200,600)\lambda=(200,600), we train the networks for 20K steps to obtain the results. We report the detailed denoising performance in Table 12 and 13 respectively.

Table 12: Denoising performance under AWGN 𝒩(0,σ2I)\mathcal{N}(0,\sigma^{2}I) on fluorescence microscopy dataset: Mouse Nucle. Images are cropped into 128×128128\times 128. Noise level σz=10\sigma_{z}=10.
# JPEG2K BM3D DIP DD ZS-N2N ZS-N2S S2S ZS-NCD
1 32.90/0.7954 38.88/0.9631 37.31/0.8973 37.73/0.9464 36.37/0.9356 34.70/0.9410 10.88/0.1687 39.03/0.9556
2 32.32/0.8300 37.53/0.9613 35.93/0.8909 36.46/0.9560 35.26/0.9345 28.78/0.8504 13.08/0.4000 36.83/0.9546
3 32.97/0.8584 38.43/0.9690 36.17/0.8482 37.03/0.9631 35.86/0.9405 31.53/0.9307 12.76/0.3374 37.81/0.9634
4 32.57/0.8418 38.05/0.9605 35.82/0.9107 36.70/0.9478 34.86/0.9066 32.13/0.8688 14.42/0.3639 37.51/0.9303
5 34.54/0.7646 41.53/0.9596 38.09/0.8268 40.02/0.9438 39.30/0.9252 29.75/0.7976 10.22/0.1165 40.93/0.9420
6 32.02/0.8860 37.49/0.9703 35.24/0.8997 36.05/0.9628 35.38/0.9491 30.63/0.8989 14.42/0.3931 37.26/0.9588
Average 32.89/0.8294 38.65/0.9640 36.43/0.8789 37.33/0.9533 36.17/0.9319 31.26/0.8812 12.63/0.2966 38.23/0.9508
Table 13: Denoising performance under AWGN 𝒩(0,σ2I)\mathcal{N}(0,\sigma^{2}I) on fluorescence microscopy dataset: Mouse Nucle. Images are cropped into 128×128128\times 128. Noise level σz=20\sigma_{z}=20.
# JPEG2K BM3D DIP DD ZS-N2N ZS-N2S S2S ZS-NCD
1 28.37/0.6337 35.10/0.9211 33.09/0.8485 33.70/0.8938 32.32/0.8609 32.59/0.8763 9.30/0.0240 34.98/0.8843
2 27.97/0.7255 33.80/0.9410 31.41/0.7986 32.39/0.9239 31.07/0.8421 28.42/0.8328 10.73/0.2383 33.75/0.9172
3 28.42/0.7121 34.45/0.9352 31.47/0.7642 32.64/0.9096 31.63/0.8660 31.08/0.9003 10.12/0.1807 34.25/0.9216
4 29.31/0.7557 34.30/0.9245 31.02/0.7598 32.71/0.9008 31.12/0.8168 30.60/0.8551 11.33/0.1947 33.87/0.8947
5 29.62/0.5932 38.50/0.9158 35.45/0.7763 37.18/0.9068 35.90/0.8585 32.89/0.8640 8.23/0.0650 37.70/0.9137
6 27.71/0.7713 33.61/0.9399 31.48/0.7860 32.41/0.9206 31.43/0.8750 26.87/0.8312 10.83/0.2328 33.72/0.9245
Average 28.57/0.6986 34.96/0.9296 32.32/0.7889 33.50/0.9092 32.25/0.8532 30.41/0.8600 10.09/0.1559 34.71/0.9093

B.5 Real Camera Noise Dataset PolyU

For these images with unknown noise model/level λ=25\lambda=25. Also for BM3D the best peroformance was achieved with setting σBM3D=15\sigma_{\mathrm{BM3D}}=15. We report the detailed denoising performance in Table 14.

Table 14: Real camera denoising performance on camera image dataset: PolyU. The dataset includes photos taken from 3 brands of cameras. Randomly selected 6 images are cropped into 512×512512\times 512.
Models C.plug11 C.bike10 N.flower1 N.plant10 S.plant13 S.door10 Average
JPEG2K 36.26 / 0.9615 34.23 / 0.9371 33.55 / 0.9194 36.74 / 0.9157 30.39 / 0.9001 34.84 / 0.9012 34.33 / 0.9225
BM3D 37.15 / 0.9758 34.85 / 0.9615 35.81 / 0.9504 38.40 / 0.9410 31.65 / 0.9465 36.43 / 0.9285 35.71 / 0.9506
DIP 37.62 / 0.9724 34.85 / 0.9534 34.93 / 0.9396 37.64 / 0.9256 31.50 / 0.9396 36.02 / 0.9145 35.43 / 0.9408
DD 36.79 / 0.9722 34.73 / 0.9566 34.85 / 0.9366 37.84 / 0.9327 30.91 / 0.9305 33.88 / 0.9084 34.83 / 0.9395
ZS-N2N 36.30 / 0.9621 33.18 / 0.8853 33.28 / 0.8974 36.21 / 0.8862 30.57 / 0.9052 34.89 / 0.8804 34.07 / 0.9028
ZS-N2S 22.76 / 0.9119 20.36 / 0.8133 25.20 / 0.8670 33.63 / 0.8920 21.33 / 0.8256 18.39 / 0.6966 23.61 / 0.8344
S2S 37.75 / 0.9765 33.56 / 0.9545 35.78 / 0.9537 38.30 / 0.9398 31.93 / 0.9483 36.65 / 0.9433 35.66 / 0.9527
ZS-NCD 36.99 / 0.9763 34.79 / 0.9586 35.43 / 0.9489 38.65 / 0.9449 31.79 / 0.9464 37.42 / 0.9451 35.84 / 0.9534

Appendix C Visual Comparisons

In this section, we provide more visualization comparison of the zero-shot denoisers. The reconstruction PSNR and SSIM are above the images.

Refer to caption
Figure 7: Kodim24 under Gaussian noise (σz=25\sigma_{z}=25).
Refer to caption
Figure 8: Kodim24 under Poisson noise (α=25\alpha=25).
Refer to caption
Figure 9: Mouse nuclei reconstruction comparison under Gaussian additive noise (σz=20\sigma_{z}=20).