This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Nonparametric denoising Signals of Unknown Local Structure, II: Nonparametric Regression Estimation

Anatoli Juditsky Corresponding author. anatoli.juditsky@imag.fr Arkadi Nemirovski nemirovs@isye.gatech.edu LJK, B.P. 53, 38041 Grenoble Cedex 9, France ISyE, Georgia Institute of Technology, 765 Ferst Drive, Atlanta GA 30332-0205 USA
Abstract

We consider the problem of recovering of continuous multi-dimensional functions ff from the noisy observations over the regular grid m1dm^{-1}\mathbb{Z}^{d}, mm\in\mathbb{N}_{*}. Our focus is at the adaptive estimation in the case when the function can be well recovered using a linear filter, which can depend on the unknown function itself. In the companion paper [26] we have shown in the case when there exists an adapted time-invariant filter, which locally recovers “well” the unknown signal, there is a numerically efficient construction of an adaptive filter which recovers the signals “almost as well”. In the current paper we study the application of the proposed estimation techniques in the non-parametric regression setting. Namely, we propose an adaptive estimation procedure for “locally well-filtered” signals (some typical examples being smooth signals, modulated smooth signals and harmonic functions) and show that the rate of recovery of such signals in the p\ell_{p}-norm on the grid is essentially the same as that rate for regular signals with nonhomogeneous smoothness.

keywords:
Nonparametric denoising, adaptive filtering, minimax estimation, nonparametric regression.

,

1 Introduction

Let 𝐅=(Ω,Σ,P){\mathbf{F}}=(\Omega,\Sigma,P) be a probability space. We consider the problem of recovering unknown complex-valued random field (sτ=sτ(ξ))τdξΩ(s_{\tau}=s_{\tau}(\xi))_{{\tau\in\mathbb{Z}^{d}\atop\xi\in\Omega}} over d\mathbb{Z}^{d} from noisy observations

yτ=sτ+eτ.y_{\tau}=s_{\tau}+e_{\tau}. (1)

We assume that the field (eτ)(e_{\tau}) of observation noises is independent of (sτ)(s_{\tau}) and is of the form eτ=σϵτe_{\tau}=\sigma\epsilon_{\tau}, where (ϵτ)(\epsilon_{\tau}) are independent of each other standard Gaussian complex-valued variables; the adjective “standard” means that (ϵτ)\Re(\epsilon_{\tau}), (ϵτ)\Im(\epsilon_{\tau}) are independent of each other 𝐍(0,1){\mathbf{N}}(0,1) random variables.

We suppose that the observations (1) come from a function (“signal”) ff of continuous argument (which we assume to vary in the dd-dimensional unit cube [0,1]d[0,1]^{d}); this function is observed in noise along an nn-point equidistant grid in [0,1]d[0,1]^{d}, and the problem is to recover ff via these observations. This problem fits the framework of nonparametric regression estimation with a “traditional setting” as follows:

A. The objective is to recover an unknown smooth function f:[0,1]df:\,[0,1]^{d}\to{\mathbb{R}}, which is sampled on the observation grid Γn={xτ=m1τ: 0τ1,,τdm}\Gamma_{n}=\{x_{\tau}=m^{-1}\tau:\,0\leq\tau_{1},...,\tau_{d}\leq m\} with (m+1)d=n(m+1)^{d}=n, so that sτ=f(xτ)s_{\tau}=f(x_{\tau}). The error of recovery is measured with some functional norm (or a semi-norm) \|\cdot\| on [0,1]d[0,1]^{d}, and the risk of recovery f^\widehat{f} of ff is the expectation Eff^f2E_{f}\|\widehat{f}-f\|^{2};

B. The estimation routines are aimed at recovering smooth signals, and their quality is measured by their maximal risks, the maximum being taken over ff running through natural families of smooth signals, e.g., Hölder or Sobolev balls;

C. The focus is on the asymptotic, as the volume of observations nn goes to infinity, behavior of the estimation routines, with emphasis on asymptotically minimax (nearly) optimal estimates – those which attain (nearly) best possible rates of convergence of the risks to 0 as the observation sample size nn\to\infty.

Initially, the research was focused on recovering smooth signals with a priori known smoothness parameters and the estimation routines were tuned to these parameters (see, e.g., [23, 34, 38, 24, 2, 31, 39, 22, 36, 21, 27]). Later on, there was a significant research on adaptive estimation. Adaptive estimation methods are free of a priori assumptions on the smoothness parameters of the signal to be recovered, and the primary goal is to develop the routines which exhibit asymptotically (nearly) optimal behavior on a wide variety of families of smooth functions (cf. [35, 28, 29, 30, 6, 8, 9, 25, 3, 7, 19]). For a more compete overview of results on smooth nonparametric regression estimation see, for instance, [33].333Our “brief outline” of adaptive approach to nonparametric regression would be severely incomplete without mentioning a novel approach aimed at recovering nonsmooth signals possessing sparse representations in properly constructed functional systems [5, 10, 4, 11, 12, 13, 14, 15, 16, 17, 37, 18]. This promising approach is completely beyond the scope of our paper.
The traditional focus on recovering smooth signals ultimately comes from the fact that such a signal locally can be well-approximated by a polynomial of a fixed order rr, and such a polynomial is an “easy to estimate” entity. Specifically, for every integer T0T\geq 0, the value of a polynomial pp at an observation point xtx_{t} can be recovered via (2T+1)d(2T+1)^{d} neighboring observations {xτ:|τjtj|T,1jd}\{x_{\tau}:|\tau_{j}-t_{j}|\leq T,1\leq j\leq d\} “at a parametric rate” – with the expected squared error Cσ2(2T+1)dC\sigma^{2}(2T+1)^{-d} which is inverse proportional to the amount (2T+1)d(2T+1)^{d} of the observations used by the estimate. The coefficient CC depends solely on the order rr and the dimensionality dd of the polynomial. The corresponding estimate p^(xt)\widehat{p}(x_{t}) of p(xt)p(x_{t}) is pretty simple: it is given by a “time-invariant filter”, that is, by convolution of observations with an appropriate discrete kernel q(T)=(qτ(T))τdq^{(T)}=(q^{(T)}_{\tau})_{\tau\in\mathbb{Z}^{d}} vanishing outside the box 𝐎T={τd:|τj|T,1jd}{\mathbf{O}}_{T}=\{\tau\in\mathbb{Z}^{d}:|\tau_{j}|\leq T,1\leq j\leq d\}:

p^(xt)=τ𝐎Tqτ(T)ytτ,\widehat{p}(x_{t})=\sum\limits_{\tau\in{\mathbf{O}}_{T}}q^{(T)}_{\tau}y_{t-\tau},

then the estimation f^\widehat{f} of f(xt)f(x_{t}) is taken as f^=p^(xt)\widehat{f}=\widehat{p}(x_{t}).

Note that the kernel q(T)q^{(T)} is readily given by the degree rr of the approximating polynomial, TT and dimension dd. The “classical” adaptation routines takes care of choosing “good” values of the approximation parameters (namely, TT and rr). On the other hand, the polynomial approximation “mechanism” is supposed to be fixed once for ever. Thus, in those procedures the “form” of the kernel is considered as given in advance.

In the companion paper [26] (referred hereafter as Part I) we have introduced the notion of a well-filtered signal. In brief, the signal (sτ)τd(s_{\tau})_{\tau\in\mathbb{Z}^{d}} is TT-well-filtered for some T+T\in\mathbb{N}_{+} if there is a filter (kernel) q=q(T)𝐎Tq=q^{(T)}\in\mathbf{O}_{T} which recovers (sτ)(s_{\tau}) in the box {u:|ut|3T}\{u:|u-t|\leq 3T\} with the mean square error comparable with σTd/2\sigma T^{-d/2}:

maxu:|ut|3TE{|suτ𝐎Tqτ(T)yuτ|2}O(σ2Td).\max\limits_{u:|u-t|\leq 3T}E\left\{|s_{u}-\sum\limits_{\tau\in{\mathbf{O}}_{T}}q^{(T)}_{\tau}y_{u-\tau}|^{2}\right\}\leq O(\sigma^{2}T^{-d}).

The universe of these signals is much wider than the one of smooth signals.As we have seen in Part I that it contains, in particular, “modulated smooth signals” – sums of a fixed number of products of smooth functions and multivariate harmonic oscillations of unknown (and arbitrarily high) frequencies. We have shown in Part I that whenever a discrete time signal (that is, a signal defined on a regular discrete grid) is well-filtered, we can recover this signal at a “nearly parametric” rate without a priori knowledge of the associated filter. In other words, a well-filtered signal can be recovered on the observation grid basically as well as if it were an algebraic polynomial of a given order.

We are about to demonstrate that the results of Part I on recovering well-filtered signals of unknown structure can be applied to recovering nonparametric signals which admit well-filtered local approximations. Such an extension has an unavoidable price – now we cannot hope to recover the signal well outside of the observation grid (a highly oscillating signal can merely vanish on the observation grid and be arbitrarily large outside it). As a result, in what follows we are interested in recovering the signals along the observation grid only and, consequently, replace the error measures based on the functional norms on [0,1]d[0,1]^{d} by their grid analogies.

The estimates to be developed will be “double adaptive”, that is, adaptive with respect to both the unknown in advance structures of well-filtered approximations of our signals and to the unknown in advance “approximation rate” – the dependence between the size of a neighborhood of a point where the signal in question is approximated and the quality of approximation in this neighborhood. Note that in the case of smooth signals, this approximation rate is exactly what is given by the smoothness parameters. The results to follow can be seen as extensions of the results of [32, 20] (see also [33]) dealing with the particular case of univariate signals satisfying differential inequalities with unknown differential operators.

2 Nonparametric regression problem

We start with the formal description of the components of the nonparametric regression problem.

Let for τd\tau\in\mathbb{Z}^{d}, |τ|=max{|τ1|,,|τd|}|\tau|=\max\{|\tau_{1}|,...,|\tau_{d}|\}, and let τm\tau\leq m for some aa\in\mathbb{N} denote τim,i=1,,d\tau_{i}\leq m,\;i=1,...,d. Let mm be a positive integer, n=(m+1)dn=(m+1)^{d}, and let Γn={x=m1α:αd,0α,|α|m}\Gamma_{n}=\{x=m^{-1}\alpha:\alpha\in\mathbb{Z}^{d},0\leq\alpha,|\alpha|\leq m\}.

Let C([0,1]d)C([0,1]^{d}) be the linear space of complex-valued fields over [0,1]d[0,1]^{d}. We associate with a signal fC([0,1]d)f\in C([0,1]^{d}) its observations along Γn\Gamma_{n}:

yyfn(ϵ)={yτyτn(f,ϵ)=f(m1τ)+eτ,eτ=σϵτ}0τm,y\equiv y^{n}_{f}(\epsilon)=\{y_{\tau}\equiv y_{\tau}^{n}(f,\epsilon)=f(m^{-1}\tau)+e_{\tau},e_{\tau}=\sigma\epsilon_{\tau}\}_{0\leq\tau\leq m}, (2)

where {ϵτ}τd\{\epsilon_{\tau}\}_{\tau\in\mathbb{Z}^{d}} are independent standard Gaussian complex-valued random noises. Our goal is to recover f|Γnf\big{|}_{\Gamma_{n}} from observations (2). In what follows, we write

fτ=f(m1τ),[τd,m1τ[0,1]d]f_{\tau}=f(m^{-1}\tau),\;\;\;\;\;{[\tau\in\mathbb{Z}^{d},m^{-1}\tau\in[0,1]^{d}]}

Below we use the following notations. For a set B[0,1]dB\subset[0,1]^{d}, we denote by (B)\mathbb{Z}(B) the set of all tdt\in\mathbb{Z}^{d} such that m1tBm^{-1}t\in B. We denote q,B\|\cdot\|_{q,B} the standard LpL_{p}-norm on BB:

gp,B=(xB|g(x)|p𝑑x)1/p,\|g\|_{p,B}=\left(\,\int_{x\in B}|g(x)|^{p}dx\right)^{1/p},

and |g|q,B|g|_{q,B} its discrete analogy, so that

|g|q,B=md/q(τ(B)|gτ|q)1/q.|g|_{q,B}=m^{-d/q}\left(\sum\limits_{\tau\in\mathbb{Z}(B)}|g_{\tau}|^{q}\right)^{1/q}.

We set

Γno=Γn(0,1)n={m1t:td,t>0,|t|<m}.\Gamma_{n}^{o}=\Gamma_{n}\cap(0,1)^{n}=\{m^{-1}t:t\in\mathbb{Z}^{d},t>0,|t|<m\}.

Let x=m1tΓnox=m^{-1}t\in\Gamma_{n}^{o}. We say that a nonempty open cube

Bh(x)={u|uixi|<h/2,i=1,,d}{B}_{h}(x)=\{u\mid\,|u_{i}-x_{i}|<h/2,\,\,i=1,...,d\}

centered at xx is admissible for xx, if Bh(x)[0,1]nB_{h}(x)\subset[0,1]^{n}. For such a cube, Th(x)T_{h}(x) denotes the largest nonnegative integer TT such that

(B){τd:|τt|4T}.\mathbb{Z}(B)\supset\{\tau\in\mathbb{Z}^{d}:|\tau-t|\leq 4T\}.

For a cube

B={xd:|xici|h/2,i=1,,d},B=\{x\in\mathbb{R}^{d}:|x_{i}-c_{i}|\leq h/2,\,i=1,...,d\},

D(B)=hD(B)=h stands for the edge of BB. For γ(0,1)\gamma\in(0,1) we denote

Bγ={xd:|xici|γh/2,i=1,,d}B_{\gamma}=\{x\in\mathbb{R}^{d}:|x_{i}-c_{i}|\leq\gamma h/2,i=1,...,d\}

the γ\gamma-shrinkage of BB to the center of BB.

2.1 Classes of locally well-filtered signals

Recall that we say that a function on [0,1]d[0,1]^{d} is smooth if it can be locally well-approximated by a polynomial. Informally, the the definition below sais that a continuous signal fC([0,1])df\in C([0,1])^{d} is locally well-filtered if ff admits a good local approximation by a well-filtered discrete signal ϕτ\phi_{\tau} on Γn\Gamma_{n} (see Definition 1 of Section 2.1, Part I).

Definition 1

Let B[0,1]dB\subset[0,1]^{d} be a cube, kk be a positive integer, ρ1\rho\geq 1, R0R\geq 0 be reals, and let p(d,]p\in(d,\infty]. The collection BB, kk, ρ\rho, RR, pp specifies the family 𝐅k,ρ,p(B,R){\mathbf{F}}^{k,\rho,p}(B,R) of locally well-filtered on BB signals ff defined by the following requirements:
(1) fC([0,1]d)f\in C([0,1]^{d});
(2) There exists a nonnegative function FLp(B),Fp,BR,F\in L_{p}(B),\;\|F\|_{p,B}\leq R, such that for every x=m1tΓnintBx=m^{-1}t\in\Gamma_{n}\cap\hbox{\rm int}B and for every admissible for xx cube Bh(x)B_{h}(x) contained in BB there exists a field ϕC(d)\phi\in C(\mathbb{Z}^{d}) such that ϕ𝐒3Th(x)t(0,ρ,Th(x))\phi\in{\mathbf{S}}^{t}_{3T_{h}(x)}(0,\rho,T_{h}(x)) (where the set 𝐒Lt(θ,ρ,T){\mathbf{S}}^{t}_{L}(\theta,\rho,T) of TT-well filtered signals is defined in Definition 1 of Part I) and

τ(Bh(x)):|ϕτfτ|hkd/pFp,Bh(x).\forall\tau\in\mathbb{Z}(B_{h}(x)):|\phi_{\tau}-f_{\tau}|\leq h^{k-d/p}\|F\|_{p,B_{h}(x)}. (3)

In the sequel, we use for 𝐅k,ρ,p(B;R){\mathbf{F}}^{k,\rho,p}(B;R) also the shortened notation 𝐅[ψ]{\mathbf{F}}[\psi], where ψ\psi stands for the collection of “parameters” (k,ρ,p,B,R)(k,\rho,p,B,R).

Remark The motivating example of locally well-filtered signals is that of modulated smooth signals as follows. Let a cube B[0,1]dB\subset[0,1]^{d}, p(d,]p\in(d,\infty], positive integers k,νk,\nu and a real R0R\geq 0 be given. Consider a collection of ν\nu functions g1,,gνC([0,1]d)g_{1},...,g_{\nu}\in C([0,1]^{d}) which are kk times continuously differentiable and satisfy the constraint

=1νDkgp,BR.\sum\limits_{\ell=1}^{\nu}\|D^{k}g_{\ell}\|_{p,B}\leq R.

Let ω()d\omega(\ell)\in\mathbb{R}^{d}, and let

f(x)==1νg(x)exp{iωT()x}.f(x)=\sum\limits_{\ell=1}^{\nu}g_{\ell}(x)\exp\{i\omega^{T}(\ell)x\}.

By the standard argument [1], whenever x=m1tΓnintBx=m^{-1}t\in\Gamma_{n}\cap\hbox{\rm int}B and Bh(x)B_{h}(x) is admissible for xx, the Taylor polynomial Φx()\Phi_{\ell}^{x}(\cdot), of order k1k-1, taken at xx, of ff_{\ell} satisfies the inequality

uBh(x)|Φx(u)f(u)|c1hkd/pFp,Bh(x),whereF(u)=|Dkf(u)|u\in B_{h}(x)\Rightarrow|\Phi_{\ell}^{x}(u)-f_{\ell}(u)|\leq c_{1}h^{k-d/p}\|F_{\ell}\|_{p,B_{h}(x)},\;\;\mbox{where}\;\;F_{\ell}(u)=|D^{k}f_{\ell}(u)|

(here and in what follows, cic_{i} are positive constants depending solely on dd, kk and ν\nu). It follows that if Φ(u)==1νΦx(u)exp{iωT()u}\Phi(u)=\sum\limits_{\ell=1}^{\nu}\Phi_{\ell}^{x}(u)\exp\{i\omega^{T}(\ell)u\} then

uBh(x)|Φ(u)f(u)|hkd/pFp,Bh(x),F=c2=1νF[Fp,Bc3R].\begin{array}[]{l}u\in B_{h}(x)\Rightarrow|\Phi(u)-f(u)|\leq h^{k-d/p}\|F\|_{p,B_{h}(x)},\\ F=c_{2}\sum\limits_{\ell=1}^{\nu}F_{\ell}\qquad[\Rightarrow\|F\|_{p,B}\leq c_{3}R].\\ \end{array} (4)

Now observe that the exponential polynomial ϕ(τ)=Φ(m1τ)\phi(\tau)=\Phi(m^{-1}\tau) belongs to 𝐒Lt(0,c4,T){\mathbf{S}}^{t}_{L}(0,c_{4},T) for any 0TL0\leq T\leq L\leq\infty (Proposition 10 of Part I). Combining this fact with (4), we conclude that f𝐅k,ρ(ν,k,d),p(B,c(ν,k,d)R).f\in{\mathbf{F}}^{k,\rho(\nu,k,d),p}(B,c(\nu,k,d)R).

2.2 Accuracy measures

Let us fix γ(0,1)\gamma\in(0,1) and q[1,]q\in[1,\infty]. Given an estimate f^n\widehat{f}_{n} of the restriction f|Γnf\big{|}_{\Gamma_{n}} of ff on the grid Γn{\Gamma_{n}}, based on observations (2) (i.e., a Borel real-valued function of xΓnx\in\Gamma_{n} and yny\in{\mathbb{C}}^{n}) and ψ=(k,ρ,p,B,R)\psi=(k,\rho,p,B,R), let us characterize the quality of the estimate on the set 𝐅[ψ]{\mathbf{F}}[\psi] by the worst-case risks

𝐑^q(f^n;𝐅[ψ])=supf𝐅[ψ](E{|f^n(;yf(ϵ))f|Γn()|q,Bγ2})1/2.\widehat{{\mathbf{R}}}_{q}\left(\widehat{f}_{n};{\mathbf{F}}[\psi]\right)=\sup_{f\in{\mathbf{F}}[\psi]}\left(E\left\{\left|\widehat{f}_{n}(\cdot;y_{f}(\epsilon))-f\big{|}_{\Gamma_{n}}(\cdot)\right|_{q,B_{\gamma}}^{2}\right\}\right)^{1/2}.

3 Estimator construction

The recovering routine we are about to build is aimed at estimating functions from classes 𝐅k,ρ,p(B,R){\mathbf{F}}^{k,\rho,p}(B,R) with unknown in advance parameters k,ρ,p,B,Rk,\rho,p,B,R. The only design parameters of the routine is an a priori upper bound μ\mu on the parameter ρ\rho and a γ(0,1)\gamma\in(0,1).

3.1 Preliminaries

From now on, we denote by Θ=Θ(n)\Theta=\Theta_{(n)} the deterministic function of observation noises defined as follows. For every cube B[0,1]dB\subset[0,1]^{d} with vertices in Γn\Gamma_{n}, we consider the discrete Fourier transform of the observation noises reduced to BΓnB\cap\Gamma_{n}, and take the maximum of modules of the resulting Fourier coefficients, let it be denoted θB(e)\theta_{B}(e). By definition,

ΘΘ(n)=σ1maxBθB(e),\Theta\equiv\Theta_{(n)}=\sigma^{-1}\max\limits_{B}\theta_{B}(e),

where the maximum is taken over all cubes BB of the indicated type. By the origin of Θ(n)\Theta_{(n)}, due to the classical results on maxima of Gaussian processes (cf also Lemma 15 of Part I), we have

w1:Prob{Θ(n)>wlnn}exp{cw2lnn2},\forall w\geq 1:\quad{\hbox{\rm Prob}}\left\{\Theta_{(n)}>w\sqrt{\ln n}\right\}\leq\exp\left\{-{c{w^{2}\ln n}\over 2}\right\}, (5)

where c>0c>0 depends solely on dd.

3.2 Building blocks: window estimates

To recover a signal ff via n=mdn=m^{d} observations (2), we use point-wise window estimates of ff defined as follows.

Let us fix a point x=m1tΓnox=m^{-1}t\in\Gamma_{n}^{o}; our goal is to build an estimate of f(x)f(x). Let Bh(x)B_{h}(x) be an admissible window for xx. We associate with this window an estimate f^nh=f^nh(x;yfn(ϵ))\widehat{f}^{h}_{n}=\widehat{f}^{h}_{n}(x;y^{n}_{f}(\epsilon)) of f(x)f(x) defined as follows. If the window is “very small”, specifically, hm1h\leq m^{-1}, so that xx is the only point from the observation grid Γn\Gamma_{n} in Bh(x)B_{h}(x), we set Th(x)=0T_{h}(x)=0 and f^nh=yt\widehat{f}^{h}_{n}=y_{t}. For a larger window, we choose the largest nonnegative integer T=Th(x)T=T_{h}(x) such that

(Bh(x)){τ:|τt|4T}\mathbb{Z}(B_{h}(x))\supset\{\tau:|\tau-t|\leq 4T\}

and apply Algorithm A of Part I to build the estimate of ft=f(x)f_{t}=f(x), the design parameters of the algorithm being (μ,Th(x))(\mu,T_{h}(x)). Let the resulting estimate be denoted by f^nh=f^nh(x;yfn(ϵ))\widehat{f}^{h}_{n}=\widehat{f}^{h}_{n}(x;y^{n}_{f}(\epsilon)).
To characterize the quality of the estimate f^nh=f^nh(x;yfn(ϵ))\widehat{f}^{h}_{n}=\widehat{f}^{h}_{n}(x;y^{n}_{f}(\epsilon)), let us set

Φμ(f,Bh(x))=minp{maxτ(Bh(x))|pτfτ|:p𝐒3Th(x)t(0,μ,Th(x))}.\Phi_{\mu}(f,{B}_{h}(x))=\min\limits_{p}\left\{\max\limits_{\tau\in\mathbb{Z}(B_{h}(x))}|p_{\tau}-f_{\tau}|:p\in{\mathbf{S}}^{t}_{3T_{h}(x)}(0,\mu,T_{h}(x))\right\}.
Lemma 2

One has

(fτ)𝐒3Th(x)t(θ,μ,Th(x)),θ=Φμ(f,Bh(x))(1+μ)(2T+1)d/2.(f_{\tau})\in{\mathbf{S}}^{t}_{3T_{h}(x)}(\theta,\mu,T_{h}(x)),\quad\theta={\Phi_{\mu}(f,B_{h}(x))(1+\mu)\over(2T+1)^{d/2}}. (6)

Assuming that h>m1h>m^{-1} and combining (6) with the result of Theorem 4 of Part I we come to the following upper bound on the error of estimating f(x)f(x) by the estimate f^nh(x;)\widehat{f}^{h}_{n}(x;\cdot):

|f(x)f^nh(x;yf(ϵ))|C1[Φμ(f,Bh(x))+σnhdΘ(n)]|f(x)-\widehat{f}^{h}_{n}(x;y_{f}(\epsilon))|\leq C_{1}\left[\Phi_{\mu}(f,B_{h}(x))+{\sigma\over{\sqrt{nh^{d}}}}\Theta_{(n)}\right] (7)

(note that (2Th(x)+1)d/2C0(nhd)1/2(2T_{h}(x)+1)^{-d/2}\leq C_{0}(nh^{d})^{-1/2}). For evident reasons (7) holds true for “very small windows” (those with hm1h\leq m^{-1}) as well.

3.3 The adaptive estimate

We are about to “aggregate” the window estimates f^nh\widehat{f}^{h}_{n} into an adaptive estimate, applying Lepskii’s adaptation scheme in the same fashion as in [30, 19, 20].
Let us fix a “safety factor” ω\omega in such a way that the event Θ(n)>ωlnn\Theta_{(n)}>\omega\sqrt{\ln n} is “highly un-probable”, namely,

Prob{Θ(n)>ωlnn}n4(μ+1);{\hbox{\rm Prob}}\left\{\Theta_{(n)}>\omega\sqrt{\ln n}\right\}\leq n^{-4(\mu+1)}; (8)

by (5), the required ω\omega may be chosen as a function of μ,d\mu,d only. We are to describe the basic blocks of the construction of the adaptive estimate.
“Good” realizations of noise. Let us define the set of “good realizations of noise” as

Ξn={ϵΘ(n)ωlnn}.\Xi_{n}=\{\epsilon\mid\,\Theta_{(n)}\leq\omega\sqrt{\ln n}\}. (9)

Now (7) implies the “conditional” error bound

ϵΞn|f(x)f^nh(x;yf(ϵ))|C1[Φμ(f,Bh(x))+Sn(h)],Sn(h)=σnhdωlnn.\begin{array}[]{l}\epsilon\in\Xi_{n}\Rightarrow|f(x)-\widehat{f}^{h}_{n}(x;y_{f}(\epsilon))|\leq C_{1}\left[\Phi_{\mu}(f,{B}_{h}(x))+S_{n}(h)\right],\\[8.53581pt] S_{n}(h)=\displaystyle{{\sigma\over{\sqrt{nh^{d}}}}\omega\sqrt{\ln n}}.\\ \end{array} (10)

Observe that as hh grows, the “deterministic term” Φμ(f,Bh(x))\Phi_{\mu}(f,{B}_{h}(x)) does not decrease, while the “stochastic term” Sn(h)S_{n}(h) decreases.

The “ideal” window. Let us define the ideal window B(x){B}_{*}(x) as the largest admissible window for which the stochastic term dominates the deterministic one:

B(x)=Bh(x)(x),h(x)=max{hh>0,Bh(x)[0,1]d,Φμ(f,Bh(x))Sn(h)}.\begin{array}[]{rcl}{B}_{*}(x)&=&{B}_{h_{*}(x)}(x),\\ h_{*}(x)&=&\max\{h\mid\,h>0,{B}_{h}(x)\subset[0,1]^{d},\Phi_{\mu}(f,{B}_{h}(x))\leq S_{n}(h)\}.\\ \end{array} (11)

Note that such a window does exist, since Sn(h)S_{n}(h)\to\infty as h+0h\to+0. Besides this, since the cubes Bh(x)B_{h}(x) are open, the quantity Φμ(f,Bh(x))\Phi_{\mu}(f,{B}_{h}(x)) is continuous from the left, so that

0<hh(x)Φμ(f,Bh(x))Sn(h).0<h\leq h_{*}(x)\Rightarrow\Phi_{\mu}(f,{B}_{h}(x))\leq S_{n}(h). (12)

Thus, the ideal window B(x){B}_{*}(x) is well-defined for every xx possessing admissible windows, i.e., for every x=Γno={m1t:td,0<t,|t|<m}x=\Gamma_{n}^{o}=\{m^{-1}t:t\in\mathbb{Z}^{d},0<t,|t|<m\}.
Normal windows. Assume that ϵΞn\epsilon\in\Xi_{n}. Then the errors of all estimates f^nh(x;y)\widehat{f}^{h}_{n}(x;y) associated with admissible windows smaller than the ideal one are dominated by the corresponding stochastic terms:

ϵΞn,0<hh(x)|f(x)f^nh(x;yf(ϵ))|2C1Sn(h)\epsilon\in\Xi_{n},0<h\leq h_{*}(x)\Rightarrow|f(x)-\widehat{f}^{h}_{n}(x;y_{f}(\epsilon))|\leq 2C_{1}S_{n}(h) (13)

(by (10) and (12)). Let us fix an ϵΞn\epsilon\in\Xi_{n} (and thus – a realization yy of the observations) and let us call an admissible for xx window Bh(x){B}_{h}(x) normal, if the associated estimate f^nh(x;y)\widehat{f}^{h}_{n}(x;y) differs from every estimate associated with a smaller window by no more than 4C14C_{1} times the stochastic term of the latter estimate, i.e.

Window Bh(x) is normal{Bh(x) is admissibleh,0<hh:|f^nh(x;y)f^nh(x;y)|4C1Sn(h)[y=yf(ϵ)]\begin{array}[]{c}\hbox{Window ${B}_{h}(x)$ is normal}\\ \Updownarrow\\ \left\{\begin{array}[]{l}\hbox{${B}_{h}(x)$ is admissible}\\ \forall h^{\prime},0<h^{\prime}\leq h:\quad|\widehat{f}^{h^{\prime}}_{n}(x;y)-\widehat{f}^{h}_{n}(x;y)|\leq 4C_{1}S_{n}(h^{\prime})\quad[y=y_{f}(\epsilon)]\\ \end{array}\right.\\ \end{array} (14)

Note that if xΓnox\in\Gamma_{n}^{o}, then xx possesses a normal window, specifically, the window Bm1(x)B_{m^{-1}}(x). Indeed, this window contains a single observation point, namely, xx itself, so that the corresponding estimate, same as every estimate corresponding to a smaller window, by construction coincides with the observation at xx, so that all the estimates f^nh(x;y)\widehat{f}^{h^{\prime}}_{n}(x;y), 0<hm10<h^{\prime}\leq m^{-1}, are the same. Note also that (13) implies that

(!) If ϵΞn\epsilon\in\Xi_{n}, then the ideal window B(x){B}_{*}(x) is normal.

The adaptive estimate f^n(x;y)\widehat{f}_{n}(x;y). The property of an admissible window to be normal is “observable” – given observations yy, we can say whether a given window is or is not normal. Besides this, it is clear that among all normal windows there exists the largest one B+(x)=Bh+(x)(x){B}^{+}(x)={B}_{h^{+}(x)}(x). The adaptive estimate f^n(x;y)\widehat{f}_{n}(x;y) is exactly the window estimate associated with the window B+(x)B^{+}(x). Note that from (!) it follows that

(!!) If ϵΞn\epsilon\in\Xi_{n}, then the largest normal window B+(x){B}^{+}(x) contains the ideal window B(x){B}_{*}(x).

By definition of a normal window, under the premise of (!!) we have

|f^nh+(x)(x;y)f^nh(x)(x;y)|4C1Sn(h(x)),|\widehat{f}^{h^{+}(x)}_{n}(x;y)-\widehat{f}^{h_{*}(x)}_{n}(x;y)|\leq 4C_{1}S_{n}(h_{*}(x)),

and we come to the conclusion as follows:
(*) If ϵΞn\epsilon\in\Xi_{n}, then the error of the estimate f^n(x;y)f^nh+(x)(x;y)\widehat{f}_{n}(x;y)\equiv\widehat{f}^{h^{+}(x)}_{n}(x;y) is dominated by the error bound (10) associated with the ideal window:

ϵΞn|f^n(x;y)f(x)|5C1[Φμ(f,Bh(x)(x))+Sn(h(x))].\epsilon\in\Xi_{n}\Rightarrow|\widehat{f}_{n}(x;y)-f(x)|\leq 5C_{1}\left[\Phi_{\mu}(f,{B}_{h_{*}(x)}(x))+S_{n}(h_{*}(x))\right]. (15)

Thus, the estimate f^n(;)\widehat{f}_{n}(\cdot;\cdot) – which is based solely on observations and does not require any a priori knowledge of the “parameters of well-filterability of ff” – possesses basically the same accuracy as the “ideal” estimate associated with the ideal window (provided, of course, that the realization of noises is not “pathological”: ϵΞn\epsilon\in\Xi_{n}).
Note that the adaptive estimate f^n(x;y)\widehat{f}_{n}(x;y) we have built depends solely on “design parameters” μ\mu, γ\gamma (recall that C1C_{1} depends on μ,γ\mu,\gamma), the volume of observations nn and the dimension dd.

4 Main result

Our main result is as follows:

Theorem 3

Let γ(0,1)\gamma\in(0,1), μ1\mu\geq 1 be an integer, let 𝐅=𝐅k,ρ,p(B;R){\mathbf{F}}={\mathbf{F}}^{k,\rho,p}(B;R) be a family of locally well-filtered signals associated with a cube B[0,1]dB\subset[0,1]^{d} with mD(B)1mD(B)\geq 1, ρμ\rho\leq\mu and p>dp>d. For properly chosen P1P\geq 1 depending solely on μ,d,p,γ\mu,d,p,\gamma and nonincreasing in p>dp>d the following statement holds true:

Suppose that the volume n=mdn=m^{d} of observations (2) is large enough, namely,

P1n2kp+d(p2)2dpRσnlnnP[D(B)]2kp+d(p2)2pP^{-1}n^{{{2kp+d(p-2)}\over{2dp}}}\geq{R\over{\sigma}}\sqrt{n\over{\ln n}}\geq P[D(B)]^{-{{2kp+d(p-2)}\over 2p}} (16)

where D(B)D(B) is the edge of the cube BB.

Then for every q[1,]q\in[1,\infty] the worst case, with respect to 𝐅{\mathbf{F}}, qq-risk of the adaptive estimate f^n(,)\widehat{f}_{n}(\cdot,\cdot) associated with the parameter μ\mu can be bounded as follows:

𝐑^q(f^n;𝐅)\displaystyle\widehat{{\mathbf{R}}}_{q}\left(\widehat{f}_{n};{\mathbf{F}}\right) =\displaystyle= supf𝐅(E{|f^n(;yf(ϵ))f()|q,Bγ2})1/2\displaystyle\sup\limits_{f\in{\mathbf{F}}}\left(E\left\{\left|\widehat{f}_{n}(\cdot;y_{f}(\epsilon))-f(\cdot)\right|_{q,B_{\gamma}}^{2}\right\}\right)^{1/2}
\displaystyle\leq PR(σ2lnnR2n)β(p,k,d,q)[D(B)]dλ(p,k,d,q),\displaystyle PR\left({\sigma^{2}\ln n\over R^{2}n}\right)^{\beta(p,k,d,q)}[D(B)]^{d\lambda(p,k,d,q)},

where

β(p,k,d,q)\displaystyle\beta(p,k,d,q) =\displaystyle= {k2k+d,whenq(2k+d)pd,k+d(1q1p)2k+d2dp,whenq>(2k+d)pd,\displaystyle\left\{\begin{array}[]{lcl}{k\over{2k+d}},&\mbox{when}&{q}\leq{(2k+d)p\over d},\\ {{k+d\left({1\over q}-{1\over p}\right)}\over{2k+d-{2d\over p}}},&\mbox{when}&{q}>{(2k+d)p\over d},\end{array}\right.
λ(p,k,d,q)\displaystyle\lambda(p,k,d,q) =\displaystyle= {1qd(2k+d)p,whenq(2k+d)pd,0,whenq>(2k+d)pd\displaystyle\left\{\begin{array}[]{lcl}{1\over q}-{d\over{(2k+d)p}},&\mbox{when}&{q}\leq{(2k+d)p\over d},\\ 0,&\mbox{when}&{q}>{(2k+d)p\over d}\end{array}\right.

(recall that here BγB_{\gamma} is the concentric to BB γ\gamma times smaller cube).

Note that the rates of convergence to 0, as nn\to\infty, of the risks 𝐑^q(f^n;𝐅)\widehat{{\mathbf{R}}}_{q}\left(\widehat{f}_{n};{\mathbf{F}}\right) of our adaptive estimate on the families 𝐅=𝐅k,ρ,p(B;R){\mathbf{F}}={\mathbf{F}}^{k,\rho,p}(B;R) are exactly the same as those stated by Theorem 3 from [31] (see also [30, 9, 19, 33]) in the case of recovering non-parametric smooth regression functions from Sobolev balls. It is well-known that in the smooth case the latter rates are optimal in order, up to logarithmic in nn factors. Since the families of locally well-filtered signals are much wider than local Sobolev balls (smooth signals are trivial examples of modulated smooth signals!), it follows that the rates of convergence stated by Theorem 3 also are nearly optimal.

5 Simulation examples

In this section we present the results of a small simulation study of the adaptive filtering algorithm applied to the 2-dimensional de-noising problem. The simulation setting is as follows: we consider real-valued signals

yτ=sτ+eτ,τ=(τ1,τ2){1,,m}2,y_{\tau}=s_{\tau}+e_{\tau},\;\;\tau=(\tau_{1},\,\tau_{2})\in\{1,...,m\}^{2},

e(1,1),,e(m,m)e_{(1,1)},...,\,e_{(m,m)} being independent standard Gaussian random variables. The problem is to estimate, given observations (yτ)(y_{\tau}), the values of the signal (fxτ)(f_{x_{\tau}}) on the grid Γm={m1τ, 1τ1,τ2m}\Gamma_{m}=\{m^{-1}\tau,\;1\leq\tau_{1},\tau_{2}\leq m\}, and f(xτ)=sτf({x_{\tau}})=s_{\tau}. The value m=128m=128 is common to all experiments.

We consider signals which are sums of three harmonic components:

sτ=α[sin(m1ω1Tτ+θ1)+sin(m1ω2Tτ+θ2)+sin(m1ω3Tτ+θ3)];s_{\tau}=\alpha[\sin(m^{-1}\omega_{1}^{T}\tau+\theta_{1})+\sin(m^{-1}\omega_{2}^{T}\tau+\theta_{2})+\sin(m^{-1}\omega_{3}^{T}\tau+\theta_{3})];

the frequencies ωi\omega_{i} and the phase shifts θi\theta_{i}, i=1,,3i=1,...,3 are drawn randomly from the uniform distribution over, respectively, [0,ωmax]3[0,\omega_{\max}]^{3} and [0,1]3[0,1]^{3} and the coefficient α\alpha is chosen to obtain the signal-to-noise ratio equal to one.

In the simulations we present here we compared the result of adaptive recovery with T=10T=10 to that of a “standard nonparametric recovery”, i.e. the recovery by the locally linear estimator with square window. We have done k=100k=100 independent runs for each of eight values of ωmax\omega_{\max},

ωmax={1.0, 2.0, 4.0, 8.0 16.0, 32.0, 64.0, 128.0}.\omega_{\max}=\{1.0,\,2.0,\,4.0,\,8.0\,16.0,\,32.0,\,64.0,\,128.0\}.

In Table 1 we summarize the results for the mean integrated squared error (MISE) of the estimation,

MISE=1100m2j=1100τ=(1,1)(m,m)(s^τ(j)sτ(j))2.MISE=\sqrt{{1\over 100m^{2}}\sum_{j=1}^{100}\sum_{\tau=(1,1)}^{(m,m)}(\widehat{s}^{(j)}_{\tau}-s^{(j)}_{\tau})^{2}}.

The observed phenomenon is rather expectable: for slowly oscillating signals the quality of the adaptive recovery is slightly worse than that of “standard recovery”, which are tuned for estimation of regular signals. When we rise the frequency of the signal components, the adaptive recovery stably outperforms the standard recovery. Finally, standard recovery is clearly unable to recover highly oscillating signals (cf Figures 1-4).

Table 1: MISE of adaptive recovery
Standard Adaptive
ωmax\omega_{\max} recovery recovery
1.01.0 0.12 0.1
2.02.0 0.20 0.12
4.04.0 0.36 0.18
8.08.0 0.54 0.27
16.016.0 0.79 0.25
32.032.0 0.75 0.29
64.064.0 0.27 0.98
128.0128.0 0.24 1.00

Appendix

We denote C(d)C(\mathbb{Z}^{d}) the linear space of complex-valued fields over d\mathbb{Z}^{d}. A field rC(d)r\in C(\mathbb{Z}^{d}) with finitely many nonzero entries rτr_{\tau} is called a filter. We use the commun notation Δj\Delta_{j}, j=1,,dj=1,...,d, for the “basic shift operators” on C(d)C(\mathbb{Z}^{d}):

(Δjr)τ1,,τd=rτ1,,τj1,τj1,τj+1,,τd.(\Delta_{j}r)_{\tau_{1},...,\tau_{d}}=r_{\tau_{1},...,\tau_{j-1},\tau_{j}-1,\tau_{j+1},...,\tau_{d}}.

and denote r(Δ)xr(\Delta)x the output of a filter rr, the input to the filter being a field xC(d)x\in C(\mathbb{Z}^{d}), so that (r(Δ)x)t=τrτxtτ.(r(\Delta)x)_{t}=\sum\limits_{\tau}r_{\tau}x_{t-\tau}.

5.1 Proof of Lemma 2.

To save notation, let B=Bh(x)B=B_{h}(x) and T=Th(x)T=T_{h}(x). Let pC(d)p\in C(\mathbb{Z}^{d}) be such that p𝐒3Tt(0,μ,T)p\in{\mathbf{S}}^{t}_{3T}(0,\mu,T) and |pτfτ|Φμ(f,Bh(x))|p_{\tau}-f_{\tau}|\leq\Phi_{\mu}(f,{B}_{h}(x)) for all τ(Bh(x))\tau\in\mathbb{Z}(B_{h}(x)). Since p𝐒3Tt(0,μ,T)p\in{\mathbf{S}}^{t}_{3T}(0,\mu,T), there exists a filter qCT(d)q\in C_{T}(\mathbb{Z}^{d}) such that |q|2μ(2T+1)d/2|q|_{2}\leq\mu(2T+1)^{-d/2} and (q(Δ)p)τ=pτ(q(\Delta)p)_{\tau}=p_{\tau} whenever |τt|3T|\tau-t|\leq 3T. Setting δτ=fτpτ\delta_{\tau}=f_{\tau}-p_{\tau}, we have for any τ\tau , |τt|3T|\tau-t|\leq 3T,

|fτ(q(Δ)f)τ||δτ|+|pτ(q(Δ)p)τ|+|(q(Δ)δ)τ|\displaystyle|f_{\tau}-(q(\Delta)f)_{\tau}|\leq|\delta_{\tau}|+|p_{\tau}-(q(\Delta)p)_{\tau}|+|(q(\Delta)\delta)_{\tau}|
\displaystyle\leq Φμ(f,Bh(x))+|q|1max{|δν|:|ντ|T}Φμ(f,Bh(x))\displaystyle\Phi_{\mu}(f,B_{h}(x))+|q|_{1}\max\{|\delta_{\nu}|:|\nu-\tau|\leq T\}\leq\Phi_{\mu}(f,B_{h}(x))
+|q|2(2T+1)d/2Φμ(f,Bh(x))max{|δν|:|ντ|T}\displaystyle+|q|_{2}(2T+1)^{d/2}\Phi_{\mu}(f,B_{h}(x))\max\{|\delta_{\nu}|:|\nu-\tau|\leq T\}
[note that |τt|3T|\tau-t|\leq 3T and |ντ|T|\nu-\tau|\leq T implies |νt|4T|\nu-t|\leq 4T]
\displaystyle\leq Φμ(f,Bh(x))(1+μ)=Φμ(f,Bh(x))(1+μ)(2T+1)d/2(2T+1)d/2\displaystyle\Phi_{\mu}(f,B_{h}(x))(1+\mu)={\Phi_{\mu}(f,B_{h}(x))(1+\mu)\over(2T+1)^{d/2}}(2T+1)^{-d/2}

as required in (6). ∎

5.2 Proof of Theorem 3

In the main body of the proof, we focus on the case p,q<p,q<\infty; the case of infinite pp and/or qq will be considered at the concluding step 50.
Let us fix a family of well-filtered signals 𝐅=𝐅dk,ρ,p(B;R){\mathbf{F}}={\mathbf{F}}^{k,\rho,p}_{d}(B;R) with the parameters satisfying the premise of Theorem 3 and a function ff from this class.
Recall that by the definition of 𝐅{\mathbf{F}} there exists a function F0F\geq 0, Fp,BR\|F\|_{p,B}\leq R, such that for all x=m1t(intB)Γnx=m^{-1}t\in(\hbox{\rm int}B)\cap\Gamma_{n} and all h,Bh(x)Bh,\;B_{h}(x)\subset B:

Φμ(f,Bh(x))P1hkd/pΩ(f,Bh(x)),Ω(f,B)=(BFp(u)𝑑u)1/p.\Phi_{\mu}(f,{B}_{h}(x))\leq P_{1}h^{k-d/p}\Omega(f,{B}_{h}(x)),\,\,\Omega(f,B^{\prime})=\left(\displaystyle{\int\limits_{B^{\prime}}}F^{p}(u)du\right)^{1/p}. (20)

From now on, PP (perhaps with sub- or superscripts) are quantities 1\geq 1 depending on μ,d,γ,p\mu,d,\gamma,p only and nonincreasing in p>dp>d.

10. We need the following auxiliary result:

Lemma 4

Assume that

nkd/pdlnnP1(μ+3)kd/p+d/2Rσω.n^{{{k-d/p}\over{d}}}\sqrt{\ln n}\geq P_{1}(\mu+3)^{k-d/p+d/2}{R\over{\sigma\omega}}. (21)

Given a point xΓnBγx\in\Gamma_{n}\cap B_{\gamma}, let us choose the largest h=h(x)h=h(x) such that

(a):h(1γ)D(B),(b):P1hkd/pΩ(f,Bh(x))Sn(h).\begin{array}[]{cl}(a):&h\leq(1-\gamma)D(B),\\ (b):&P_{1}h^{k-d/p}\Omega(f,{B}_{h}(x))\leq S_{n}(h).\end{array} (22)

Then h(x)h(x) is well-defined and

h(x)m1.h(x)\geq m^{-1}. (23)

Besides this, the error at xx of the adaptive estimate f^n\widehat{f}_{n} as applied to ff can be bounded as follows:

|f^n(x;y)f(x)|C2[Sn(h(x))𝟏{ϵΞn}+σΘ(n)𝟏{ϵΞn}]|\widehat{f}_{n}(x;y)-f(x)|\leq C_{2}\left[S_{n}(h(x))\mathbf{1}\{\epsilon\in\Xi_{n}\}+\sigma\Theta_{(n)}\mathbf{1}\{\epsilon\not\in\Xi_{n}\}\right] (24)

Proof: The quantity h(x)h(x) is well-defined, since for small positive hh the left hand side in (22.bb) is close to 0, while the right hand side one is large. From (21) it follows that h=m1h=m^{-1} satisfies (22.aa), so that Bm1(x)B{B}_{m^{-1}}(x)\subset B. Moreover, (21.bb) implies that

P1mk+d/pRSn(m1);P_{1}m^{-k+d/p}R\leq S_{n}(m^{-1});

the latter inequality, in view of Ω(f,Bm1(x))R\Omega(f,{B}_{m^{-1}}(x))\leq R, says that h=m1h=m^{-1} satisfies (22.bb) as well. Thus, h(x)m1h(x)\geq m^{-1}, as claimed in (23).

Consider the window Bh(x)(x){B}_{h(x)}(x). By (22.aa) it is admissible for xx, while from (22.bb) combined with (20) we get Φμ(f,Bh(x)(x))Sn(h).\Phi_{\mu}(f,{B}_{h(x)}(x))\leq S_{n}(h). It follows that the ideal window B(x){B}_{*}(x) of xx is not smaller than Bh(x)(x){B}_{h(x)}(x).

Assume that ϵΞn\epsilon\in\Xi_{n}. Then, according to (15), we have

|f^n(x;y)f(x)|5C1[Φμ(f,Bh(x)(x))+Sn(h(x))].|\widehat{f}_{n}(x;y)-f(x)|\leq 5C_{1}\left[\Phi_{\mu}(f,{B}_{h_{*}(x)}(x))+S_{n}(h_{*}(x))\right]. (25)

Now, by the definition of an ideal window, Φμ(f,Bh(x)(x))Sn(h(x))\Phi_{\mu}(f,{B}_{h_{*}(x)}(x))\leq S_{n}(h_{*}(x)), and the right hand side in (25) does not exceed 10C1Sn(h(x))10C1Sn(h(x))10C_{1}S_{n}(h_{*}(x))\leq 10C_{1}S_{n}(h(x)) (recall that, as we have seen, h(x)h(x)h_{*}(x)\geq h(x)), as required in (24).

Now let ϵΞn\epsilon\not\in\Xi_{n}. Note that f^n(x;y)\widehat{f}_{n}(x;y) is certain estimate f^h(x;y)\widehat{f}^{h}(x;y) associated with a centered at xx and admissible for xx cube Bh(x){B}_{h}(x) which is normal and such that hm1h\geq m^{-1} (the latter – since the window Bm1(x)B_{m^{-1}}(x) always is normal, and Bh(x)B_{h}(x) is the largest normal window centered at xx). Applying (14) with h=m1h^{\prime}=m^{-1} (so that f^nh(x;y)=f(x)+σϵt\widehat{f}^{h^{\prime}}_{n}(x;y)=f(x)+\sigma\epsilon_{t}), we get |(f(x)+σϵt)f^n(x;y)|4C1Sn(m1),|(f(x)+\sigma\epsilon_{t})-\widehat{f}_{n}(x;y)|\leq 4C_{1}S_{n}(m^{-1}), whence

|f(x)f^n(x;y)|σ|ϵt|+4C1Sn(m1)σΘ(n)+4C1σωlnnC2Θ(n)|f(x)-\widehat{f}_{n}(x;y)|\leq\sigma|\epsilon_{t}|+4C_{1}S_{n}(m^{-1})\leq\sigma\Theta_{(n)}+4C_{1}\sigma\omega\sqrt{\ln n}\leq C_{2}\Theta_{(n)}

(recall that we are in the situation ϵΞn\epsilon\not\in\Xi_{n}, whence ωlnnΘ(n)\omega\sqrt{\ln n}\leq\Theta_{(n)}). We have arrived at (24). ∎

Now we are ready to complete the proof. Assume that (21) takes place, and let us fix qq, 2k+ddpq<{{2k+d}\over d}p\leq q<\infty.
20. Let us denote σ^n=σlnnn\widehat{\sigma}_{n}=\sigma\sqrt{{{\ln n}\over n}}. Note that for every xΓnBγx\in\Gamma_{n}\cap B_{\gamma} either

h(x)=(1γ)D(B),h(x)=(1-\gamma)D(B),

or

h(x)=(σ^nP1Ω(f,Bh(x)(x)))2p2kp+(p2)d,h(x)=\left({{\widehat{\sigma}_{n}}\over{P_{1}\Omega(f,B_{h(x)}(x))}}\right)^{{2p\over{2kp+(p-2)d}}},

what means that

P1hkd/p(x)Ω(f,Bh(x)(x))=Sn(h(x)).P_{1}h^{k-d/p}(x)\Omega(f,B_{h(x)}(x))=S_{n}(h(x)). (26)

Let U,VU,V be the sets of those xBγnΓnBγx\in B_{\gamma}^{n}\equiv\Gamma_{n}\cap B_{\gamma} for which the first or, respectively, the second of this possibilities takes place. If VV is nonempty, let us partition it as follows.
1) We can choose x1Vx_{1}\in V (VV is finite!) such that h(x)h(x1)xV.h(x)\geq h(x_{1})\quad\forall x\in V. After x1x_{1} is chosen, we set V1={xVBh(x)(x)Bh(x1)(x1)}.V_{1}=\{x\in V\mid\,B_{h(x)}(x)\cap B_{h(x_{1})}(x_{1})\neq\emptyset\}.
2) If the set V\V1V\backslash V_{1} is nonempty, we apply the construction from 1) to this set, thus getting x2V\V1x_{2}\in V\backslash V_{1} such that h(x)h(x2)xV\V1,h(x)\geq h(x_{2})\quad\forall x\in V\backslash V_{1}, and set V2={xV\V1Bh(x)(x)Bh(x2)(x2)}.V_{2}=\{x\in V\backslash V_{1}\mid\,B_{h(x)}(x)\cap B_{h(x_{2})}(x_{2})\neq\emptyset\}. If the set V\(V1V2)V\backslash(V_{1}\cup V_{2}) still is nonempty, we apply the same construction to this set, thus getting x3x_{3} and V3V_{3}, and so on.
The outlined process clearly terminates after certain step (since VV is finite). On termination, we get a collection of MM points x1,,xMVx_{1},...,x_{M}\in V and a partition V=V1V2VMV=V_{1}\cup V_{2}\cup...\cup V_{M} with the following properties:

  1. 1.

    The cubes Bh(x1)(x1),,Bh(xM)(xM)B_{h(x_{1})}(x_{1}),...,B_{h(x_{M})}(x_{M}) are mutually disjoint;

  2. 2.

    For every M\ell\leq M and every xVx\in V_{\ell} we have h(x)h(x) and Bh(x)(x)Bh(x)(x).h(x)\geq h(x_{\ell})\hbox{\ and\ }B_{h(x)}(x)\cap B_{h(x_{\ell})}(x_{\ell})\neq\emptyset.
    We claim that also

  3. 3.

    For every M\ell\leq M and every xVx\in V_{\ell} one has

    h(x)max[h(x);xx].h(x)\geq\max\left[h(x_{\ell});\|x-x_{\ell}\|_{\infty}\right]. (27)

Indeed, h(x)h(x)h(x)\geq h(x_{\ell}) by (ii), so that it suffices to verify (27) in the case when xxh(x)\|x-x_{\ell}\|_{\infty}\geq h(x_{\ell}). Since Bh(x)(x)B_{h(x)}(x) intersects Bh(x)(x)B_{h(x_{\ell})}(x_{\ell}), we have

xx12(h(x)+h(x)).\|x-x_{\ell}\|_{\infty}\leq{1\over 2}(h(x)+h(x_{\ell})).

Whence

h(x)2xxh(x)xx,h(x)\geq 2\|x-x_{\ell}\|_{\infty}-h(x_{\ell})\geq\|x-x_{\ell}\|_{\infty},

which is what we need.
30. Let us set Bγn=ΓnBγB_{\gamma}^{n}=\Gamma_{n}\cap B_{\gamma}. Assume that ϵΞn\epsilon\in\Xi_{n}. When substituting h(x)=(1γ)[D(B)]h(x)=(1-\gamma)[D(B)] for xUx\in U, we have by (24):

|f^n(;y)f()|q,BγqC2qmdqxBγnSnq(h(x))\displaystyle\left|\widehat{f}_{n}(\cdot;y)-f(\cdot)\right|_{q,B_{\gamma}}^{q}\leq C_{2}^{q}m^{-{d\over q}}{\displaystyle}{\sum\limits_{x\in B_{\gamma}^{n}}}S_{n}^{q}(h(x))
=\displaystyle= C2qmdqxUSnq(h(x))+C2qmdq=1MxVSnq(h(x))\displaystyle C_{2}^{q}m^{-{d\over q}}{\displaystyle}{\sum\limits_{x\in U}}S_{n}^{q}(h(x))+C_{2}^{q}m^{-{d\over q}}{\displaystyle}{\sum\limits_{\ell=1}^{M}\sum\limits_{x\in V_{\ell}}}S_{n}^{q}(h(x))
=\displaystyle= C2qmdqxU[σ^n((1γ)D(B))d/2]q+C2qmdq=1MxVSnq(h(x))\displaystyle C_{2}^{q}m^{-{d\over q}}{\displaystyle}{\sum\limits_{x\in U}\left[{{\widehat{\sigma}_{n}}\over{((1-\gamma)D(B))^{d/2}}}\right]^{q}}+C_{2}^{q}m^{-{d\over q}}{\displaystyle}{\sum\limits_{\ell=1}^{M}\sum\limits_{x\in V_{\ell}}S_{n}^{q}(h(x))}
[by (27)] \displaystyle\leq C3qσ^nqmdq=1MxV(max[h(x),xx])dq2+C3qσ^nq[D(B)]d(2q)2\displaystyle C_{3}^{q}\widehat{\sigma}_{n}^{q}m^{-{d\over q}}{\displaystyle}{\sum\limits_{\ell=1}^{M}\sum\limits_{x\in V_{\ell}}\left(\max\left[h(x_{\ell}),\|x-x_{\ell}\|_{\infty}\right]\right)^{-{dq\over 2}}}+C_{3}^{q}\widehat{\sigma}_{n}^{q}[D(B)]^{{d(2-q)\over 2}}
\displaystyle\leq C4qσ^nq=1M(max[h(x),xx])dq2𝑑x+C3qσ^nq[D(B)]d(2q)2\displaystyle C_{4}^{q}\widehat{\sigma}_{n}^{q}{\displaystyle}{\sum\limits_{\ell=1}^{M}\int\left(\max\left[h(x_{\ell}),\|x-x_{\ell}\|_{\infty}\right]\right)^{-{dq\over 2}}}dx+C_{3}^{q}\widehat{\sigma}_{n}^{q}[D(B)]^{{d(2-q)\over 2}}
\displaystyle\leq C5qσ^nq=1M0rd1(max[h(x),r])dq2𝑑r+C3qσ^nqD[D(B)]d(2q)2,\displaystyle C_{5}^{q}\widehat{\sigma}_{n}^{q}{\displaystyle}{\sum\limits_{\ell=1}^{M}\int\limits_{0}^{\infty}r^{d-1}\left(\max\left[h(x_{\ell}),r\right]\right)^{-{dq\over 2}}dr}+C_{3}^{q}\widehat{\sigma}_{n}^{q}D[D(B)]^{{d(2-q)\over 2}},

due to h(x)m1h(x)\geq m^{-1}, see (23). Further, note that

dq2d+12k+d2pd+1d2/2+1{dq\over 2}-d+1\geq{{2k+d}\over 2}p-d+1\geq d^{2}/2+1

in view of q2k+ddpq\geq{{2k+d}\over d}p, k1k\geq 1 and p>dp>d, and

|f^n(;y)f()|q,BγqC6qσ^nq=1M[h(x)]d(2q)2+C3qσ^nq[D(B)]d(2q)2\displaystyle\left|\widehat{f}_{n}(\cdot;y)-f(\cdot)\right|_{q,B_{\gamma}}^{q}\leq C_{6}^{q}\widehat{\sigma}_{n}^{q}{\displaystyle}{\sum\limits_{\ell=1}^{M}\left[h(x_{\ell})\right]^{{d(2-q)\over 2}}}+C_{3}^{q}\widehat{\sigma}_{n}^{q}[D(B)]^{{d(2-q)\over 2}}
[by (26]] =\displaystyle= C6qσ^nq=1M[σ^nP1Ω(f,Bh(x)(x))]d(2q)2k2d/p+d+C3qσ^nq[D(B)]d(2q)2\displaystyle C_{6}^{q}\widehat{\sigma}_{n}^{q}{\displaystyle}{\sum\limits_{\ell=1}^{M}}\left[{{\widehat{\sigma}_{n}}\over{P_{1}\Omega(f,B_{h(x_{\ell})}(x_{\ell}))}}\right]^{{d(2-q)\over 2k-2d/p+d}}+C_{3}^{q}\widehat{\sigma}_{n}^{q}[D(B)]^{{d(2-q)\over 2}}
=\displaystyle= C3qσ^nq[D(B)]d(2q)2+C6qσ^n2β(p,k,d,q)q=1M[P1Ω(f,Bh(x)(x))]d(q2)2k2d/p+d\displaystyle C_{3}^{q}\widehat{\sigma}_{n}^{q}[D(B)]^{{d(2-q)\over 2}}+C_{6}^{q}\widehat{\sigma}_{n}^{2\beta(p,k,d,q)q}{\displaystyle}{\sum\limits_{\ell=1}^{M}}\left[P_{1}\Omega(f,B_{h(x_{\ell})}(x_{\ell}))\right]^{{d(q-2)\over 2k-2d/p+d}}

by definition of β(p,k,d,q)\beta(p,k,d,q).
Now note that d(q2)2k2d/p+dp{{d(q-2)}\over{2k-2d/p+d}}\geq p in view of q2k+ddpq\geq{{2k+d}\over d}p, so that

=1M[P1Ω(f,Bh(x)(x))]d(q2)2k2d/p+d\displaystyle\sum\limits_{\ell=1}^{M}\left[P_{1}\Omega(f,B_{h(x_{\ell})}(x_{\ell}))\right]^{{{d(q-2)}\over{2k-2d/p+d}}} \displaystyle\leq [=1M(P1Ω(f,Bh(x)(x)))p]dq2dp(2k2d/p+d)\displaystyle\left[\sum\limits_{\ell=1}^{M}\left(P_{1}\Omega(f,B_{h(x_{\ell})}(x_{\ell}))\right)^{p}\right]^{{{dq-2d}\over{p(2k-2d/p+d)}}}
\displaystyle\leq [P1pRp]d(q2)p(2k2d/p+d)\displaystyle\left[P_{1}^{p}R^{p}\right]^{{{d(q-2)}\over{p(2k-2d/p+d)}}}

(see (20) and take into account that the cubes Bh(x)(x)B_{h(x_{\ell})}(x_{\ell}), =1,,M\ell=1,...,M, are mutually disjoint by (i)). We conclude that for ϵΞn\epsilon\in\Xi_{n}

|f^n(;yf(ϵ))f()|q,Bγ\displaystyle\left|\widehat{f}_{n}(\cdot;y_{f}(\epsilon))-f(\cdot)\right|_{q,B_{\gamma}} \displaystyle\leq C7σ^n[D(B)]d(2/q1)2+P2σ^n2β(p,k,d,q)Rd(12/q)2k2d/p+d\displaystyle C_{7}\widehat{\sigma}_{n}[D(B)]^{{d(2/q-1)\over 2}}+P_{2}\widehat{\sigma}_{n}^{2\beta(p,k,d,q)}R^{{{d(1-2/q)}\over{2k-2d/p+d}}} (28)
=\displaystyle= C7σ^n[D(B)]d(2/q1)2+P2R(σ^nR)2β(p,k,d,q).\displaystyle C_{7}\widehat{\sigma}_{n}[D(B)]^{{d(2/q-1)\over 2}}+P_{2}R\left({{\widehat{\sigma}_{n}}\over R}\right)^{2\beta(p,k,d,q)}.

40. Now assume that ϵΞn\epsilon\not\in\Xi_{n}. In this case, by (24),

|f^n(x;y)f(x)|C2σΘ(n)xBγn.|\widehat{f}_{n}(x;y)-f(x)|\leq C_{2}\sigma\Theta_{(n)}\quad\forall x\in B_{\gamma}^{n}.

Hence, taking into account that mD(B)1mD(B)\geq 1,

|f^n(;y)f()|q,BγC2σΘ(n)[D(B)]dq.\left|\widehat{f}_{n}(\cdot;y)-f(\cdot)\right|_{q,B_{\gamma}}\leq C_{2}\sigma\Theta_{(n)}[D(B)]^{{d\over q}}. (29)

50. When combining (28) and (29), we get

(E{f^n(;y)f()q,Bγ2})1/2C8max[σ^n[D(B)]d(2/q1)2;P4R(σ^nR)2β(p,k,d,q);J],\left(E\left\{\|\widehat{f}_{n}(\cdot;y)-f(\cdot)\|_{q,B_{\gamma}}^{2}\right\}\right)^{1/2}\\ \leq C_{8}\max\left[\widehat{\sigma}_{n}[D(B)]^{{{d(2/q-1)}\over 2}};\,P_{4}R\left({{\widehat{\sigma}_{n}}\over{R}}\right)^{2\beta(p,k,d,q)};\;J\right],

where

J2\displaystyle J^{2} =\displaystyle= E{𝟏{ϵΞn}C2σ2Θ(n)2}C22σ2P1/2{ϵΞn}(E{Θ(n)4})1/2\displaystyle E\left\{\mathbf{1}\{\epsilon\not\in\Xi_{n}\}C_{2}\sigma^{2}\Theta_{(n)}^{2}\right\}\leq C^{2}_{2}\sigma^{2}P^{{1/2}}\{\epsilon\not\in\Xi_{n}\}\left(E\left\{\Theta_{(n)}^{4}\right\}\right)^{{1/2}}
\displaystyle\leq C9σ2n2(μ+1)lnn\displaystyle C_{9}\sigma^{2}n^{-2(\mu+1)}{\ln n}

(we have used (5) and (8)). Thus, when (21) holds, for all d<p<d<p<\infty and all qq, 2k+ddpq<{{2k+d}\over d}p\leq q<\infty we have

(E{f^n(;y)f()q,Bγ2})1/2\displaystyle\left(E\left\{\|\widehat{f}_{n}(\cdot;y)-f(\cdot)\|_{q,B_{\gamma}}^{2}\right\}\right)^{1/2} (30)
\displaystyle\leq C8max[σ^n[D(B)]d(2/q1)2,P4R(σ^nR)2β(p,k,d,q),C91/2σlnnn(μ+1)].\displaystyle C_{8}\max\left[\widehat{\sigma}_{n}[D(B)]^{{{d(2/q-1)}\over 2}},P_{4}R\left({{\widehat{\sigma}_{n}}\over{R}}\right)^{2\beta(p,k,d,q)},{C^{1/2}_{9}\sigma\sqrt{\ln n}\over n^{(\mu+1)}}\right].

Now it is easily seen that if P1P\geq 1 is a properly chosen function of μ,d,γ,p\mu,d,\gamma,p nonincreasing in p>dp>d and (16) takes place then

  1. 1.

    assumption (21) holds,

  2. 2.

    the right hand side in (30) does not exceed the quantity

    PR(σ^nR)2β(p,k,d,q)=PR(σ^nR)2β(p,k,d,q)[D(B)]dλ(p,k,d,q)PR\left({{\widehat{\sigma}_{n}}\over{R}}\right)^{2\beta(p,k,d,q)}=PR\left({{\widehat{\sigma}_{n}}\over{R}}\right)^{2\beta(p,k,d,q)}[D(B)]^{d\lambda(p,k,d,q)}

    (recall that q2k+ddpq\geq{{2k+d}\over d}p, so that λ(p,k,d,q)=0\lambda(p,k,d,q)=0).

We conclude the bound (3) for the case of d<p<d<p<\infty, >q2k+ddp\infty>q\geq{{2k+d}\over d}p. When passing to the limit as qq\to\infty, we get the desired bound for q=q=\infty as well.

Now let d<p<d<p<\infty and 1qq2k+ddp.1\leq q\leq q_{*}\equiv{{2k+d}\over d}p. By the Hölder inequality and in view of mD(B)1mD(B)\geq 1 we have

|g|q,BγC10|g|q,Bγ|Bγ|1q1q,\left|g\right|_{q,B_{\gamma}}\leq C_{10}\left|g\right|_{q_{*},B_{\gamma}}|B_{\gamma}|^{{1\over q}-{1\over{q_{*}}}},

and thus

𝐑^q(f^n;𝐅)C10𝐑^q(f^n;𝐅)[D(B)]d(1q1q).\widehat{{\mathbf{R}}}_{q}\left(\widehat{f}_{n};{\mathbf{F}}\right)\leq C_{10}\widehat{{\mathbf{R}}}_{q_{*}}\left(\widehat{f}_{n};{\mathbf{F}}\right)[D(B)]^{d\left({1\over q}-{1\over q_{*}}\right)}.

Combining this observation with the (already proved) bound (3) associated with q=qq=q_{*}, we see that (3) is valid for all q[1,]q\in[1,\infty], provided that d<p<d<p<\infty. Passing in the resulting bound to limit as pp\to\infty, we conclude the validity of (3) for all p(d,]p\in(d,\infty], q[1,]q\in[1,\infty]. ∎

References

  • [1] O.V. Besov, V.P. Il’in, and S.M. Nikol’ski. Integral representations of functions and embedding theorems. Moscow: Nauka Publishers, 1975 (in Russian).
  • [2] L. Birgé. Approximation dans les espaces métriques et théorie de l’estimation. Z. Wahrscheinlichkeitstheorie verw. Geb., 65:181–237, 1983.
  • [3] L. Birgé, P. Massart. From model selection to adaptive estimation. In: D. Pollard, E. Torgersen and G. Yang, Eds., Festschrift for Lucien Le Cam, Springer 1999, 55–89.
  • [4] E. Candés, D. Donoho. Ridgelets: a key to high-dimensional intermittency? Philos. Trans. Roy. Soc. London Ser. A 357:2495-2509, 1999.
  • [5] S. Chen, D.L. Donoho, M.A. Saunders. Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1):33-61, 1998.
  • [6] D. Donoho, I. Johnstone. Ideal spatial adaptation via wavelet shrinkage. Biometrika 81(3):425-455, 1994.
  • [7] D. Donoho, I. Johnstone. Minimax risk over p\ell_{p} balls for q\ell_{q} losses. Probab. Theory Related Fields 99:277-303, 1994.
  • [8] D. Donoho, I. Johnstone. Adapting to unknown smoothness via wavelet shrinkage. J. Amer. Statist. Assoc. 90(432):1200–1224, 1995.
  • [9] D. Donoho, I. Johnstone, G. Kerkyacharian, D. Picard. Wavelet shrinkage: Asymptopia? (with discussion and reply by the authors). J. Royal Statist. Soc. Series B 57(2):301–369, 1995.
  • [10] D. Donoho. Tight frames of kk-plane ridgelets and the problem of representing objects that are smooth away from dd-dimensional singularities in n{\mathbb{R}}^{n}. Proc. Natl. Acad. Sci. USA 96(5):1828-1833, 1999.
  • [11] D. Donoho. Wedgelets: nearly minimax estimation of edges. Ann. Statist. 27:859-897, 1999.
  • [12] D. Donoho. Orthonormal ridgelets and linear singularities. SIAM J. Math. Anal. 31:1062-1099, 2000.
  • [13] D. Donoho. Ridge functions and orthonormal ridgelets. J. Approx. Theory 111(2):143-179, 2001.
  • [14] D. Donoho. Curvelets and curvilinear integrals. J. Approx. Theory 113(1):59-90, 2001.
  • [15] D. Donoho. Sparse components of images and optimal atomic decompositions. Constr. Approx. 17:353-382, 2001.
  • [16] D. Donoho, X. Huo. Uncertainty principle and ideal atomic decomposition. IEEE Trans. on Information Theory 47(7):2845-2862, 2001.
  • [17] D. Donoho, X. Huo. Beamlets and multiscale image analysis. Lect. Comput. Sci. Eng. 20:149-196, Springer, 2002.
  • [18] M. Elad, A. Bruckstein. A generalized uncertainty principle and sparse representation in pairs of bases. IEEE Trans. on Information Theory (to appear)
  • [19] A. Goldenshluger, A. Nemirovski. On spatially adaptive estimation of nonparametric regression. Math. Methods of Statistics 6(2):135–170, 1997.
  • [20] A. Goldenshluger, A. Nemirovski. Adaptive de-noising of signals satisfying differential inequalities. IEEE Trans. on Information Theory 43, 1997.
  • [21] Yu. Golubev. Asymptotic minimax estimation of regression function in additive model. Problemy peredachi informatsii 28(2):3–15, 1992. (English transl. in Problems Inform. Transmission 28, 1992.)
  • [22] W. Härdle. Applied Nonparametric Regression, ES Monograph Series 19, Cambridge, U.K., Cambridge University Press, 1990.
  • [23] I. Ibragimov and R. Khasminskii. On nonparametric estimation of regression. Soviet Math. Dokl. 21:810–814, 1980.
  • [24] I. Ibragimov and R. Khasminskii. Statistical Estimation. Springer-Verlag, New York, 1981.
  • [25] A. Juditsky. Wavelet estimators: Adapting to unknown smoothness. Math. Methods of Statistics 6(1):1–25, 1997.
  • [26] A. Juditsky and A. Nemirovski. Nonparametric Denoising of Signals with Unknown Local Structure, I: Oracle Inequalities Accepted to Appl. Comp. Harm. Anal.
  • [27] A. Korostelev, A. Tsybakov. Minimax theory of image reconstruction. Lecture Notes in Statistics 82, Springer, New York, 1993.
  • [28] O. Lepskii. On a problem of adaptive estimation in Gaussian white noise. Theory of Probability and Its Applications 35(3):454–466, 1990.
  • [29] O. Lepskii. Asymptotically minimax adaptive estimation I: Upper bounds. Optimally adaptive estimates. Theory of Probability and Its Applications, 36(4):682–697, 1991.
  • [30] O. Lepskii, E. Mammen, V. Spokoiny. Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors. Ann. Statist. 25(3):929–947, 1997.
  • [31] A. Nemirovskii. On nonparametric estimation of smooth regression functions. Sov. J. Comput. Syst. Sci., 23(6):1–11, 1985.
  • [32] A. Nemirovski. On nonparametric estimation of functions satisfying differential inequalities. R. Khasminski, Ed. Advances in Soviet Mathematics 12:7–43, American Mathematical Society, 1992.
  • [33] A. Nemirovski. Topics in Non-parametric Statistics. In: M. Emery, A. Nemirovski, D. Voiculescu, Lectures on Probability Theory and Statistics, Ecole d’Eteé de Probabilités de Saint-Flour XXVII – 1998, Ed. P. Bernard. – Lecture Notes in Mathematics 1738:87–285.
  • [34] M. Pinsker. Optimal filtration of square-integrable signals in Gaussian noise. Problemy peredachi informatsii, 16(2):120–133. 1980. (English transl. in Problems Inform. Transmission 16, 1980.)
  • [35] M. Pinsker, S. Efroimovich. Learning algorithm for nonparametric filtering. Automation and Remote Control 45(11):1434–1440, 1984.
  • [36] M. Rosenblatt. Stochastic curve estimation. Institute of Mathematical Statistics, Hayward, California, 1991.
  • [37] J.-L. Starck, E. Candés, D. Donoho. The curvelet transform for image denoising. IEEE Trans. Image Process. 11(6):670-684, 2002.
  • [38] Ch. Stone. Optimal rates of convergence for nonparametric estimators. Annals of Statistics, 8(3):1348–1360, 1980.
  • [39] G. Wahba. Spline models for observational data. SIAM, Philadelphia, 1990.
True ImageObservationStandard recoveryAdaptive recovery\begin{array}[]{cc}\epsfbox{sig1.eps}\hfil&\epsfbox{obs1.eps}\hfil\\ \hbox{True Image}&\hbox{Observation}\\ \epsfbox{lio1.eps}\hfil&\epsfbox{rec1.eps}\hfil\\ \hbox{Standard recovery}&\hbox{Adaptive recovery}\end{array}

Figure 1: Recovery for ωmax=2.0\omega_{\max}=2.0
True ImageObservationStandard recoveryAdaptive recovery\begin{array}[]{cc}\epsfbox{sig2.eps}\hfil&\epsfbox{obs2.eps}\hfil\\ \hbox{True Image}&\hbox{Observation}\\ \epsfbox{lio2.eps}\hfil&\epsfbox{rec2.eps}\hfil\\ \hbox{Standard recovery}&\hbox{Adaptive recovery}\end{array}

Figure 2: Recovery for ωmax=8.0\omega_{\max}=8.0
True ImageObservationStandard recoveryAdaptive recovery\begin{array}[]{cc}\epsfbox{sig3.eps}\hfil&\epsfbox{obs3.eps}\hfil\\ \hbox{True Image}&\hbox{Observation}\\ \epsfbox{lio3.eps}\hfil&\epsfbox{rec3.eps}\hfil\\ \hbox{Standard recovery}&\hbox{Adaptive recovery}\end{array}

Figure 3: Recovery for ωmax=32.0\omega_{\max}=32.0
True ImageObservationStandard recoveryAdaptive recovery\begin{array}[]{cc}\epsfbox{sig5.eps}\hfil&\epsfbox{obs5.eps}\hfil\\ \hbox{True Image}&\hbox{Observation}\\ \epsfbox{lio5.eps}\hfil&\epsfbox{rec5.eps}\hfil\\ \hbox{Standard recovery}&\hbox{Adaptive recovery}\end{array}

Figure 4: Recovery for ωmax=128.0\omega_{\max}=128.0