This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Universal kernel-type estimation of random fields thanks: The study was supported by the program for fundamental scientific research of the Siberian Branch of the Russian Academy of Sciences, project FWNF-2022-0015.

Y.Y. Linke, I.S. Borisov, and P.S. Ruzankin Sobolev Institute of Mathematics, Novosibirsk, 630090 Russia.Novosibirsk State University, Novosibirsk, 630090 Russia.E-mails: linke@math.nsc.ru, sibam@math.nsc.ru, ruzankin@math.nsc.ru.
Abstract

Consistent weighted least square estimators are proposed for a wide class of nonparametric regression models with random regression function, where this real-valued random function of kk arguments is assumed to be continuous with probability 1. We obtain explicit upper bounds for the rate of uniform convergence in probability of the new estimators to the unobservable random regression function for both fixed or random designs. In contrast to the predecessors’ results, the bounds for the convergence are insensitive to the correlation structure of the kk-variate design points. As an application, we study the problem of estimating the mean and covariance functions of random fields with additive noise under dense data conditions. The theoretical results of the study are illustrated by simulation examples which show that the new estimators are more accurate in some cases than the Nadaraya–Watson ones. An example of processing real data on earthquakes in Japan in 2012–2021 is included.

Key words and phrases: nonparametric regression, uniform consistency, kernel-type estimator.

AMS subject classification: 62G08.

1. Introduction

We study the following regression model:

Yi=f(Xi)+ξi,i=1,,n,Y_{i}=f(X_{i})+\xi_{i},\qquad i=1,\ldots,n, (1)

where f(t)f(t), t:=(t(1),,t(k))Θ+kt:=(t^{(1)},\ldots,t^{(k)})\in\Theta\subset{\mathbb{R}}_{+}^{k}, k1k\geq 1, is an unknown real-valued random function (a random field). We assume that Θ\Theta is a compact set, the random field f(t)f(t) is continuous on Θ\Theta with probability 1, and the design {Xi;i=1,,n}\{X_{i};\,i=1,\ldots,n\} consists of a collection of observed random vectors with unknown (generally speaking) distributions, not necessarily independent or identically distributed. The random design points XiX_{i} may depend on nn, i.e., a triangular array scheme for the design can be considered within this model. In particular, this scheme includes regression models with fixed design. Moreover, we do not require that the random field f(t)f(t) be independent of the design {Xi}\{X_{i}\}.

Next, we will assume that the unobservable random errors {ξi}\{\xi_{i}\} (a noise) form a martingale difference sequence, with

Mp:=supi𝐄|ξi|p<for somep>kandp2.M_{p}:=\sup_{i}{\bf E}|\xi_{i}|^{p}<\infty\quad\mbox{for some}\,\,p>k\,\,\mbox{and}\,\,p\geq 2. (2)

We also assume that {ξi}\{\xi_{i}\} are independent of the collection {Xi}\{X_{i}\} and the random field f()f(\cdot). The noise {ξi}\{\xi_{i}\} may depend on nn.

Our goal is to construct consistent in C(Θ)C(\Theta) estimators for the random regression field f(t)f(t) by the observations {(Xi,Yi);in}\{(X_{i},Y_{i});\,i\leq n\} under minimal restrictions on the design points {Xi}\{X_{i}\}, where C(Θ)C(\Theta) denotes the space of all continuous functions on Θ\Theta with the uniform norm.

In the classical case of nonrandom f()f(\cdot), the most popular estimating procedures are based on kernel-type estimators. We emphasize among them the Nadaraya–Watson, the Priestley-Chao, the Gasser-Müller estimators, the local polynomial estimators, and some of their modifications (see Härdle, 1990; Wand and Jones, 1995; Fan and Gijbels, 1996; Fan and Yao, 2003; Loader, 1999; Young, 2017; Müller, 1988). We do not aspire for providing a comprehensive review of this actively developing (especially in the last two decades) area of nonparametric estimation, and will focus only on publications representing certain methodological areas. We are primarily interested in conditions on the design elements. In this regard, a large number of publications in this area may be tentatively divided into the two groups: the studies dealing with fixed design {Xi;in}\{X_{i};\,i\leq n\} or with random one. In papers dealing with a random design, as a rule, the design consists of independent identically distributed random variables or stationary observations satisfying known forms of dependence, e.g., various types of mixing conditions, association, Markov or martingale properties, etc. Not attempting to present a comprehensive review, we may note the papers by Kulik and Lorek (2011), Kulik and Wichelhaus (2011), Roussas (1990, 1991), Györfi et. al. (2002), Masry (2005), Hansen (2008), Honda (2010), Laib and Louani (2010), Li et al. (2016), Hong and Linton (2016), Shen and Xie (2013), Jiang and Mack (2002), Linton and Jacho-Chavez (2010), Chu and Deng (2003) (see also the references in the papers). Besides, in the recent studies by Gao et al. (2015), Wang and Chan (2014), Chan and Wang (2014), Linton and Wang (2016), Wang and Phillips (2009a,b), Karlsen et al. (2007), the authors considered nonstationary design sequences under special forms of dependence (Markov chains, autoregressions, sums of moving averages, and so on).

In the case of fixed design, the vast majority of papers make certain assumptions on the regularity of design (see Zhou and Zhu, 2020; Benelmadani et al., 2020; Tang et al., 2018; Gu et al., 2007; Benhenni et al., 2010; Müller and Prewitt, 1993; Ahmad and Lin, 1984; Georgiev, 1988, 1990). In univariate models, the nonrandom design points XiX_{i} are most often restricted by the formula Xi=g(i/n)+o(1/n)X_{i}=g(i/n)+o(1/n) with a function gg of bounded variation, where the error term o(1/n)o(1/n) is uniform in i=1,,ni=1,\ldots,n. If gg is linear, then the design is equidistant. Another regularity condition in the univariate case is the relation maxin(XiXi1)=O(1/n)\max\nolimits_{i\leq n}(X_{i}-X_{i-1})=O(1/n), where the design elements are arranged in increasing order. In a number of recent studies, a more general condition maxin(XiXi1)0\max\nolimits_{i\leq n}(X_{i}-X_{i-1})\to 0 can be found (e.g., see Yang and Yang, 2016; He, 2019; Wu et al., 2020). In several works, including those dealing with the so-called weighted estimators, certain conditions are imposed on the behavior of functions of design elements, but meaningful corresponding examples are limited to cases of regular design (e.g., see Zhang et al., 2019 ; Zhang et al., 2018; Liang and Jing, 2005; Roussas et al., 1992; Georgiev, 1988).

The problem of uniform approximation of the kernel-type estimators has been studied by many authors (e.g., see Einmahl and Mason, 2005; Hansen, 2008; Gu et al., 2007; Shen and Xie, 2013; Li et al., 2016; Liang and Jing, 2005; Wang and Chan, 2014; Chan and Wang, 2014; Gao et al., 2015 and the references therein).

In this paper, we study a class of kernel-type estimators, asymptotic properties of which do not depend on the design correlation structure. The design may be fixed (and not necessarily regularly spaced) or random (with not necessarily weakly dependent components). We present weighted least square estimators where the weights are chosen as the Lebesgue measures of the elements of a finite random partition of the regression function domain Θ\Theta such that every partition element corresponds to one design point. As a result, the proposed kernel estimators for the regression function are transformation of sums of weighted observations in a certain way with the structure of multiple Riemann integral sums, so that conceptually our approach is close to the methods of Priestley and Chao (1972) and of Mack and Müller (1988), who considered the cases of univariate fixed design and i.i.d. random design, respectively. Explicit upper bounds are obtained for the rate of uniform convergence of these estimators to the random regression field.

In contrast to the predecessors’ results, we do not impose any restrictions on the design correlation structure. We will consider the maximum cell diameter of the above-mentioned partition of Θ\Theta generated by the design elements, as the main characteristic of the design. Sufficient conditions for the consistency of the new estimators, as well as the windows’ widths will be derived in terms of that characteristic. The advantage of that characteristic over the classical weak dependence conditions is that the characteristic is insensitive to forms of correlation of the design elements. The main condition will be that the maximum cell diameter tends to zero in probability as the sample size grows. Note that such requirement is, in fact, necessary, since only when the design densely fills the regression function domain, it is possible to reconstruct the function more or less precisely.

Univariate versions of this estimation problem were studied in Borisov et al. (2021) and Linke et al. (2022) where the asymptotic analysis and simulations showed that the proposed estimators perform better than the Nadaraya–Watson ones in several cases. Note that the univariate case in Borisov et al. (2021) does not allow direct generalization to a multivariate case, since the weights were defined there as the spacings of the variational series generated by the design elements. Note also that the estimator in Borisov et al. (2021) is a particular univariate case of the estimators proposed in this paper, but not the only one. One of the univariate estimators studied here may be more accurate than the estimator in Borisov et al. (2021) (see Remark 3 below). Conditions on the design elements similar to those of this paper were used Linke and Borisov (2022), and in Linke (2023). The conditions provide uniform consistency of the estimators, but guarantee only pointwise consistency of the Nadarya–Watson ones. Besides, similar restrictions on the design elements were used before in Linke and Borisov (2017, 2018), and Linke (2019) in estimation of the parameters of several nonlinear regression models.

In this paper, we will assume that the unknown random regression function f(t)f(t), tΘt\in\Theta, is continuous with probability 1. Considering the general case of random regression function allows us to obtain results on estimating the mean function of a random regression process. In regard to estimating random regression functions, we may note the papers by Li and Hsing (2010), Hall et al. (2006), Zhou et al. (2018), Zhang and Wang (2016, 2018), Yao et al. (2005), Zhang and Chen (2007), Yao (2007), Lin and Wang (2022). In those papers, the mean and covariance functions of the random regression process f(t)f(t) were estimated when, for independent noisy copies of the random process, each of the trajectories was observed in a certain subset of design elements (nonuniform random time grid). Estimation of mean and covariance of random processes is an actively developing area of nonparametric estimation, especially in the last couple of decades, is of independent interest, and plays an important role in subsequent analyses (e.g., see Hsing and Eubank, 2015; Li and Hsing, 2010; Zhang and Wang, 2016; Wang et al., 2016).

Estimation of random regression functions usually deals with either random or deterministic design. In the case of random design, it is usually assumed that the design elements are independent identically distributed (e.g., see Hall et al., 2006; Li and Hsing, 2010; Zhou et al., 2018; Yao , 2007; Yao et al., 2005; Zhang and Chen, 2007; Zhang and Wang, 2016, 2018; Lin and Wang, 2022). Some authors emphasized that their results can be extended to weakly dependent design (e.g., see Hall et al., 2006). For deterministic time grids, regularity conditions are often required (e.g., see Song et al., 2014; Hall et al., 2006). In regard to denseness of filling the regression function domain, the two types of design are distinguished in the literature: either the design is “sparse”, e.g., the number of design elements in each series is uniformly limited (Hall et al., 2006; Zhou et al., 2018; Li and Hsing, 2010), or the design is “dense” and the number of elements in a series increases with the sequential number of the series (Zhou et al., 2018; Li and Hsing, 2010). Uniform consistency of several estimators of the mean of random regression function was considered, for example, by Yao et al. (2005), Zhou et al. (2018), Li and Hsing (2010), Hsing and Eubank (2015), Zhang and Wang (2016).

In this paper, we consider one of the variants of estimation of the mean of a random regression function as an application of the main result. In the case of dense design, uniformly consistent estimators are constructed for the mean function, when the series-to-series-independent design is arbitrarily correlated inside each series. We require only that, in each series, the design elements form a refining partition of the domain of the random regression function. Our settings also include a general deterministic design situation, but we do not impose traditionally used regularity conditions. Thus, in the problem of estimating the mean function, as well as in the problem of estimating the function in model (1), we weaken traditional conditions on the design elements. Note that methodologies used for estimating the mean function for dense and for sparse data usually differ (e.g., see Wang et al., 2016). In the case of growing number of observations in each series, it is natural to preliminarily evaluate the trajectory of the random regression function in each series and then average the estimates over all series (e.g., see Hall et al., 2006). That is what we will do in this paper following this generally accepted approach. Universal estimates both for the mean and covariance functions of a random process in the case of sparse data, insensitive to the nature of the dependence of design elements, are proposed in Linke and Borisov (2024).

This paper has the following structure. Section 2 contains the main results on the rate of uniform convergence of the new estimators to the random regression function. In Section 3, we consider an application of the main results to the problem of estimating the mean and covariance function of a random regression field. In Section 4, the asymptotic normality of the new estimators is discussed. Section 5 contains several simulation examples. In Section 6, we discuss an example of assessing real data on earthquakes in Japan in 2012–2021. In Section 7, we summarize the main results of the paper. The proofs of the theorems and lemmas from Sections 2–4 are contained in Section 8.

2. Main assumptions and results

Without loss of generality we will assume that d(Θ0)1d(\Theta\cup 0)\leq 1, where

d(A):=supx,yAxyd(A):=\sup_{x,y\in A}\|x-y\|

is the diameter of a set AA and \|\cdot\| is the supnorm in k{\mathbb{R}}^{k}. In what follows, unless otherwise stated, all the limits will be taken as nn\to\infty.

Our approach recalls a construction of multivariate Riemann integrals. To this end, we need the following condition on the design {Xi}\{X_{i}\}.

(𝐃)({\bf D}) For each nn, there exists a random partition of the set Θ\Theta into nn Borel-measurable subsets {Δi;i=1,,n}\{\Delta_{i};\;i=1,\ldots,n\} such that δn:=maxind(ΔiXi)0\delta_{n}:=\max_{i\leq n}d(\Delta_{i}\cup X_{i})\to 0 in probability.

Condition (D) means that, for every nn, the set {Xi;in}\{X_{i};\,i\leq n\} forms a δn\delta_{n}-net in the compact set Θ\Theta. In particular, Condition (D) is satisfied if the design points {Xi}\{X_{i}\} are pairwise distinct, XiΔiX_{i}\in\Delta_{i} for all ini\leq n, and maxind(Δi)0\max_{i\leq n}d(\Delta_{i})\to 0 in probability.

In the case Θ=[0,1]k\Theta=[0,1]^{k}, a regularly spaced design satisfies Condition (D). Moreover, if {Xi;i1}\{X_{i};\,i\geq 1\} is a stationary sequence satisfying an α\alpha-mixing condition and [0,1]k[0,1]^{k} is the support of the distribution of X1X_{1}, then Condition (D)(D) is fulfilled (see Remark 3 in Linke and Borisov, 2017). It is not hard to verify that, for i.i.d. design points with the probability density function of X1X_{1} bounded away from zero on [0,1]k[0,1]^{k}, one can have δn=O(lognn1/k)\delta_{n}=O\left(\frac{\log n}{n^{1/k}}\right) with probability 1. Notice that the dependence of random variables {Xi}\{X_{i}\} in Condition (D)(D) may be much stronger than that in these examples (see Linke and Borisov, 2017, 2018 and the example below).

Example. Let a sequence of bivariate random variables {Xi;i1}\{{X}_{i};\,i\geq 1\} is defined by the relation

Xi=νiU1i+(1νi)U2i,{X}_{i}=\nu_{i}{U}_{1i}+(1-\nu_{i}){U}_{2i}, (3)

where the random vectors {U1i}\{{U}_{1i}\} and {U2i}\{{U}_{2i}\} are independent and uniformly distributed on the rectangles [0,1/2]×[0,1][0,1/2]\times[0,1] and [1/2,1]×[0,1][1/2,1]\times[0,1], respectively, while the sequence {νi}\{\nu_{i}\} does not depend on {U1i}\{{U}_{1i}\} and {U2i}\{{U}_{2i}\} and consists of Bernoulli random variables with success probability 1/21/2, i.e., the distribution of XiX_{i} is the equi-weighted mixture of the two above-mentioned uniform distributions. The dependence between the random variables {νi}\{\nu_{i}\} is defined by the equalities ν2i1=ν1\nu_{2i-1}=\nu_{1} and ν2i=1ν1\nu_{2i}=1-\nu_{1}. In this case, the random variables {Xi;i1}\{{X}_{i};\,i\geq 1\} in (3) form a stationary sequence of random variables uniformly distributed on the unit square [0,1]×[0,1][0,1]\times[0,1], but, say, all known mixing conditions are not satisfied here because, for all natural mm and nn,

(X2m[0,1/2]×[0,1],X2n1[0,1/2]×[0,1])=0.\displaystyle{\mathbb{P}}\big{(}{X}_{2m}\in[0,1/2]\times[0,1],\,{X}_{2n-1}\in[0,1/2]\times[0,1]\big{)}=0.

On the other hand, it is easy to check that the stationary sequence {Xi}\{{X}_{i}\} satisfies the Glivenko–Cantelli theorem. This means that, for any fixed h>0h>0,

#{i:tXih, 1in}4h2n\#\{i:\,\|{t}-{X}_{i}\|\leq h,\,1\leq i\leq n\}\sim 4h^{2}n

almost surely uniformly in tt, where #\# denotes the standard counting measure. In other words, the sequence {Xi}\{{X}_{i}\} satisfies Condition (D)(D).

It is clear that, according to the scheme of this example, one can construct various sequences of dependent random variables uniformly distributed on [0,1]×[0,1][0,1]\times[0,1], based on the choice of different sequences of the Bernoulli switches with the conditions νjk=1\nu_{j_{k}}=1 and νlk=0\nu_{l_{k}}=0 for infinitely many indices {jk}\{j_{k}\} and {lk}\{l_{k}\}, respectively. In this case, Condition (D)(D) will also be satisfied. But the corresponding sequence {Xi}\{{X}_{i}\} (not necessarily stationary) may not satisfy the strong law of large numbers. For example, a similar situation occurs when νj=1ν1\nu_{j}=1-\nu_{1} for j=22k1,,22k1j=2^{2k-1},\ldots,2^{2k}-1 and νj=ν1\nu_{j}=\nu_{1} for j=22k,,22k+11j=2^{2k},\ldots,2^{2k+1}-1, where k=1,2,k=1,2,\ldots (i.e., we randomly choose one of the two rectangles [0,1/2]×[0,1][0,1/2]\times[0,1] and [1/2,1]×[0,1][1/2,1]\times[0,1], into which we randomly throw the first point, and then alternate the selection of one of the two rectangles by the following numbers of elements of the sequence: 11, 22, 222^{2}, 232^{3}, etc.). Indeed, we can introduce the notation nk=22k1n_{k}=2^{2k}-1, n~k=22k+11\tilde{n}_{k}=2^{2k+1}-1, Sm=i=1mXi(1)S_{m}=\sum\nolimits_{i=1}^{m}X_{i}^{(1)}, with Xi=(Xi(1),Xi(2)){X}_{i}=(X_{i}^{(1)},X_{i}^{(2)}), and note that, for all outcomes consisting the event {ν1=1}\{\nu_{1}=1\}, one has

Snknk=1nkiN1,kU1i(1)+1nkiN2,kU2i(1),\frac{S_{n_{k}}}{n_{k}}=\frac{1}{n_{k}}\sum\limits_{i\in N_{1,k}}U_{1i}^{(1)}+\frac{1}{n_{k}}\sum\limits_{i\in N_{2,k}}U_{2i}^{(1)}, (4)

where Uji=(Uji(1),Uji(2)){U}_{ji}=(U_{ji}^{(1)},U_{ji}^{(2)}), j=1,2j=1,2; N1,kN_{1,k} and N2,kN_{2,k} are the collections of indices for which the observations {Xi,ink}\{X_{i},i\leq n_{k}\} lie in the rectangles [0,1/2]×[0,1][0,1/2]\times[0,1] or [1/2,1]×[0,1][1/2,1]\times[0,1], respectively. It is easy to see that #(N1,k)=nk/3\#(N_{1,k})=n_{k}/3 and #(N2,k)=2#(N1,k)\#(N_{2,k})=2\#(N_{1,k}). Hence, Snk/nk7/12{S_{n_{k}}}/{n_{k}}\to{7}/{12} almost surely as kk\to\infty due to the strong law of large numbers for the sequences {U1i(1)}\{U_{1i}^{(1)}\} and {u2i(1)}\{u_{2i}^{(1)}\}. On the other hand, for all elementary outcomes in the event {ν1=1}\{\nu_{1}=1\}, as kk\to\infty, we have with probability 1

Sn~kn~k=1n~kiN~1,kU1i(1)+1n~kiN~2,kU2i(1)512,\frac{S_{\tilde{n}_{k}}}{\tilde{n}_{k}}=\frac{1}{\tilde{n}_{k}}\sum\limits_{i\in\tilde{N}_{1,k}}U_{1i}^{(1)}+\frac{1}{\tilde{n}_{k}}\sum\limits_{i\in\tilde{N}_{2,k}}U_{2i}^{(1)}\to\frac{5}{12}, (5)

where N~1,k\tilde{N}_{1,k} and N~2,k\tilde{N}_{2,k} are the collections of indices for which the observations {Xi,in~k}\{{X}_{i},i\leq\tilde{n}_{k}\} lie the rectangles [0,1/2]×[0,1][0,1/2]\times[0,1] or [1/2,1]×[0,1][1/2,1]\times[0,1], respectively. In proving the convergence in (5) we took into account that #(N~1,k)=(22k+21)/3\#(\tilde{N}_{1,k})=(2^{2k+2}-1)/3, #(N~2,k)=2nk/3\#(\tilde{N}_{2,k})=2n_{k}/3, i.e., #(N~1,k)=2#(N~2,k)+1\#(\tilde{N}_{1,k})=2\#(\tilde{N}_{2,k})+1.

Similar arguments are valid for elementary outcomes consisting the event {ν1=0}\{\nu_{1}=0\}. \hfill\Box

In what follows, by K(s)K(s), sks\in\mathbb{R}^{k}, we will denote the kernel function. We assume that the kernel function is zero outside [1,1]k[-1,1]^{k} and is a centrally symmetric probability density function, i.e., K(s)0K(s)\geq 0, K(s)=K(s)K(s)=K(-s) for all s[1,1]ks\in[-1,1]^{k}, and [1,1]kK(s)𝑑s=1\int_{[-1,1]^{k}}K(s)ds=1. For example, we may consider product-kernels of the form

K(s)=j=1kKo(s(j)),K(s)=\prod\limits_{j=1}^{k}K_{o}(s^{(j)}),

where Ko()K_{o}(\cdot) is a univariate symmetric probability density function with support [1,1][-1,1]. We also assume that the function K(s)K(s) satisfies the Lipschitz condition with constant L1L\geq 1:

|K(x)K(y)|L(|x(1)y(1)|++|x(k)y(k)|)|K(x)-K(y)|\leq L\left(|x^{(1)}-y^{(1)}|+\cdots+|x^{(k)}-y^{(k)}|\right)

for all x=(x(1),,x(k))x=(x^{(1)},\dots,x^{(k)}) and y=(y(1),,y(k))y=(y^{(1)},\dots,y^{(k)}), and put K(y)=0K(y)=0 for all yy such that y>1\|y\|>1. Notice that, under these restrictions, supsK(s)L\sup_{s}K(s)\leq L.

Put

Kε(s):=εkK(ε1s).K_{\varepsilon}(s):=\varepsilon^{-k}K(\varepsilon^{-1}s).

By θε\theta_{\varepsilon} we denote a random vector with the density Kε(t)K_{\varepsilon}(t), which is independent of the random variables {Yi}\{Y_{i}\}.

Let Λ()\Lambda(\cdot) denote the Lebesgue measure in k{\mathbb{R}}^{k}. Introduce the following notation:

fn,ε(t):=i=1nYiKε(tXi)Λ(Δi)i=1nKε(tXi)Λ(Δi),\displaystyle f^{*}_{n,\varepsilon}(t):=\frac{\sum_{i=1}^{n}Y_{i}K_{\varepsilon}(t-X_{i})\Lambda(\Delta_{i})}{\sum_{i=1}^{n}K_{\varepsilon}(t-X_{i})\Lambda(\Delta_{i})}, (6)

where 0/0=00/0=0 by definition;

Jε(t):=ΘKε(tx)Λ(dx)𝐏(tθεΘ),tΘ;\displaystyle J_{\varepsilon}(t):=\int\limits_{\Theta}K_{\varepsilon}(t-x)\,\Lambda(dx)\equiv{\bf P}(t-\theta_{\varepsilon}\in{\Theta}),\quad t\in\Theta; (7)
ωf(ε):=supx,yΘ:xyε|f(x)f(y)|.\omega_{f}(\varepsilon):=\sup_{x,y\in\Theta:\,\|x-y\|\leq\varepsilon}|f(x)-f(y)|.

Now, notice that

fn,ε(t)=argminzi=1n(Yiz)2Kε(tXi)Λ(Δi),{f^{*}_{n,\varepsilon}}(t)={\rm arg}\min\limits_{z\in\mathbb{R}}\sum\limits^{n}_{i=1}(Y_{i}-z)^{2}K_{\varepsilon}(t-X_{i})\Lambda(\Delta_{i}),

i.e., the estimators of the form (6) belong to the class of weighted least square estimators. Estimators (6) are also called local constant ones.

Finally, we will assume that there exist constants ρ>0\rho>0 and ε0(0,1]\varepsilon_{0}\in(0,1] such that

Jε(t)ρ for all tΘand positiveεε0.J_{\varepsilon}(t)\geq\rho\ \mbox{ for all }\ t\in\Theta\,\ \mbox{and positive}\,\,\varepsilon\leq\varepsilon_{0}. (8)

So, some cases (for example, if Θ\Theta contains isolated points) are excluded from the scheme under consideration.

R e m a r k  1.

Notice that if the set Θ\Theta can be represented as the union of hyperrectangles with the edges of lengths greater than ε0\varepsilon_{0} and kernel KK is a product-kernel, then we have the lower bound ρ2k\rho\geq 2^{-k}.

The main result of this paper is as follows.

Theorem 1. Let the conditions (D)(\rm D) and (8)(\ref{rho}) hold. Then, for any fixed ε(0,ε0]\varepsilon\in(0,\varepsilon_{0}],

suptΘ|fn,ε(t)f(t)|ωf(ε)+ζn(ε)\displaystyle\sup_{t\in\Theta}|f^{*}_{n,\varepsilon}(t)-f(t)|\leq\omega_{f}(\varepsilon)+\zeta_{n}(\varepsilon) (9)

with probability 11, where ζn(ε)\zeta_{n}(\varepsilon) is a positive random variable such that

𝐏(ζn(ε)>y)\displaystyle{\bf P}\left(\zeta_{n}(\varepsilon)>y\right) \displaystyle\leq G(k,p)ρpMpLp/2ypεk(p/2+1)𝐄(δnkp/2)\displaystyle G(k,p)\,\rho^{-p}\,M_{p}\,L^{p/2}\,y^{-p}\,\varepsilon^{-k(p/2+1)}\,{\bf E}(\delta_{n}^{kp/2})\, (10)
+𝐏(δn>εmin{1,ρ(k2k+1L)1}),\displaystyle+\ {\bf P}(\delta_{n}>\varepsilon\min\{1,\,\rho(k2^{k+1}L)^{-1}\}),

where

0<G(k,p)<(p1)p/22p(k+(3/2))(1+k2(pk)/(p+1)1)p+1.0<G(k,p)<(p-1)^{p/2}2^{p(k+(3/2))}\,\left(1+\frac{k}{2^{(p-k)/(p+1)}-1}\right)^{p+1}.

In what follows, we will denote by Op(ηn)O_{p}(\eta_{n}) some univariate random variables ζn\zeta_{n} such that, for all y>0y>0,

lim supn𝐏(|ζn|/ηn>y)β(y),\limsup_{n\to\infty}{\bf P}(|\zeta_{n}|/\eta_{n}>y)\leq\beta(y),

where {ηn}\{\eta_{n}\} is a sequence of positive nonrandom numbers, limyβ(y)=0\lim_{y\to\infty}\beta(y)=0, and the function β(y)\beta(y) does not depend on nn.

R e m a r k  2.

For example, let the function ff be nonrandom. In (10), put

y=(εk(p/2+1)𝐄(δnkp/2))1/p.y=\left(\varepsilon^{-k(p/2+1)}\,{\bf E}(\delta_{n}^{kp/2})\right)^{1/p}.

Applying the power Markov’s inequality with exponent kp/2{kp/2} for the second summand in (10), we obtain that, under the conditions of the theorem,

ζn(ε)=Op((εk(p/2+1)𝐄(δnkp/2))1/p)\zeta_{n}(\varepsilon)=O_{p}\left(\left(\varepsilon^{-k(p/2+1)}\,{\bf E}(\delta_{n}^{kp/2})\right)^{1/p}\right)

and there exists a solution εε(n)\varepsilon\equiv\varepsilon(n) to the equation

𝐄(δnkp/2)=εk(p/2+1)ωfp(ε).\displaystyle{\bf E}(\delta_{n}^{kp/2})=\varepsilon^{k(p/2+1)}\omega^{p}_{f}(\varepsilon). (11)

It is clear that this solution vanishes as nn\to\infty. In fact, the value ε(n)\varepsilon(n) minimizes in ε\varepsilon the order of smallness for the right-hand side of (9). Notice that δn/ε(n)p0\delta_{n}/\varepsilon(n)\stackrel{{\scriptstyle p}}{{\to}}0 and (ε(n))k(p/2+1)𝐄(δnkp/2)(\varepsilon(n))^{-k(p/2+1)}\,{\bf E}(\delta_{n}^{kp/2}) in view of (11).

Taking Remark 1 into account one can obtain the following two assertions as consequences of Theorem 11.

Corollary 1. Let 𝒞\cal C be a set of nonrandom equicontinuous functions from the function space C[0,1]kC[0,1]^{k} ((for example, a precompact subset of C[0,1]kC[0,1]^{k})). Then, under Condition (D)(D),

γn(𝒞):=supf𝒞supt[0,1]k|fn,ε~(n)(t)f(t)|p0,\gamma_{n}({\cal C}):=\sup_{f\in\cal C}\,\sup_{t\in[0,1]^{k}}|f^{*}_{n,\tilde{\varepsilon}(n)}(t)-f(t)|\stackrel{{\scriptstyle p}}{{\to}}0,

where ε~(n)\tilde{\varepsilon}(n) is a solution to equation (11)(\ref{opt}) in which the modulus of continuity ωf(ε)\omega_{f}(\varepsilon) is replaced with the universal modulus ω𝒞(ε):=supf𝒞ωf(ε)\omega_{\cal C}(\varepsilon):=\sup_{f\in\cal C}\omega_{f}(\varepsilon). In this case, γn(𝒞)=Op(ω𝒞(ε~(n)))\gamma_{n}({\cal C})=O_{p}(\omega_{\cal C}(\tilde{\varepsilon}(n))).

Corollary 2. If the modulus of continuity of the regression random field f(t)f(t), t[0,1]kt\in[0,1]^{k}, in Model (1)(\ref{f2}) meets the condition ωf(ε)ζτ(ε)\omega_{f}(\varepsilon)\leq\zeta\tau(\varepsilon) a.s., where ζ>0\zeta>0 is a proper random variable and τ(ε)\tau(\varepsilon) is a positive continuous nonrandom function, with τ(ε)0\tau(\varepsilon)\to 0 as ε0\varepsilon\to 0, then, under Condition (D)(D),

supt[0,1]k|fn,ε^(n)(t)f(t)|p0,\displaystyle\sup_{t\in[0,1]^{k}}|f^{*}_{n,\hat{\varepsilon}(n)}(t)-f(t)|\stackrel{{\scriptstyle p}}{{\to}}0, (12)

where ε^(n)\hat{\varepsilon}(n) is a solution to equation (11)(\ref{opt}) in which the modulus of continuity ωf(ε)\omega_{f}(\varepsilon) is replaced with τ(ε)\tau(\varepsilon).

Example 2. Let Θ=[0,1]k\Theta=[0,1]^{k}, δnνn1/k\delta_{n}\leq\nu n^{-1/k}, with 𝐄νkp/2<{\bf E}\nu^{kp/2}<\infty, and ωf(ε)ζεα\omega_{f}(\varepsilon)\leq\zeta\varepsilon^{\alpha}, α(0,1]\alpha\in(0,1], where ζ\zeta is a proper random variable. Then ε(n)=O(n12k(1/p+1/2)+α)\varepsilon(n)=O\left(n^{-\frac{1}{2k(1/p+1/2)+\alpha}}\right) and

supt[0,1]k|fn,ε(t)f(t)|=Op(nα2k(1/p+1/2)+α).\sup_{t\in[0,1]^{k}}|f^{*}_{n,\varepsilon}(t)-f(t)|=O_{p}\left(n^{-\frac{\alpha}{2k(1/p+1/2)+\alpha}}\right).

In particular, in the one-dimensional case, if f()=W()f(\cdot)=W(\cdot) is a Wiener process on [0,1][0,1], and the i.i.d. random variables ξi\xi_{i} are centered Gaussian, then by Lévy’s modulus of continuity theorem, for any arbitrarily small ν>0\nu>0, we have

supt[0,1]|fn,ε(t)f(t)|=Op(n1/3+ν).\sup_{t\in[0,1]}|f^{*}_{n,\varepsilon}(t)-f(t)|=O_{p}(n^{-1/3+\nu}).

Here we put k=1k=1, α=1/2ν1\alpha=1/2-\nu_{1}, and 1/p<ν21/p<\nu_{2}, with arbitrarily small positive ν1\nu_{1} and ν2\nu_{2}.

R e m a r k  3.

Let k=1k=1, Θ=[0,1]\Theta=[0,1]. Denote by Xn:1Xn:nX_{n:1}\leq\ldots\leq X_{n:n} the ordered sample {Xi;i=1,,n}\{X_{i};\,i=1,\ldots,n\}. Put

Xn:0:=0,Xn:n+1:=1,Δni:=(Xn:i1,Xn:i],i=1,,n.X_{n:0}:=0,\quad X_{n:n+1}:=1,\quad\Delta_{ni}:=(X_{n:i-1},\,X_{n:i}],\quad i=1,\ldots,n.

Denote by YniY_{ni} the response variable YY corresponding to Xn:iX_{n:i} in (1). Then we can write down estimator (6) for the function ff as

fn,ε(t)=i=1nYniKε(tXn:i)ΔXnii=1nKε(tXn:i)ΔXni,f^{*}_{n,\varepsilon}(t)=\frac{\sum_{i=1}^{n}Y_{ni}K_{\varepsilon}(t-X_{n:i})\Delta X_{ni}}{\sum_{i=1}^{n}K_{\varepsilon}(t-X_{n:i})\Delta X_{ni}}, (13)

where

ΔXni:=Λ(Δni)=Xn:iXn:i1.\Delta X_{ni}:=\Lambda(\Delta_{ni})=X_{n:i}-X_{n:i-1}.

This estimator was proposed and studied in detail in Borisov et al. (2021).

But, instead of {Δni}\{\Delta_{ni}\}, we can consider Voronoi cells

Δ~ni:=(Xn:i1+Xn:i2,Xn:i+Xn:i+12]\widetilde{\Delta}_{ni}:=\left(\frac{X_{n:i-1}+X_{n:i}}{2},\,\frac{X_{n:i}+X_{n:i+1}}{2}\right]

and write down the corresponding estimator:

f~n,ε(t)=i=1nYniKε(tXn:i)Δ~Xnii=1nKε(tXn:i)Δ~Xni,\widetilde{f}^{*}_{n,\varepsilon}(t)=\frac{\sum_{i=1}^{n}Y_{ni}K_{\varepsilon}(t-X_{n:i})\widetilde{\Delta}X_{ni}}{\sum_{i=1}^{n}K_{\varepsilon}(t-X_{n:i})\widetilde{\Delta}X_{ni}}, (14)

where

Δ~Xni:=Λ(Δ~ni)=Xn:i+1Xn:i12.\widetilde{\Delta}X_{ni}:=\Lambda(\widetilde{\Delta}_{ni})=\frac{X_{n:i+1}-X_{n:i-1}}{2}.

Repeating, for the last estimator, the corresponding proofs in Borisov et al. (2021) originally applied to estimator (13), we can easily see that all properties of estimator (13) are retained for (14), except the constant factor in the asymptotic variance. Namely, let the regression function f(t)f(t) be twice continuously differentiable and nonrandom, let the errors {ξi}\{\xi_{i}\} be independent identically distributed, centered with finite second moment M2M_{2}, and independent of the design {Xi}\{X_{i}\}, whose elements be independent identically distributed. In addition, let X1X_{1} have a strictly positive density p(t)p(t) which is continuously differentiable. Then

𝕍arfn,ε(t)2M2hnp(t)11K2(u)𝑑u,𝕍arf~n,ε(t)1.5M2hnp(t)11K2(u)𝑑u.{\mathbb{V}ar}f_{n,\varepsilon}^{*}(t)\sim\frac{2M_{2}}{hnp(t)}\int\limits_{-1}^{1}K^{2}(u)du,\qquad{\mathbb{V}ar}\widetilde{f}_{n,\varepsilon}^{*}(t)\sim\frac{1.5M_{2}}{hnp(t)}\int\limits_{-1}^{1}K^{2}(u)du. (15)

The former asymptotic relation was established in Lemma 3 by Borisov et al. (2021). The latter relation can be proved by repeating the proof of that lemma with obvious changes. Hence, in the case of independent and identically distributed design points, the asymptotic variance of the estimator can be reduced by choosing an appropriate partition.

Thus, for k=1k=1, this paper deals with a more general class of estimators (6) than that in Borisov et al. (2021) where estimator (13) was studied, and representatives of the new class can have certain advantages over the estimator (13).

3. Application to estimating the mean and covariance functions
of a random regression function

In this section, as an application of Theorem 1, we will construct a consistent estimator for the mean function of the random regression function in Model (1)(\ref{f2}). We consider the following multivariate statement of the problem of estimating the mean function of an a.s. continuous random regression stochastic process. Consider NN independent copies of Model (1)(\ref{f2}):

Yi,j=fj(Xi,j)+ξi,j,i=1,,n,j=1,,N,Y_{i,j}=f_{j}(X_{i,j})+\xi_{i,j},\qquad i=1,\ldots,n,\,\,\,\,j=1,\ldots,N, (16)

where f(t),f1(t),,fN(t)f(t),f_{1}(t),\ldots,f_{N}(t), t[0,1]kt\in[0,1]^{k}, are i.i.d. unknown a.s. continuous stochastic processes, and, for every jj, the collection {ξi,j;in}\{\xi_{i,j};\,i\leq n\} satisfies condition (2). Here and in what follows, the subscript jj denotes the sequential number of such a copy. Introduce the notation

f¯N,n,ε(t):=1Nj=1Nfn,ε,j(t).\overline{f^{*}}_{N,n,\varepsilon}(t):=\frac{1}{N}\sum\limits_{j=1}^{N}f^{*}_{n,\varepsilon,j}(t).

Theorem 2. Let Condition (D)(D) for Model (1)(\ref{f2}) be fulfilled and

𝐄supt[0,1]k|f(t)|<.{\bf E}\sup_{t\in[0,1]^{k}}|f(t)|<\infty. (17)

Besides, let a sequences εεn0\varepsilon\equiv\varepsilon_{n}\to 0 and a sequence of naturals NNnN\equiv N_{n}\to\infty satisfy the conditions

εk(p/2+1)𝐄(δnkp/2)0andN𝐏(δn>εmin{1,ρ(k2k+1L)1})0.\varepsilon^{-k(p/2+1)}\,{\bf E}(\delta_{n}^{kp/2})\to 0\,\,\,\mbox{and}\,\,\,N{\bf P}(\delta_{n}>\varepsilon\min\{1,\,\rho(k2^{k+1}L)^{-1}\})\to 0. (18)

Then

supt[0,1]k|f¯N,n,ε(t)𝐄f(t)|p0.\sup\limits_{t\in[0,1]^{k}}|\overline{f^{*}}_{N,n,\varepsilon}(t)-{\bf E}f(t)|\stackrel{{\scriptstyle p}}{{\to}}0. (19)
R e m a r k  4.

If condition (17) is replaced with a slightly stronger condition

𝐄supt[0,1]kf2(t)<{\bf E}\sup_{t\in[0,1]^{k}}f^{2}(t)<\infty

then, under the restrictions (18), one can prove the uniform consistency of the estimator

MN,n(t1,t2):=1Nj=1Nfn,ε,j(t1)fn,ε,j(t2),t1,t2[0,1]k,M_{N,n}^{*}(t_{1},t_{2}):=\frac{1}{N}\sum\limits_{j=1}^{N}f^{*}_{n,\varepsilon,j}(t_{1})f^{*}_{n,\varepsilon,j}(t_{2}),\,\,\,\,t_{1},t_{2}\in[0,1]^{k},

for the unknown mixed second-moment function 𝐄f(t1)f(t2){\bf E}f(t_{1})f(t_{2}), where εεn\varepsilon\equiv\varepsilon_{n} and NNnN\equiv N_{n} are defined in (18). The proof is based on the same arguments as those in proving Theorem 2, and therefore is omitted. In other words, under the above-mentioned conditions, the estimator

Covn(t1,t2):=MN,n(t1,t2)f¯N,n,ε(t1)f¯N,n,ε(t2){\rm Cov}^{*}_{n}(t_{1},t_{2}):=M_{N,n}^{*}(t_{1},t_{2})-\overline{f^{*}}_{N,n,\varepsilon}(t_{1})\overline{f^{*}}_{N,n,\varepsilon}(t_{2})

is uniformly consistent for the covariance function of the random regression field f(t)f(t).

4. Asymptotic normality

In this section, we discuss sufficient conditions for asymptotic normality of the estimators fn,ε(t)f^{*}_{n,\varepsilon}(t). Denote by 0{\cal F}_{0} the trivial σ\sigma-field, and by j{\cal F}_{j} the σ\sigma-field generated by the collection {ξ1,,ξj}\{\xi_{1},\ldots,\xi_{j}\}, by the design points, and by the regression random field.

Theorem 3. Let the design {Xi}\{X_{i}\} do not depend on nn. Under Condition (D)(D), assume that, for some tΘt\in\Theta and a sequence εεn\varepsilon\equiv\varepsilon_{n},

hn:=maxjn(Kε(tXj)Λ(Δj))2j=1n(Kε(tXj)Λ(Δj))2p0,h_{n}:=\frac{\max_{j\leq n}\big{(}K_{\varepsilon}\left(t-X_{j}\right)\Lambda(\Delta_{j})\big{)}^{2}}{\sum_{j=1}^{n}\big{(}K_{\varepsilon}\left(t-X_{j}\right)\Lambda(\Delta_{j})\big{)}^{2}}\stackrel{{\scriptstyle p}}{{\to}}0, (20)
𝐄(ξj2|j1)=σ2 a.s. for all j,{\bf E}(\xi_{j}^{2}\ |\ {\cal F}_{j-1})=\sigma^{2}\ \mbox{ a.s. for all }j,
maxj𝐄(ξj2 1(ξj2>a/hn)|j1)p0 for all a>0.\max_{j}{\bf E}\left(\xi_{j}^{2}\,{\bf 1}(\xi_{j}^{2}>a/h_{n})\ |\ {\cal F}_{j-1}\right)\stackrel{{\scriptstyle p}}{{\to}}0\ \mbox{ for all }a>0.

Then

Bn,ε1(t)(fn,ε(t)f(t)rn,ε(t))dN(0,σ2),B^{-1}_{n,\varepsilon}(t)\left(f^{*}_{n,\varepsilon}(t)-f(t)-r_{n,\varepsilon}(t)\right)\stackrel{{\scriptstyle d}}{{\to}}N(0,\sigma^{2}),

where N(0,σ2)N(0,\sigma^{2}) is a centered Gaussian random variable with variance σ2\sigma^{2},

Bn,ε2(t):=Jn,ε2(t)i=1n(Kε(tXi)Λ(Δi))2,B^{2}_{n,\varepsilon}(t):=J^{-2}_{n,\varepsilon}(t)\sum\limits_{i=1}^{n}\big{(}K_{\varepsilon}(t-X_{i})\Lambda(\Delta_{i})\big{)}^{2},
rn,ε(t):=Jn,ε1(t)i=1n(f(Xi)f(t))Kε(tXi)Λ(Δi),r_{n,\varepsilon}(t):=J^{-1}_{n,\varepsilon}(t)\sum\limits_{i=1}^{n}(f(X_{i})-f(t))K_{\varepsilon}\left(t-X_{i}\right)\Lambda(\Delta_{i}),
Jn,ε(t):=i=1nKε(tXi)Λ(Δi).J_{n,\varepsilon}(t):=\sum\limits_{i=1}^{n}K_{\varepsilon}(t-X_{i})\Lambda(\Delta_{i}).

The theorem is a direct consequence of Corollary 3.1 in Hall and Heyde (1980).

5. Simulation examples

In this section, we present simulations comparing the estimator fn,ε(t)f^{*}_{n,\varepsilon}(t) defined in (6) with the Nadaraya–Watson estimator

f^n,ε(t):=i=1nYiKε(tXi)i=1nKε(tXi)\hat{f}_{n,\varepsilon}(t):=\frac{\sum_{i=1}^{n}Y_{i}K_{\varepsilon}(t-X_{i})}{\sum_{i=1}^{n}K_{\varepsilon}(t-X_{i})}

in the 2-dimensional case. For this estimator, we will assume 0/0=00/0=0, like that was done for the estimator (6)(\ref{main}).

The elements of the design space Θ=[1,1]×[1,1]\Theta=[-1,1]\times[-1,1] will be denoted by (x,y)(x,y). The following two algorithms were used to partition the space Θ\Theta into the sets Δi\Delta_{i}.

The first algorithm is the Voronoi partitioning. For each ii, the set Δi\Delta_{i} is the Voronoi cell corresponding to XiX_{i}, i.e. the set of all points of Θ\Theta that lie closer to XiX_{i} than to any other design point. The deldir R package was employed for calculation of the squares of the cells.

The second algorithm is recursive partitioning by coordinate-wise medians. First, we divide Θ\Theta into the two rectangles by the line t(1)=median{X1(1),,Xn(1)},t^{(1)}={\rm median}\{X_{1}^{(1)},\dots,X_{n}^{(1)}\}, where the median is the midpoint of the interval (Xn/2(1),Xn/2+1(1))\left(X_{\lfloor n/2\rfloor}^{(1)},\ X_{\lfloor n/2\rfloor+1}^{(1)}\right) when all the points are sorted in increasing order with respect to the first coordinate. Then each of the two rectangles is divided recursively. If, at some step, a rectangle contains two or more design points then it is divided into the two parts: If the rectangle’s width is greater than its height, then the rectangle is divided by the line t(1)=median{Xl(1):lB},t^{(1)}={\rm median}\{X_{l}^{(1)}:l\in B\}, where BB is the set of indices of the design points falling into the rectangle; otherwise it is divided by the line t(2)=median{Xl(2):lB}t^{(2)}={\rm median}\{X_{l}^{(2)}:l\in B\}. As soon as there is only one design point XiX_{i} in a rectangle, the rectangle is put to be Δi\Delta_{i}.

Results of partitioning Θ\Theta into cells for a collection of 100 points by the both algorithms are displayed in Fig. 1.

Refer to caption
Refer to caption
Figure 1: Partitioning into Voronoi cells (left) and partitioning by coordinate-wise medians (right) for the same collection of points.

In the simulation examples below, we used the tricubic kernel

K(x,y)=440162πmax{0,(1x2+y2 3)3}.K(x,y)=\frac{440}{162\pi}\max\left\{0,\left(1-\sqrt{x^{2}+y^{2}}^{\,3}\right)^{3}\right\}.

In each example, 1000 simulation runs were performed. In each of the simulation runs, 5000 design points were generated and randomly divided into the training (80%) and validation (20%) sets. For the design points XiX_{i}, the observations YiY_{i} were generated with i.i.d. Gaussian noise with standard deviation σ=0.5\sigma=0.5. For each of the tested algorithms, on the training set, the optimal ε\varepsilon was calculated by 10-fold cross-validation minimizing the average of mean-square errors. The ε\varepsilon was selected from 20 values located on the logarithmic grid from 0.01 to 0.5. The random partitioning for the cross-validation was the same for all the tested algorithms.

Then, for each of the algorithms, the model, trained on the training set with the chosen ε\varepsilon, was used to compute the mean-square error (MSE) for the observations of the validation set:

MSE=1mj(fε(Xj)Yj)2,{\rm MSE}=\frac{1}{m}\sum_{j}(f^{*}_{\varepsilon}(X_{j})-Y_{j})^{2},

where the sum is taken over the validation set, and mm is the size of the set. Besides, that model was employed to compute the maximal absolute error (MaxE) for the true values of the target function ff on the 100×100100\times 100 uniform lattice on Θ\Theta:

MaxE=maxj|fε(γj)f(γj)|,{\rm MaxE}=\max_{j}|f^{*}_{\varepsilon}(\gamma_{j})-f(\gamma_{j})|,

where the maximum is computed for the elements γj\gamma_{j} of the 100×100100\times 100 lattice covering Θ\Theta.

The algorithms that were compared will be denoted by NW (Nadaraya-Watson), ULCV (Universal Local Constant estimator (6) with Voronoi partitioning), and ULCM (Universal Local Constant estimator (6) with coordinate-wise Medians partitioning).

The results of the simulation runs are presented as median (1-st quartile, 3-rd quartile) and are compared between the estimators with the paired Wilcoxon test.

In the examples below, we intentionally chose the densities of the design points with high nonuniformity in order to demonstrate possible advantages of the new estimator.


5.1. Example 1

Refer to caption
(a) Design points
Refer to caption
(b) MSE
Refer to caption
(c) MaxE
Figure 2: Design points in an Example 1 experiment (left), the mean-square errors (middle), and maximal absolute errors (right) in Example 1

In this example, we approximate the nonrandom regression function

f(x,y)=51+e20x2y3.f(x,y)=\frac{5}{1+e^{-20x}}-2y^{3}.

The design points were generated in a way similar to that in Example in Sec. 1. First, we choose the left rectangle [1,0)×[1,1][-1,0)\times[-1,1] or the right rectangle [0,1]×[1,1][0,1]\times[-1,1] with equal probabilities and draw X1X_{1} uniformly distributed in the chosen rectangle. Then we draw 1010 design points uniformly distributed in the other rectangle. Then we draw 10210^{2} design points uniformly distributed in the rectangle where X1X_{1} lies. Then we draw 10310^{3} design points uniformly distributed in the rectangle where X2X_{2} lies, and so on. In other words, we alternate the rectangle after 11, 1010, 10210^{2}, 10310^{3}, … draws. One draw of the 50005000 design points is depicted in Fig. 2. The estimated function f(x,y)f(x,y) and a computed ULCV estimate are depicted in Fig. 3.

Refer to caption
(a) The estimated function z=f(x,y)z=f(x,y)
Refer to caption
(b) A ULCV estimate z=f(x,y)z=f^{*}(x,y)
Figure 3: The estimated function (left) and a result of ULCV estimator (right) in Example 1

The results are presented in Fig. 2. The ULCV estimator appeared to perform best among the three considered ones both for MSE and MaxE accuracy measures. In particular, the ULCV estimator was better than the NW one: MSE 0.2661 (0.2584, 0.2742) vs. 0.2734 (0.2650, 0.2819), p<0.0001p<0.0001; MaxE 0.7878 (0.7013, 0.9230) vs. 1.1998 (1.0911, 1.3250), p<0.0001p<0.0001. In this example, the ULCM estimator was better than the NW one as well.


5.2. Example 2

In this example, we approximate the nonrandom regression function

f(x,y)=sin(10x2+y2)/x2+y2.f(x,y)=\sin\left(10\sqrt{x^{2}+y^{2}}\right)/\sqrt{x^{2}+y^{2}}. (21)

The i.i.d. design points were generated with independent polar coordinates (ρ,φ)(\rho,\varphi), where ρ\rho was drawn with the density proportional to r2(2r)1/10r^{2}(2-r)^{1/10}, 0r20\leq r\leq 2, and φ\varphi was uniformly distributed on [0,2π)[0,2\pi). The distribution of the design points was restricted on Θ=[1,1]×[1,1]\Theta=[-1,1]\times[-1,1], i.e., the design points that did not fall into Θ\Theta were excluded, keeping the total number of collected points equal to 5000, as in the other simulation examples. One draw of the design points is depicted in Fig. 4. The estimated function f(x,y)f(x,y) and a computed ULCV estimate are depicted in Fig. 5.

Refer to caption
(a) Design points
Refer to caption
(b) MSE
Refer to caption
(c) MaxE
Figure 4: Design points in an Example 2 experiment (left), the mean-square errors (middle), and maximal absolute errors (right) in Example 2
Refer to caption
(a) The estimated function z=f(x,y)z=f(x,y)
Refer to caption
(b) A ULCV estimate z=f(x,y)z=f^{*}(x,y)
Figure 5: The estimated function (left) and a result of ULCV estimator (right) in Example 2

The results are presented in Fig. 4. The ULCV estimator was the best among the three considered ones both for MSE and MaxE accuracy measures. In particular, the ULCV estimator was better than the NW one: MSE 0.2803 (0.2718, 0.2898) vs. 0.2870 (0.2774, 0.2974), p<0.0001p<0.0001; MaxE 2.505 (2.072, 3.140) vs. 2.695 (2.303, 3.361), p<0.0001p<0.0001. In this example, the ULCM estimator had lower MaxE and higher MSE than the NW estimator did.


5.3. Example 3

In this example, we approximate the same nonrandom regression function (21) as in Example 2. The only difference of this example from Example 2 is that here the coordinates of the design points were generated as independent normal random variables with mean 0 and standard deviation 1/2. As above, the distribution of the design points was restricted on Θ=[1,1]×[1,1]\Theta=[-1,1]\times[-1,1]. One draw of the design points is depicted in Fig. 6.

Refer to caption
(a) Design points
Refer to caption
(b) MSE
Refer to caption
(c) MaxE
Figure 6: Design points in an Example 3 experiment (left), the mean-square errors (middle), and maximal absolute errors (right) in Example 3

The results are presented in Fig. 6. The ULCV estimator was the best one in terms of MSE, in particular, it was better than NW: MSE 0.2834 (0.2750, 0.2922) vs. 0.2895 (0.2808, 0.2977), p<0.0001p<0.0001. But ULCV was worse than NW in terms of MaxE: 1.507 (1.364, 1.653) vs. 1.488 (1.357, 1.643), p<0.0001p<0.0001. In this example, the ULCM estimator was the worst one both for MSE and MaxE. However, from a practical point of view, the three estimators demonstrated similar accuracy in terms of MaxE.


6. Real data application

Refer to caption
(a) Earthquakes
Refer to caption
(b) MSE
Refer to caption
(c) MaxE
Figure 7: Observed earthquakes events (left), the mean-square errors (middle), and maximal absolute errors (right)

In this section, we compared the new ULCV and ULCM estimators with the NW one in the application to the data on earthquakes in Japan that happened in 2012–2021 (data retrieved from ANSS Comprehensive Earthquake Catalog, 2022). Each of the 10184 collected earthquake events was described by its coordinates (longitude and latitude) and its magnitude (ranging from 2.7 to 7.8). The collected events are presented in Fig. 7. The goal of the application of the estimators was to accurately estimate the mean magnitude depending on the coordinates. As in the simulation examples above, we did 1000 runs, in each of which the data were randomly divided into the training (80%) and validation (20%) sets. For each of the tested algorithms, on the training set, the optimal ε\varepsilon was calculated by 10-fold cross-validation minimizing the average of mean-square errors. The ε\varepsilon was selected from 20 values located on the logarithmic grid from 1 to 10. The random partitioning for the cross-validation was the same for all the tested algorithms. The difference of the computations of this section with those of Sec. 5 was that we did not know the true value of the estimated function, therefore, we had to estimate the maximal error (MaxE) on the validation set in each run, not on true values of the estimated function. Besides, since the domain of the coordinates of the events is nonrectangular while the epmloyed domain partitioning algorithms (Voronoi cells algorithm and coordinate-wise medians algorithm) calculated the squares of the cells for a rectangular domain, we bounded the squares of the cells from above by 1 in order to avoid overweighting of the corresponding observations. The resulting estimates of the NW and ULCV estimators are depicted in Fig. 8, where, for each estimator, the value of ε\varepsilon was chosen as the median of those chosen in the 1000 runs.

Refer to caption
(a) The NW estimate z=f^(x,y)z=\hat{f}(x,y)
Refer to caption
(b) The ULCV estimate z=f(x,y)z=f^{*}(x,y)
Figure 8: The estimated mean magnitude for the NW (left) and ULCV (right) estimators

The results are presented in Fig. 7. The ULCV estimator was the best among the three considered ones both for MSE and MaxE accuracy measures. In particular, the ULCV estimator was better than the NW one: MSE 0.1296 (0.1245, 0.1348) vs. 0.1297 (0.1239, 0.1364), p<0.0001p<0.0001; MaxE 2.573 (2.464, 2.785) vs. 2.736 (2.442, 3.346), p=0.0005p=0.0005. In this example, the ULCM estimator yielded lower MaxE and higher MSE than the NW estimator did. However, from a practical point of view, the three estimators displayed similar median MSE.

7. Conclusion

In this paper, for a wide class of nonparametric regression models with a multivariate random design, universal uniformly consistent kernel estimators are proposed for the unknown random regression functions (random fields) of the corresponding multivariate argument. These estimators belong to the class of local constant kernel estimators. But in contrast to the vast majority of previously known results, traditional correlation conditions of design elements are not needed for the consistency of the new estimators. The design can be either fixed and not necessarily regular, or random and not necessarily consisting of independent or weakly dependent random variables. With regard to design elements, the only condition that is required is the dense filling of the regression function domain with the design points.

Explicit upper bounds are found for the rate of uniform convergence in probability of the new estimators to an unknown random regression function. The only characteristic explicitly included in these estimators is the maximum diameter of the cells of partition generated by the design elements, and only convergence to zero in probability is required for the characteristic. The advantage of this condition over the classical ones is that it is insensitive to the forms of dependence of the design observations. Note that this condition is, in fact, necessary, since only when the design densely fills the regression function domain, it is possible to reconstruct the regression function with a certain accuracy. As a corollary of the main result, we obtain a consistent estimator for the mean function of a continuous random process.

In the simulation examples of Section 5, the new estimators were compared with Nadaraya–Watson estimators. In some of the examples, the new estimators proved to be most accurate. In Section 6, as an application of the new estimators, we studied the real data on the magnitudes of earthquakes in Japan, and the accuracy of the new estimators was comparable to that of the Nadaraya-Watson ones.

8. Proofs

Proof of Theorem 11. Taking the relation (1) into account, one can obtain the following identity:

fn,ε(t)=Rn,ε(f,t)+νn,ε(t),f^{*}_{n,\varepsilon}(t)=R_{n,\varepsilon}(f,t)+\nu_{n,\varepsilon}(t),

where

Rn,ε(f,t):=Jn,ε1(t)i=1nf(Xi)Kε(tXi)Λ(Δi),R_{n,\varepsilon}(f,t):=J^{-1}_{n,\varepsilon}(t)\sum\limits_{i=1}^{n}f(X_{i})K_{\varepsilon}\left(t-X_{i}\right)\Lambda(\Delta_{i}),
νn,ε(t):=Jn,ε1(t)i=1nKε(tXi)ξiΛ(Δi),\nu_{n,\varepsilon}(t):=J^{-1}_{n,\varepsilon}(t)\sum\limits_{i=1}^{n}K_{\varepsilon}\left(t-X_{i}\right)\xi_{i}\Lambda(\Delta_{i}),
Jn,ε(t):=i=1nKε(tXi)Λ(Δi).J_{n,\varepsilon}(t):=\sum\limits_{i=1}^{n}K_{\varepsilon}(t-X_{i})\Lambda(\Delta_{i}).

Notice that, by virtue of the properties of KK, the range of summation in the three sums above is equal to {i:tXiε}\{i:\,\|t-X_{i}\|\leq\varepsilon\}. This is the principal argument in the calculations below.

Lemma 1. The following estimate is valid:

suptΘ|Rn,ε(f,t)f(t)|ωf(ε).\displaystyle\sup_{t\in\Theta}|R_{n,\varepsilon}(f,t)-f(t)|\leq\omega_{f}(\varepsilon). (22)

The proof is immediate from the identity

Rn,ε(f,t)=f(t)+Jn,ε1(t)i:tXiε(f(Xi)f(t))Kε(tXi)Λ(Δi).R_{n,\varepsilon}(f,t)=f(t)+J^{-1}_{n,\varepsilon}(t)\sum\limits_{i:\,\|t-X_{i}\|\leq\varepsilon}(f(X_{i})-f(t))K_{\varepsilon}\left(t-X_{i}\right)\Lambda(\Delta_{i}).

\hfill\Box

Lemma 2. If δnεε0\delta_{n}\leq\varepsilon\leq\varepsilon_{0} then, for any tΘt\in\Theta, the following relation is valid::

Jn,ε(t)ρk2kLδnε1.\displaystyle J_{n,\varepsilon}(t)\geq\rho-k2^{k}L\delta_{n}\varepsilon^{-1}.

Proof. We have

Jn,ε(t)=g(y)𝑑y,J_{n,\varepsilon}(t)=\int g(y)dy,

where

g(y)=i=1nKε(tXi)I(yΔi);g(y)=\sum\limits_{i=1}^{n}K_{\varepsilon}(t-X_{i})\,I(y\in\Delta_{i});

here I()I(\cdot) is the indicator of an event. Since

|Kε(tx)Kε(ty)|kLεk1δn, whenever xyδn,|K_{\varepsilon}\left(t-x\right)-K_{\varepsilon}\left(t-y\right)|\leq kL\varepsilon^{-k-1}\delta_{n},\ \mbox{ whenever }\ \|x-y\|\leq\delta_{n}, (23)

we have g(y)Kε(ty)kLεk1δng(y)\geq K_{\varepsilon}(t-y)-kL\varepsilon^{-k-1}\delta_{n} for all yΘy\in\Theta. Hence, by (8),

Jn,ε(t)\displaystyle J_{n,\varepsilon}(t) \displaystyle\geq yΘ:tyε(Kε(ty)kLεk1δn)𝑑y\displaystyle\int_{y\in\Theta:\ \|t-y\|\leq\varepsilon}\big{(}K_{\varepsilon}(t-y)-kL\varepsilon^{-k-1}\delta_{n}\big{)}dy
\displaystyle\geq ρk2kLδnε1.\displaystyle\rho-k2^{k}L\delta_{n}\varepsilon^{-1}.

The lemma is proved.\hfill\Box

Lemma 3. For every y>0y>0 and ε(0,ε0]\varepsilon\in(0,\varepsilon_{0}], on the subset of elementary events defined by the relation δn/εmin{1,ρ(k2k+1L)1}\delta_{n}/\varepsilon\leq\min\{1,\,\rho(k2^{k+1}L)^{-1}\}, the following upper bound is valid::

𝐏((δnk/2εk((1/2)+(1/p)))1suptΘ|νn,ε(t)|>y)G(k,p)ρpMpLp/2yp,\displaystyle{\bf P}_{\cal F}\left(\Big{(}\delta_{n}^{k/2}\varepsilon^{-k((1/2)+(1/p))}\Big{)}^{-1}\sup\limits_{t\in\Theta}|\nu_{n,\varepsilon}(t)|>y\right)\leq G(k,p)\,\rho^{-p}\,M_{p}\,L^{p/2}y^{-p}, (24)

where the symbol 𝐏{\bf P}_{\cal F} denotes the conditional probability given the σ\sigma-field {\cal F} generated by the design {Xi}\{X_{i}\} and the paths of the random field f()f(\cdot); here

G(k,p)<(p1)p/22p(k+(3/2))(1+k2(pk)/(p+1)1)p+1.G(k,p)<(p-1)^{p/2}2^{p(k+(3/2))}\,\left(1+\frac{k}{2^{(p-k)/(p+1)}-1}\right)^{p+1}.

Proof. Under the condition δn/ερ(k2k+1L)1\delta_{n}/\varepsilon\leq\rho(k2^{k+1}L)^{-1}, by virtue of Lemma 2 the simple inequality |νn,ε(t)|2ρ1|μn,ε(t)||\nu_{n,\varepsilon}(t)|\leq 2\rho^{-1}|\mu_{n,\varepsilon}(t)| is valid, where

μn,ε(t):=i=1nKε(tXi)Λ(Δi)ξi.\mu_{n,\varepsilon}(t):=\sum\limits_{i=1}^{n}K_{\varepsilon}(t-X_{i})\Lambda(\Delta_{i})\xi_{i}.

The distribution tail of suptΘ|μn,ε(t)|\sup_{t\in\Theta}|\mu_{n,\varepsilon}(t)| will be estimated by Kolmogorov’s dyadic chaining which has been used to estimate the tail probability of the sup-norm of a stochastic processes having continuous paths with probability 1.

Without loss of generality we will assume that Θ[0,1]k\Theta\subset[0,1]^{k}. We first note that the set Θ\Theta under the supremum sign above can be replaced with the subset of dyadic rational points =l1l{\cal R}=\cup_{l\geq 1}{\cal R}_{l}, where

l={(j1/2l,,jk/2l):j1=1,,2l1;;jk=1,,2l1}.{\cal R}_{l}=\{(j_{1}/2^{l},\dots,j_{k}/2^{l}):\ j_{1}=1,\dots,2^{l}-1;\dots;j_{k}=1,\ldots,2^{l}-1\}.

Thus,

suptΘ|μn,ε(t)|supt|μn,ε(t)|\sup_{t\in\Theta}|\mu_{n,\varepsilon}(t)|\leq\sup_{t\in{\cal R}}|\mu_{n,\varepsilon}(t)|
maxtm|μn,ε(t)|+l=m+1r=1kmaxtl|μn,ε(t+2ler)μn,ε(t)|,\leq\max_{t\in{\cal R}_{m}}|\mu_{n,\varepsilon}(t)|+\sum_{l=m+1}^{\infty}\sum_{r=1}^{k}\max_{t\in{\cal R}_{l}}\big{|}\mu_{n,\varepsilon}(t+2^{-l}e_{r})-\mu_{n,\varepsilon}(t)\big{|},

where mm is some natural number that will be chosen later, and ere_{r} is the kk-dimensional vector with the rr-th component 1 and other components 0.

Hence,

𝐏(suptΘ|μn,ε(t)|\displaystyle{\bf P}_{\cal F}(\sup_{t\in\Theta}|\mu_{n,\varepsilon}(t)| >\displaystyle> y)\displaystyle y)
\displaystyle\leq 𝐏(maxtm|μn,ε(t)|>amy)\displaystyle{\bf P}_{\cal F}(\max_{t\in{\cal R}_{m}}|\mu_{n,\varepsilon}(t)|>a_{m}y)
+l=m+1r=1k𝐏(maxtl|μn,ε(t+2ler)μn,ε(t)|>aly/k)\displaystyle+\sum_{l=m+1}^{\infty}\sum_{r=1}^{k}{\bf P}_{\cal F}\Big{(}\max_{t\in{\cal R}_{l}}\big{|}\mu_{n,\varepsilon}(t+2^{-l}e_{r})-\mu_{n,\varepsilon}(t)\big{|}>a_{l}y/k\Big{)}
\displaystyle\leq tm𝐏(|μn,ε(t)|>amy)\displaystyle\sum_{t\in{\cal R}_{m}}{\bf P}_{\cal F}(|\mu_{n,\varepsilon}(t)|>a_{m}y)
+l=m+1r=1ktl𝐏(|μn,ε(t+2ler)μn,ε(t)|>aly/k),\displaystyle+\sum_{l=m+1}^{\infty}\sum_{r=1}^{k}\sum_{t\in{\cal R}_{l}}{\bf P}_{\cal F}\Big{(}\big{|}\mu_{n,\varepsilon}(t+2^{-l}e_{r})-\mu_{n,\varepsilon}(t)\big{|}>a_{l}y/k\Big{)},

where am,am+1,a_{m},a_{m+1},\dots is a sequence of positive numbers such that am+am+1+=1a_{m}+a_{m+1}+\cdots=1.

In order to estimate the probability 𝐏(|μn,ε(t)|>amy){\bf P}_{\cal F}(|\mu_{n,\varepsilon}(t)|>a_{m}y), we use Rio’s martingale inequality (Rio 2009, Theorem 2.1)

𝐄|i=1nηi|p((p1)i=1n(𝐄|ηi|p)2/p)p/2,\displaystyle{\bf E}\Big{|}\sum^{n}_{i=1}\eta_{i}\Big{|}^{p}\leq\left((p-1)\sum^{n}_{i=1}\big{(}{\bf E}|\eta_{i}|^{p}\big{)}^{2/p}\right)^{p/2}, (26)

where {ηi}\{\eta_{i}\} is a martingale-difference sequence with finite moments of order p2p\geq 2.

Now, put

ηi:=Kε(tXi)Λ(Δi)ξi.\eta_{i}:=K_{\varepsilon}(t-X_{i})\Lambda(\Delta_{i})\xi_{i}.

From (26) and the simple upper bounds

Kε(tXi)Λ(Δi)Lεkδnk,K_{\varepsilon}(t-X_{i})\Lambda(\Delta_{i})\leq\frac{L}{\varepsilon^{k}}\delta^{k}_{n},
iKε(tXi)Λ(Δi)Lεk(2ε+2δn)k\sum_{i}K_{\varepsilon}(t-X_{i})\Lambda(\Delta_{i})\leq\frac{L}{\varepsilon^{k}}(2\varepsilon+2\delta_{n})^{k}

we then obtain that, with probability 11,

i=1n(𝐄|ηi|p)2/pMp2/p 2kL(1+δn/ε)k(δn/ε)k.\sum^{n}_{i=1}\big{(}{\bf E}_{\cal F}|\eta_{i}|^{p}\big{)}^{2/p}\leq M_{p}^{2/p}\,2^{k}L\,(1+\delta_{n}/\varepsilon)^{k}\,(\delta_{n}/\varepsilon)^{k}.

Under the restriction δnε1\delta_{n}\leq\varepsilon\leq 1, the last inequality and (26) imply that

𝐏(|μn,ε(t)|>amy)𝐄(|μn,ε(t)|p)(amy)pG1(δn/ε)kp/2(amy)pa.s.,\displaystyle{\bf P}_{\cal F}(|\mu_{n,\varepsilon}(t)|>a_{m}y)\leq\frac{{\bf E}_{\cal F}(|\mu_{n,\varepsilon}(t)|^{p})}{(a_{m}y)^{p}}\leq G_{1}\frac{(\delta_{n}/\varepsilon)^{kp/2}}{(a_{m}y)^{p}}\quad\mbox{a.s.}, (27)

where

G1=(p1)p/2 2kpLp/2Mp.G_{1}=(p-1)^{p/2}\,2^{kp}L^{p/2}\,M_{p}.

Now let us estimate 𝐏(|μn,ε(t+2ler)μn,ε(t)|>aly/k){\bf P}_{\cal F}\Big{(}\big{|}\mu_{n,\varepsilon}(t+2^{-l}e_{r})-\mu_{n,\varepsilon}(t)\big{|}>a_{l}y/k\Big{)}. In order to do it we use (26) with

ηi:=(Kε(tXi+2ler)Kε(tXi))Λ(Δi)ξi.\eta_{i}:=\Big{(}K_{\varepsilon}(t-X_{i}+2^{-l}e_{r})-K_{\varepsilon}(t-X_{i})\Big{)}\Lambda(\Delta_{i})\xi_{i}.

We have

|Kε(tXi+2ler)Kε(tXi)|Λ(Δi)Lεk+1 2lδnk,\Big{|}K_{\varepsilon}(t-X_{i}+2^{-l}e_{r})-K_{\varepsilon}(t-X_{i})\Big{|}\Lambda(\Delta_{i})\leq\frac{L}{\varepsilon^{k+1}}\ 2^{-l}\ \delta_{n}^{k},
i|Kε(tXi+2ler)Kε(tXi)|Λ(Δi)Lεk+1 2l+1(2ε+2δn)k.\sum_{i}\Big{|}K_{\varepsilon}(t-X_{i}+2^{-l}e_{r})-K_{\varepsilon}(t-X_{i})\Big{|}\Lambda(\Delta_{i})\leq\frac{L}{\varepsilon^{k+1}}\ 2^{-l+1}\ (2\varepsilon+2\delta_{n})^{k}.

Thus

i=1n(𝐄|ηi|p)2/pMp2/p 2kL 22l+1(1+δn/ε)k(δn/ε)k.\sum^{n}_{i=1}\big{(}{\bf E}_{\cal F}|\eta_{i}|^{p}\big{)}^{2/p}\leq M_{p}^{2/p}\,2^{k}L\,2^{-2l+1}\,(1+\delta_{n}/\varepsilon)^{k}\,(\delta_{n}/\varepsilon)^{k}.

Again, under the restriction δnε1\delta_{n}\leq\varepsilon\leq 1, the last inequality and (26) imply

𝐏(|μn,ε(t+2ler)μn,ε(t)|>aly/k)G2k(δn/ε)kp/2εp 2lp(aly)p,\displaystyle{\bf P}_{\cal F}\Big{(}\big{|}\mu_{n,\varepsilon}(t+2^{-l}e_{r})-\mu_{n,\varepsilon}(t)\big{|}>a_{l}y/k\Big{)}\leq\frac{G_{2}}{k}\ \frac{(\delta_{n}/\varepsilon)^{kp/2}\ \varepsilon^{-p}\ 2^{-lp}}{(a_{l}y)^{p}}, (28)

where

G2=2p/2kp+1(p1)p/2 2kpLp/2Mp=2p/2kp+1G1.G_{2}=2^{p/2}\,k^{p+1}\,(p-1)^{p/2}\,2^{kp}L^{p/2}\,M_{p}=2^{p/2}\,k^{p+1}\,G_{1}.

Combining (Universal kernel-type estimation of random fields thanks: The study was supported by the program for fundamental scientific research of the Siberian Branch of the Russian Academy of Sciences, project FWNF-2022-0015. ), (27), and (28), we obtain

𝐏(suptΘ|μn,ε(t)|>y){\bf P}_{\cal F}(\sup_{t\in\Theta}|\mu_{n,\varepsilon}(t)|>y)
<yp(δn/ε)kp/2(G12kmamp+G2εpl=m+12(pk)lakp).<y^{-p}(\delta_{n}/\varepsilon)^{kp/2}\left(G_{1}2^{km}a_{m}^{-p}+G_{2}\,\varepsilon^{-p}\sum_{l=m+1}^{\infty}2^{-(p-k)l}\,a_{k}^{-p}\right).

The optimal sequence ala_{l} minimizing the right-hand side of this inequality is as follows: am=c(G12km)1/(p+1)a_{m}=c(G_{1}2^{km})^{1/(p+1)} and al=c(G2εp 2(pk)l)1/(p+1)a_{l}=c\big{(}G_{2}\,\varepsilon^{-p}\,2^{-(p-k)l}\big{)}^{1/(p+1)} for l=m+1,m+2,l=m+1,m+2,\dots, where the coefficient cc is defined by the relation am+am+1+=1a_{m}+a_{m+1}+\cdots=1. For this sequence, we get

𝐏(suptΘ|μn,ε(t)|>y){\bf P}_{\cal F}(\sup_{t\in\Theta}|\mu_{n,\varepsilon}(t)|>y)
<yp(δn/ε)kp/2((G12km)1/(p+1)+l=m+1(G2εp2(pk)l)1/(p+1))p+1.<y^{-p}(\delta_{n}/\varepsilon)^{kp/2}\left((G_{1}2^{km})^{1/(p+1)}+\sum_{l=m+1}^{\infty}\Big{(}G_{2}\,\varepsilon^{-p}2^{-(p-k)l}\Big{)}^{1/(p+1)}\right)^{p+1}.

Now, put m=log2εm=\lceil-\log_{2}\varepsilon\rceil, where a\lceil a\rceil is the minimal integer greater than or equal to aa. Then

𝐏(suptΘ|μn,ε(t)|>y){\bf P}_{\cal F}(\sup_{t\in\Theta}|\mu_{n,\varepsilon}(t)|>y)
<ypδnkp/2εk(1+(p/2))((2G1)1/(p+1)+(G22(pk))1/(p+1)l=02(pk)l/(p+1))p+1<y^{-p}\,\delta_{n}^{kp/2}\,\varepsilon^{-k(1+(p/2))}\left((2G_{1})^{1/(p+1)}+\big{(}G_{2}2^{-(p-k)}\big{)}^{1/(p+1)}\sum_{l=0}^{\infty}2^{-(p-k)l/(p+1)}\right)^{p+1}
<ypδnkp/2εk(1+(p/2))2p/2G1(1+k2(pk)/(p+1)1)p+1.<y^{-p}\,\delta_{n}^{kp/2}\,\varepsilon^{-k(1+(p/2))}2^{p/2}\,G_{1}\left(1+\frac{k}{2^{(p-k)/(p+1)}-1}\right)^{p+1}.

This yields the statement of the lemma with

G(k,p)=2p 2p/2G1Lp/2Mp(1+k2(pk)/(p+1)1)p+1.G(k,p)=2^{p}\,2^{p/2}\,\frac{G_{1}}{L^{p/2}\,M_{p}}\left(1+\frac{k}{2^{(p-k)/(p+1)}-1}\right)^{p+1}.

Lemma 3 is proved.\hfill\Box

The statement of Theorem 1 follows from Lemmas 1–3.

Proof of Theorem 2. First of all, notice that condition (17) and Lebesgue’s dominated convergence theorem imply the relation

limν0𝐄ωf(ν)=0.\lim_{\nu\to 0}{\bf E}\omega_{f}(\nu)=0. (29)

It is clear that the relation (29) implies the uniform law of large numbers for independent copies of the a.s. continuous random process f(t)f(t), i.e.,

supt[0,1]k|f¯N(t)𝐄f(t)|p0\sup\limits_{t\in[0,1]^{k}}|\overline{f}_{N}(t)-{\bf E}f(t)|\stackrel{{\scriptstyle p}}{{\to}}0

as NN\to\infty, where f¯N(t):=1Nj=1Nfj(t)\overline{f}_{N}(t):=\frac{1}{N}\sum\limits_{j=1}^{N}f_{j}(t). Put

Δn,ε,j:=supt[0,1]k|fn,ε,j(t)fj(t)|.\Delta_{n,\varepsilon,j}:=\sup_{t\in[0,1]^{k}}|f^{*}_{n,\varepsilon,j}(t)-f_{j}(t)|. (30)

So, to prove (19) we need only to verify the following version of the law of large numbers for independent copies of the residuals defined in (30):

1Nj=1NΔn,ε,jp0,\frac{1}{N}\sum_{j=1}^{N}\Delta_{n,\varepsilon,j}\stackrel{{\scriptstyle p}}{{\to}}0,

but only for the sequences ε\varepsilon and NN chosen in (18).

Introduce the following events:

An,ε,j:={δn,jεmin{1,ρ(k2k+1L)1}},j=1,,N,A_{n,\varepsilon,j}:=\{\delta_{n,j}\leq\varepsilon\min\{1,\,\rho(k2^{k+1}L)^{-1}\}\},\,\,\,j=1,\ldots,N,

where the sequence εεn0\varepsilon\equiv\varepsilon_{n}\to 0 meets (18). (It is evident that such a sequence exists.) For any positive ν\nu we have

𝐏{1Nj=1NΔn,ε,j>ν}𝐏{1Nj=1NΔn,ε,jI(An,ε,j)>ν}+N𝐏(An,ε,1¯).{\bf P}\left\{\frac{1}{N}\sum_{j=1}^{N}\Delta_{n,\varepsilon,j}>\nu\right\}\leq{\bf P}\left\{\frac{1}{N}\sum_{j=1}^{N}\Delta_{n,\varepsilon,j}I(A_{n,\varepsilon,j})>\nu\right\}+N{\bf P}(\overline{A_{n,\varepsilon,1}}). (31)

Next, from Theorem 1 we obtain

𝐄Δn,ε,jI(An,ε,j)𝐄ωf(ε)+0𝐏(ζn(ε)>y,δnεmin{1,ρ(k2k+1L)1})𝑑y{\bf E}\Delta_{n,\varepsilon,j}I(A_{n,\varepsilon,j})\leq{\bf E}\omega_{f}(\varepsilon)+\int\limits_{0}^{\infty}{\bf P}\left(\zeta_{n}(\varepsilon)>y,\,\delta_{n}\leq\varepsilon\min\{1,\,\rho(k2^{k+1}L)^{-1}\}\right)dy
𝐄ωf(ε)+γn+γn𝐏(ζn(ε)>y,δnεmin{1,ρ(k2k+1L)1})𝑑y\leq{\bf E}\omega_{f}(\varepsilon)+\gamma_{n}+\int\limits_{\gamma_{n}}^{\infty}{\bf P}\left(\zeta_{n}(\varepsilon)>y,\,\delta_{n}\leq\varepsilon\min\{1,\,\rho(k2^{k+1}L)^{-1}\}\right)dy
𝐄ωf(ε)+C~γn,\leq{\bf E}\omega_{f}(\varepsilon)+\tilde{C}\gamma_{n},

where C~:=1+G(k,p)ρpMpLp/2\tilde{C}:=1+G(k,p)\,\rho^{-p}\,M_{p}\,L^{p/2} and γn:=(εk(p/2+1)𝐄δnkp/2)1/p\gamma_{n}:=\left(\varepsilon^{-k(p/2+1)}\,{\bf E}\delta_{n}^{kp/2}\right)^{1/p}. It remains to apply Markov’s inequality for the first probability on the right-hand side of (31) and use the limit relations (18) and the last estimate. Theorem 2 is proved.\hfill\Box


Acknowledgments

The authors are deeply grateful to Professor I.A. Ibragimov for his useful remarks. In addition, the authors thank the anonymous referee whose comments contributed to a better presentation of this study.

Data availability statement. The data required to reproduce the above findings are available to download from https://earthquake.usgs.gov/data/comcat/ (ANSS Comprehensive Earthquake Catalog, 2022).

References

Ahmad, I. A. and Lin, P.-E. (1984), ‘Fitting a multiple regression function’, J. Statist. Plann. Infer. 9, 163–176.

ANSS Comprehensive Earthquake Catalog, 2022. In: U.S. Geological Survey, Earthquake Hazards Program, 2017, Advanced National Seismic System (ANSS) Comprehensive Catalog of Earthquake Events and Products: Various, https://doi.org/10.5066/F7MS3QZH. Data retrieved September 4, 2022 from https://earthquake.usgs.gov/data/comcat/

Benhenni, K., Hedli-Griche, S., and Rachdi, M. (2010), ‘Estimation of the regression operator from functional fixed-design with correlated errors’, J. Multivar. Anal. 101, 476–490.

Benelmadani, D., Benhenni, K., and Louhichi, S. (2020), ‘Trapezoidal rule and sampling designs for the nonparametric estimation of the regression function in models with correlated errors’, Statistics 54, 59–96.

Borisov, I.S., Linke, Yu.Yu., and Ruzankin P.S. (2021), ‘Universal weighted kernel-type estimators for some class of regression models’, Metrika 84, 141–166.

Brown, L.D. and Levine, M. (2007), ‘Variance estimation in nonparametric regression via the difference sequence method’, Ann. Statist. 35, 2219–2232.

Chan, N. and Wang, Q. (2014), ‘Uniform convergence for Nadaraya-Watson estimators with nonstationary data’, Econometric Theory 30, 1110–1133.

Chu, C. K. and Deng, W.-S. (2003), ‘An interpolation method for adapting to sparse design in multivariate nonparametric regression’, J. Statist. Plann. Inference 116, 91–111.

Einmahl, U. and Mason, D.M. (2005), ‘Uniform in bandwidth consistency of kernel-type function estimators’, Ann. Statist. 33, 1380–1403.

Fan, J. and Gijbels, I. (1996), Local Polynomial Modelling and its Applications, London: Chapman and Hall.

Fan, J. and Yao, Q. (2003), Nonlinear time series nonparametric and parametric methods, Springer.

Gao, J., Kanaya, S., Li, D., and Tjostheim, D. (2015), ‘Uniform consistency for nonparametric estimators in null recurrent time series’, Econometric Theory 31, 911–952.

Gasser, T. and Engel, J. (1990), ‘The choice of weghts in kernel regression estimation’, Biometrica 77, 277-381.

Georgiev, A. A. (1988), ‘Consistent nonparametric multiple regression: The fixed design case’, J. Multivariate Anal. 25, 100–110.

Georgiev, A. A. (1990), ‘Nonparametric multiple function fitting’, Stat. Probab. Lett. 10, 203–211.

Georgiev, A. A. (1989), ‘Asymptotic properties of the multivariate Nadaraya-Watson regression function estimate: The fixed design case’, Stat. Probab. Lett. 7, 35–40.

Gu, W., Roussas, G. G., and Tran, L. T. (2007), ‘On the convergence rate of fixed design regression estimators for negatively associated random variables’, Stat. Probab. Lett. 77, 1214–1224.

Györfi, L., Kohler, M., Krzyzak, A., and Walk, H. (2002), A Distribution-Free Theory of Nonparametric Regression, New York: Springer.

Hall, P. and Heyde, C. C., (1980), Martingale limit theory and its application. Academic Press.

Hall, P., Müller, H.-G., and Wang, J.-L. (2006), ‘Properties of principal component methods for functional and longitudinal data analysis’, Ann. Statist. 34, 1493–1517.

Hansen, B.E. (2008), ‘Uniform convergence rates for kernel estimation with dependent data’, Econometric Theory 24, 726–748.

Härdle, W. (1990), Applied Nonparametric Regression, New York: Cambridge University Press.

He, Q. (2019), ‘Consistency of the Priestley–Chao estimator in nonparametric regression model with widely orthant dependent errors’, J. Inequal. Appl. 64, 2–13.

Hsing, T. and Eubank, R. (2015), Theoretical foundations of functional data analysis, with an introduction to linear operators, Wiley.

Honda, T. (2010), ‘Nonparametric regression for dependent data in the errors-in-variables problem‘, Global COE Hi-Stat Discussion Paper Series, Institute of Economic Research, Hitotsubashi University.

Hong, S. Y. and Linton, O. B. (2016), ‘Asymptotic properties of a Nadaraya-Watson type estimator for regression functions of infinite order’, SSRN Electronic Journal.

Jennen-Steinmetz, C. and Gasser, T. (1989), ‘A unifying approach for nonparametric regression estimation’, J. Americ. Stat. Assoc. 83, 1084–1089.

Jiang, J. and Mack, Y.P. (2001), ‘Robust local polynomial regression for dependent data’, Statistica Sinica 11, 705–722.

Jones, M.C., Davies, S.J., and Park, B.U. (1994), ‘Versions of kernel-type regression estimators’, J. Americ. Stat. Assoc. 89, 825–832.

Karlsen, H.A., Myklebust, T., and Tjostheim, D. (2007), ‘Nonparametric estimation in a nonlinear cointegration type model’, Ann. Statist. 35, 252–299.

Kulik, R. and Lorek, P. (2011), ‘Some results on random design regression with long memory errors and predictors’, J. Statist. Plann. Infer. 141, 508–523.

Kulik, R. and Wichelhaus C. (2011), ‘Nonparametric conditional variance and error density estimation in regrssion models with dependent errors and predictors’, Electr. J. Statist. 5, 856–898.

Laib, N. and Louani, D. (2010), ‘Nonparametric kernel regression estimation for stationary ergodic data: Asymptotic properties’, J. Multivar. Anal. 101, 2266–2281.

Liang, H.-Y. and Jing, B.-Y. (2005), ‘Asymptotic properties for estimates of nonparametric regression models based on negatively associated sequences’, J. Multivariate Anal. 95, 227–245.

Li, Y. and Hsing, T. (2010), ‘Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data’, Ann. Statist. 38, 3321–3351.

Li, X., Yang, W., and Hu, S. (2016), ‘Uniform convergence of estimator for nonparametric regression with dependent data’, J. Inequal. Appl., 142.

Lin, Z. and Wang, J.-L. (2022), ‘Mean and covariance estimation for functional snippets’, J. Amer. Statist. Assoc. 117, 348–360.

Linton, O. and Wang, Q. (2016), ‘Nonparametric transformation regression with nonstationary data’, Econometric Theory 32, 1–29.

Linke, Yu.Yu. and Borisov, I.S. (2017), ‘Constructing initial estimators in one-step estimation procedures of nonlinear regression’, Stat. Probab. Lett. 120, 87–94.

Linke, Yu.Yu. and Borisov, I.S. (2018), ‘Constructing explicit estimators in nonlinear regression problems’, Theory Probab. Appl. 63, 22–44.

Linke, Yu.Yu. (2019), ‘Asymptotic properties of one-step M-estimators’, Commun. Stat. Theory Methods 48, 4096–4118.

Linke, Yu.Yu. (2023), ‘Towards insensitivity of Nadaraya–Watson estimators to design correlation’, Theory Probab. Appl. 68 (to appear).

Linke, Yu.Yu. and Borisov, I.S. (2022), ‘Insensitivity of Nadaraya–Watson estimators to design correlation’, Commun. Stat. Theory Methods 51, 6909–6918.

Linton, O. B. and Jacho-Chavez, D. T. (2010), ‘On internally corrected and symmetrized kernel estimators for nonparametric regression’, TEST 19, 166–186.

Loader, C. (1999), Local regression and likelihood, Springer.

Mack, Y.P. and Müller, H.-G. (1988), ‘Convolution type estimators for nonparametric regression’, Stat. Prob. Lett. 7, 229–239.

Masry, E. (2005), ‘Nonparametric regression estimation for dependent functional data’, Stoch. Proc. Their Appl. 115, 155–177.

Müller, H.-G. (1988), Nonparametric Regression Analysis of Longitudinal Data, New York: Springer.

Müller, H. G. and Prewitt, K. A. (1993), ‘Multiparameter bandwidth processes and adaptive surface smoothing’, J. Multivariate Anal. 47, 1–21.

Priestley, M.B. and Chao, M.T. (1972), ‘Non-Parametric Function Fitting’, J. Royal Statist. Soc., Series B, 34, 385–392.

Rio, E. (2009), ‘Moment Inequalities for Sums of Dependent Random Variables under Projective Conditions’, J. Theor. Probab. 22: 146–163.

Roussas, G.G. (1990), ‘Nonparametric regression estimation under mixing conditions’, Stach. Proc. Appl., 36, 107-116.

Roussas, G.G. (1991), ‘Kernel estimates under association: Strong uniform consistency’, Stat. Probab. Lett. 12, 393-403.

Roussas, G. G., Tran, L. T., and Ioannides, D. A. (1992), ‘Fixed design regression for time series: asymptotic normality’, J. Multivariate Anal. 40, 262–291.

Shen, J. and Xie, Y. (2013), ‘Strong consistency of the internal estimator of nonparametric regression with dependent data’, Stat. Probab. Lett. 83, 1915–1925.

Song, Q., Liu, R., Shao, Q., and Yang, L. (2014), ‘A simultaneous confidence band for dense longitudinal regression’, Commun. Stat. Theory Methods 43, 5195–5210.

Tang, X., Xi, M., Wu, Y., and Wang, X. (2018), ‘Asymptotic normality of a wavelet estimator for asymptotically negatively associated errors’, Stat. Probab. Lett. 140, 191–201.

Wand, M.P. and Jones, M.C. (1995), Kernel Smoothing, London: Chapman and Hall.

Wang, J.-L., Chiou, J.-M., and Müller, H.-G. (2016), ‘Review of functional data analysis’, Annu. Rev. Statist. 3, 257–295.

Wang, Q. and Chan, N. (2014), ‘Uniform convergence rates for a class of martingales with application in non-linear cointegrating regression’, Bernoulli 20, 207–230.

Wang, Q.Y. and Phillips, P.C.B. (2009a), ‘Asymptotic theory for local time density estimation and nonparametric cointegrating regression’, Econometric Theory 25, 710–738.

Wang, Q. and Phillips, P.C.B. (2009b), ‘Structural nonparametric cointegrating regression’, Econometrica 77, 1901–1948.

Wu, J.S. and Chu, C.K. (1994), ‘Nonparametric estimation of a regression function with dependent observations’, Stoch. Proc. Their Appl., 50, 149-160.

Wu, Y., Wang, X., and Balakrishnan, N. (2020), ‘On the consistency of the P–C estimator in a nonparametric regression model’, Stat. Papers 61, 899–915.

Yang, X. and Yang, S. (2016), ‘Strong consistency of non parametric kernel regression estimator for strong mixing samples’, Commun. Stat. Theory Methods 46, 10537–10548.

Yao, F. (2007), ‘Asymptotic distributions of nonparametric regression estimators for longitudinal or functional data’, J. Multivariate Anal. 98, 40–56.

Yao, F., Müller, H.-G., and Wang, J.-L. (2005), ‘Functional data analysis for sparse longitudinal data’, J. Amer. Statist. Assoc. 100, 577–590.

Young, D.S. (2017), Handbook of regression methods, Chapman and Hall.

Zhang, X. and Wang, J.-L. (2016), ‘From sparse to dense functional data and beyond’, Ann. Statist. 44, 2281–2321.

Zhang, J.-T. and Chen, J. (2007), ‘Statistical inferences for functional data’, Ann. Statist. 35, 1052–1079.

Zhang, X. and Wang, J.-L. (2018), ‘Optimal weighting schemes for longitudinal and functional data’, Stat. Prob. Lett. 138, 165–170.

Zhang, S., Miao, Y., Xu, X., and Gao, Q. (2018), ‘Limit behaviors of the estimator of nonparametric regression model based on martingale difference errors’, J. Korean Stat. Soc. 47, 537–547.

Zhang, S., Hou, T., and Qu, C. (2019), ‘Complete consistency for the estimator of nonparametric regression model based on martingale difference errors’, Commun. Stat. Theory Methods 50, 358–370.

Zhou, X. and Zhu, F. (2020), ‘Asymptotics for L1-wavelet method for nonparametric regression’, J. Inequal. Appl. 216.