This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Change-point Detection for Sparse and Dense Functional Data in General Dimensions

Carlos Misael Madrid Padilla1    Daren Wang2    Zifeng Zhao3    Yi Yu4
(1Department of Mathematics, University of Notre Dame
2Department of ACMS, University of Notre Dame
3Mendoza College of Business, University of Notre Dame
4Department of Statistics, University of Warwick
)
Abstract

We study the problem of change-point detection and localisation for functional data sequentially observed on a general dd-dimensional space, where we allow the functional curves to be either sparsely or densely sampled. Data of this form naturally arise in a wide range of applications such as biology, neuroscience, climatology and finance. To achieve such a task, we propose a kernel-based algorithm named functional seeded binary segmentation (FSBS). FSBS is computationally efficient, can handle discretely observed functional data, and is theoretically sound for heavy-tailed and temporally-dependent observations. Moreover, FSBS works for a general dd-dimensional domain, which is the first in the literature of change-point estimation for functional data. We show the consistency of FSBS for multiple change-point estimation and further provide a sharp localisation error rate, which reveals an interesting phase transition phenomenon depending on the number of functional curves observed and the sampling frequency for each curve. Extensive numerical experiments illustrate the effectiveness of FSBS and its advantage over existing methods in the literature under various settings. A real data application is further conducted, where FSBS localises change-points of sea surface temperature patterns in the south Pacific attributed to El Niño.

1 Introduction

Recent technological advancement has boosted the emergence of functional data in various application areas, including neuroscience (e.g. Dai et al., 2019; Petersen et al., 2019), finance (e.g. Fan et al., 2014), transportation (e.g. Chiou et al., 2014), climatology (e.g. Bonner et al., 2014; Fraiman et al., 2014) and others. We refer the readers to Wang et al. (2016) - a comprehensive review, for recent development of statistical research in functional data analysis.

In this paper, we study the problem of change-point detection and localisation for functional data, where the data are observed sequentially as a time series and the mean functions are piecewise stationary, with abrupt changes occurring at unknown time points. To be specific, denote 𝒟\mathcal{D} as a general dd-dimensional space that is homeomorphic to [0,1]d[0,1]^{d}, where d+d\in\mathbb{N}^{+} is considered as arbitrary but fixed. We assume that the observations {(xt,i,yt,i)}t=1,i=1T,n𝒟×\{(x_{t,i},y_{t,i})\}_{t=1,i=1}^{T,n}\subseteq\mathcal{D}\times\mathbb{R} are generated based on

yt,i=ft(xt,i)+ξt(xt,i)+δt,i, for t=1,,T and i=1,,n.\displaystyle y_{t,i}=f^{*}_{t}(x_{t,i})+\xi_{t}(x_{t,i})+\delta_{t,i},\text{ for }t=1,\ldots,T\text{ and }i=1,\ldots,n. (1)

In this model, {xt,i}t=1,i=1T,n𝒟\{x_{t,i}\}_{t=1,i=1}^{T,n}\subseteq\mathcal{D} denotes the discrete grids where the (noisy) functional data {yt,i}t=1,i=1T,n\{y_{t,i}\}_{t=1,i=1}^{T,n}\subseteq\mathbb{R} are observed, {ft:𝒟}t=1T\{f_{t}^{*}:\mathcal{D}\to\mathbb{R}\}_{t=1}^{T} denotes the deterministic mean functions, {ξt:𝒟}t=1T\{\xi_{t}:\mathcal{D}\to\mathbb{R}\}_{t=1}^{T} denotes the functional noise and {δt,i}t=1,i=1T,n\{\delta_{t,i}\}_{t=1,i=1}^{T,n}\subseteq\mathbb{R} denotes the measurement error. We refer to 1 below for detailed technical conditions on the model.

To model the unstationarity of sequentially observed functional data which commonly exists in real world applications, we assume that there exist KK\in\mathbb{N} change-points, namely 0=η0<η1<<ηK<ηK+1=T0=\eta_{0}<\eta_{1}<\cdots<\eta_{K}<\eta_{K+1}=T, satisfying that ftft+1f^{*}_{t}\neq f^{*}_{t+1}, if and only if t{ηk}k=1Kt\in\{\eta_{k}\}_{k=1}^{K}. Our primary interest is to accurately estimate {ηk}k=1K\{\eta_{k}\}_{k=1}^{K}.

Due to the importance of modelling unstationary functional data in various scientific fields, this problem has received extensive attention in the statistical change-point literature, see e.g. Aue et al. (2009), Berkes et al. (2009), Hörmann and Kokoszka (2010), Zhang et al. (2011), Aue et al. (2018) and Dette et al. (2020). Despite the popularity, we identify a few limitations in the existing works. Firstly, both the methodological validity and theoretical guarantees of all these papers require fully observed functional data without measurement error, which may not be realistic in practice. Secondly, most existing works focus on the single change-point setting and to our best knowledge, there is no consistency result of multiple change-point estimation for functional data. Lastly but most importantly, existing algorithms only consider functional data with support on [0,1][0,1] and thus are not applicable to functional data with multi-dimensional domain, a type of data frequently encountered in neuroscience and climatology.

In view of the aforementioned three limitations, in this paper, we make several theoretical and methodological contributions, summarized below.

\bullet In terms of methodology, our proposed kernel-based change-point detection algorithm, functional seeded binary segmentation (FSBS), is computationally efficient, can handle discretely observed functional data contaminated with measurement error, and allows for temporally-dependent and heavy-tailed data. FSBS, in particular, works for a general dd-dimensional domain with arbitrary but fixed d+d\in\mathbb{N}^{+}. This level of generality is the first time seen in the literature.

\bullet In terms of theory, we show that under standard regularity conditions, FSBS is consistent in detecting and localising multiple change-points. We also provide a sharp localisation error rate, which reveals an interesting phase transition phenomenon depending on the number of functional curves observed TT and the sampling frequency for each curve nn. To the best of our knowledge, the theoretical results we provide in this paper are the sharpest in the existing literature.

\bullet A striking case we handle in this paper is that each curve is only sampled at one point, i.e. n=1n=1. To the best of our knowledge, all the existing functional data change-point analysis papers assume full curves are observed. We not only allow for discrete observation, but carefully study this most extreme sparse case n=1n=1 and provide consistent localisation of the change-points.

\bullet We conduct extensive numerical experiments on simulated and real data. The result further supports our theoretical findings, showcases the advantages of FSBS over existing methods and illustrates the practicality of FSBS.

\bullet A byproduct of our theoretical analysis is new theoretical results on kernel estimation for functional data under temporal dependence and heavy-tailedness. This set of new results per se are novel, enlarging the toolboxes of functional data analysis.

Notation and definition. For any function f:[0,1]df:\,[0,1]^{d}\to\mathbb{R} and for 1p<1\leq p<\infty, define fp=([0,1]d|f(x)|pdx)1/p\|f\|_{p}=(\int_{[0,1]^{d}}|f(x)|^{p}\,\mathrm{d}x)^{1/p} and for p=p=\infty, define f=supx[0,1]d|f(x)|\|f\|_{\infty}=\sup_{x\in{[0,1]^{d}}}|f(x)|. Define p={f:[0,1]d,fp<}\mathcal{L}_{p}=\{f:\,[0,1]^{d}\to\mathbb{R},\,\|f\|_{p}<\infty\}. For any vector s=(s1,,sd)ds=(s_{1},\ldots,s_{d})^{\top}\in{\mathbb{N}^{d}}, define |s|=i=1dsi|s|=\sum_{i=1}^{d}s_{i}, s!=s1!sd!s!=s_{1}!\cdots s_{d}! and the associated partial differential operator Ds=|s|x1s1xdsdD^{s}=\frac{\partial^{|s|}}{\partial x_{1}^{s_{1}}\cdots\partial x_{d}^{s_{d}}}. For α>0\alpha>0, denote α\lfloor\alpha\rfloor to be the largest integer smaller than α\alpha. For any function f:[0,1]df:\,[0,1]^{d}\to\mathbb{R} that is α\lfloor\alpha\rfloor-times continuously differentiable at point x0x_{0}, denote by fx0αf_{x_{0}}^{\alpha} its Taylor polynomial of degree α\lfloor\alpha\rfloor at x0x_{0}, which is defined as fx0α(x)=|s|α(xx0)ss!Dsf(x0).f_{x_{0}}^{\alpha}(x)=\sum_{|s|\leq\lfloor\alpha\rfloor}\frac{(x-x_{0})^{s}}{s!}D^{s}f(x_{0}). For a constant L>0L>0, let α(L)\mathcal{H}^{\alpha}(L) be the set of functions f:[0,1]df:\,[0,1]^{d}\to\mathbb{R} such that ff is α\lfloor\alpha\rfloor-times differentiable for all x[0,1]dx\in[0,1]^{d} and satisfy |f(x)fx0α(x)|L|xx0|α|f(x)-f_{x_{0}}^{\alpha}(x)|\leq L|x-x_{0}|^{\alpha}, for all x,x0[0,1]dx,x_{0}\in[0,1]^{d}. Here |xx0||x-x_{0}| is the Euclidean distance between x,x0dx,x_{0}\in\mathbb{R}^{d}. In non-parametric statistical literature, α(L)\mathcal{H}^{\alpha}(L) are often referred to as the class of Hölder smooth functions. We refer the interested readers to Rigollet and Vert (2009) for more detailed discussion on Hölder smooth functions.

For two positive sequences {an}n+\{a_{n}\}_{n\in\mathbb{N}^{+}} and {bn}n+\{b_{n}\}_{n\in\mathbb{N}^{+}}, we write an=O(bn)a_{n}=O(b_{n}) or anbna_{n}\lesssim b_{n} if anCbna_{n}\leq Cb_{n} with some constant C>0C>0 that does not depend on nn, and an=Θ(bn)a_{n}=\Theta(b_{n}) or anbna_{n}\asymp b_{n} if an=O(bn)a_{n}=O(b_{n}) and bn=O(an)b_{n}=O(a_{n}).

2 Functional seeded binary segmentation

2.1 Problem formulation

Detailed model assumptions imposed on model (1) are collected in 1. For notational simplicity, without loss of generality, we set the general dd-dimensional domain 𝒟\mathcal{D} to be [0,1]d[0,1]^{d}, as the results apply to any 𝒟\mathcal{D} that is homeomorphic to [0,1]d[0,1]^{d}.

Assumption 1.

The data {(xt,i,yt,i)}t=1,i=1T,n[0,1]d×\{(x_{t,i},y_{t,i})\}_{t=1,i=1}^{T,n}\subseteq[0,1]^{d}\times\mathbb{R} are generated based on model (1).

a. (Discrete grids) The grids {xt,i}t=1,i=1T,n[0,1]d\{x_{t,i}\}_{t=1,i=1}^{T,n}\subseteq[0,1]^{d} are independently sampled from a common density function u:[0,1]du:\,[0,1]^{d}\to\mathbb{R}. In addition, there exist constants r>0r>0 and L>0L>0 such that ur(L)u\in\mathcal{H}^{r}(L) and that infx[0,1]du(x)c~\inf_{x\in[0,1]^{d}}u(x)\geq\tilde{c} with an absolute constant c~>0\tilde{c}>0.

b. (Mean functions) For r>0r>0 and L>0L>0, we have ftr(L)f^{*}_{t}\in\mathcal{H}^{r}(L). The minimal spacing between two consecutive change-points Δ=mink=1K+1(ηkηk1)\Delta=\min_{k=1}^{K+1}(\eta_{k}-\eta_{k-1}) satisfies that Δ=Θ(T)\Delta=\Theta(T).

c. (Functional noise) Let {εi,ε0}i\{\varepsilon_{i},\varepsilon^{\prime}_{0}\}_{i\in\mathbb{Z}} be i.i.d. random elements taking values in a measurable space SξS_{\xi} and gg be a measurable function g:Sξ2g:\,S_{\xi}^{\infty}\to\mathcal{L}_{2}. The functional noise {ξt}t=1T2\{\xi_{t}\}_{t=1}^{T}\subseteq{\mathcal{L}^{2}} takes the form

ξt=g(𝒢t),with 𝒢t=(,ε1,ε0,ε1,,εt1,εt).\xi_{t}=g(\mathcal{G}_{t}),\quad\mbox{with }\mathcal{G}_{t}=(\ldots,\varepsilon_{-1},\varepsilon_{0},\varepsilon_{1},\ldots,\varepsilon_{t-1},\varepsilon_{t}).

There exists an absolute constant q3q\geq 3, such that 𝔼(ξtq)<Cξ,1\mathbb{E}(\|\xi_{t}\|_{\infty}^{q})<C_{\xi,1} for some absolute constant Cξ,1C_{\xi,1}. Define a coupled process

ξt=g(𝒢t),with 𝒢t=(,ε1,ε0,ε1,,εt1,εt).\xi^{*}_{t}=g(\mathcal{G}^{*}_{t}),\quad\mbox{with }\mathcal{G}^{*}_{t}=(\ldots,\varepsilon_{-1},\varepsilon^{\prime}_{0},\varepsilon_{1},\ldots,\varepsilon_{t-1},\varepsilon_{t}).

We have t=1t1/21/q{𝔼ξtξtq}1/q<Cξ,2\sum_{t=1}^{\infty}t^{1/2-1/q}\{\mathbb{E}\|\xi_{t}-\xi_{t}^{*}\|_{\infty}^{q}\}^{1/q}<C_{\xi,2} for some absolute constant Cξ,2>0C_{\xi,2}>0.

d. (Measurement error) Let {ϵi,ϵ0}i\{\epsilon_{i},\epsilon^{\prime}_{0}\}_{i\in\mathbb{Z}} be i.i.d. random elements taking values in a measurable space SδS_{\delta} and g~n\tilde{g}_{n} be a measurable function g~n:Sδn\tilde{g}_{n}:\,S_{\delta}^{\infty}\to\mathbb{R}^{n}. The measurement error {δt}t=1Tn\{\delta_{t}\}_{t=1}^{T}\subseteq\mathbb{R}^{n} takes the form

δt=g~n(t),with t=(,ϵ1,ϵ0,ϵ1,,ϵt1,ϵt).\delta_{t}=\tilde{g}_{n}(\mathcal{F}_{t}),\quad\mbox{with }\mathcal{F}_{t}=(\ldots,\epsilon_{-1},\epsilon_{0},\epsilon_{1},\ldots,\epsilon_{t-1},\epsilon_{t}).

There exists an absolute constant q3q\geq 3, such that maxi=1n𝔼(|δt,i|q)<Cδ,1\max_{i=1}^{n}\mathbb{E}(|\delta_{t,i}|^{q})<C_{\delta,1} for some absolute constant Cδ,1C_{\delta,1}. Define a coupled process

δt=g~n(t),with t=(,ϵ1,ϵ0,ϵ1,,ϵt1,ϵt).\delta_{t}^{*}=\tilde{g}_{n}(\mathcal{F}_{t}^{*}),\quad\mbox{with }\mathcal{F}_{t}^{*}=(\ldots,\epsilon_{-1},\epsilon_{0}^{\prime},\epsilon_{1},\ldots,\epsilon_{t-1},\epsilon_{t}).

We have maxi=1nt=1t1/21/q{𝔼|δt,iδt,i|q}1/q<Cδ,2\max_{i=1}^{n}\sum_{t=1}^{\infty}t^{1/2-1/q}\{\mathbb{E}|\delta_{t,i}-\delta_{t,i}^{*}|^{q}\}^{1/q}<C_{\delta,2} for some absolute constant Cδ,2>0C_{\delta,2}>0.

1a allows the functional data to be observed on discrete grids and moreover, we allow for different grids at different time points. The sampling distribution μ\mu is required to be lower bounded on the support [0,1]d[0,1]^{d}, which is a standard assumption widely used in the nonparametric literature (e.g. Tsybakov, 2009). Here, different functional curves are assumed to have the same number of grid points nn. We remark that this is made for presentation simplicity only. It can indeed be further relaxed and the main results below will then depend on both the minimum and maximum numbers of grid points.

Note that 1a does not impose any restriction between the sampling frequency nn and the number of functional curves TT, and indeed our method can handle both the dense case where nTn\gg T and the sparse case where nn can be upper bounded by a constant. Besides the random sampling scheme studied here, another commonly studied scenario is the fixed design, where it usually assumes that the sampling locations {xi}i=1n\{x_{i}\}_{i=1}^{n} are common to all functional curves across time. We remark that while we focus on the random design here, our proposed algorithm can be directly applied to the fixed design case without any modification. Furthermore, its theoretical justification under the fixed design case can be established similarly with minor modifications, which is omitted.

The observed functional data have mean functions {ft}t=1T\{f^{*}_{t}\}_{t=1}^{T}, which are assumed to be Hölder continuous in 1b. Note that the Hölder parameters in 1a and b are both denoted by rr. We remark that different smoothness are allowed and we use the same rr here for notational simplicity. This sequence of mean functions is our primary interest and is assumed to possess a piecewise constant pattern, with the minimal spacing Δ\Delta being of the same order as TT. This assumption essentially requires that the number of change-points is upper bounded. It can also be further relaxed and we will have more elaborated discussions on this matter in Section 5.

Our model allows for two sources of noise - functional noise and measurement error, which are detailed in 1c and d, respectively. Both the functional noise and the measurement error are allowed to possess temporal dependence and heavy-tailedness. For temporal dependence, we adopt the physical dependence framework by Wu (2005), which covers a wide range of time series models, such as ARMA and vector AR models. It further covers popular functional time series models such as functional AR and MA models (Hörmann and Kokoszka, 2010). We also remark that 1c and d impose a short range dependence, which is characterized by the absolute upper bounds Cξ,2C_{\xi,2} and Cδ,2C_{\delta,2}. Further relaxation is possible by allowing the upper bounds Cξ,2C_{\xi,2} and Cδ,2C_{\delta,2} to vary with the sample size TT.

The heavy-tail behavior is encoded in the parameter qq. In 1c and d, we adopt the same quantity qq for presentational simplicity and remark that different heavy-tailedness levels are allowed. An extreme example is that when q=q=\infty, the noise is essentially sub-Gaussian. Importantly, 1d does not impose any restriction on the cross-sectional dependence among measurement errors observed on the same time tt, which can be even perfectly correlated.

2.2 Kernel-based change-point detection

To estimate the change-point {ηk}k=1K\{\eta_{k}\}_{k=1}^{K} in the mean functions {ft}t=1T\{f^{*}_{t}\}_{t=1}^{T}, we propose a kernel-based cumulative sum (CUSUM) statistic, which is simple, intuitive and computationally efficient. The key idea is to recover the unobserved {ft}t=1T\{f^{*}_{t}\}_{t=1}^{T} from the observations {(xt,i,yt,i)}t=1,i=1T,n\{(x_{t,i},y_{t,i})\}_{t=1,i=1}^{T,n} based on kernel estimation.

Given a kernel function K():d+K(\cdot):\mathbb{R}^{d}\to\mathbb{R}^{+} and a bandwidth parameter h>0h>0, we define Kh(x)=hdK(x/h)K_{h}(x)=h^{-d}K(x/h) for xd.x\in\mathbb{R}^{d}. Given the random grids {xt,i}t=1,i=1T,n\{x_{t,i}\}_{t=1,i=1}^{T,n} and a bandwidth parameter h¯\bar{h}, we define the density estimator of the sampling distribution u(x)u(x) as

p^(x)=p^h¯(x)=1nTt=1Ti=1nKh¯(xxt,i),x[0,1]d.\hat{p}(x)=\hat{p}_{\bar{h}}(x)=\frac{1}{nT}\sum_{t=1}^{T}\sum_{i=1}^{n}K_{\bar{h}}(x-x_{t,i}),\quad x\in[0,1]^{d}.

Given p^(x)\hat{p}(x) and a bandwidth parameter h>0h>0, for any time t=1,2,,Tt=1,2,\cdots,T, we define the kernel-based estimation for ft(x)f_{t}^{*}(x) as

Ft,h(x)=i=1nyt,iKh(xxt,i)np^(x),x[0,1]d.F_{t,h}(x)=\frac{\sum_{i=1}^{n}y_{t,i}K_{h}(x-x_{t,i})}{n\hat{p}(x)},\quad x\in[0,1]^{d}. (2)

Based on the kernel estimation Ft,h(x)F_{t,h}(x), for any integer pair 0s<eT0\leq s<e\leq T, we define the CUSUM statistic as

F~t,h(s,e](x)=et(es)(ts)l=s+1tFl,h(x)ts(es)(et)l=t+1eFl,h(x),x[0,1]d.\widetilde{F}_{t,h}^{(s,e]}(x)=\sqrt{\frac{e-t}{(e-s)(t-s)}}\sum_{l=s+1}^{t}F_{l,h}(x)-\sqrt{\frac{t-s}{(e-s)(e-t)}}\sum_{l=t+1}^{e}F_{l,h}(x),\quad x\in[0,1]^{d}. (3)

The CUSUM statistic defined in (3) is the cornerstone of our algorithm and is based on two kernel estimators p^()\hat{p}(\cdot) and Ft,h()F_{t,h}(\cdot). At a high level, the CUSUM statistic F~t,h(s,e]()\widetilde{F}_{t,h}^{(s,e]}(\cdot) estimates the difference in mean between the functional data in the time intervals (s,t](s,t] and (t,e](t,e]. In the functional data analysis literature, other popular approaches for mean function estimation are reproducing kernel Hilbert space based methods and local polynomial regression. However, to our best knowledge, existing works based on the two approaches typically require that the functional data are temporally independent and it is not obvious how to extend their theoretical guarantees to the temporal dependence case. We therefore choose the kernel estimation method owing to its flexibility in terms of both methodology and theory and we derive new theoretical results on kernel estimation for functional data under temporal dependence and heavy-tailedness.

For multiple change-point estimation, a key ingredient is to isolate each single change-point with well-designed intervals in [0,T][0,T]. To achieve this, we combine the CUSUM statistic in (3) with a modified version of the seeded binary segmentation (SBS) proposed in Kovács et al. (2020). SBS is based on a collection of deterministic intervals defined in Definition 1.

Definition 1 (Seeded intervals).

Let 𝒦=C𝒦loglog(T)\mathcal{K}=\lceil C_{\mathcal{K}}\operatorname*{log\,log}(T)\rceil, with some sufficiently large absolute constant C𝒦>0C_{\mathcal{K}}>0. For k{1,,𝒦}k\in\{1,\ldots,\mathcal{K}\}, let 𝒥k\mathcal{J}_{k} be the collection of 2k12^{k}-1 intervals of length lk=T2k+1l_{k}=T2^{-k+1} that are evenly shifted by lk/2=T2kl_{k}/2=T2^{-k}, i.e.

𝒥k={((i1)T2k,(i1)T2k+T2k+1],i=1,,2k1}.\mathcal{J}_{k}=\{(\lfloor(i-1)T2^{-k}\rfloor,\,\lceil(i-1)T2^{-k}+T2^{-k+1}\rceil],\quad i=1,\ldots,2^{k}-1\}.

The overall collection of seeded intervals is denoted as 𝒥=k=1𝒦𝒥k\mathcal{J}=\cup_{k=1}^{\mathcal{K}}\mathcal{J}_{k}.

The essential idea of the seeded intervals defined in Definition 1 is to provide a multi-scale system of searching regions for multiple change-points. SBS is computationally efficient with a computational cost of the order O{Tlog(T)}O\{T\log(T)\} (Kovács et al., 2020).

Based on the CUSUM statistic and seeded intervals, Algorithm 1 summarises the proposed functional seeded binary segmentation algorithm (FSBS) for multiple change-point estimation in sequentially observed functional data. There are three main tuning parameters involved in Algorithm 1, the kernel bandwidth h¯\bar{h} in the estimation of the sampling distribution, the kernel bandwidth hh in the estimation of the mean function and the threshold parameter τ\tau for declaring change-points. Their theoretical and numerical guidance will be presented in Sections 3.1 and 4, respectively.

Data {xt,i,yt,i}t=1,i=1T,n\{x_{t,i},y_{t,i}\}_{t=1,i=1}^{T,n}, seeded intervals 𝒥\mathcal{J}, tuning parameters h¯,h,τ>0\bar{h},h,\tau>0.
Initialization: If (s,e]=(0,n](s,e]=(0,n], set S\textbf{S}\leftarrow\varnothing and set ρlog(T)n1hd\rho\leftarrow\log(T)n^{-1}h^{-d}. Furthermore, sample log(T)\lceil\log(T)\rceil points from {xt,i}t=1,i=1T,n\{x_{t,i}\}_{t=1,i=1}^{T,n} uniformly at random without replacement and denote them as {um}m=1log(T)\{u_{m}\}_{m=1}^{\lceil\log(T)\rceil}. Estimate the sampling distribution evaluated at {p^h¯(um)}m=1log(T)\{\hat{p}_{\bar{h}}(u_{m})\}_{m=1}^{\lceil\log(T)\rceil}.
for =(α,β]𝒥\mathcal{I}=(\alpha,\beta]\in\mathcal{J} and m{1,,log(T)m\in\{1,\ldots,\lceil\log(T)\rceildo
     if =(α,β](s,e]\mathcal{I}=(\alpha,\beta]\subseteq(s,e] and βα>2ρ\beta-\alpha>2\rho then
         Ammaxα+ρtβρ|F~t,h(α,β](um)|A_{m}^{\mathcal{I}}\leftarrow\max_{\alpha+\rho\leq t\leq\beta-\rho}|\widetilde{F}_{t,h}^{(\alpha,\beta]}(u_{m})|
         Dmargmaxα+ρtβρ|F~t,h(α,β](um)|D_{m}^{\mathcal{I}}\leftarrow\operatorname*{arg\,max}_{\alpha+\rho\leq t\leq\beta-\rho}|\widetilde{F}_{t,h}^{(\alpha,\beta]}(u_{m})|
     else
         (Am,Dm)(1,0)(A_{m}^{\mathcal{I}},D_{m}^{\mathcal{I}})\leftarrow(-1,0)
     end if
end for
(m,)argmaxm=1,,log(T),𝒥Am(m^{*},\mathcal{I}^{*})\leftarrow\operatorname*{arg\,max}_{m=1,\ldots,\lceil\log(T)\rceil,\mathcal{I}\in\mathcal{J}}A^{\mathcal{I}}_{m}.
if Am>τA_{m^{*}}^{\mathcal{I}^{*}}>\tau then
     𝐒𝐒Dm{\bf S}\leftarrow{\bf S}\cup D_{m^{*}}^{\mathcal{I}^{*}}
     FSBS ((s,Dm],h¯,h,τ)((s,D_{m^{*}}^{\mathcal{I}^{*}}],\bar{h},h,\tau)
     FSBS ((Dm,e],h¯,h,τ)((D_{m^{*}}^{\mathcal{I}^{*}},e],\bar{h},h,\tau)
end if
The set of estimated change-points S.
Algorithm 1 Functional Seeded Binary Segmentation. FSBS ((s,e],h¯,h,τ)((s,e],\bar{h},h,\tau)

Algorithm 1 is conducted in an iterative way, starting with the whole time course, using the multi-scale seeded intervals to search for the point according to the largest CUSUM value. A change-point is declared if the corresponding maximum CUSUM value exceeds a pre-specified threshold τ\tau and the whole sequence is then split into two with the procedure being carried on in the sub-intervals.

Algorithm 1 utilizes a collection of random grid points {um}m=1log(T){xt,i}t=1,i=1T,n\{u_{m}\}_{m=1}^{\lceil\log(T)\rceil}\subseteq\{x_{t,i}\}_{t=1,i=1}^{T,n} to detect changes in the functional data. For a change of mean functions at the time point η\eta with fη+1fη>0\|f^{*}_{\eta+1}-f^{*}_{\eta}\|_{\infty}>0, we show in the appendix that, as long as log(T)\lceil\log(T)\rceil grid points are sampled, with high probability, there is at least one point um{um}m=1log(T)u_{m^{\prime}}\in\{u_{m}\}_{m=1}^{\lceil\log(T)\rceil} such that |fη+1(um)fη(um)|fη+1fη.|f^{*}_{\eta+1}(u_{m^{\prime}})-f^{*}_{\eta}(u_{m^{\prime}})|\asymp\|f^{*}_{\eta+1}-f^{*}_{\eta}\|_{\infty}. Thus, this procedure allows FSBS to detect changes in the mean functions without evaluating functions on a dense lattice grid and thus improves computational efficiency.

3 Main Results

3.1 Assumptions and theory

We begin by imposing assumptions on the kernel function K()K(\cdot) used in FSBS.

Assumption 2 (Kernel function).

Let K():d+K(\cdot):\mathbb{R}^{d}\to\mathbb{R}^{+} be compactly supported and satisfy the following conditions.

a. The kernel function K()K(\cdot) is adaptive to the Hölder class r(L){\mathcal{H}^{r}}(L), i.e. for any fr(L)f\in{\mathcal{H}^{r}}(L), it holds that supx[0,1]d|[0,1]dKh(xz)f(z)dzf(x)|C~hr,\sup_{x\in[0,1]^{d}}\big{|}\int_{[0,1]^{d}}K_{h}\left({x-z}\right)f(z)\,\mathrm{d}z-f(x)\big{|}\leq\tilde{C}h^{r}, where C~>0\tilde{C}>0 is a constant that only depends on LL.

b. The class of functions K={K(x)/h:d+,h>0}\mathcal{F}_{K}=\{K(x-\cdot)/h:\,\mathbb{R}^{d}\to\mathbb{R}^{+},h>0\} is separable in (d)\mathcal{L}_{\infty}(\mathbb{R}^{d}) and is a uniformly bounded VC-class. This means that there exist constants A,ν>0A,\nu>0 such that for every probability measure QQ on d\mathbb{R}^{d} and every u(0,K)u\in(0,\|K\|_{\infty}), it holds that 𝒩(K,2(Q),u)(AK/u)v\mathcal{N}(\mathcal{F}_{K},\mathcal{L}_{2}(Q),u)\leq\left({A\|K\|_{\infty}}/{u}\right)^{v}, where 𝒩(K,2(Q),u)\mathcal{N}(\mathcal{F}_{K},\mathcal{L}_{2}(Q),u) denotes the uu-covering number of the metric space (K,2(Q))(\mathcal{F}_{K},\mathcal{L}_{2}(Q)).

2 is a standard assumption in the nonparametric literature, see Giné and Guillou (1999, 2001), Kim et al. (2019), Wang et al. (2019) among many others. These assumptions hold for most commonly used kernels, including uniform, polynomial and Gaussian kernels.

Recall the minimal spacing Δ=mink=1K+1(ηkηk1)\Delta=\min_{k=1}^{K+1}(\eta_{k}-\eta_{k-1}) defined in 1b. We further define the jump size at the kkth change-point as κk=fηk+1fηk\kappa_{k}=\|f^{*}_{\eta_{k}+1}-f^{*}_{\eta_{k}}\|_{\infty} and define κ=mink=1K\kappa=\min_{k=1}^{K} as the minimal jump size. 3 below details how strong the signal needs to be in terms of κ\kappa and Δ\Delta, given the grid size nn, the number of functional curves TT, smoothness parameter rr, dimensionality dd and moment condition qq.

Assumption 3 (Signal-to-noise ratio, SNR).

There exists an arbitrarily-slow diverging sequence CSNR=CSNR(T)C_{\mathrm{SNR}}=C_{\mathrm{SNR}}(T) such that

κΔ>CSNRlogmax{1/2,5/q}(T)(1+Td2r+dn2r2r+d)1/2.\kappa\sqrt{\Delta}>C_{\mathrm{SNR}}\log^{\max\{1/2,5/q\}}(T)\Big{(}1+T^{\frac{d}{2r+d}}n^{\frac{-2r}{2r+d}}\Big{)}^{1/2}.

We are now ready to present the main theorem, showing the consistency of FSBS.

Theorem 1.

Under Assumptions 1, 2 and 3, let {η^k}k=1K^\{\widehat{\eta}_{k}\}_{k=1}^{\widehat{K}} be the estimated change-points by FSBS detailed in Algorithm 1 with data {xt,i,yt,i}t=1,i=1T,n\{x_{t,i},y_{t,i}\}_{t=1,i=1}^{T,n}, bandwidth parameters h¯=Ch¯(Tn)12r+d\bar{h}=C_{\bar{h}}(Tn)^{-\frac{1}{2r+d}}, h=Ch(Tn)12r+dh=C_{h}(Tn)^{\frac{-1}{2r+d}} and threshold parameter τ=Cτlogmax{1/2,5/q}(T)(1+Td2r+dn2r2r+d)1/2\tau=C_{\tau}\log^{\max\{1/2,5/q\}}(T)\left(1+T^{\frac{d}{2r+d}}n^{\frac{-2r}{2r+d}}\right)^{1/2}, for some absolute constants Ch¯,Ch,Cτ>0C_{\bar{h}},C_{h},C_{\tau}>0. It holds that

{K^=K;|η^kηk|CFSBSlogmax{1,10/q}(T)(1+Td2r+dn2r2r+d)κk2,k=1,,K}13log1(T),\displaystyle\mathbb{P}\left\{\widehat{K}=K;\,|\widehat{\eta}_{k}-\eta_{k}|\leq C_{\mathrm{FSBS}}\log^{\max\{1,10/q\}}(T)\left(1+T^{\frac{d}{2r+d}}n^{\frac{2r}{2r+d}}\right)\kappa_{k}^{-2},\,\forall k=1,\ldots,K\right\}\geq 1-3\log^{-1}(T),

where CFSBS>0C_{\mathrm{FSBS}}>0 is an absolute constant.

In view of 3 and Theorem 1, we see that with properly chosen tuning parameters and with probability tending to one as the sample size TT grows, the output of FSBS estimates the correct number of change-points and

maxk=1K|η^kηk|/Δ(1+Td2r+dn2r2r+d)logmax{1,10/q}(T)/(κ2Δ)=o(1),\max_{k=1}^{K}{|\widehat{\eta}_{k}-\eta_{k}|}/{\Delta}\lesssim{\left(1+T^{\frac{d}{2r+d}}n^{\frac{-2r}{2r+d}}\right)\log^{\max\{1,10/q\}}(T)}/{(\kappa^{2}\Delta)}=o(1),

where the last inequality follows from 3. The above inequality shows that there exists a one-to-one mapping from {η^k}k=1K\{\widehat{\eta}_{k}\}_{k=1}^{K} to {ηk}k=1K\{\eta_{k}\}_{k=1}^{K}, assigning by the smallest distance.

3.2 Discussions on functional seeded binary segmentation (FSBS)

From sparse to dense regimes. In our setup, each curve is only observed at nn discrete points and we allow the full range of choices of nn, representing from sparse to dense scenarios, all accompanied with consistency results. In the most sparse case n=1n=1, 3 reads as κΔTd/(4r+2d)×\kappa\sqrt{\Delta}\gtrsim T^{d/(4r+2d)}\times a logarithmic factor, under which the localisation error is upper bounded by Td/(2r+d)κ2T^{d/(2r+d)}\kappa^{-2}, up to a logarithmic factor. To the best of our knowledge, this challenging case has not been dealt in the existing change-point detection literature for functional data. In the most dense case, we can heuristically let n=n=\infty and for simplicity let q=q=\infty representing the sub-Gaussian noise case. 3 reads as κΔlog1/2(T)\kappa\sqrt{\Delta}\asymp\log^{1/2}(T) and the localisation error is upper bounded by κ2log(T)\kappa^{-2}\log(T). Both the SNR ratio and localisation error are the optimal rate in the univariate mean change-point localisation problem (Wang et al., 2020), implying the optimality of FSBS in the dense situation.

Tuning parameters. There are three tuning parameters involved. In the CUSUM statistic (3), we specify that the density estimator of the sampling distribution is a kernel estimator with bandwidth h¯(Tn)1/(2r+d)\bar{h}\asymp(Tn)^{-1/(2r+d)}. Due to the independence of the observation grids, such a choice of the bandwidth follows from the classical nonparametric literature (e.g. Tsybakov, 2009) and is minimax-rate optimal in terms of the estimation error. For completeness, we include the study of p^()\hat{p}(\cdot)’s theoretical properties in Appendix B. In practice, there exist different default methods for the selection of h¯\bar{h} , see for example the function Hpi from the R package ks (Chacón and Duong (2018)).

The other bandwidth tuning parameter hh is also required to be h(Tn)1/(2r+d)h\asymp(Tn)^{-1/(2r+d)}. Despite that we allow for physical dependence in both functional noise and measurement error, we show that the same order of bandwidth (as h¯\bar{h}) is required under 1. This is an interesting finding, if not surprising. This particular choice of hh is due to the fact that the physical dependence put forward by Wu (2005) is a short range dependence condition and does not change the rate of the sample size.

The threshold tuning parameter τ\tau is set to be a high-probability upper bound on the CUSUM statistics when there is no change-point and is in fact of the form

τ=Cτlogmax{1/2,5/q}(T)n1hd+1.\tau=C_{\tau}\log^{\max\{1/2,5/q\}}(T)\sqrt{n^{-1}h^{-d}+1}.

This also reflects the requirement on the SNR detailed in 3, that κΔτ\kappa\sqrt{\Delta}\gtrsim\tau.

Phase transition. Recall that the number of curves is TT and the number of observations on each curve is nn. The asymptotic regime we discuss is to let TT diverge, while allowing all other parameters, including nn, to be functions of TT. In Theorem 1, we allow a full range of cases in terms of the relationship between nn and TT. As a concrete example, when the smooth parameter r=2r=2, the jump size κ1\kappa\asymp 1 and in the one-dimensional case d=1d=1, with high probability (ignoring logarithmic factors for simplicity),

maxk=1K|η^kηk|=Op(T15n45+1)={Op(1),nT1/4;Op(T15n45),nT1/4.\max_{k=1}^{K}|\widehat{\eta}_{k}-\eta_{k}|=O_{p}(T^{\frac{1}{5}}n^{-\frac{4}{5}}+1)=\begin{cases}O_{p}(1),&n\geq T^{1/4};\\ O_{p}(T^{\frac{1}{5}}n^{-\frac{4}{5}}),&n\leq T^{1/4}.\end{cases}

This relationship between nn and TT was previously demonstrated in the mean function estimation literature (e.g. Cai and Yuan, 2011; Zhang and Wang, 2016), where the observations are discretely sampled from independently and identically distributed functional data. It is shown that the minimax estimation error rate also possesses the same phase transition between nn and TT, i.e. with the transition boundary nT1/4n\asymp T^{1/4}, which agrees with our finding under the change-point setting.

Physical dependence and heavy-tailedness In 1c and d, we allow for physical dependence type temporal dependence and heavy-tailed additive noise. As we have discussed, since the physical dependence is in fact a short range dependence, all the rates involved are the same as those in the independence cases, up to logarithmic factors. Having said this, the technical details required in dealing with this short range dependence are fundamentally different from those in the independence cases. From the result, it might be more interesting to discuss the effect of the heavy-tail behaviours, which are characterised by the parameter qq. It can be seen from the rates in 3 and Theorem 1 that the effect of qq disappears and it behaves the same as if the noise is sub-Gaussian when q10q\geq 10.

4 Numerical Experiments

4.1 Simulated data analysis

We compare the proposed FSBS with state-of-the-art methods for change-point detection in functional data across a wide range of simulation settings. The implementations for our approaches can be found at https://github.com/cmadridp/FSBS. We compare with three competitors: BGHK in Berkes et al. (2009), HK in Hörmann and Kokoszka (2010) and SN in Zhang et al. (2011). All three methods estimate change-points via examining mean change in the leading functional principal components of the observed functional data. BGHK is designed for temporally independent data while HK and SN can handle temporal dependence via the estimation of long-run variance and the use of self-normalization principle, respectively. All three methods require fully observed functional data. In practice, they convert discrete data to functional observations by using B-splines with 20 basis functions.

For the implementation of FSBS, we adopt the Gaussian kernel. Following the standard practice in kernel density estimation, the bandwidth h¯\bar{h} is selected by the function Hpi in the R package ks (Chacón and Duong (2018)). The tuning parameter τ\tau and the bandwidth hh are chosen by cross-validation, with evenly-indexed data being the training set and oddly-indexed data being the validation set. For each pair of candidate (h,τ)(h,\tau), we obtain change-point estimators {η^k}k=1K^\{\widehat{\eta}_{k}\}_{k=1}^{\widehat{K}} on the training set and compute the validation loss k=1K^t[η^k,η^k+1)i=1n{(η^k+1η^k)1t=η^k+1η^k+1Ft,h(xt,i)yt,i}2.\sum_{k=1}^{\widehat{K}}\sum_{t\in[\widehat{\eta}_{k},\widehat{\eta}_{k+1})}\sum_{i=1}^{n}\{(\widehat{\eta}_{k+1}-\widehat{\eta}_{k})^{-1}\sum_{t=\widehat{\eta}_{k}+1}^{\widehat{\eta}_{k+1}}F_{t,h}(x_{t,i})-y_{t,i}\}^{2}. The pair (h,τ)(h,\tau) is then chosen to be the one corresponding to the lowest validation loss.

We consider five different scenarios for the observations {xti,yti}t=1,i=1T,n\{x_{ti},y_{ti}\}_{t=1,i=1}^{T,n}. For all scenarios 1-5, we set T=200.T=200. Given the dimensionality dd, denote a generic grid point as x=(x(1),,x(d))x=(x^{(1)},\cdots,x^{(d)}). Scenarios 1 to 4 are generated based on model (1). The basic setting is as follows.

\bullet Scenario 1 (S1) Let (n,d)=(1,1)(n,d)=(1,1), the unevenly-spaced change-points be (η1,η2)=(30,130)(\eta_{1},\eta_{2})=(30,130) and the three distinct mean functions be 6cos()6\cos(\cdot), 6sin()6\sin(\cdot) and 6cos()6\cos(\cdot).

\bullet Scenario 2 (S2) Let (n,d)=(10,1)(n,d)=(10,1), the unevenly-spaced change-points be (η1,η2)=(30,130)(\eta_{1},\eta_{2})=(30,130) and the three distinct mean functions be 2cos()2\cos(\cdot), 2sin()2\sin(\cdot) and 2cos()2\cos(\cdot).

\bullet Scenario 3 (S3) Let (n,d)=(50,1)(n,d)=(50,1), the unevenly-spaced change-points be (η1,η2)=(30,130)(\eta_{1},\eta_{2})=(30,130) and the three distinct mean functions be cos()\cos(\cdot), sin()\sin(\cdot) and cos()\cos(\cdot).

\bullet Scenario 4 (S4) Let (n,d)=(10,2)(n,d)=(10,2), the unevenly-spaced change-points be (η1,η2)=(100,150)(\eta_{1},\eta_{2})=(100,150) and the three distinct mean functions be 0, 3x(1)x(2)3x^{(1)}x^{(2)} and 0.

For S1-S4, the functional noise is generated as ξt(x)=0.5ξt1(x)+i=150i1bt,ihi(x)\xi_{t}(x)=0.5\xi_{t-1}(x)+\sum_{i=1}^{50}i^{-1}b_{t,i}h_{i}(x), where {hi(x)=j=1d(1/2)πsin(ix(j))}i=150\{h_{i}(x)=\prod_{j=1}^{d}(1/\sqrt{2})\pi\sin(ix^{(j)})\}_{i=1}^{50} are basis functions and {bt,i}t=1,i=1T,50\{b_{t,i}\}_{t=1,i=1}^{T,50} are i.i.d. standard normal random variables. The measurement error is generated as δt=0.3δt1+ϵt\delta_{t}=0.3\delta_{t-1}+\epsilon_{t}, where {ϵt}t=1T\{\epsilon_{t}\}_{t=1}^{T} are i.i.d. 𝒩(0,0.5In)\mathcal{N}(0,0.5I_{n}). We observe the noisy functional data {yti}t=1,i=1T,n\{y_{ti}\}_{t=1,i=1}^{T,n} at grid points {xti}t=1,i=1T,n\{x_{ti}\}_{t=1,i=1}^{T,n} independently sampled from Unif([0,1]d)\mathrm{Unif}([0,1]^{d}).

Scenario 5 is adopted from Zhang et al. (2011) for densely-sampled functional data without measurement error.

\bullet Scenario 5 (S5) Let (n,d)=(50,1)(n,d)=(50,1), the evenly-spaced change-points be (η1,η2)=(68,134)(\eta_{1},\eta_{2})=(68,134) and the three distinct mean functions be 0, sin()\sin(\cdot) and 2sin()2\sin(\cdot).

The grid points {xti}i=150\{x_{ti}\}_{i=1}^{50} are 5050 evenly-spaced points in [0,1][0,1] for all t=1,,Tt=1,\cdots,T. The functional noise is generated as ξt()=[0,1]ψ(,u)ξt1(u)du+ϵt()\xi_{t}(\cdot)=\int_{[0,1]}\psi(\cdot,u)\xi_{t-1}(u)\,\mathrm{d}u+\epsilon_{t}(\cdot), where {ϵt()}t=1T\{\epsilon_{t}(\cdot)\}_{t=1}^{T} are independent standard Brownian motions and ψ(v,u)=1/3exp((v2+u2)/2)\psi(v,u)=1/3\exp((v^{2}+u^{2})/2) is a bivariate Gaussian kernel.

S1-S5 represent a wide range of simulation settings including the extreme sparse case S1, sparse case S2, the two-dimensional domain case S4, and the densely sampled cases S3 and S5. Note that S1 and S4 can only be handled by FSBS as for S1 it is impossible to estimate a function via B-spline based on one point and for S4, the domain is of dimension 2.

Evaluation result: For a given set of true change-points 𝒞={ηk}k=1K\mathcal{C}=\{\eta_{k}\}_{k=1}^{K}, we evaluate the accuracy of the estimator {η^k}k=1K^\{\widehat{\eta}_{k}\}_{k=1}^{\widehat{K}} by the difference |K^K||\widehat{K}-K| and the Hausdorff distance d(𝒞^,𝒞)d(\hat{\mathcal{C}},\mathcal{C}), defined by d(𝒞^,𝒞)=max{maxx𝒞^miny𝒞{|xy|},maxy𝒞^minx𝒞{|xy|}}d(\hat{\mathcal{C}},\mathcal{C})=\max\{\max_{x\in{\hat{\mathcal{C}}}}\min_{y\in{\mathcal{C}}}\{|x-y|\},\max_{y\in{\hat{\mathcal{C}}}}\min_{x\in{\mathcal{C}}}\{|x-y|\}\}. For 𝒞^=\hat{\mathcal{C}}=\varnothing, we use the convention that |K^K|=K|\widehat{K}-K|=K and d(𝒞^,𝒞)=Td(\hat{\mathcal{C}},\mathcal{C})=T.

For each scenario, we repeat the experiments 100 times and Figure 1 summarizes the performance of FSBS, BGHK, HK and SN. Tabulated results can be found in Appendix A. As can be seen, FSBS consistently outperforms the competing methods by a wide margin and demonstrates robust behaviour across the board for both sparsely and densely sampled functional data.

Refer to caption
Figure 1: Bar plots for simulation results of S1-S5. Each bar reports the mean and standard deviation computed based on 100 experiments. From left to right, the first two plots correspond to the Hausdorff distance and |KK^||K-\hat{K}| in S2, S3 and S5. The last two plots correspond to S1 and S4.

4.2 Real data application

We consider the COBE-SSTE dataset (Physical Sciences Laboratory, 2020), which consists of monthly average sea surface temperature (SST) from 1940 to 2019, on a 11 degree latitude by 11 degree longitude grid (48×30)(48\times 30) covering Australia. The specific coordinates are latitude 1010S-3939S and longitude 110110E-157157E.

We apply FSBS to detect potential change-points in the two-dimensional SST. The implementation of FSBS is the same as the one described in Section 4.1. To avoid seasonality, we apply FSBS to the SST for the month of June from 1940 to 2019. We further conduct the same analysis separately for the month of July for robustness check.

For both the June and July data, two change-points are identified by FSBS, Year 1981 and 1996, suggesting the robustness of the finding. The two change-points might be associated with years when both the Indian Ocean Dipole and Oceanic Niño Index had extreme events (Ashok et al., 2003). The El Niño/Southern Oscillation has been recognized as an important manifestation of the tropical ocean-atmosphere-land coupled system. It is an irregular periodic variation in winds and sea surface temperatures over the tropical eastern Pacific Ocean. Much of the variability in the climate of Australia is connected with this phenomenon (Australian Government, 2014).

To visualize the estimated change, Figure 2 depicts the average SST before the first change-point Year 1981, between the two change-points, and after the second change-point Year 1996. The two rows correspond to the June and July data, respectively. As we can see, the top left corners exhibit different patterns in the three periods, suggesting the existence of change-points.

Refer to caption
Figure 2: Average SST. From left to right: average SST from 1940 to 1981, average SST from 1982 to 1996, and average SST from 1997 to 2019. The top and bottom rows correspond to the June and July data respectively.

5 Conclusion

In this paper, we study change-point detection for sparse and dense functional data in general dimensions. We show that our algorithm FSBS can consistently estimate the change-points even in the extreme sparse setting with n=1n=1. Our theoretical analysis reveals an interesting phase transition between nn and TT, which has not been discovered in the existing literature for functional change-point detection. The consistency of FSBS relies on the assumption that the minimal spacing ΔT\Delta\asymp T. To relax this assumption, we may consider increasing 𝒦\mathcal{K} in Definition 1 to enlarge the coverage of the seeded intervals in FSBS and apply the narrowest over threshold selection method (Theorem 3 in Kovács et al., 2020). With minor modifications of the current theoretical analysis, the consistency of FSBS can be established for the case of ΔT\Delta\ll T. Since such a relaxation does not add much more methodological insights to our paper, we omit this additional technical discussion for conciseness.

References

  • Ashok et al. (2003) Karumuri Ashok, Zhaoyong Guan, and Toshio Yamagat. A look at the relationship between the enso and the indian ocean dipole. Journal of the Meteorological Society of Japan. Ser. II, 81(1):41–56, 2003. doi: 10.2151/jmsj.81.41.
  • Aue et al. (2009) Alexander Aue, Robertas Gabrys, Lajos Horváth, and Piotr Kokoszka. Estimation of a change-point in the mean function of functional data. Journal of Multivariate Analysis, 100(10):2254–2269, 2009.
  • Aue et al. (2018) Alexander Aue, Gregory Rice, and Ozan Sönmez. Detecting and dating structural breaks in functional data without dimension reduction. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(3):509–529, 2018.
  • Australian Government (2014) Australian Government. What is El Niño and how does it impact Australia? http://www.bom.gov.au/climate/updates/articles/a008-el-nino-and-australia.shtml, June 2014.
  • Berkes et al. (2009) István Berkes, Robertas Gabrys, Lajos Horváth, and Piotr Kokoszka. Detecting changes in the mean of functional observations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(5):927–946, 2009.
  • Bonner et al. (2014) Simon J Bonner, Nathaniel K Newlands, and Nancy E Heckman. Modeling regional impacts of climate teleconnections using functional data analysis. Environmental and Ecological Statistics, 21(1):1–26, 2014.
  • Cai and Yuan (2011) T Tony Cai and Ming Yuan. Optimal estimation of the mean function based on discretely sampled functional data: Phase transition. The Annals of Statistics, 39(5):2330–2355, 2011.
  • Chacón and Duong (2018) José E Chacón and Tarn Duong. Multivariate kernel smoothing and its applications. Chapman and Hall/CRC, 2018.
  • Chiou et al. (2014) Jeng-Min Chiou, Yi-Chen Zhang, Wan-Hui Chen, and Chiung-Wen Chang. A functional data approach to missing value imputation and outlier detection for traffic flow data. Transportmetrica B: Transport Dynamics, 2(2):106–129, 2014.
  • Dai et al. (2019) Xiongtao Dai, Hans-Georg Müller, Jane-Ling Wang, and Sean CL Deoni. Age-dynamic networks and functional correlation for early white matter myelination. Brain Structure and Function, 224(2):535–551, 2019.
  • Dette et al. (2020) Holger Dette, Kevin Kokot, and Stanislav Volgushev. Testing relevant hypotheses in functional time series via self-normalization. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(3):629–660, 2020.
  • Fan et al. (2014) Yingying Fan, Natasha Foutz, Gareth M James, and Wolfgang Jank. Functional response additive model estimation with online virtual stock markets. The Annals of Applied Statistics, 8(4):2435–2460, 2014.
  • Fraiman et al. (2014) Ricardo Fraiman, Ana Justel, Regina Liu, and Pamela Llop. Detecting trends in time series of functional data: A study of antarctic climate change. Canadian Journal of Statistics, 42(4):597–609, 2014.
  • Giné and Guillou (1999) Evarist Giné and Armelle Guillou. Laws of the iterated logarithm for censored data. The Annals of Probability, 27(4):2042–2067, 1999.
  • Giné and Guillou (2001) Evarist Giné and Armelle Guillou. On consistency of kernel density estimators for randomly censored data: rates holding uniformly over adaptive intervals. Annales de l’IHP Probabilités et statistiques, 37(4):503–522, 2001.
  • Giné and Guillou (2002) Evarist Giné and Armelle Guillou. Rates of strong uniform consistency for multivariate kernel density estimators. Annales de l’Institut Henri Poincare (B) Probability and Statistics, 38(6):907–921, 2002. ISSN 0246-0203. doi: https://doi.org/10.1016/S0246-0203(02)01128-7.
  • Hörmann and Kokoszka (2010) Siegfried Hörmann and Piotr Kokoszka. Weakly dependent functional data. The Annals of Statistics, 38(3):1845–1884, 2010.
  • Jiang (2017) Heinrich Jiang. Uniform convergence rates for kernel density estimation. In International Conference on Machine Learning, pages 1694–1703. PMLR, 2017.
  • Kim et al. (2019) Jisu Kim, Jaehyeok Shin, Alessandro Rinaldo, and Larry Wasserman. Uniform convergence rate of the kernel density estimator adaptive to intrinsic volume dimension. In International Conference on Machine Learning, pages 3398–3407. PMLR, 2019.
  • Kirch (2006) Claudia Kirch. Resampling methods for the change analysis of dependent data. PhD thesis, Universität zu Köln, 2006.
  • Kovács et al. (2020) Solt Kovács, Housen Li, Peter Bühlmann, and Axel Munk. Seeded binary segmentation: A general methodology for fast and optimal change point detection. arXiv preprint arXiv:2002.06633, 2020.
  • Liu et al. (2013) Weidong Liu, Han Xiao, and Wei Biao Wu. Probability and moment inequalities under dependence. Statistica Sinica, pages 1257–1272, 2013.
  • Petersen et al. (2019) Alexander Petersen, Sean Deoni, and Hans-Georg Müller. Fréchet estimation of time-varying covariance matrices from sparse data, with application to the regional co-evolution of myelination in the developing brain. The Annals of Applied Statistics, 13(1):393–419, 2019.
  • Physical Sciences Laboratory (2020) Physical Sciences Laboratory. COBE SST2 and Sea-Ice. https://psl.noaa.gov/data/gridded/data.cobe2.html, April 2020.
  • Rigollet and Vert (2009) Philippe Rigollet and Régis Vert. Optimal rates for plug-in estimators of density level sets. Bernoulli, 15(4):1154–1178, 2009.
  • Rinaldo and Wasserman (2010) Alessandro Rinaldo and Larry Wasserman. Generalized density clustering. The Annals of Statistics, 38(5):2678–2722, 2010.
  • Sriperumbudur and Steinwart (2012) Bharath Sriperumbudur and Ingo Steinwart. Consistency and rates for clustering with dbscan. In Artificial Intelligence and Statistics, pages 1090–1098. PMLR, 2012.
  • Tsybakov (2009) Alexandre B Tsybakov. Introduction to Nonparametric Estimation. Springer series in statistics. Springer, Dordrecht, 2009. doi: 10.1007/b13794.
  • Vershynin (2018) Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  • Wang et al. (2019) Daren Wang, Xinyang Lu, and Alessandro Rinaldo. Dbscan: Optimal rates for density-based cluster estimation. Journal of Machine Learning Research, 2019.
  • Wang et al. (2020) Daren Wang, Yi Yu, and Alessandro Rinaldo. Univariate mean change point detection: Penalization, cusum and optimality. Electronic Journal of Statistics, 14(1):1917–1961, 2020.
  • Wang et al. (2016) Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller. Functional data analysis. Annual Review of Statistics and Its Application, 3:257–295, 2016.
  • Wu (2005) Wei Biao Wu. Nonlinear system theory: Another look at dependence. Proceedings of the National Academy of Sciences, 102(40):14150–14154, 2005.
  • Zhang et al. (2011) Xianyang Zhang, Xiaofeng Shao, Katharine Hayhoe, and Donald J Wuebbles. Testing the structural stability of temporally dependent functional observations and application to climate projections. Electronic Journal of Statistics, 5:1765–1796, 2011.
  • Zhang and Wang (2016) Xiaoke Zhang and Jane-Ling Wang. From sparse to dense functional data and beyond. The Annals of Statistics, 44(5):2281–2321, 2016.

Appendices

Appendix A Detailed simulation results

We present the tables containing the results of the simulation study in Section 4.1 of the main text.

Table 1: Scenario 1 (n=1,d=1n=1,d=1 changes from 6cos6\cos-6sin6\sin-6cos6\cos)
Model KK^<0K-\hat{K}<0 KK^=0K-\hat{K}=0 KK^>0K-\hat{K}>0 |K^K||\hat{K}-K| dd
FSBS 0.05 0.86 0.09 0.170.17 16.1516.15

Changes occur at the times 3030 and 130130.

Table 2: Scenario 2 (n=10,d=1n=10,d=1, changes from 2cos2\cos-2sin2\sin-2cos2\cos)
Model KK^<0K-\hat{K}<0 KK^=0K-\hat{K}=0 KK^>0K-\hat{K}>0 |K^K||\hat{K}-K| dd
FSBS 0.05 0.95 0 0.050.05 3.323.32
BGHK 0.580.58 0.420.42 0 1.121.12 20.1120.11
HK 0.160.16 0.470.47 0.370.37 0.780.78 66.4566.45
SN 0.040.04 0.030.03 0.930.93 1.831.83 181.11181.11

Changes occur at the times 3030 and 130130.

Table 3: Scenario 3 (n=50,d=1n=50,d=1, changes from cos\cos-sin\sin-cos\cos)
Model KK^<0K-\hat{K}<0 KK^=0K-\hat{K}=0 KK^>0K-\hat{K}>0 |K^K||\hat{K}-K| dd
FSBS 0 0.93 0.07 0.070.07 7.357.35
BGHK 0.850.85 0.150.15 0 2.972.97 32.8832.88
HK 0 0.080.08 0.920.92 1.711.71 172.52172.52
SN 0.020.02 0.040.04 0.940.94 1.851.85 183.63183.63

Changes occur at the times 3030 and 130130.

Table 4: Scenario 4 (n=10,d=2n=10,d=2, changes from 0-3x(1)x(2)3x^{(1)}x^{(2)}-0)
Model KK^<0K-\hat{K}<0 KK^=0K-\hat{K}=0 KK^>0K-\hat{K}>0 |K^K||\hat{K}-K| dd
FSBS 0 0.92 0.08 0.080.08 5.025.02

Changes occur at the times 100100 and 150150.

Table 5: Scenario 5 (n=50,d=1n=50,d=1, changes from 0-sin\sin-2sin2\sin)
Model KK^<0K-\hat{K}<0 KK^=0K-\hat{K}=0 KK^>0K-\hat{K}>0 |K^K||\hat{K}-K| dd
FSBS 0.02 0.98 0 0.020.02 16.916.9
BGHK 0.480.48 0.300.30 0.220.22 1.091.09 34.3634.36
HK 0 0.190.19 0.810.81 0.810.81 48.2448.24
SN 0.080.08 0.330.33 0.590.59 0.850.85 65.1565.15

Changes occur at the times 6868 and 134134.

Appendix B Proof of Theorem 1

In this section, we present the proofs of theorem Theorem 1. To this end, we will invoke the following well-known ll_{\infty} bounds for kernel density estimation.

Lemma 1.

Let {xt,i}i=1,t=1n,T\{x_{t,i}\}_{i=1,t=1}^{n,T} be random grid points independently sampled from a common density function u:[0,1]du:[0,1]^{d}\to\mathbb{R}. Under Assumption 2-b, the density estimator of the sampling distribution μ\mu,

p^(x)=1nTt=1Ti=1ntKh¯(xxi,t),x[0,1]d,\hat{p}(x)=\frac{1}{nT}\sum_{t=1}^{T}\sum_{i=1}^{n_{t}}K_{\bar{h}}(x-x_{i,t}),\quad x\in[0,1]^{d},

satisfies,

p^𝔼(p^)Clog(nT)+log(1/h¯)nTh¯d\displaystyle||\hat{p}-\mathbb{E}(\hat{p})||_{\infty}\leq C\sqrt{\frac{\log(nT)+\log(1/\bar{h})}{nT\bar{h}^{d}}} (4)

with probability at least 11nT1-\frac{1}{nT}. Moreover, under Assumption 2-a, the bias term satisfies

𝔼(p^)uC2h¯r.\displaystyle||\mathbb{E}(\hat{p})-u||_{\infty}\leq C_{2}\bar{h}^{r}. (5)

Therefore,

p^u=O((log(nT)nT)2r2r+d)\displaystyle||\hat{p}-u||_{\infty}=O\Big{(}\Big{(}\frac{\log(nT)}{nT}\Big{)}^{\frac{2r}{2r+d}}\Big{)} (6)

with probability at least 11nT1-\frac{1}{nT}.

The verification of these bounds can be found in many places in the literature. For equation (4) see for example Giné and Guillou (2002), Rinaldo and Wasserman (2010), Sriperumbudur and Steinwart (2012) and Jiang (2017). For equation (5), Tsybakov (2009) is a common reference.

Proof of Theorem 1.

For any (s,e](0,T](s,e]\subseteq(0,T], let

f~t(s,e](x)=et(es)(ts)l=s+1tfl(x)ts(es)(et)l=t+1efl(x),x[0,1]d.\widetilde{f}^{(s,e]}_{t}(x)=\sqrt{\frac{e-t}{(e-s)(t-s)}}\sum_{l=s+1}^{t}f_{l}^{*}(x)-\sqrt{\frac{t-s}{(e-s)(e-t)}}\sum_{l=t+1}^{e}f_{l}^{*}(x),\ x\in[0,1]^{d}.

For any r~(ρ,Tρ]\tilde{r}\in(\rho,T-\rho] and x[0,1]x\in[0,1], we consider

𝒜x((s,e],ρ,λ)={maxt=s+ρ+1eρ|F~t,hs,e(x)f~ts,e(x)|λ};\displaystyle\mathcal{A}_{x}((s,e],\rho,\lambda)=\bigg{\{}\max_{t=s+\rho+1}^{e-\rho}|\widetilde{F}_{t,h}^{s,e}(x)-\widetilde{f}_{t}^{s,e}(x)|\leq\lambda\bigg{\}};
x(r~,ρ,λ)={maxN=ρTr~|1Nt=r~+1r~+NFt,h(x)1Nt=r~+1r~+Nft(x)|λ}\displaystyle\mathcal{B}_{x}(\tilde{r},\rho,\lambda)=\bigg{\{}\max_{N=\rho}^{T-\tilde{r}}\bigg{|}\frac{1}{\sqrt{N}}\sum_{t=\tilde{r}+1}^{\tilde{r}+N}F_{t,h}(x)-\frac{1}{\sqrt{N}}\sum_{t=\tilde{r}+1}^{\tilde{r}+N}f_{t}(x)\bigg{|}\leq\lambda\bigg{\}}\bigcup
{maxN=ρr~|1Nt=r~N+1r~Ft,h(x)1Nt=r~N+1r~ft(x)|λ}.\displaystyle\quad\quad\quad\quad\quad\ \ \bigg{\{}\max_{N=\rho}^{\tilde{r}}\bigg{|}\frac{1}{\sqrt{N}}\sum_{t=\tilde{r}-N+1}^{\tilde{r}}F_{t,h}(x)-\frac{1}{\sqrt{N}}\sum_{t=\tilde{r}-N+1}^{\tilde{r}}f_{t}(x)\bigg{|}\leq\lambda\bigg{\}}.


From Algorithm 1, we have that

ρ=log(T)nhd.\rho=\frac{\log(T)}{nh^{d}}.

We observe that, ρnhd=log(T)\rho nh^{d}=\log(T) and for T3,T\geq 3, we have that

ρ1/21/q(nhd)1/2(q1)/q.\displaystyle\rho^{1/2-1/q}\geq{(nh^{d})^{1/2-(q-1)/q}}.

Therefore, Proposition 1 and Corollary 1 imply that with

λ=Cλ(log5/q(T)1nhd+1+log(T)nhd+Thr+T(log(nT)nT)2r2r+d),\displaystyle\lambda=C_{\lambda}\bigg{(}\log^{5/q}(T)\sqrt{\frac{1}{nh^{d}}+1}+\sqrt{\frac{\log(T)}{nh^{d}}}+\sqrt{T}h^{r}+\sqrt{T}\Big{(}\frac{\log(nT)}{nT}\Big{)}^{\frac{2r}{2r+d}}\bigg{)}, (7)

for some diverging sequence CλC_{\lambda}, it holds that

P{𝒜xc((s,e],ρ,λ)}4C1log(T)(log5/q(T))q+2T5+10TnP\bigg{\{}\mathcal{A}_{x}^{c}((s,e],\rho,\lambda)\bigg{\}}\leq 4C_{1}\frac{\log(T)}{(\log^{5/q}(T))^{q}}+\frac{2}{T^{5}}+\frac{10}{Tn}

and

P{xc(r,ρ,λ)}2C1log(T)(log5/q(T))q+1T5+5Tn.P\bigg{\{}\mathcal{B}_{x}^{c}(r,\rho,\lambda)\bigg{\}}\leq 2C_{1}\frac{\log(T)}{(\log^{5/q}(T))^{q}}+\frac{1}{T^{5}}+\frac{5}{Tn}.

Then, using that log4(T)=O(T),\log^{4}(T)=O(T), from above

P{𝒜xc((s,e],ρ,λ)}=O(log4(T))andP{xc(r,ρ,λ)}=O(log4(T)).P\bigg{\{}\mathcal{A}_{x}^{c}((s,e],\rho,\lambda)\bigg{\}}=O(\log^{-4}(T))\quad\text{and}\quad P\bigg{\{}\mathcal{B}_{x}^{c}(r,\rho,\lambda)\bigg{\}}=O(\log^{-4}(T)).

Now, we notice that,

k=1𝒦n~k=k=1𝒦(2k1)k=1𝒦2k2(2log(2)C𝒦(log(log(T)))/log21)\displaystyle\sum_{k=1}^{\mathcal{K}}\tilde{n}_{k}=\sum_{k=1}^{\mathcal{K}}(2^{k}-1)\leq\sum_{k=1}^{\mathcal{K}}2^{k}\leq 2(2^{\lceil\log(2)C_{\mathcal{K}}(\log(\log(T)))/\log 2\rceil}-1)
4(2(log(log(T)))/log2)log(2)C𝒦=O(loglog(2)C𝒦((T))).\displaystyle\leq 4(2^{(\log(\log(T)))/\log 2})^{\log(2)C_{\mathcal{K}}}=O(\log^{\log(2)C_{\mathcal{K}}}((T))).

In addition, there are K=O(1)K=O(1) number of change-points. In consequence, it follows that

P{𝒜u(,ρ,λ) for all 𝒥 and all u{um}m=1log(T)}11log2(T),\displaystyle P\bigg{\{}\mathcal{A}_{u}(\mathcal{I},\rho,\lambda)\text{ for all }\mathcal{I}\in\mathcal{J}\text{ and all }u\in\{u_{m}\}_{m=1}^{\log(T)}\bigg{\}}\geq 1-\frac{1}{\log^{2}(T)}, (8)
P{u(s,ρ,λ)u(e,ρ,λ) for all (s,e]=𝒥 and all u{um}m=1log(T)}11log(T),\displaystyle P\bigg{\{}\mathcal{B}_{u}(s,\rho,\lambda)\cup\mathcal{B}_{u}(e,\rho,\lambda)\text{ for all }\mathcal{(}s,e]=\mathcal{I}\in\mathcal{J}\text{ and all }u\in\{u_{m}\}_{m=1}^{\log(T)}\bigg{\}}\geq 1-\frac{1}{\log(T)}, (9)
P{u(ηk,ρ,λ) for all 1kK and all u{um}m=1log(T)}11log3(T).\displaystyle P\bigg{\{}\mathcal{B}_{u}(\eta_{k},\rho,\lambda)\text{ for all }1\leq k\leq K\text{ and all }u\in\{u_{m}\}_{m=1}^{\log(T)}\bigg{\}}\geq 1-\frac{1}{\log^{3}(T)}. (10)

The rest of the argument is made by assuming the events in equations (8), (9) and (10) hold.

Denote

Υk=Clogmax{1,10/q}(T)(1+Td2r+dn2r2r+d)κk2andΥmax=Clogmax{1,10/q}(T)(1+Td2r+dn2r2r+d)κ2,\Upsilon_{k}=C\log^{\max\{1,10/q\}}(T)\bigg{(}1+T^{\frac{d}{2r+d}}n^{\frac{-2r}{2r+d}}\bigg{)}\kappa_{k}^{-2}\quad\text{and}\quad\Upsilon_{\max}=C\log^{\max\{1,10/q\}}(T)\bigg{(}1+T^{\frac{d}{2r+d}}n^{\frac{-2r}{2r+d}}\bigg{)}\kappa^{-2},

where κ=min{κ1,,κK}\kappa=\min\{\kappa_{1},\ldots,\kappa_{K}\}. Since Υk\Upsilon_{k} is the desired localisation rate, by induction, it suffices to consider any generic interval (s,e](0,T](s,e]\subseteq(0,T] that satisfies the following three conditions:

ηm1sηmηm+qeηm+q+1,q1;\displaystyle\eta_{m-1}\leq s\leq\eta_{m}\leq\ldots\leq\eta_{m+q}\leq e\leq\eta_{m+q+1},\quad q\geq-1;
either ηmsΥmorsηm1Υm1;\displaystyle\text{ either }\eta_{m}-s\leq\Upsilon_{m}\quad\text{or}\quad s-\eta_{m-1}\leq\Upsilon_{m-1};
either ηm+q+1eΥm+q+1oreηm+qΥm+q.\displaystyle\text{ either }\eta_{m+q+1}-e\leq\Upsilon_{m+q+1}\quad\text{or}\quad e-\eta_{m+q}\leq\Upsilon_{m+q}.

Here q=1q=-1 indicates that there is no change-point contained in (s,e](s,e].

Denote

Δk=ηk1ηk for k=1,,K+1andΔ=min{Δ1,,ΔK+1}.\Delta_{k}=\eta_{k-1}-\eta_{k}\text{ for }k=1,\ldots,K+1\quad\text{and}\quad\Delta=\min\{\Delta_{1},\ldots,\Delta_{K+1}\}.

Observe that since κk>0\kappa_{k}>0 for all 1kK1\leq k\leq K and that Δk=Θ(T)\Delta_{k}=\Theta(T), it holds that Υmax=o(Δ)\Upsilon_{\max}=o(\Delta). Therefore, it has to be the case that for any true change-point ηm(0,T]\eta_{m}\in(0,T], either |ηms|Υm|\eta_{m}-s|\leq\Upsilon_{m} or |ηms|ΔΥmaxΘ(T)|\eta_{m}-s|\geq\Delta-\Upsilon_{\max}\geq\Theta(T). This means that min{|ηme|,|ηms|}Υm\min\{|\eta_{m}-e|,|\eta_{m}-s|\}\leq\Upsilon_{m} indicates that ηm\eta_{m} is a detected change-point in the previous induction step, even if ηm(s,e]\eta_{m}\in(s,e]. We refer to ηm(s,e]\eta_{m}\in(s,e] as an undetected change-point if min{ηms,ηme}=Θ(T)\min\{\eta_{m}-s,\eta_{m}-e\}=\Theta(T). To complete the induction step, it suffices to show that FSBS ((s,e],h,τ)((s,e],h,\tau)
(i) will not detect any new change point in (s,e](s,e] if all the change-points in that interval have been previously detected, and
(ii) will find a point DmD_{m*}^{\mathcal{I}^{*}} in (s,e](s,e] such that |ηmDm|Υm|\eta_{m}-D_{m*}^{\mathcal{I}^{*}}|\leq\Upsilon_{m} if there exists at least one undetected change-point in (s,e](s,e].

In order to accomplish this, we need the following series of steps.

Step 1. We first observe that if ηk{ηk}k=1K\eta_{k}\in\{\eta_{k}\}_{k=1}^{K} is any change-point in the functional time series, by Lemma 8, there exists a seeded interval k=(sk,ek]\mathcal{I}_{k}=(s_{k},e_{k}] containing exactly one change-point ηk\eta_{k} such that

min{ηksk,ekηk}116ζk,andmax{ηksk,ekηk}ζk\displaystyle\min\{\eta_{k}-s_{k},e_{k}-\eta_{k}\}\geq\frac{1}{16}\zeta_{k},\quad\text{and}\quad\max\{\eta_{k}-s_{k},e_{k}-\eta_{k}\}\leq\zeta_{k}

where,

ζk=910min{ηk+1ηk,ηkηk1}.\zeta_{k}=\frac{9}{10}\min\{\eta_{k+1}-\eta_{k},\eta_{k}-\eta_{k-1}\}.

Even more, we notice that if ηk(s,e]\eta_{k}\in(s,e] is any undetected change-point in (s,e](s,e]. Then it must hold that

sηk1Υmax.s-\eta_{k-1}\leq\Upsilon_{\max}.

Since Υmax=O(logmax{1,10/q}(T)Td2r+d)\Upsilon_{\max}=O(\log^{\max\{1,10/q\}}(T)T^{\frac{d}{2r+d}}) and O(loga(T))=o(Tb)O(\log^{a}(T))=o(T^{b}) for any positive numbers aa and bb, we have that Υmax=o(T)\Upsilon_{\max}=o(T). Moreover, ηkskζk910(ηkηk1)\eta_{k}-s_{k}\leq\zeta_{k}\leq\frac{9}{10}(\eta_{k}-\eta_{k-1}), so that it holds that

skηk1110(ηkηk1)>Υmaxsηk1s_{k}-\eta_{k-1}\geq\frac{1}{10}(\eta_{k}-\eta_{k-1})>\Upsilon_{\max}\geq s-\eta_{k-1}

and in consequence skss_{k}\geq s. Similarly ekee_{k}\leq e. Therefore

k=(sk,ek](s,e].\mathcal{I}_{k}=(s_{k},e_{k}]\subseteq(s,e].

Step 2. Consider the collection of intervals {k=(sk,ek]}k=1K\{\mathcal{I}_{k}=(s_{k},e_{k}]\}_{k=1}^{K} in Step 1. In this step, it is shown that for each k{1,,K}k\in\{1,\ldots,K\}, it holds that

maxt=sk+ρt=ekρmaxm=1m=log(T)|F~t,h(sk,ek](um)|c1Tκk,\displaystyle\max_{t=s_{k}+\rho}^{t=e_{k}-\rho}\max_{m=1}^{m=\log(T)}|\widetilde{F}_{t,h}^{(s_{k},e_{k}]}(u_{m})|\geq c_{1}\sqrt{T}\kappa_{k}, (11)

for some sufficient small constant c1c_{1}.

Let k{1,,K}k\in\{1,\ldots,K\}. By Step 1, k\mathcal{I}_{k} contains exactly one change-point ηk\eta_{k}. Since for every umu_{m}, ft(um)f^{*}_{t}(u_{m}) is a one dimensional population time series and there is only one change-point in k=(sk,ek]\mathcal{I}_{k}=(s_{k},e_{k}], it holds that

fsk+1(um)==fηk(um)fηk+1(um)==fek(um)f^{*}_{s_{k}+1}(u_{m})=...=f^{*}_{\eta_{k}}(u_{m})\neq f^{*}_{\eta_{k}+1}(u_{m})=...=f^{*}_{e_{k}}(u_{m})

which implies, for sk<t<ηks_{k}<t<\eta_{k}

f~t(sk,ek](um)=\displaystyle\widetilde{f}^{(s_{k},e_{k}]}_{t}(u_{m})= ekt(eksk)(tsk)l=sk+1tfηk(um)tsk(eksk)(ekt)l=t+1ηkfηk(um)\displaystyle\sqrt{\frac{e_{k}-t}{(e_{k}-s_{k})(t-s_{k})}}\sum_{l=s_{k}+1}^{t}f_{\eta_{k}}^{*}(u_{m})-\sqrt{\frac{t-s_{k}}{(e_{k}-s_{k})(e_{k}-t)}}\sum_{l=t+1}^{\eta_{k}}f_{\eta_{k}}^{*}(u_{m})
\displaystyle- tsk(eksk)(ekt)l=ηk+1ekfηk+1(um)\displaystyle\sqrt{\frac{t-s_{k}}{(e_{k}-s_{k})(e_{k}-t)}}\sum_{l=\eta_{k}+1}^{e_{k}}f_{\eta_{k}+1}^{*}(u_{m})
=\displaystyle= (tsk)ekt(eksk)(tsk)fηk(um)(ηkt)tsk(eksk)(ekt)fηk(um)\displaystyle(t-s_{k})\sqrt{\frac{e_{k}-t}{(e_{k}-s_{k})(t-s_{k})}}f_{\eta_{k}}^{*}(u_{m})-(\eta_{k}-t)\sqrt{\frac{t-s_{k}}{(e_{k}-s_{k})(e_{k}-t)}}f_{\eta_{k}}^{*}(u_{m})
\displaystyle- (ekηk)tsk(eksk)(ekt)fηk+1(um)\displaystyle(e_{k}-\eta_{k})\sqrt{\frac{t-s_{k}}{(e_{k}-s_{k})(e_{k}-t)}}f_{\eta_{k}+1}^{*}(u_{m})
=\displaystyle= (tsk)(ekt)(eksk)fηk(um)(ηkt)tsk(eksk)(ekt)fηk(um)\displaystyle\sqrt{\frac{(t-s_{k})(e_{k}-t)}{(e_{k}-s_{k})}}f_{\eta_{k}}^{*}(u_{m})-(\eta_{k}-t)\sqrt{\frac{t-s_{k}}{(e_{k}-s_{k})(e_{k}-t)}}f_{\eta_{k}}^{*}(u_{m})
\displaystyle- (ekηk)tsk(eksk)(ekt)fηk+1(um)\displaystyle(e_{k}-\eta_{k})\sqrt{\frac{t-s_{k}}{(e_{k}-s_{k})(e_{k}-t)}}f_{\eta_{k}+1}^{*}(u_{m})
=\displaystyle= (ekt)tsk(ekt)(eksk)fηk(um)(ηkt)tsk(eksk)(ekt)fηk(um)\displaystyle(e_{k}-t)\sqrt{\frac{t-s_{k}}{(e_{k}-t)(e_{k}-s_{k})}}f_{\eta_{k}}^{*}(u_{m})-(\eta_{k}-t)\sqrt{\frac{t-s_{k}}{(e_{k}-s_{k})(e_{k}-t)}}f_{\eta_{k}}^{*}(u_{m})
\displaystyle- (ekηk)tsk(eksk)(ekt)fηk+1(um)\displaystyle(e_{k}-\eta_{k})\sqrt{\frac{t-s_{k}}{(e_{k}-s_{k})(e_{k}-t)}}f_{\eta_{k}+1}^{*}(u_{m})
=\displaystyle= (ekηk)tsk(ekt)(eksk)fηk(um)(ekηk)tsk(eksk)(ekt)fηk+1(um)\displaystyle(e_{k}-\eta_{k})\sqrt{\frac{t-s_{k}}{(e_{k}-t)(e_{k}-s_{k})}}f_{\eta_{k}}^{*}(u_{m})-(e_{k}-\eta_{k})\sqrt{\frac{t-s_{k}}{(e_{k}-s_{k})(e_{k}-t)}}f_{\eta_{k}+1}^{*}(u_{m})
=\displaystyle= (ekηk)tsk(ekt)(eksk)(fηk(um)fηk+1(um)).\displaystyle(e_{k}-\eta_{k})\sqrt{\frac{t-s_{k}}{(e_{k}-t)(e_{k}-s_{k})}}(f_{\eta_{k}}^{*}(u_{m})-f_{\eta_{k}+1}^{*}(u_{m})).

Similarly, for ηktek\eta_{k}\leq t\leq e_{k}

ft(sk,ek](um)=ekt(eksk)(tsk)(ηksk)(fηk(um)fηk+1(um)).\displaystyle f^{(s_{k},e_{k}]}_{t}(u_{m})=\sqrt{\frac{e_{k}-t}{(e_{k}-s_{k})(t-s_{k})}}(\eta_{k}-s_{k})(f^{*}_{\eta_{k}}(u_{m})-f^{*}_{\eta_{k}+1}(u_{m})).

Therefore,

f~t(sk,ek](um)={tsk(eksk)(ekt)(ekηk)(fηk(um)fηk+1(um)),sk<t<ηk;ekt(eksk)(tsk)(ηksk)(fηk(um)fηk+1(um)),ηktek.\displaystyle\widetilde{f}^{(s_{k},e_{k}]}_{t}(u_{m})=\begin{cases}\sqrt{\frac{t-s_{k}}{(e_{k}-s_{k})(e_{k}-t)}}(e_{k}-\eta_{k})(f^{*}_{\eta_{k}}(u_{m})-f^{*}_{\eta_{k}+1}(u_{m})),&s_{k}<t<\eta_{k};\\ \sqrt{\frac{e_{k}-t}{(e_{k}-s_{k})(t-s_{k})}}(\eta_{k}-s_{k})(f^{*}_{\eta_{k}}(u_{m})-f^{*}_{\eta_{k}+1}(u_{m})),&\eta_{k}\leq t\leq e_{k}.\end{cases} (12)

By Lemma 7, with probability at least 1o(1)1-o(1), there exists uk~{um}m=1log(T)u_{\tilde{k}}\in\{u_{m}\}_{m=1}^{\log(T)} such that

|fηk(uk~)fηk+1(uk~)|34κk.|f^{*}_{\eta_{k}}(u_{\tilde{k}})-f^{*}_{\eta_{k}+1}(u_{\tilde{k}})|\geq\frac{3}{4}\kappa_{k}.

Since Δ=Θ(T)\Delta=\Theta(T), ρ=O(log(T)Td2r+d)\rho=O(\log(T)T^{\frac{d}{2r+d}}) and loga(T)=o(Tb)\log^{a}(T)=o(T^{b}) for any positive numbers aa and b,b, we have that

min{ηksk,ekηk}116ζkc2T>ρ,\min\{\eta_{k}-s_{k},e_{k}-\eta_{k}\}\geq\frac{1}{16}\zeta_{k}\geq c_{2}T>\rho, (13)

so that ηk[sk+ρ,ekρ]\eta_{k}\in[s_{k}+\rho,e_{k}-\rho]. Then, from (12), (13) and the fact that |eksk|<T|e_{k}-s_{k}|<T and |ηksk|<T|\eta_{k}-s_{k}|<T,

|f~ηk(sk,ek](uk~)|=ekηk(eksk)(ηksk)(ηksk)|fηk(uk~)fηk+1(uk~)|c2T34κk.\displaystyle|\widetilde{f}^{(s_{k},e_{k}]}_{\eta_{k}}(u_{\tilde{k}})|=\sqrt{\frac{e_{k}-\eta_{k}}{(e_{k}-s_{k})(\eta_{k}-s_{k})}}(\eta_{k}-s_{k})|f^{*}_{\eta_{k}}(u_{\tilde{k}})-f^{*}_{\eta_{k}+1}(u_{\tilde{k}})|\geq c_{2}\sqrt{T}\frac{3}{4}\kappa_{k}. (14)

Therefore, it holds that

maxt=sk+ρt=ekρmaxm=1m=log(T)|F~t,h(sk,ek](um)|\displaystyle\max_{t=s_{k}+\rho}^{t=e_{k}-\rho}\max_{m=1}^{m=\log(T)}|\widetilde{F}^{(s_{k},e_{k}]}_{t,h}(u_{m})|\geq |F~ηk,h(sk,ek](uk~)|\displaystyle|\widetilde{F}^{(s_{k},e_{k}]}_{\eta_{k},h}(u_{\tilde{k}})|
\displaystyle\geq |f~ηk(sk,ek](uk~)|λ\displaystyle|\widetilde{f}^{(s_{k},e_{k}]}_{\eta_{k}}(u_{\tilde{k}})|-\lambda
\displaystyle\geq c234Tκkλ,\displaystyle c_{2}\frac{3}{4}\sqrt{T}\kappa_{k}-\lambda,

where the first inequality follows from the fact that ηk[sk+ρ,ekρ]\eta_{k}\in[s_{k}+\rho,e_{k}-\rho], the second inequality follows from the good event in (8), and the last inequality follows from (14).
Next, we observe that log5q(T)1nhd+1=o(T2r+dd)O(Td2r+d)=o(T)\log^{\frac{5}{q}}(T)\sqrt{\frac{1}{nh^{d}}+1}=o(\sqrt{T^{\frac{2r+d}{d}}})O(\sqrt{T^{\frac{d}{2r+d}}})=o(\sqrt{T}), ρ<c2T\rho<c_{2}T, hr=o(1)h^{r}=o(1) and (lognTnT)2r2r+d=o(1)\Big{(}\frac{\log nT}{nT}\Big{)}^{\frac{2r}{2r+d}}=o(1). In consequence, since κk\kappa_{k} is a positive constant, by the upper bound of λ\lambda on Equation 7, for sufficiently large TT, it holds that

c24Tκkλ.\frac{c_{2}}{4}\sqrt{T}\kappa_{k}\geq\lambda.

Therefore,

maxt=sk+ρt=ekρmaxm=1m=log(T)|F~t,h(sk,ek](um)|c22Tκk.\max_{t=s_{k}+\rho}^{t=e_{k}-\rho}\max_{m=1}^{m=\log(T)}|\widetilde{F}^{(s_{k},e_{k}]}_{t,h}(u_{m})|\geq\frac{c_{2}}{2}\sqrt{T}\kappa_{k}.

Therefore Equation 11 holds with c1=c22.c_{1}=\frac{c_{2}}{2}.

Step 3. In this step, it is shown that FSBS((s,e],h,τ)((s,e],h,\tau) can consistently detect or reject the existence of undetected change-points within (s,e](s,e].

Suppose ηk(s,e]\eta_{k}\in(s,e] is any undetected change-point. Then by the second half of Step 1, k(s,e]\mathcal{I}_{k}\subseteq(s,e]. Therefore

Ammaxt=sk+ρt=ekρmaxm=1m=log(T)|F~t,h(sk,ek](um)|c1Tκk>τ,\displaystyle A^{\mathcal{I}^{*}}_{m^{*}}\geq\max_{t=s_{k}+\rho}^{t=e_{k}-\rho}\max_{m=1}^{m=\log(T)}|\widetilde{F}_{t,h}^{(s_{k},e_{k}]}(u_{m})|\geq c_{1}\sqrt{T}\kappa_{k}>\tau,

where the second inequality follows from Equation 11, and the last inequality follows from the fact that, loga(T)=o(Tb)\log^{a}(T)=o(T^{b}) for any positive numbers aa and bb implies

τ=Cτ(logmax{1,10/q}(T)1nhd+1)=o(T)\tau=C_{\tau}\bigg{(}\log^{\max\{1,10/q\}}(T)\sqrt{\frac{1}{nh^{d}}+1}\bigg{)}=o(\sqrt{T})

.

Suppose there does not exist any undetected change-point in (s,e](s,e]. Then for any =(α,β](s,e]\mathcal{I}=(\alpha,\beta]\subseteq(s,e], one of the following situations must hold,

  • (a)

    There is no change-point within (α,β](\alpha,\beta];

  • (b)

    there exists only one change-point ηk\eta_{k} within (α,β](\alpha,\beta] and min{ηkα,βηk}Υk\min\{\eta_{k}-\alpha,\beta-\eta_{k}\}\leq\Upsilon_{k};

  • (c)

    there exist two change-points ηk,ηk+1\eta_{k},\eta_{k+1} within (α,β](\alpha,\beta] and

    ηkαΥkandβηk+1Υk+1.\eta_{k}-\alpha\leq\Upsilon_{k}\quad\text{and}\quad\beta-\eta_{k+1}\leq\Upsilon_{k+1}.

The calculations of (c) are provided as the other two cases are similar and simpler. Note that for any x[0,1]dx\in[0,1]^{d}, it holds that

|fηk+1(x)fηk+1+1(x)|fηk+1fηk+1+1=κk+1|f_{\eta_{k+1}}^{*}(x)-f_{\eta_{k+1}+1}^{*}(x)|\leq\|f_{\eta_{k+1}}^{*}-f_{\eta_{k+1}+1}^{*}\|_{\infty}=\kappa_{k+1}

and similarly

|fηk(x)fηk+1(x)|κk.|f_{\eta_{k}}^{*}(x)-f_{\eta_{k}+1}^{*}(x)|\leq\kappa_{k}.

By Lemma 10 and the assumption that (α,β](\alpha,\beta] contains only two change-points, it holds that for all x[0,1]dx\in[0,1]^{d},

maxt=αβ|f~t(a,β](x)|\displaystyle\max_{t=\alpha}^{\beta}|\widetilde{f}^{(a,\beta]}_{t}(x)|\leq βηr+1|fηr+1(x)fηr+1+1(x)|+ηrα|fηr(x)fηr+1(x)|\displaystyle\sqrt{\beta-\eta_{r+1}}|f^{*}_{\eta_{r+1}}(x)-f^{*}_{\eta_{r+1}+1}(x)|+\sqrt{\eta_{r}-\alpha}|f_{\eta_{r}}^{*}(x)-f_{\eta_{r}^{*}+1}(x)|
\displaystyle\leq Υk+1κk+1+Υkκk2Clogmax{1/2,5/q}(T)1+Td2r+dn2r2r+d.\displaystyle\sqrt{\Upsilon_{k+1}}\kappa_{k+1}+\sqrt{\Upsilon_{k}}\kappa_{k}\leq 2\sqrt{C}\log^{\max\{1/2,5/q\}}(T)\sqrt{1+T^{\frac{d}{2r+d}}n^{\frac{-2r}{2r+d}}}.

Thus

maxt=αβf~t(a,β]2Clogmax{1/2,5/q}(T)1+Td2r+dn2r2r+d.\displaystyle\max_{t=\alpha}^{\beta}\|\widetilde{f}^{(a,\beta]}_{t}\|_{\infty}\leq 2\sqrt{C}\log^{\max\{1/2,5/q\}}(T)\sqrt{1+T^{\frac{d}{2r+d}}n^{\frac{-2r}{2r+d}}}. (15)

Therefore in the good event in Equation 8, for any 1mlog(T)1\leq m\leq\log(T) and any =(α,β](s,e]\mathcal{I}=(\alpha,\beta]\subseteq(s,e], it holds that

Am=\displaystyle A_{m}^{\mathcal{I}}= maxt=α+ρβρ|F~t,h(α,β](um)|\displaystyle\max_{t=\alpha+\rho}^{\beta-\rho}|\widetilde{F}_{t,h}^{(\alpha,\beta]}(u_{m})|
\displaystyle\leq maxt=α+ρβρf~t(α,β]+λ\displaystyle\max_{t=\alpha+\rho}^{\beta-\rho}\|\widetilde{f}^{(\alpha,\beta]}_{t}\|_{\infty}+\lambda
\displaystyle\leq 2Clogmax{1/2,5/q}(T)1+Td2r+dn2r2r+d+λ,\displaystyle 2\sqrt{C}\log^{\max\{1/2,5/q\}}(T)\sqrt{1+T^{\frac{d}{2r+d}}n^{\frac{-2r}{2r+d}}}+\lambda,

where the first inequality follows from Equation 8, and the last inequality follows from Equation 15. Then,

2Clogmax{1/2,5/q}(T)1+Td2r+dn2r2r+d+λ\displaystyle 2\sqrt{C}\log^{\max\{1/2,5/q\}}(T)\sqrt{1+T^{\frac{d}{2r+d}}n^{\frac{-2r}{2r+d}}}+\lambda
=\displaystyle= 2Clogmax{1/2,5/q}(T)1nhd+1\displaystyle 2\sqrt{C}\log^{\max\{1/2,5/q\}}(T)\sqrt{\frac{1}{nh^{d}}+1}
+Cλlog5/q(T)1nhd+1+Cλlog(T)nhd+CλThr+CλT(lognTnT)2r2r+d.\displaystyle+C_{\lambda}\log^{5/q}(T)\sqrt{\frac{1}{nh^{d}}+1}+C_{\lambda}\sqrt{\frac{\log(T)}{nh^{d}}}+C_{\lambda}\sqrt{T}h^{r}+C_{\lambda}\sqrt{T}\Big{(}\frac{\log nT}{nT}\Big{)}^{\frac{2r}{2r+d}}.

We observe that log(T)nhd=O(log(T)1/21nhd+1)\sqrt{\frac{\log(T)}{nh^{d}}}=O(\log(T)^{1/2}\sqrt{\frac{1}{nh^{d}}+1}). Moreover,

Thr=T(1nT)r2r+d(T12r2r+d)1nr2r+d,\sqrt{T}h^{r}=\sqrt{T}\Big{(}\frac{1}{nT}\Big{)}^{\frac{r}{2r+d}}\leq(T^{\frac{1}{2}-\frac{r}{2r+d}})\frac{1}{n^{\frac{r}{2r+d}}},

and given that,

12r2r+d=d2(2r+d),\frac{1}{2}-\frac{r}{2r+d}=\frac{d}{2(2r+d)},

we get,

Thr=o(logmax1/2,5/q(T)1nhd+1).\sqrt{T}h^{r}=o\Big{(}\log^{\max{1/2,5/q}}(T)\sqrt{\frac{1}{nh^{d}}+1}\Big{)}.

Following the same line of arguments, we have that

T(lognTnT)2r2r+d=T122r2r+dlog2r2r+d(T)=o(logT1nhd+1).\sqrt{T}\Big{(}\frac{\log nT}{nT}\Big{)}^{\frac{2r}{2r+d}}=T^{\frac{1}{2}-\frac{2r}{2r+d}}\log^{\frac{2r}{2r+d}}(T)=o\Big{(}\log T\sqrt{\frac{1}{nh^{d}}+1}\Big{)}.

Thus, by the choice of τ\tau, it holds that with sufficiently large constant CτC_{\tau},

Amτfor all 1mlog(T)and all(s,e].\displaystyle A_{m}^{\mathcal{I}}\leq\tau\quad\text{for all }1\leq m\leq\log(T)\quad\text{and all}\quad\mathcal{I}\subseteq(s,e]. (16)

As a result, FSBS ((s,e],h,τ)((s,e],h,\tau) will correctly reject if (s,e](s,e] contains no undetected change-points.
 
Step 4. Assume that there exists an undetected change-point ηk~(s,e]\eta_{\tilde{k}}\in(s,e] such that

min{ηk~s,ηk~e}=Θ(T).\min\{\eta_{\tilde{k}}-s,\eta_{\tilde{k}}-e\}=\Theta(T).

Let mm^{*} and \mathcal{I}^{*} be defined as in FSBS ((s,e],h,τ)((s,e],h,\tau) with

=(α,β].\mathcal{I}^{*}=(\alpha^{*},\beta^{*}].

To complete the induction, it suffices to show that, there exists a change-point ηk(s,e]\eta_{k}\in(s,e] such that min{ηks,ηke}=Θ(T)\min\{\eta_{k}-s,\eta_{k}-e\}=\Theta(T) and |Dmηk|Υk|D_{m^{*}}^{\mathcal{I}^{*}}-\eta_{k}|\leq\Upsilon_{k}.

Consider the uni-variate time series

Ft,h(um)=1ni=1nyt,iKh(umxt,i)andft(um) for all 1tT.F_{t,h}(u_{m*})=\frac{1}{n}\sum_{i=1}^{n}y_{t,i}K_{h}(u_{m*}-x_{t,i})\quad\text{and}\quad f^{*}_{t}(u_{m*})\quad\text{ for all }1\leq t\leq T.

Since the collection of the change-points of the time series {ft(um)}t\{f_{t}^{*}(u_{m*})\}_{t\in\mathcal{I}^{*}} is a subset of that of {ηk}k=0K+1(s,e]\{\eta_{k}\}_{k=0}^{K+1}\cap(s,e], we may apply Lemma 9 to by setting

μt=Ft,h(um)andωt=ft(um)\mu_{t}=F_{t,h}(u_{m*})\quad\text{and}\quad\omega_{t}=f^{*}_{t}(u_{m*})

on the interval \mathcal{I}^{*}. Therefore, it suffices to justify that all the assumptions of Lemma 9 hold.

In the following, λ\lambda is used in Lemma 9. Then Equation 36 and Equation 37 are directly consequence of Equation 8, Equation 9, Equation 10.
We observe that, for any =(α,β](s,e],\mathcal{I}=(\alpha,\beta]\subseteq(s,e],

maxt=α+ρβρ|F~t,h(α,β](um)|=AmAm=maxt=α+ρβρ|F~t,h(α,β](um)|\displaystyle\max_{t=\alpha^{*}+\rho}^{\beta^{*}-\rho}|\widetilde{F}_{t,h}^{(\alpha^{*},\beta^{*}]}(u_{m^{*}})|=A^{\mathcal{I}^{*}}_{m^{*}}\geq A^{\mathcal{I}}_{m}=\max_{t=\alpha+\rho}^{\beta-\rho}|\widetilde{F}_{t,h}^{(\alpha,\beta]}(u_{m})|

for all mm. By Step 1 with k=(sk,ek]\mathcal{I}_{k}=(s_{k},e_{k}], it holds that

min{ηksk,ekηk}116ζkc2T,\min\{\eta_{k}-s_{k},e_{k}-\eta_{k}\}\geq\frac{1}{16}\zeta_{k}\geq c_{2}T,

Therefore for all k{k~:min{ηk~s,eηk~}c2T}k\in\{\tilde{k}:\min\{\eta_{\tilde{k}}-s,e-\eta_{\tilde{k}}\}\geq c_{2}T\},

maxt=α+ρβρ|F~t,h(α,β](um)|maxt=sk+ρ,m=1t=ekρ,m=log(T)|F~t,h(sk,ek](um)|c1Tκk,\max_{t=\alpha^{*}+\rho}^{\beta^{*}-\rho}|\widetilde{F}_{t,h}^{(\alpha^{*},\beta^{*}]}(u_{m^{*}})|\geq\max_{t=s_{k}+\rho,m=1}^{t=e_{k}-\rho,m=\log(T)}|\widetilde{F}_{t,h}^{(s_{k},e_{k}]}(u_{m})|\geq c_{1}\sqrt{T}\kappa_{k},

where the last inequality follows from Equation 11. Therefore Equation 38 holds in Lemma 9. Finally, Equation 39 is a direct consequence of the choices that

h=Ch(Tn)12r+dandρ=log(T)nhd.h=C_{h}(Tn)^{\frac{-1}{2r+d}}\quad\text{and}\quad\rho=\frac{\log(T)}{nh^{d}}.

Thus, all the conditions in Lemma 9 are met. So that, there exists a change-point ηk\eta_{k} of {ft(um)}t\{f^{*}_{t}(u_{m*})\}_{t\in\mathcal{I}^{*}}, satisfying

min{βηk,ηkα}>cT,\min\{\beta^{*}-\eta_{k},\eta_{k}-\alpha^{*}\}>cT, (17)

and

|Dmηk|max{C3λ2κk2,ρ}\displaystyle|D^{\mathcal{I}^{*}}_{m*}-\eta_{k}|\leq\max\{C_{3}\lambda^{2}\kappa_{k}^{-2},\rho\}\leq C4logmax{10/q,1}(T)(1+1nhd+Th2r+T(log(nT)nT)4r2r+d)κk2\displaystyle C_{4}\log^{\max\{10/q,1\}}(T)\bigg{(}1+\frac{1}{nh^{d}}+Th^{2r}+T\bigg{(}\frac{\log(nT)}{nT}\bigg{)}^{\frac{4r}{2r+d}}\bigg{)}\kappa_{k}^{-2}
\displaystyle\leq Clogmax{10/q,1}(T)(1+Td2r+dn2r2r+d)κk2\displaystyle C\log^{\max\{10/q,1\}}(T)\bigg{(}1+T^{\frac{d}{2r+d}}n^{\frac{-2r}{2r+d}}\bigg{)}\kappa_{k}^{-2}

for sufficiently large constant CC, where we have followed the same line of arguments than for the conclusion of (16). Observe that
i) The change-points of {ft(um)}t\{f_{t}^{*}(u_{m^{*}})\}_{t\in\mathcal{I}^{*}} belong to (s,e]{ηk}k=1K(s,e]\cap\{\eta_{k}\}_{k=1}^{K}; and
ii) Equation 17 and (α,β](s,e](\alpha^{*},\beta^{*}]\subseteq(s,e] imply that

min{eηk,ηks}>cTΥmax.\min\{e-\eta_{k},\eta_{k}-s\}>cT\geq\Upsilon_{\max}.

As discussed in the argument before Step 1, this implies that ηk\eta_{k} must be an undetected change-point of {ft(um)}t\{f_{t}^{*}(u_{m^{*}})\}_{t\in\mathcal{I}^{*}}. ∎

Appendix C Deviation bounds related to kernels

In this section, we deal with all the large probability events occurred in the proof of Theorem 1. Recall that Ft,h(x)=1ni=1nyt,iKh(xxt,i)p^(x)F_{t,h}(x)=\frac{\frac{1}{n}\sum_{i=1}^{n}y_{t,i}K_{h}(x-x_{t,i})}{\hat{p}(x)}, and

F~t,h(s,e](x)=et(es)(ts)l=s+1tFl,h(x)ts(es)(et)l=t+1eFl,h(x).\displaystyle\widetilde{F}_{t,h}^{(s,e]}(x)=\sqrt{\frac{e-t}{(e-s)(t-s)}}\sum_{l=s+1}^{t}F_{l,h}(x)-\sqrt{\frac{t-s}{(e-s)(e-t)}}\sum_{l=t+1}^{e}F_{l,h}(x).

By assumption 2, we have maxl=1qKl=maxl=1qKl<CK\max_{l=1}^{q}\|K^{l}\|_{\infty}=\max_{l=1}^{q}\|K\|_{\infty}^{l}<C_{K}, where CK>0C_{K}>0 is an absolute constant. Moreover, assumption 1b implies |ft(x)|<Cf|f^{*}_{t}(x)|<C_{f} for any x[0,1]d,t1,,T.x\in{{[0,1]^{d}}},t\in{1,...,T.}

Proposition 1.

Suppose that 1 and 2 hold, that ρnhdlog(T)\rho nh^{d}\geq\log(T) and that T3.T\geq 3. Then for any x[0,1]dx\in[0,1]^{d}

(maxk=ρTr~|1kt=r~+1r~+k(Ft,h(x)ft(x))|2c~z1nhd+1+C~1c~(log(T)nhd)+C~c~Thr+C¯Cfc~T(log(nT)nT)2r2r+d)\displaystyle\mathbb{P}\bigg{(}\max_{k=\rho}^{T-\tilde{r}}\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\big{(}F_{t,h}(x)-f_{t}^{*}(x)\big{)}\bigg{|}\geq\frac{2}{\tilde{c}}z\sqrt{\frac{1}{nh^{d}}+1}+\frac{\tilde{C}_{1}}{\tilde{c}}\Big{(}\sqrt{\frac{\log(T)}{nh^{d}}}\Big{)}+\frac{\tilde{C}}{\tilde{c}}\sqrt{T}h^{r}+\frac{\bar{C}C_{f}}{\tilde{c}}\sqrt{T}\Big{(}\frac{\log(nT)}{nT}\Big{)}^{\frac{2r}{2r+d}}\bigg{)}
2C1log(T)zq+T5+5Tn;\displaystyle\leq 2C_{1}\frac{\log(T)}{z^{q}}+T^{-5}+\frac{5}{Tn}; (18)
(maxk=ρr~|1kt=r~k+1r~(Ft,h(x)ft(x))|2c~z1nhd+1+C~1c~(log(T)nhd)+C~c~Thr+C¯Cfc~T(log(nT)nT)2r2r+d)\displaystyle\mathbb{P}\bigg{(}\max_{k=\rho}^{\tilde{r}}\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}-k+1}^{\tilde{r}}\big{(}F_{t,h}(x)-f_{t}^{*}(x)\big{)}\bigg{|}\geq\frac{2}{\tilde{c}}z\sqrt{\frac{1}{nh^{d}}+1}+\frac{\tilde{C}_{1}}{\tilde{c}}\Big{(}\sqrt{\frac{\log(T)}{nh^{d}}}\Big{)}+\frac{\tilde{C}}{\tilde{c}}\sqrt{T}h^{r}+\frac{\bar{C}C_{f}}{\tilde{c}}\sqrt{T}\Big{(}\frac{\log(nT)}{nT}\Big{)}^{\frac{2r}{2r+d}}\bigg{)}
2C1log(T)zq+T5+5Tn.\displaystyle\leq 2C_{1}\frac{\log(T)}{z^{q}}+T^{-5}+\frac{5}{Tn}. (19)
Proof.

The proofs of Proposition 1 and Proposition 1 are the same. So only the proof of Proposition 1 is presented. We define the events E1={p^uC¯((log(Tn)Tn)2r2r+d)}E_{1}=\Big{\{}||\hat{p}-u||_{\infty}\leq\bar{C}\Big{(}\Big{(}\frac{\log(Tn)}{Tn}\Big{)}^{\frac{2r}{2r+d}}\Big{)}\Big{\}} and E2={p^c¯,c¯=infxu(x)C¯(log(Tn)Tn)2r2r+d}E_{2}=\Big{\{}\hat{p}\geq\bar{c},\ \bar{c}=\inf_{x}u(x)-\bar{C}\Big{(}\frac{\log(Tn)}{Tn}\Big{)}^{\frac{2r}{2r+d}}\Big{\}}. Using Lemma 1, especifically by equation (6), we have that P(E1)11nTP(E_{1})\geq 1-\frac{1}{nT}. Then, we observe that in event E1E_{1}, for x[0,1]dx\in[0,1]^{d}

infsu(s)p^(x)u(x)p^(x)|u(x)p^(x)|C¯(log(Tn)Tn)2r2r+d\inf_{s}u(s)-\hat{p}(x)\leq u(x)-\hat{p}(x)\leq|u(x)-\hat{p}(x)|\leq\bar{C}\Big{(}\frac{\log(Tn)}{Tn}\Big{)}^{\frac{2r}{2r+d}}

which implies E1E2E1\subseteq{E_{2}}. Therefore, P(E2c)1nTP(E_{2}^{c})\leq\frac{1}{nT}.
Now, for any xx, observe that, by definition of Ft,hF_{t,h} and triangle inequality

I=maxk=ρTr~\displaystyle I=\max_{k=\rho}^{T-\tilde{r}} 1k|t=r~+1r~+kFt,h(x)t=r~+1r~+kft(x)|\displaystyle\frac{1}{\sqrt{k}}\bigg{|}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}F_{t,h}(x)-\sum_{t=\tilde{r}+1}^{\tilde{r}+k}f_{t}^{*}(x)\bigg{|}
maxk=ρTr~\displaystyle\leq\max_{k=\rho}^{T-\tilde{r}} |1kt=r~+1r~+k1ni=1n(ft(xt,i)Kh(xxt,i)p^(x)ft(x))|\displaystyle\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\frac{1}{n}\sum_{i=1}^{n}\bigg{(}\frac{f^{*}_{t}(x_{t,i})K_{h}(x-x_{t,i})}{\hat{p}(x)}-f_{t}^{*}(x)\bigg{)}\bigg{|}
+maxk=ρTr~\displaystyle+\max_{k=\rho}^{T-\tilde{r}} |1kt=r~+1r~+k1ni=1nξt(xt,i)Kh(xxt,i)p^(x)|\displaystyle\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\frac{1}{n}\sum_{i=1}^{n}\frac{\xi_{t}(x_{t,i})K_{h}(x-x_{t,i})}{\hat{p}(x)}\bigg{|} (20)
+maxk=ρTr~\displaystyle+\max_{k=\rho}^{T-\tilde{r}} |1kt=r~+1r~+k1ni=1nδt,iKh(xxt,i)p^(x)|\displaystyle\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\frac{1}{n}\sum_{i=1}^{n}\frac{\delta_{t,i}K_{h}(x-x_{t,i})}{\hat{p}(x)}\bigg{|}
=I1+I2+I3.\displaystyle=I_{1}+I_{2}+I_{3}.

In the following, we will show that I1I1,1+I1,2+I1,3I_{1}\leq I_{1,1}+I_{1,2}+I_{1,3}, and that

  1. 1.

    (I1,1C~1c~(log(T)nhd))1T5+1Tn\mathbb{P}\Big{(}I_{1,1}\geq\frac{\tilde{C}_{1}}{\tilde{c}}\Big{(}\sqrt{\frac{\log(T)}{nh^{d}}}\Big{)}\Big{)}\leq\frac{1}{T^{5}}+\frac{1}{Tn},

  2. 2.

    (I1,2C~c~Thr)1Tn\mathbb{P}\Big{(}I_{1,2}\geq\frac{\tilde{C}}{\tilde{c}}\sqrt{T}h^{r}\Big{)}\leq\frac{1}{Tn},

  3. 3.

    (I1,3C¯Cfc~T(log(nT)nT)2r2r+d)1Tn\mathbb{P}\Big{(}I_{1,3}\geq\frac{\bar{C}C_{f}}{\tilde{c}}\sqrt{T}\Big{(}\frac{\log(nT)}{nT}\Big{)}^{\frac{2r}{2r+d}}\Big{)}\leq\frac{1}{Tn},

  4. 4.

    (I21c~z1nhd+1)C1logTzq+1Tn\mathbb{P}\Big{(}I_{2}\geq\frac{1}{\tilde{c}}z\sqrt{\frac{1}{nh^{d}}+1}\Big{)}\leq\frac{C_{1}\log T}{z^{q}}+\frac{1}{Tn},

  5. 5.

    (I31c~z1nhd+1)C1logTzq+1Tn\mathbb{P}\Big{(}I_{3}\geq\frac{1}{\tilde{c}}z\sqrt{\frac{1}{nh^{d}}+1}\Big{)}\leq\frac{C_{1}\log T}{z^{q}}+\frac{1}{Tn},

in order to conclude that,

(I2z1nhd+1+C~1(log(T)nhd)+C~c~Thr+C¯Cfc~T(log(nT)nT)2r2r+d)\displaystyle\mathbb{P}\bigg{(}I\geq 2z\sqrt{\frac{1}{nh^{d}}+1}+\tilde{C}_{1}\Big{(}\sqrt{\frac{\log(T)}{nh^{d}}}\Big{)}+\frac{\tilde{C}}{\tilde{c}}\sqrt{T}h^{r}+\frac{\bar{C}C_{f}}{\tilde{c}}\sqrt{T}\Big{(}\frac{\log(nT)}{nT}\Big{)}^{\frac{2r}{2r+d}}\bigg{)}
\displaystyle\leq (I1,1C~1(log(T)nhd))+(I1,2C~c~Thr)+(I1,3C¯Cfc~T(log(nT)nT)2r2r+d)\displaystyle\mathbb{P}\Big{(}I_{1,1}\geq\tilde{C}_{1}\Big{(}\sqrt{\frac{\log(T)}{nh^{d}}}\Big{)}\Big{)}+\mathbb{P}\Big{(}I_{1,2}\geq\frac{\tilde{C}}{\tilde{c}}\sqrt{T}h^{r}\Big{)}+\mathbb{P}\Big{(}I_{1,3}\geq\frac{\bar{C}C_{f}}{\tilde{c}}\sqrt{T}\Big{(}\frac{\log(nT)}{nT}\Big{)}^{\frac{2r}{2r+d}}\Big{)}
+\displaystyle+ (I2z1nhd+1)+(I3z1nhd+1)\displaystyle\mathbb{P}\Big{(}I_{2}\geq z\sqrt{\frac{1}{nh^{d}}+1}\Big{)}+\mathbb{P}\Big{(}I_{3}\geq z\sqrt{\frac{1}{nh^{d}}+1}\Big{)}
\displaystyle\leq 2C1log(T)zq+T5+5Tn.\displaystyle 2C_{1}\frac{\log(T)}{z^{q}}+T^{-5}+\frac{5}{Tn}.

Step 1. The analysis for I1I_{1} is done. We observe that,

maxk=ρTr~|1kt=r~+1r~+k1ni=1n(ft(xt,i)Kh(xxt,i)p^(x)ft(x))|\displaystyle\max_{k=\rho}^{T-\tilde{r}}\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\frac{1}{n}\sum_{i=1}^{n}\bigg{(}\frac{f^{*}_{t}(x_{t,i})K_{h}(x-x_{t,i})}{\hat{p}(x)}-f_{t}^{*}(x)\bigg{)}\bigg{|}
maxk=ρTr~|1kt=r~+1r~+k1ni=1n(ft(xt,i)Kh(xxt,i)p^(x)ft(z)Kh(xz)𝑑μ(z)p^(x))|\displaystyle\leq\max_{k=\rho}^{T-\tilde{r}}\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\frac{1}{n}\sum_{i=1}^{n}\bigg{(}\frac{f^{*}_{t}(x_{t,i})K_{h}(x-x_{t,i})}{\hat{p}(x)}-\frac{\int f_{t}^{*}(z)K_{h}(x-z)d\mu(z)}{\hat{p}(x)}\bigg{)}\bigg{|}
+maxk=ρTr~|1kt=r~+1r~+k1ni=1n(ft(z)Kh(xz)𝑑μ(z)p^(x)ft(x))|=I1,1+I~1.\displaystyle+\max_{k=\rho}^{T-\tilde{r}}\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\frac{1}{n}\sum_{i=1}^{n}\bigg{(}\frac{\int f_{t}^{*}(z)K_{h}(x-z)d\mu(z)}{\hat{p}(x)}-f_{t}^{*}(x)\bigg{)}\bigg{|}=I_{1,1}+\tilde{I}_{1}.

Step 1.1 The analysis for I1,1I_{1,1} is done. We note that the random variables {ft(xt,i)Kh(xxt,i)}1int,1tN\{f^{*}_{t}(x_{t,i})K_{h}(x-x_{t,i})\}_{1\leq i\leq n_{t},1\leq t\leq N} are independent distributed with mean ft(z)Kh(xz)𝑑μ(z)\int f_{t}^{*}(z)K_{h}(x-z)d\mu(z) and

Var(ft(xt,i)Kh(xxt,i))\displaystyle\mathrm{Var}\big{(}f^{*}_{t}(x_{t,i})K_{h}(x-x_{t,i})\big{)}\leq E{(ft)2(xt,i)Kh2(xxt,i)}\displaystyle E\big{\{}(f^{*}_{t})^{2}(x_{t,i})K_{h}^{2}(x-x_{t,i})\big{\}}
=\displaystyle= [0,1]d(ft)2(z)1h2dK2(xzh)𝑑μ(z)\displaystyle\int_{[0,1]^{d}}(f^{*}_{t})^{2}(z)\frac{1}{h^{2d}}K^{2}\big{(}\frac{x-z}{h}\big{)}d\mu(z)
\displaystyle\leq Cf2hd[0,1]d1hdK2(xzh)𝑑μ(z)\displaystyle\frac{C_{f}^{2}}{h^{d}}\int_{[0,1]^{d}}\frac{1}{h^{d}}K^{2}\big{(}\frac{x-z}{h}\big{)}d\mu(z)
=\displaystyle= Cf2hd[0,1]dK2(u)𝑑μ(u)<Cf2CK2hd.\displaystyle\frac{C_{f}^{2}}{h^{d}}\int_{[0,1]^{d}}K^{2}\big{(}u\big{)}d\mu(u)<\frac{C_{f}^{2}C_{K}^{2}}{h^{d}}.

Since |ft(xt,i)Kh(xxt,i)|CfCKhd|f^{*}_{t}(x_{t,i})K_{h}(x-x_{t,i})|\leq C_{f}C_{K}h^{-d}, by Bernstein inequality Vershynin (2018), we have that

(|1knt=r+1r+ki=1nft(xt,i)Kh(xxt,i)ft(z)Kh(xz)𝑑μ(z)|C~1{log(T)knhd+log(T)knhd})T6.\mathbb{P}\bigg{(}\bigg{|}\frac{1}{kn}\sum_{t=r+1}^{r+k}\sum_{i=1}^{n}f^{*}_{t}(x_{t,i})K_{h}(x-x_{t,i})-\int f_{t}^{*}(z)K_{h}(x-z)d\mu(z)\bigg{|}\geq\tilde{C}_{1}\bigg{\{}\sqrt{\frac{\log(T)}{knh^{d}}}+\frac{\log(T)}{knh^{d}}\bigg{\}}\bigg{)}\leq T^{-6}.

Since knhdlog(T)knh^{d}\geq\log(T) if kρk\geq\rho, with probability at most T5T^{-5}, it holds that

maxk=ρTr~|1knt=r+1r+ki=1n(ft(xt,i)Kh(xxt,i)ft(z)Kh(xz)𝑑μ(z))|C~1log(T)nhd.\max_{k=\rho}^{T-\tilde{r}}\bigg{|}\frac{1}{\sqrt{k}n}\sum_{t=r+1}^{r+k}\sum_{i=1}^{n}\bigg{(}f^{*}_{t}(x_{t,i})K_{h}(x-x_{t,i})-\int f_{t}^{*}(z)K_{h}(x-z)d\mu(z)\bigg{)}\bigg{|}\geq\tilde{C}_{1}\sqrt{\frac{\log(T)}{nh^{d}}}.

Therefore, using that P(E2c)1Tn,P(E^{c}_{2})\leq\frac{1}{Tn}, we conclude

maxk=ρTr~|1kt=r~+1r~+k1nti=1nt(ft(xt,i)Kh(xxt,i)p^(x)ft(z)Kh(xz)𝑑μ(z)p^(x))|C~1c~log(T)nhd\max_{k=\rho}^{T-\tilde{r}}\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\frac{1}{n_{t}}\sum_{i=1}^{n_{t}}\bigg{(}\frac{f^{*}_{t}(x_{t,i})K_{h}(x-x_{t,i})}{\hat{p}(x)}-\frac{\int f_{t}^{*}(z)K_{h}(x-z)d\mu(z)}{\hat{p}(x)}\bigg{)}\bigg{|}\geq\frac{\tilde{C}_{1}}{\tilde{c}}\sqrt{\frac{\log(T)}{nh^{d}}}

with probability at most T5+1nT.T^{-5}+\frac{1}{nT}.  

Step 1.2 The analysis for I1,2I_{1,2} and I1,3I_{1,3} is done. We observe that

I~1=\displaystyle\tilde{I}_{1}= maxk=ρTr~|1kt=r~+1r~+k1ni=1n(ft(z)Kh(xz)𝑑μ(z)p^(x)ft(x))|\displaystyle\max_{k=\rho}^{T-\tilde{r}}\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\frac{1}{n}\sum_{i=1}^{n}\bigg{(}\frac{\int f_{t}^{*}(z)K_{h}(x-z)d\mu(z)}{\hat{p}(x)}-f_{t}^{*}(x)\bigg{)}\bigg{|}
\displaystyle\leq maxk=ρTr~|1kt=r~+1r~+k1ni=1n(ft(z)Kh(xz)𝑑μ(z)p^(x)ft(x)u(x)p^(x))|\displaystyle\max_{k=\rho}^{T-\tilde{r}}\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\frac{1}{n}\sum_{i=1}^{n}\bigg{(}\frac{\int f_{t}^{*}(z)K_{h}(x-z)d\mu(z)}{\hat{p}(x)}-\frac{f_{t}^{*}(x)u(x)}{\hat{p}(x)}\bigg{)}\bigg{|} (21)
+\displaystyle+ maxk=ρTr~|1kt=r~+1r~+k1ni=1n(ft(x)u(x)p^(x)ft(x))|=I1,2+I1,3.\displaystyle\max_{k=\rho}^{T-\tilde{r}}\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\frac{1}{n}\sum_{i=1}^{n}\bigg{(}\frac{f_{t}^{*}(x)u(x)}{\hat{p}(x)}-f_{t}^{*}(x)\bigg{)}\bigg{|}=I_{1,2}+I_{1,3}. (22)

Then, we observe that

I1,2=\displaystyle I_{1,2}= |1kt=r~+1r~+k1ni=1n(ft(z)Kh(xz)𝑑μ(z)ft(x)u(x))|\displaystyle\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\frac{1}{n}\sum_{i=1}^{n}\bigg{(}\int f_{t}^{*}(z)K_{h}(x-z)d\mu(z)-f_{t}^{*}(x)u(x)\bigg{)}\bigg{|}
1kt=r~+1r~+k1ni=1n|ft(z)Kh(xz)𝑑μ(z)ft(x)u(x)|\displaystyle\leq\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\frac{1}{n}\sum_{i=1}^{n}\bigg{|}\int f_{t}^{*}(z)K_{h}(x-z)d\mu(z)-f_{t}^{*}(x)u(x)\bigg{|}
1kt=r~+1r~+k1ni=1nC~hr\displaystyle\leq\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\frac{1}{n}\sum_{i=1}^{n}\tilde{C}h^{r}
=1kt=r~+1r~+kC~hr\displaystyle=\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\tilde{C}h^{r}
=kC~hr\displaystyle=\sqrt{k}\tilde{C}h^{r}

where the second inequality follows from assumption 2. Therefore, using event E2E_{2}, we can bound (21) by C~c~Thr\frac{\tilde{C}}{\tilde{c}}\sqrt{T}h^{r} with probability at least 11nT.1-\frac{1}{nT}. Meanwhile, for (22) we have that,

I1,3=\displaystyle I_{1,3}= maxk=ρTr~|1kt=r~+1r~+k1ni=1n(ft(x)u(x)p^(x)ft(x))|\displaystyle\max_{k=\rho}^{T-\tilde{r}}\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\frac{1}{n}\sum_{i=1}^{n}\bigg{(}\frac{f_{t}^{*}(x)u(x)}{\hat{p}(x)}-f_{t}^{*}(x)\bigg{)}\bigg{|}
\displaystyle\leq maxk=ρTr~1kt=r~+1r~+k1ni=1n|ft(x)||u(x)p^(x)p^(x)|.\displaystyle\max_{k=\rho}^{T-\tilde{r}}\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\frac{1}{n}\sum_{i=1}^{n}|f_{t}^{*}(x)|\bigg{|}\frac{u(x)-\hat{p}(x)}{\hat{p}(x)}\bigg{|}. (23)

Then, since in the event E1E_{1}, it is satisfies that

p^uC¯((log(Tn)Tn)2r2r+d),andp^c¯;||\hat{p}-u||_{\infty}\leq\bar{C}\Big{(}\Big{(}\frac{\log(Tn)}{Tn}\Big{)}^{\frac{2r}{2r+d}}\Big{)},\ \text{and}\ \hat{p}\geq\bar{c};

we have that equation (23), is bounded by

maxk=ρTr~1kt=r~+1r~+k1ni=1nC¯Cfc~(log(nT)nT)2r2r+dC¯Cfc~T(log(nT)nT)2r2r+d\max_{k=\rho}^{T-\tilde{r}}\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\frac{1}{n}\sum_{i=1}^{n}\frac{\bar{C}C_{f}}{\tilde{c}}\Big{(}\frac{\log(nT)}{nT}\Big{)}^{\frac{2r}{2r+d}}\leq\frac{\bar{C}C_{f}}{\tilde{c}}\sqrt{T}\Big{(}\frac{\log(nT)}{nT}\Big{)}^{\frac{2r}{2r+d}}

with probability at least 11nT.1-\frac{1}{nT}.  

Step 2. The analysis for I2I_{2} and I3I_{3} is done. For 1tT1\leq t\leq T, let

Zt=1ni=1nξt(xt,i)Kh(xxt,i)andWt=1ni=1nδt,iKh(xxt,i).Z_{t}=\frac{1}{n}\sum_{i=1}^{n}\xi_{t}(x_{t,i})K_{h}(x-x_{t,i})\quad\text{and}\quad W_{t}=\frac{1}{n}\sum_{i=1}^{n}\delta_{t,i}K_{h}(x-x_{t,i}).

By Lemma 2 and event E2E_{2}, it holds that

{maxk=ρTr~|1kt=r~+1r~+kZtp^(x)|1c~z1nhd+1}C1log(T)zq+1nT\displaystyle\mathbb{P}\bigg{\{}\max_{k=\rho}^{T-\tilde{r}}\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\frac{Z_{t}}{\hat{p}(x)}\bigg{|}\geq\frac{1}{\tilde{c}}z\sqrt{\frac{1}{nh^{d}}+1}\bigg{\}}\leq\frac{C_{1}\log(T)}{z^{q}}+\frac{1}{nT}

and

{maxk=ρTr~|1kt=r~+1r~+kWtp^(x)|1c~z1nhd+1}C1log(T)zq+1nT.\displaystyle\mathbb{P}\bigg{\{}\max_{k=\rho}^{T-\tilde{r}}\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=\tilde{r}+1}^{\tilde{r}+k}\frac{W_{t}}{\hat{p}(x)}\bigg{|}\geq\frac{1}{\tilde{c}}z\sqrt{\frac{1}{nh^{d}}+1}\bigg{\}}\leq\frac{C_{1}\log(T)}{z^{q}}+\frac{1}{nT}.

The desired result follows from putting the previous steps together. ∎

Corollary 1.

Suppose that ρnhdlog(T)\rho nh^{d}\geq\log(T) and that T3T\geq 3. Then for z>0z>0

{maxt=s+ρ+1eρ|F~t,h(s,e](x)f~t(s,e](x)|4c~z1nhd+1+2C~1c~(log(T)nhd)+2C~c~Thr+2C¯Cfc~T(log(nT)nT)2r2r+d}\displaystyle\mathbb{P}\bigg{\{}\max_{t=s+\rho+1}^{e-\rho}\bigg{|}\widetilde{F}_{t,h}^{(s,e]}(x)-\widetilde{f}_{t}^{(s,e]}(x)\bigg{|}\geq\frac{4}{\tilde{c}}z\sqrt{\frac{1}{nh^{d}}+1}+\frac{2\tilde{C}_{1}}{\tilde{c}}\Big{(}\sqrt{\frac{\log(T)}{nh^{d}}}\Big{)}+\frac{2\tilde{C}}{\tilde{c}}\sqrt{T}h^{r}+\frac{2\bar{C}C_{f}}{\tilde{c}}\sqrt{T}\Big{(}\frac{\log(nT)}{nT}\Big{)}^{\frac{2r}{2r+d}}\bigg{\}}
2T5+4C1log(T)zq+101Tn.\displaystyle\leq 2T^{-5}+\frac{4C_{1}\log(T)}{z^{q}}+10\frac{1}{Tn}.
Proof.

By definition of F~t,h(s,e]\widetilde{F}_{t,h}^{(s,e]} and f~t(s,e]\widetilde{f}_{t}^{(s,e]}, we have that

|F~t,h(s,e](x)f~t(s,e](x)|\displaystyle\bigg{|}\widetilde{F}_{t,h}^{(s,e]}(x)-\widetilde{f}_{t}^{(s,e]}(x)\bigg{|} |et(es)(ts)l=s+1t(Fl,h(x)fl(x))|\displaystyle\leq\bigg{|}\sqrt{\frac{e-t}{(e-s)(t-s)}}\sum_{l=s+1}^{t}(F_{l,h}(x)-f^{*}_{l}(x))\bigg{|}
+|ts(es)(et)l=t+1e(Fl,h(x)fl(x))|.\displaystyle+\bigg{|}\sqrt{\frac{t-s}{(e-s)(e-t)}}\sum_{l=t+1}^{e}(F_{l,h}(x)-f^{*}_{l}(x))\bigg{|}.

Then, we observe that,

et(es)(ts)1tsifst,andts(es)(et)1etifte.\displaystyle\sqrt{\frac{e-t}{(e-s)(t-s)}}\leq\sqrt{\frac{1}{t-s}}\ \text{if}\ s\leq t,\ \text{and}\ \sqrt{\frac{t-s}{(e-s)(e-t)}}\leq\sqrt{\frac{1}{e-t}}\ \text{if}\ t\leq e.

Therefore,

X=maxt=s+ρ+1eρ|F~t,h(s,e](x)f~t(s,e](x)|\displaystyle X=\max_{t=s+\rho+1}^{e-\rho}\bigg{|}\widetilde{F}_{t,h}^{(s,e]}(x)-\widetilde{f}_{t}^{(s,e]}(x)\bigg{|}\leq maxt=s+ρ+1eρ|1tsl=s+1t(Fl,h(x){fl(x))|\displaystyle\max_{t=s+\rho+1}^{e-\rho}\bigg{|}\sqrt{\frac{1}{t-s}}\sum_{l=s+1}^{t}\bigg{(}F_{l,h}(x)-\{f_{l}^{*}(x)\bigg{)}\bigg{|}
+\displaystyle+ maxt=s+ρ+1eρ|1etl=t+1e(Fl,h(x)fl(x))|=X1+X2.\displaystyle\max_{t=s+\rho+1}^{e-\rho}\bigg{|}\sqrt{\frac{1}{e-t}}\sum_{l=t+1}^{e}\bigg{(}F_{l,h}(x)-f_{l}^{*}(x)\bigg{)}\bigg{|}=X_{1}+X_{2}.

Finally, letting λ=4c~z1nhd+1+2C~1c~(log(T)nhd)+2C~c~Thr+2C¯Cfc~T(log(nT)nT)2r2r+d,\lambda=\frac{4}{\tilde{c}}z\sqrt{\frac{1}{nh^{d}}+1}+\frac{2\tilde{C}_{1}}{\tilde{c}}\Big{(}\sqrt{\frac{\log(T)}{nh^{d}}}\Big{)}+\frac{2\tilde{C}}{\tilde{c}}\sqrt{T}h^{r}+\frac{2\bar{C}C_{f}}{\tilde{c}}\sqrt{T}\Big{(}\frac{\log(nT)}{nT}\Big{)}^{\frac{2r}{2r+d}}, we get that

(Xλ)\displaystyle\mathbb{P}(X\geq\lambda)\leq (X1+X2λ2+λ2)\displaystyle\mathbb{P}(X_{1}+X_{2}\geq\frac{\lambda}{2}+\frac{\lambda}{2})
\displaystyle\leq (X1λ2)+(X2λ2)\displaystyle\mathbb{P}(X_{1}\geq\frac{\lambda}{2})+\mathbb{P}(X_{2}\geq\frac{\lambda}{2})
\displaystyle\leq 2T5+4C1log(T)zq+101Tn,\displaystyle 2T^{-5}+\frac{4C_{1}\log(T)}{z^{q}}+10\frac{1}{Tn},

where the last inequality follows from Proposition 1. ∎

C.1 Additional Technical Results

The following lemmas provide lower bounds for

Zt=1ni=1nξt(xt,i)Kh(xxt,i)andWt=1ni=1nδt,iKh(xxt,i).Z_{t}=\frac{1}{n}\sum_{i=1}^{n}\xi_{t}(x_{t,i})K_{h}(x-x_{t,i})\quad\text{and}\quad W_{t}=\frac{1}{n}\sum_{i=1}^{n}\delta_{t,i}K_{h}(x-x_{t,i}).

They are a direct consequence of the temporal dependence and heavy-tailedness of the data considered in 1.

Lemma 2.

Let ρT\rho\leq T be such that ρnhdlog(T)\rho nh^{d}\geq\log(T) and T3.T\geq 3. Let N+N\in\mathbb{Z}^{+} be such that NρN\geq\rho.
a. Suppose that for any q3q\geq 3 it holds that

t=1t1/21/q𝔼{ξtξtq}1/q=O(1).\displaystyle\sum_{t=1}^{\infty}t^{1/2-1/q}\mathbb{E}\big{\{}\|\xi_{t}-\xi_{t}^{*}\|_{\infty}^{q}\big{\}}^{1/q}=\mathrm{O}(1). (24)

Then for any z>0z>0,

{maxk=ρN|{1nhd+1}1/21kt=1kZt|z}C1log(T)zq.\displaystyle\mathbb{P}\bigg{\{}\max_{k=\rho}^{N}\bigg{|}\bigg{\{}\frac{1}{nh^{d}}+1\bigg{\}}^{-1/2}\frac{1}{\sqrt{k}}\sum_{t=1}^{k}Z_{t}\bigg{|}\geq z\bigg{\}}\leq\frac{C_{1}\log(T)}{z^{q}}.

b. Suppose that for some q3q\geq 3,

t=1t1/21/qmaxi=1n{𝔼|δt,iδt,i|q}1/q<O(1).\displaystyle\sum_{t=1}^{\infty}t^{1/2-1/q}\max_{i=1}^{n}\big{\{}\mathbb{E}|\delta_{t,i}-\delta_{t,i}^{*}|^{q}\big{\}}^{1/q}<\mathrm{O}(1). (25)

Then for any w>0w>0,

{maxk=ρN|{1nhd+1}1/21kt=1kWt|w}C1log(T)wq.\displaystyle\mathbb{P}\bigg{\{}\max_{k=\rho}^{N}\bigg{|}\bigg{\{}\frac{1}{nh^{d}}+1\bigg{\}}^{-1/2}\frac{1}{\sqrt{k}}\sum_{t=1}^{k}W_{t}\bigg{|}\geq w\bigg{\}}\leq\frac{C_{1}\log(T)}{w^{q}}.
Proof.

The proof of part b is similar and simpler than that of part a. For conciseness, only the proof of a is presented.

By Lemma 4 and Equation 24, for all J+J\in\mathbb{Z}^{+}, it holds that

𝔼{maxk=1J|t=1kZt|q}1/qJ1/2C{(1nhd)1/2+1}+J1/qC′′{(1nhd)(q1)/q+1}.\displaystyle\mathbb{E}\bigg{\{}\max_{k=1}^{J}|\sum_{t=1}^{k}Z_{t}|^{q}\bigg{\}}^{1/q}\leq J^{1/2}C\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}+J^{1/q}C^{\prime\prime}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{(q-1)/q}+1\bigg{\}}.

As a result there exists a constant C1C_{1} such that

𝔼{maxk=1J|t=1kZt|q}\displaystyle\mathbb{E}\bigg{\{}\max_{k=1}^{J}|\sum_{t=1}^{k}Z_{t}|^{q}\bigg{\}}\leq C1Jq/2{(1nhd)1/2+1}q+C1J{(1nhd)(q1)/q+1}q.\displaystyle C_{1}J^{q/2}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}^{q}+C_{1}J\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{(q-1)/q}+1\bigg{\}}^{q}.

We observe that

Jq/2=\displaystyle J^{q/2}= q20Jxq/21𝑑x\displaystyle\frac{q}{2}\int_{0}^{J}x^{q/2-1}dx (26)
=\displaystyle= q2(01xq/21𝑑x+1Jxq/21𝑑x)\displaystyle\frac{q}{2}\Big{(}\int_{0}^{1}x^{q/2-1}dx+\int_{1}^{J}x^{q/2-1}dx\Big{)} (27)
\displaystyle\leq q2(1+1Jxq/21𝑑x)\displaystyle\frac{q}{2}\Big{(}1+\int_{1}^{J}x^{q/2-1}dx\Big{)} (28)
=\displaystyle= q2(1+12xq/21𝑑x++J1Jxq/21𝑑x)\displaystyle\frac{q}{2}\Big{(}1+\int_{1}^{2}x^{q/2-1}dx+...+\int_{J-1}^{J}x^{q/2-1}dx\Big{)} (29)
\displaystyle\leq q2(1+122q/21𝑑x++J1JJq/21𝑑x)\displaystyle\frac{q}{2}\Big{(}1+\int_{1}^{2}2^{q/2-1}dx+...+\int_{J-1}^{J}J^{q/2-1}dx\Big{)} (30)
=\displaystyle= q2k=1Jkq/21,\displaystyle\frac{q}{2}\sum_{k=1}^{J}k^{q/2-1}, (31)

which implies, there is a constant C2C_{2} such that

C1Jq/2{(1nhd)1/2+1}q+C1J{(1nhd)(q1)/q+1}qC2k=1Jαk,\displaystyle C_{1}J^{q/2}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}^{q}+C_{1}J\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{(q-1)/q}+1\bigg{\}}^{q}\leq C_{2}\sum_{k=1}^{J}\alpha_{k},

where

αk=kq/21{(1nhd)1/2+1}q+{(1nhd)(q1)/q+1}q.\alpha_{k}=k^{q/2-1}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}^{q}+\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{(q-1)/q}+1\bigg{\}}^{q}.

By theorem B.2 of Kirch (2006),

𝔼{maxk=1N|1kt=1kZt|}q\displaystyle\mathbb{E}\bigg{\{}\max_{k=1}^{N}\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=1}^{k}Z_{t}\bigg{|}\bigg{\}}^{q}\leq 4C2l=1Nlq/2αl\displaystyle 4C_{2}\sum_{l=1}^{N}l^{-q/2}\alpha_{l}
=\displaystyle= 4C2l=1N(l1{(1nhd)1/2+1}q+lq/2{(1nhd)(q1)/q+1}q)\displaystyle 4C_{2}\sum_{l=1}^{N}\bigg{(}l^{-1}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}^{q}+l^{-q/2}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{(q-1)/q}+1\bigg{\}}^{q}\bigg{)}
\displaystyle\leq C3log(N){(1nhd)1/2+1}q+C3Nq/2+1{(1nhd)(q1)/q+1}q\displaystyle C_{3}\log(N)\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}^{q}+C_{3}N^{-q/2+1}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{(q-1)/q}+1\bigg{\}}^{q}

where the last inequality follows from the fact that 1N1x=log(N)\int_{1}^{N}\frac{1}{x}=log(N) and that 1Nxq2=O(Nq/2+1).\int_{1}^{N}x^{\frac{-q}{2}}=O(N^{-q/2+1}). Since

N1/21/qρ1/21/q(nhd)1/2(q1)/q,N^{1/2-1/q}\geq\rho^{1/2-1/q}\geq{(nh^{d})^{1/2-(q-1)/q}},

it holds that, 1nhdN.\frac{1}{nh^{d}}\leq N. Moreover,

Nq/2+1{(1nhd)(q1)/q+1}q=\displaystyle N^{-q/2+1}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{(q-1)/q}+1\bigg{\}}^{q}= Nq/2+1{(1nhd)(q1)/q+1/21/2+1}q\displaystyle N^{-q/2+1}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{(q-1)/q+1/2-1/2}+1\bigg{\}}^{q}
=\displaystyle= Nq/2+1{(1nhd)(q1)/q1/2(1nhd)1/2+1}q\displaystyle N^{-q/2+1}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{(q-1)/q-1/2}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}^{q}
\displaystyle\leq Nq/2+1{(1nhd)1/2+1}q{(1nhd)(q1)/q1/2+1}q\displaystyle N^{-q/2+1}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}^{q}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{(q-1)/q-1/2}+1\bigg{\}}^{q}
=\displaystyle= Nq/2+1{(1nhd)1/2+1}q{(1nhd)1/21/q+1}q\displaystyle N^{-q/2+1}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}^{q}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2-1/q}+1\bigg{\}}^{q}
=\displaystyle= Nq/2+1{(1nhd)1/2+1}q{(1nhd)(q2)/(2q)+1}q\displaystyle N^{-q/2+1}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}^{q}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{(q-2)/(2q)}+1\bigg{\}}^{q}
\displaystyle\leq C3Nq/2+1{(1nhd)1/2+1}q{(1nhd)(q2)/(2q)}q\displaystyle C^{{}^{\prime}}_{3}N^{-q/2+1}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}^{q}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{(q-2)/(2q)}\bigg{\}}^{q}
=\displaystyle= C3Nq/2+1{(1nhd)1/2+1}q{(1nhd)(q2)/(2)}\displaystyle C^{{}^{\prime}}_{3}N^{-q/2+1}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}^{q}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{(q-2)/(2)}\bigg{\}}
\displaystyle\leq C3Nq/2+1{(1nhd)1/2+1}q{(1nhd)q/21}\displaystyle C^{{}^{\prime}}_{3}N^{-q/2+1}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}^{q}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q/2-1}\bigg{\}}
\displaystyle\leq C3Nq/2+1{(1nhd)1/2+1}qNq/21\displaystyle C^{{}^{\prime}}_{3}N^{-q/2+1}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}^{q}N^{q/2-1}
=\displaystyle= C3{(1nhd)1/2+1}q.\displaystyle C^{{}^{\prime}}_{3}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}^{q}.

It follows that,

𝔼{maxk=1N|1kt=1kZt|}qC4log(N){(1nhd)1/2+1}q.\mathbb{E}\bigg{\{}\max_{k=1}^{N}\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=1}^{k}Z_{t}\bigg{|}\bigg{\}}^{q}\leq C_{4}\log(N)\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}^{q}.

By Markov’s inequality, for any z>0z>0 and the assumption that TN,T\geq N,

{maxk=1N{1nhd+1}1/2|1kt=1kZt|z}C1log(T)zq.\mathbb{P}\bigg{\{}\max_{k=1}^{N}\bigg{\{}\frac{1}{nh^{d}}+1\bigg{\}}^{-1/2}\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=1}^{k}Z_{t}\bigg{|}\geq z\bigg{\}}\leq\frac{C_{1}\log(T)}{z^{q}}.

Since NρN\geq\rho, this directly implies that

{maxk=ρN{1nhd+1}1/2|1kt=1kZt|z}C1log(T)zq.\displaystyle\mathbb{P}\bigg{\{}\max_{k=\rho}^{N}\bigg{\{}\frac{1}{nh^{d}}+1\bigg{\}}^{-1/2}\bigg{|}\frac{1}{\sqrt{k}}\sum_{t=1}^{k}Z_{t}\bigg{|}\geq z\bigg{\}}\leq\frac{C_{1}\log(T)}{z^{q}}.

Lemma 3.

Suppose 1 c holds and q2q\geq 2. Then there exists absolute constants C>0C>0 so that

𝔼|ZtZt|qC𝔼{ξtξtq}{(1nhd)q1+1}.\displaystyle\mathbb{E}|Z_{t}-Z_{t}^{*}|^{q}\leq C\mathbb{E}\big{\{}\|\xi_{t}-\xi_{t}^{*}\|_{\infty}^{q}\big{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+1\bigg{\}}. (32)

If in addition 𝔼{ξtq}=O(1)\mathbb{E}\big{\{}\|\xi_{t}\|_{\infty}^{q}\big{\}}=O(1), then there exists absolute constants CC^{\prime} such that

𝔼|Zt|qC{(1nhd)q1+1}.\displaystyle\mathbb{E}|Z_{t}|^{q}\leq C^{\prime}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+1\bigg{\}}. (33)
Proof.

The proof of the Equation 33 is simpler and simpler than Equation 32. So only the proof of Equation 32 is presented. Note that since {xt}t=1T\{x_{t}\}_{t=1}^{T} and {ξt}t=1T\{\xi_{t}\}_{t=1}^{T} are independent, and that {xt}t=1T\{x_{t}\}_{t=1}^{T} are independent identically distributed,

Zt=1ni=1nξt(xt,i)Kh(xxt,i).Z_{t}^{*}=\frac{1}{n}\sum_{i=1}^{n}\xi_{t}^{*}(x_{t,i})K_{h}(x-x_{t,i}).

Step 1. Note that, by the Newton’s binomial

𝔼|ZtZt|q=\displaystyle\mathbb{E}|Z_{t}-Z_{t}^{*}|^{q}= 𝔼{|1ni=1n{ξtξt}(xt,i)Kh(xxt,i)|q}\displaystyle\mathbb{E}\bigg{\{}\bigg{|}\frac{1}{n}\sum_{i=1}^{n}\{\xi_{t}^{*}-\xi_{t}\}(x_{t,i})K_{h}(x-x_{t,i})\bigg{|}^{q}\bigg{\}}
\displaystyle\leq 1nq𝔼{β1+β2++βn=qβ10,,βn0(qβ1,β2,,βn)j=1n|{ξtξt}(xt,i)Kh(xxt,i)|βj}\displaystyle\frac{1}{{n}^{q}}\mathbb{E}\bigg{\{}\sum_{\begin{subarray}{c}\beta_{1}+\beta_{2}+\ldots+\beta_{n}=q\\ \beta_{1}\geq 0,\ldots,\beta_{n}\geq 0\end{subarray}}{q\choose\beta_{1},\beta_{2},\ldots,\beta_{n}}\prod_{j=1}^{n}\big{|}\{\xi_{t}^{*}-\xi_{t}\}(x_{t,{i}})K_{h}(x-x_{t,{i}})\big{|}^{\beta_{j}}\bigg{\}}
=\displaystyle= 1nq𝔼{k=1qβ1+β2++βn=qβ=(β1,,βn),β0=k,β0(qβ1,β2,,βn)j=1n|{ξtξt}(xt,i)Kh(xxt,i)|βj}.\displaystyle\frac{1}{n^{q}}\mathbb{E}\bigg{\{}\sum_{k=1}^{q}\sum_{\begin{subarray}{c}\beta_{1}+\beta_{2}+\ldots+\beta_{n}=q\\ \\ \beta=(\beta_{1},\ldots,\beta_{n}),\|\beta\|_{0}=k,\beta\geq 0\end{subarray}}{q\choose\beta_{1},\beta_{2},\ldots,\beta_{n}}\prod_{j=1}^{n}\big{|}\{\xi_{t}^{*}-\xi_{t}\}(x_{t,{i}})K_{h}(x-x_{t,{i}})\big{|}^{\beta_{j}}\bigg{\}}.

Step 2. For a fixed β=(β1,,βn)\beta=(\beta_{1},\ldots,\beta_{n}) such that β1++βn=q\beta_{1}+\ldots+\beta_{n}=q and that β0=k\|\beta\|_{0}=k, consider

𝔼{j=1n|{ξtξt}(xt,i)Kh(xxt,i)|βj}.\mathbb{E}\bigg{\{}\prod_{j=1}^{n}\big{|}\{\xi_{t}^{*}-\xi_{t}\}(x_{t,{i}})K_{h}(x-x_{t,{i}})\big{|}^{\beta_{j}}\bigg{\}}.

Without loss of generality, assume that β1,,βk\beta_{1},\ldots,\beta_{k} are non-zero. Then it holds that

𝔼{|(ξtξt)(xt,1)|β1|Kh(xxt,1)|β1|(ξtξt)(xt,k)|βk|Kh(xxt,k)|βk}\displaystyle\mathbb{E}\bigg{\{}\big{|}(\xi_{t}^{*}-\xi_{t})(x_{t,{1}})\big{|}^{\beta_{1}}\big{|}K_{h}(x-x_{t,{1}})\big{|}^{\beta_{1}}\cdots\big{|}(\xi_{t}^{*}-\xi_{t})(x_{t,{k}})\big{|}^{\beta_{k}}\big{|}K_{h}(x-x_{t,{k}})\big{|}^{\beta_{k}}\bigg{\}}
=\displaystyle= 𝔼ξ{|(ξtξt)(r)|β1|Kh(xr)|β1𝑑μ(r)|(ξtξt)(r)|βk|Kh(xr)|βk𝑑μ(r)}\displaystyle\mathbb{E}_{\xi}\bigg{\{}\int\big{|}(\xi_{t}^{*}-\xi_{t})(r)\big{|}^{\beta_{1}}\big{|}K_{h}(x-r)\big{|}^{\beta_{1}}d\mu(r)\cdots\int\big{|}(\xi_{t}^{*}-\xi_{t})(r)\big{|}^{\beta_{k}}\big{|}K_{h}(x-r)\big{|}^{\beta_{k}}d\mu(r)\bigg{\}}
=\displaystyle= 𝔼ξ{|(ξtξt)(xsh)|β1|K(s)|β1hd(β11)𝑑μ(s)|(ξtξt)(xsh)|βk|K(s)|βkhd(βk1)𝑑μ(s)}\displaystyle\mathbb{E}_{\xi}\bigg{\{}\int\big{|}(\xi_{t}^{*}-\xi_{t})(x-sh)\big{|}^{\beta_{1}}\frac{\big{|}K(s)\big{|}^{\beta_{1}}}{h^{d(\beta_{1}-1)}}d\mu(s)\cdots\int\big{|}(\xi_{t}^{*}-\xi_{t})(x-sh)\big{|}^{\beta_{k}}\frac{\big{|}K(s)\big{|}^{\beta_{k}}}{h^{d(\beta_{k}-1)}}d\mu(s)\bigg{\}}
\displaystyle\leq hdj=1k(βj1)𝔼ξ{ξtξtβ1CKβ1ξtξtβkCKβk}\displaystyle h^{-d\sum_{j=1}^{k}(\beta_{j}-1)}\mathbb{E}_{\xi}\bigg{\{}\|\xi_{t}^{*}-\xi_{t}\|_{\infty}^{\beta_{1}}C_{K}^{\beta_{1}}\cdots\|\xi_{t}^{*}-\xi_{t}\|_{\infty}^{\beta_{k}}C_{K}^{\beta_{k}}\bigg{\}}
\displaystyle\leq hd(qk)CKq𝔼ξ{ξtξtj=1kβk}\displaystyle h^{-d(q-k)}C_{K}^{q}\mathbb{E}_{\xi}\bigg{\{}\|\xi_{t}^{*}-\xi_{t}\|_{\infty}^{\sum_{j=1}^{k}\beta_{k}}\bigg{\}}
\displaystyle\leq hd(qk)CKq𝔼ξ{ξtξtq}\displaystyle h^{-d(q-k)}C_{K}^{q}\mathbb{E}_{\xi}\big{\{}\|\xi_{t}^{*}-\xi_{t}\|_{\infty}^{q}\big{\}}

where the third equality follows by using the change of variable s=xrh,s=\frac{x-r}{h}, the first inequality by assumption 2.  

Step 3. Let k{1,,q}k\in\{1,\ldots,q\} be fixed. Note that (qβ1,β2,,βn)q!{q\choose\beta_{1},\beta_{2},\ldots,\beta_{n}}\leq q!. Consider set

k={βn:β0,β1++βn=q,|β|0=k}.\mathcal{B}_{k}=\bigg{\{}\beta\in\mathbb{N}^{n}:\beta\geq 0,\beta_{1}+\ldots+\beta_{n}=q,|\beta|_{0}=k\bigg{\}}.

To bound the cardinality of the set k\mathcal{B}_{k}, first note that since |β|0=k|\beta|_{0}=k, there are (nk){n\choose k} number of ways to choose the index of non-zero entries of β\beta.
Suppose {i1,ik}\{i_{1},\ldots i_{k}\} are the chosen index such that βi10,,βik0.\beta_{i_{1}}\not=0,\ldots,\beta_{i_{k}}\not=0. Then the constrains βi1>0,,βi1>0\beta_{i_{1}}>0,\ldots,\beta_{i_{1}}>0 and βi1++βik=q\beta_{i_{1}}+\ldots+\beta_{i_{k}}=q are equivalent to that of diving qq balls into kk groups (without distinguishing each ball). As a result there are (q1k1){q-1\choose k-1} number of ways to choose the {βi1,,βik}\{\beta_{i_{1}},\ldots,\beta_{i_{k}}\} once the index {i1,ik}\{i_{1},\ldots i_{k}\} are chosen.

Step 4. Combining the previous three steps, it follows that for some constants Cq,C1>0C_{q},C_{1}>0 only depending on qq,

𝔼|ZtZt|q\displaystyle\mathbb{E}|Z_{t}-Z_{t}^{*}|^{q}\leq 1nq𝔼{k=1qβ1+β2++βn=qβ=(β1,,βn),|β|0=k,β0(qβ1,β2,,βn)j=1n|(ξtξt)(xt,i)Kh(xxt,i)|βj}\displaystyle\frac{1}{n^{q}}\mathbb{E}\bigg{\{}\sum_{k=1}^{q}\sum_{\begin{subarray}{c}\beta_{1}+\beta_{2}+\ldots+\beta_{n}=q\\ \\ \beta=(\beta_{1},\ldots,\beta_{n}),|\beta|_{0}=k,\beta\geq 0\end{subarray}}{q\choose\beta_{1},\beta_{2},\ldots,\beta_{n}}\prod_{j=1}^{n}\big{|}(\xi_{t}^{*}-\xi_{t})(x_{t,{i}})K_{h}(x-x_{t,{i}})\big{|}^{\beta_{j}}\bigg{\}}
\displaystyle\leq 1nqk=1q(nk)(q1k1)q!hd(qk)CKq𝔼ξ{ξtξtq}\displaystyle\frac{1}{n^{q}}\sum_{k=1}^{q}{n\choose k}{q-1\choose k-1}q!h^{-d(q-k)}C_{K}^{q}\mathbb{E}_{\xi}\big{\{}\|\xi_{t}^{*}-\xi_{t}\|_{\infty}^{q}\big{\}}
\displaystyle\leq 1nqk=1qnkCqCKqhd(qk)𝔼ξ{ξtξtq}\displaystyle\frac{1}{n^{q}}\sum_{k=1}^{q}n^{k}C_{q}C_{K}^{q}h^{-d(q-k)}\mathbb{E}_{\xi}\big{\{}\|\xi_{t}^{*}-\xi_{t}\|_{\infty}^{q}\big{\}}
\displaystyle\leq C1𝔼ξ{ξtξtq}{(1nhd)q1+(1nhd)q2++(1nhd)+1}\displaystyle C_{1}\mathbb{E}_{\xi}\big{\{}\|\xi_{t}^{*}-\xi_{t}\|_{\infty}^{q}\big{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-2}+\ldots+\bigg{(}\frac{1}{nh^{d}}\bigg{)}+1\bigg{\}}
\displaystyle\leq C1𝔼ξ{ξtξtq}q{(1nhd)q1+1},\displaystyle C_{1}\mathbb{E}_{\xi}\big{\{}\|\xi_{t}^{*}-\xi_{t}\|_{\infty}^{q}\big{\}}q\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+1\bigg{\}},

where the second inequality is satisfied by step 3 and that (qβ1,β2,,βn)q!{q\choose\beta_{1},\beta_{2},\ldots,\beta_{n}}\leq q!, while the third inequality is achieved by using that (nk)(q1k1)q!(nk)CqnkCq.{n\choose k}{q-1\choose k-1}q!\leq{n\choose k}C_{q}\leq n^{k}C_{q}. Moreover, given that 1nqnkhd(qk)=(1nhd)qk\frac{1}{n^{q}}n^{k}h^{-d(q-k)}=\Big{(}\frac{1}{nh^{d}}\Big{)}^{q-k} the fourth inequality is obtained. The last inequality holds because if 1nhd1\frac{1}{nh^{d}}\leq 1, then {(1nhd)q1++(1nhd)+1}q,\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+\ldots+\bigg{(}\frac{1}{nh^{d}}\bigg{)}+1\bigg{\}}\leq q, and if 1nhd1\frac{1}{nh^{d}}\geq 1, then {(1nhd)q1++(1nhd)+1}q(1nhd)q1\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+\ldots+\bigg{(}\frac{1}{nh^{d}}\bigg{)}+1\bigg{\}}\leq q\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}. ∎

Lemma 4.

Suppose 1 c holds. Let ρT\rho\leq T be such that ρnhdlog(T)\rho nh^{d}\geq\log(T) and T3.T\geq 3. Let N+N\in\mathbb{Z}^{+} be such that NρN\geq\rho. Then, it holds that

{𝔼maxk=1N|t=1kZt|q}1/qN1/2C{(1nhd)1/2+1}+N1/qC{(1nhd)(q1)/q+1}.\bigg{\{}\mathbb{E}\max_{k=1}^{N}|\sum_{t=1}^{k}Z_{t}|^{q}\bigg{\}}^{1/q}\leq N^{1/2}C\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}+N^{1/q}C^{\prime}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{(q-1)/q}+1\bigg{\}}.
Proof.

We have that q>2q>2 and 𝔼|Z1|<\mathbb{E}|Z_{1}|<\infty by the use of Lemma 3. Then, making use of Theorem 1 of Liu et al. (2013), we obtain that

{𝔼maxk=1N|t=1kZt|q}1/q\displaystyle\bigg{\{}\mathbb{E}\max_{k=1}^{N}|\sum_{t=1}^{k}Z_{t}|^{q}\bigg{\}}^{1/q}\leq N1/2C1{j=1NΘj,2+j=N+1Θj,q+{𝔼|Z1|2}1/2}\displaystyle N^{1/2}C_{1}\bigg{\{}\sum_{j=1}^{N}\Theta_{j,2}+\sum_{j=N+1}^{\infty}\Theta_{j,q}+\{\mathbb{E}|Z_{1}|^{2}\}^{1/2}\bigg{\}}
+\displaystyle+ N1/qC2{j=1Nj1/21/qΘj,q+{𝔼|Z1|q}1/q},\displaystyle N^{1/q}C_{2}\bigg{\{}\sum_{j=1}^{N}j^{1/2-1/q}\Theta_{j,q}+\{\mathbb{E}|Z_{1}|^{q}\}^{1/q}\bigg{\}},

where Θj,q={𝔼(|ZjZj|q)}1/q\Theta_{j,q}=\{\mathbb{E}(|Z_{j}^{*}-Z_{j}|^{q})\}^{1/q}. Moreover, we observe that since Θj,2Θj,q\Theta_{j,2}\leq\Theta_{j,q} for any q2q\geq 2, it follows

{𝔼maxk=1N|t=1kZt|q}1/q\displaystyle\bigg{\{}\mathbb{E}\max_{k=1}^{N}|\sum_{t=1}^{k}Z_{t}|^{q}\bigg{\}}^{1/q}\leq N1/2C1{j=1Θj,q+{𝔼|Z1|2}1/2}\displaystyle N^{1/2}C_{1}\bigg{\{}\sum_{j=1}^{\infty}\Theta_{j,q}+\{\mathbb{E}|Z_{1}|^{2}\}^{1/2}\bigg{\}}
+\displaystyle+ N1/qC2{j=1j1/21/qΘj,q+{𝔼|Z1|q}1/q},\displaystyle N^{1/q}C_{2}\bigg{\{}\sum_{j=1}^{\infty}j^{1/2-1/q}\Theta_{j,q}+\{\mathbb{E}|Z_{1}|^{q}\}^{1/q}\bigg{\}},

Next, by the first part of Lemma 3,

Θj,qqC𝔼{ξjξjq}{(1nhd)q1+1}.\Theta_{j,q}^{q}\leq C\mathbb{E}\big{\{}\|\xi_{j}-\xi_{j}^{*}\|_{\infty}^{q}\big{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+1\bigg{\}}.

even more, we have that N1nhdN\geq\frac{1}{nh^{d}}, implies that

{𝔼maxk=1N|t=1kZt|q}1/q\displaystyle\bigg{\{}\mathbb{E}\max_{k=1}^{N}|\sum_{t=1}^{k}Z_{t}|^{q}\bigg{\}}^{1/q}\leq N1/2C1{j=1C𝔼{ξjξjq}{(1nhd)q1}1/q+{𝔼|Z1|2}1/2}\displaystyle N^{1/2}C^{{}^{\prime}}_{1}\bigg{\{}\sum_{j=1}^{\infty}C\mathbb{E}\big{\{}\|\xi_{j}-\xi_{j}^{*}\|_{\infty}^{q}\big{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}\bigg{\}}^{1/q}+\{\mathbb{E}|Z_{1}|^{2}\}^{1/2}\bigg{\}}
+\displaystyle+ N1/qC2{j=1j1/21/qC𝔼{ξjξjq}{(1nhd)q1+1}1/q+{𝔼|Z1|q}1/q}\displaystyle N^{1/q}C^{{}^{\prime}}_{2}\bigg{\{}\sum_{j=1}^{\infty}j^{1/2-1/q}C\mathbb{E}\big{\{}\|\xi_{j}-\xi_{j}^{*}\|_{\infty}^{q}\big{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+1\bigg{\}}^{1/q}+\{\mathbb{E}|Z_{1}|^{q}\}^{1/q}\bigg{\}}
\displaystyle\leq N1/2C1′′{j=1C𝔼{ξjξjq}{(1nhd)1/21/q}{(1nhd)1/2+1}+{𝔼|Z1|2}1/2}\displaystyle N^{1/2}C^{{}^{\prime\prime}}_{1}\bigg{\{}\sum_{j=1}^{\infty}C\mathbb{E}\big{\{}\|\xi_{j}-\xi_{j}^{*}\|_{\infty}^{q}\big{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2-1/q}\bigg{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}+\{\mathbb{E}|Z_{1}|^{2}\}^{1/2}\bigg{\}}
+\displaystyle+ N1/qC2{j=1j1/21/qC𝔼{ξjξjq}{(1nhd)q1+1}1/q+{𝔼|Z1|q}1/q}\displaystyle N^{1/q}C^{{}^{\prime}}_{2}\bigg{\{}\sum_{j=1}^{\infty}j^{1/2-1/q}C\mathbb{E}\big{\{}\|\xi_{j}-\xi_{j}^{*}\|_{\infty}^{q}\big{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+1\bigg{\}}^{1/q}+\{\mathbb{E}|Z_{1}|^{q}\}^{1/q}\bigg{\}}
\displaystyle\leq N1/2C1′′{j=1C𝔼{ξjξjq}{(N)1/21/q}{(1nhd)1/2+1}+{𝔼|Z1|2}1/2}\displaystyle N^{1/2}C^{{}^{\prime\prime}}_{1}\bigg{\{}\sum_{j=1}^{\infty}C\mathbb{E}\big{\{}\|\xi_{j}-\xi_{j}^{*}\|_{\infty}^{q}\big{\}}\bigg{\{}\bigg{(}N\bigg{)}^{1/2-1/q}\bigg{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}+\{\mathbb{E}|Z_{1}|^{2}\}^{1/2}\bigg{\}}
+\displaystyle+ N1/qC2{j=1j1/21/qC𝔼{ξjξjq}{(1nhd)q1+1}1/q+{𝔼|Z1|q}1/q}.\displaystyle N^{1/q}C^{{}^{\prime}}_{2}\bigg{\{}\sum_{j=1}^{\infty}j^{1/2-1/q}C\mathbb{E}\big{\{}\|\xi_{j}-\xi_{j}^{*}\|_{\infty}^{q}\big{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+1\bigg{\}}^{1/q}+\{\mathbb{E}|Z_{1}|^{q}\}^{1/q}\bigg{\}}.

From 1 c,

{𝔼maxk=1N|t=1kZt|q}1/q\displaystyle\bigg{\{}\mathbb{E}\max_{k=1}^{N}|\sum_{t=1}^{k}Z_{t}|^{q}\bigg{\}}^{1/q}\leq N1/2C1′′′{1+{(1nhd)1/2+1}+{𝔼|Z1|2}1/2}\displaystyle N^{1/2}C_{1}^{\prime\prime\prime}\bigg{\{}1+\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}+\{\mathbb{E}|Z_{1}|^{2}\}^{1/2}\bigg{\}}
+\displaystyle+ N1/qC2′′{1+{(1nhd)q1+1}1/q+{𝔼|Z1|q}1/q}.\displaystyle N^{1/q}C^{{}^{\prime\prime}}_{2}\bigg{\{}1+\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+1\bigg{\}}^{1/q}+\{\mathbb{E}|Z_{1}|^{q}\}^{1/q}\bigg{\}}.

By the second part of Lemma 3, it holds that

{𝔼maxk=1N|t=1kZt|q}1/qN1/2C1′′′′{1+{(1nhd)+1}1/2}+N1/qC2′′′{1+{(1nhd)q1+1}1/q}.\displaystyle\bigg{\{}\mathbb{E}\max_{k=1}^{N}|\sum_{t=1}^{k}Z_{t}|^{q}\bigg{\}}^{1/q}\leq N^{1/2}C_{1}^{\prime\prime\prime\prime}\bigg{\{}1+\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}+1\bigg{\}}^{1/2}\bigg{\}}+N^{1/q}C_{2}^{\prime\prime\prime}\bigg{\{}1+\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+1\bigg{\}}^{1/q}\bigg{\}}.

This immediately implies the desired result. ∎

Lemma 5.

Suppose 1 holds. Then there exists absolute constants C1C_{1} such that

𝔼|WtWt|qC1maxi=1n𝔼{|δt,iδt,i|q}{(1nhd)q1+1}.\displaystyle\mathbb{E}|W_{t}-W_{t}^{*}|^{q}\leq C_{1}\max_{i=1}^{n}\mathbb{E}\big{\{}|\delta_{t,i}-\delta_{t,i}^{*}|^{q}\big{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+1\bigg{\}}. (34)

If in addition 𝔼{|δt,i|qq}=O(1)\mathbb{E}\big{\{}|\delta_{t,i}|_{q}^{q}\big{\}}=O(1) for all 1in1\leq i\leq n, then there exists absolute constants CC^{\prime} such that

𝔼(|Wt|q)1/qC{(1nhd)q1+1}1/q.\displaystyle\mathbb{E}(|W_{t}|^{q})^{1/q}\leq C^{\prime}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+1\bigg{\}}^{1/q}. (35)
Proof.

The proof is similar to that of Lemma 3. The proof of the Equation 35 is simpler and simpler than Equation 34. So only the proof of Equation 34 is presented. Note that since {xt}t=1T\{x_{t}\}_{t=1}^{T} and {δt}t=1T\{\delta_{t}\}_{t=1}^{T} are independent, and that {xt}t=1T\{x_{t}\}_{t=1}^{T} are independent identically distributed,

δt=1ni=1nδt,iKh(xxt,i).\delta_{t}^{*}=\frac{1}{n}\sum_{i=1}^{n}\delta_{t,i}^{*}K_{h}(x-x_{t,i}).

Step 1. Note that, by the Newton’s binomial

𝔼|δtδt|q=\displaystyle\mathbb{E}|\delta_{t}-\delta_{t}^{*}|^{q}= 𝔼{|1ni=1n(δt,iδt,i)Kh(xxt,i)|q}\displaystyle\mathbb{E}\bigg{\{}\bigg{|}\frac{1}{n}\sum_{i=1}^{n}(\delta_{t,i}^{*}-\delta_{t,i})K_{h}(x-x_{t,i})\bigg{|}^{q}\bigg{\}}
\displaystyle\leq 1nq𝔼{β1+β2++βn=qβ10,,βn0(qβ1,β2,,βn)j=1n|(δt,iδt,i)Kh(xxt,i)|βj}\displaystyle\frac{1}{{n}^{q}}\mathbb{E}\bigg{\{}\sum_{\begin{subarray}{c}\beta_{1}+\beta_{2}+\ldots+\beta_{n}=q\\ \beta_{1}\geq 0,\ldots,\beta_{n}\geq 0\end{subarray}}{q\choose\beta_{1},\beta_{2},\ldots,\beta_{n}}\prod_{j=1}^{n}\big{|}(\delta_{t,i}^{*}-\delta_{t,i})K_{h}(x-x_{t,{i}})\big{|}^{\beta_{j}}\bigg{\}}
=\displaystyle= 1nq𝔼{k=1qβ1+β2++βn=qβ=(β1,,βn),|β|0=k,β0(qβ1,β2,,βn)j=1n|(δt,iδt,i)Kh(xxt,i)|βj}.\displaystyle\frac{1}{n^{q}}\mathbb{E}\bigg{\{}\sum_{k=1}^{q}\sum_{\begin{subarray}{c}\beta_{1}+\beta_{2}+\ldots+\beta_{n}=q\\ \\ \beta=(\beta_{1},\ldots,\beta_{n}),|\beta|_{0}=k,\beta\geq 0\end{subarray}}{q\choose\beta_{1},\beta_{2},\ldots,\beta_{n}}\prod_{j=1}^{n}\big{|}(\delta_{t,i}^{*}-\delta_{t,i})K_{h}(x-x_{t,{i}})\big{|}^{\beta_{j}}\bigg{\}}.

Step 2. For a fixed β=(β1,,βn)\beta=(\beta_{1},\ldots,\beta_{n}) such that β1++βn=q\beta_{1}+\ldots+\beta_{n}=q and that |β|0=k|\beta|_{0}=k, consider

𝔼{j=1n|(δt,iδt,i)Kh(xxt,i)|βj}.\mathbb{E}\bigg{\{}\prod_{j=1}^{n}\big{|}(\delta_{t,i}^{*}-\delta_{t,i})K_{h}(x-x_{t,{i}})\big{|}^{\beta_{j}}\bigg{\}}.

Without loss of generality, assume that β1,,βk\beta_{1},\ldots,\beta_{k} are non-zero. Then it holds that

𝔼{|(δt,1δt,1)|β1|Kh(xxt,1)|β1|(δt,kδt,k)|βk|Kh(xxt,k)|βk}\displaystyle\mathbb{E}\bigg{\{}\big{|}(\delta_{t,1}^{*}-\delta_{t,1})\big{|}^{\beta_{1}}\big{|}K_{h}(x-x_{t,{1}})\big{|}^{\beta_{1}}\cdots\big{|}(\delta_{t,k}^{*}-\delta_{t,k})\big{|}^{\beta_{k}}\big{|}K_{h}(x-x_{t,{k}})\big{|}^{\beta_{k}}\bigg{\}}
=\displaystyle= 𝔼δ{|(δt,1δt,1|β1|Kh(xr)|β1dμ(r)|(δt,kδt,k)|βk|Kh(xr)|βkdμ(r)}\displaystyle\mathbb{E}_{\delta}\bigg{\{}\int\big{|}(\delta_{t,1}^{*}-\delta_{t,1}\big{|}^{\beta_{1}}\big{|}K_{h}(x-r)\big{|}^{\beta_{1}}d\mu(r)\cdots\int\big{|}(\delta_{t,k}^{*}-\delta_{t,k})\big{|}^{\beta_{k}}\big{|}K_{h}(x-r)\big{|}^{\beta_{k}}d\mu(r)\bigg{\}}
=\displaystyle= 𝔼δ{|(δt,1δt,1|β1|K(s)|β1hd(β11)dμ(s)|(δt,kδt,k)|βk|K(s)|βkhd(βk1)dμ(s)}\displaystyle\mathbb{E}_{\delta}\bigg{\{}\int\big{|}(\delta_{t,1}^{*}-\delta_{t,1}\big{|}^{\beta_{1}}\frac{\big{|}K(s)\big{|}^{\beta_{1}}}{h^{d(\beta_{1}-1)}}d\mu(s)\cdots\int\big{|}(\delta_{t,k}^{*}-\delta_{t,k})\big{|}^{\beta_{k}}\frac{\big{|}K(s)\big{|}^{\beta_{k}}}{h^{d(\beta_{k}-1)}}d\mu(s)\bigg{\}}
\displaystyle\leq hdj=1k(βj1)𝔼δ{|(δt,1δt,1)|β1CKβ1|(δt,kδt,k)|βkCKβk}\displaystyle h^{-d\sum_{j=1}^{k}(\beta_{j}-1)}\mathbb{E}_{\delta}\bigg{\{}\big{|}(\delta_{t,1}^{*}-\delta_{t,1})\big{|}^{\beta_{1}}C_{K}^{\beta_{1}}\cdots\big{|}(\delta_{t,k}^{*}-\delta_{t,k})\big{|}^{\beta_{k}}C_{K}^{\beta_{k}}\bigg{\}}
\displaystyle\leq hd(qk)CKq𝔼δ{maxi=1n|δt,iδt,i|j=1kβk}\displaystyle h^{-d(q-k)}C_{K}^{q}\mathbb{E}_{\delta}\bigg{\{}\max_{i=1}^{n}|\delta_{t,i}-\delta_{t,i}^{*}|^{\sum_{j=1}^{k}\beta_{k}}\bigg{\}}
\displaystyle\leq hd(qk)CKq𝔼δ{maxi=1n|δt,iδt,i|q}\displaystyle h^{-d(q-k)}C_{K}^{q}\mathbb{E}_{\delta}\big{\{}\max_{i=1}^{n}|\delta_{t,i}-\delta_{t,i}^{*}|^{q}\big{\}}

where the third equality follows by using the change of variable s=xrh,s=\frac{x-r}{h}, the first inequality by assumption 2.  
Step 3. Let k{1,,q}k\in\{1,\ldots,q\} be fixed. Note that (qβ1,β2,,βn)q!{q\choose\beta_{1},\beta_{2},\ldots,\beta_{n}}\leq q!. Consider set

k={βn:β0,β1++βn=q,|β|0=k}.\mathcal{B}_{k}=\bigg{\{}\beta\in\mathbb{N}^{n}:\beta\geq 0,\beta_{1}+\ldots+\beta_{n}=q,|\beta|_{0}=k\bigg{\}}.

To bound the cardinality of the set k\mathcal{B}_{k}, first note that since |β|0=k|\beta|_{0}=k, there are (nk){n\choose k} number of ways to choose the index of non-zero entries of β\beta.
Suppose {i1,ik}\{i_{1},\ldots i_{k}\} are the chosen index such that βi10,,βik0.\beta_{i_{1}}\not=0,\ldots,\beta_{i_{k}}\not=0. Then the constrains βi1>0,,βi1>0\beta_{i_{1}}>0,\ldots,\beta_{i_{1}}>0 and βi1++βik=q\beta_{i_{1}}+\ldots+\beta_{i_{k}}=q are equivalent to that of diving qq balls into kk groups (without distinguishing each ball). As a result there are (q1k1){q-1\choose k-1} number of ways to choose the {βi1,,βik}\{\beta_{i_{1}},\ldots,\beta_{i_{k}}\} once the index {i1,ik}\{i_{1},\ldots i_{k}\} are chosen.

Step 4. Combining the previous three steps, it follows that for some constants Cq,C1>0C_{q},C_{1}>0 only depending on qq,

𝔼|WtWt|q\displaystyle\mathbb{E}|W_{t}-W_{t}^{*}|^{q}\leq 1nq𝔼{k=1qβ1+β2++βn=qβ=(β1,,βn),|β|0=k,β0(qβ1,β2,,βn)j=1n|(δt,iδt,i)Kh(xxt,i)|βj}\displaystyle\frac{1}{n^{q}}\mathbb{E}\bigg{\{}\sum_{k=1}^{q}\sum_{\begin{subarray}{c}\beta_{1}+\beta_{2}+\ldots+\beta_{n}=q\\ \\ \beta=(\beta_{1},\ldots,\beta_{n}),|\beta|_{0}=k,\beta\geq 0\end{subarray}}{q\choose\beta_{1},\beta_{2},\ldots,\beta_{n}}\prod_{j=1}^{n}\big{|}(\delta_{t,i}^{*}-\delta_{t,i})K_{h}(x-x_{t,{i}})\big{|}^{\beta_{j}}\bigg{\}}
\displaystyle\leq 1nqk=1q(nk)(q1k1)q!hd(qk)CKq𝔼δ{maxi=1n|δt,iδt,i|q}\displaystyle\frac{1}{n^{q}}\sum_{k=1}^{q}{n\choose k}{q-1\choose k-1}q!h^{-d(q-k)}C_{K}^{q}\mathbb{E}_{\delta}\big{\{}\max_{i=1}^{n}|\delta_{t,i}-\delta_{t,i}^{*}|^{q}\big{\}}
\displaystyle\leq 1nqk=1qnkCqCKqhd(qk)𝔼δ{maxi=1n|δt,iδt,i|q}\displaystyle\frac{1}{n^{q}}\sum_{k=1}^{q}n^{k}C_{q}C_{K}^{q}h^{-d(q-k)}\mathbb{E}_{\delta}\big{\{}\max_{i=1}^{n}|\delta_{t,i}-\delta_{t,i}^{*}|^{q}\big{\}}
\displaystyle\leq C1𝔼δ{maxi=1n|δt,iδt,i|q}{(1nhd)q1+(1nhd)q2++(1nhd)+1}\displaystyle C_{1}\mathbb{E}_{\delta}\big{\{}\max_{i=1}^{n}|\delta_{t,i}-\delta_{t,i}^{*}|^{q}\big{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-2}+\ldots+\bigg{(}\frac{1}{nh^{d}}\bigg{)}+1\bigg{\}}
\displaystyle\leq C1𝔼δ{maxi=1n|δt,iδt,i|q}q{(1nhd)q1+1},\displaystyle C_{1}\mathbb{E}_{\delta}\big{\{}\max_{i=1}^{n}|\delta_{t,i}-\delta_{t,i}^{*}|^{q}\big{\}}q\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+1\bigg{\}},

where the second inequality is satisfied by step 3 and that (qβ1,β2,,βn)q!{q\choose\beta_{1},\beta_{2},\ldots,\beta_{n}}\leq q!, while the third inequality is achieved by using that (nk)(q1k1)q!(nk)CqnkCq.{n\choose k}{q-1\choose k-1}q!\leq{n\choose k}C_{q}\leq n^{k}C_{q}. Moreover, given that 1nqnkhd(qk)=(1nhd)qk\frac{1}{n^{q}}n^{k}h^{-d(q-k)}=\Big{(}\frac{1}{nh^{d}}\Big{)}^{q-k} the fourth inequality is obtained. The last inequality holds because if 1nhd1\frac{1}{nh^{d}}\leq 1, then {(1nhd)q1++(1nhd)+1}q,\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+\ldots+\bigg{(}\frac{1}{nh^{d}}\bigg{)}+1\bigg{\}}\leq q, and if 1nhd1\frac{1}{nh^{d}}\geq 1, then {(1nhd)q1++(1nhd)+1}q(1nhd)q1\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+\ldots+\bigg{(}\frac{1}{nh^{d}}\bigg{)}+1\bigg{\}}\leq q\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}. ∎

Lemma 6.

Suppose 1 d holds. Let ρT\rho\leq T be such that ρnhdlog(T)\rho nh^{d}\geq\log(T) and T3.T\geq 3. Let N+N\in\mathbb{Z}^{+} be such that NρN\geq\rho. Then, it holds that

{𝔼maxk=1N|t=1kWt|q}1/qN1/2C{(1nhd)1/2+1}+N1/qC{(1nhd)(q1)/q+1}.\bigg{\{}\mathbb{E}\max_{k=1}^{N}|\sum_{t=1}^{k}W_{t}|^{q}\bigg{\}}^{1/q}\leq N^{1/2}C\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}+N^{1/q}C^{\prime}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{(q-1)/q}+1\bigg{\}}.
Proof.

We have that q>2q>2 and E|W1|<E|W_{1}|<\infty by the use of Lemma 5. Then, making use of Theorem 1 of Liu et al. (2013), we obtain that

{𝔼maxk=1N|t=1kZt|q}1/q\displaystyle\bigg{\{}\mathbb{E}\max_{k=1}^{N}|\sum_{t=1}^{k}Z_{t}|^{q}\bigg{\}}^{1/q}\leq N1/2C1{j=1NΘj,2+j=N+1Θj,q+{𝔼|W1|2}1/2}\displaystyle N^{1/2}C_{1}\bigg{\{}\sum_{j=1}^{N}\Theta_{j,2}+\sum_{j=N+1}^{\infty}\Theta_{j,q}+\{\mathbb{E}|W_{1}|^{2}\}^{1/2}\bigg{\}}
+\displaystyle+ N1/qC2{j=1Nj1/21/qΘj,q+{𝔼|W1|q}1/q},\displaystyle N^{1/q}C_{2}\bigg{\{}\sum_{j=1}^{N}j^{1/2-1/q}\Theta_{j,q}+\{\mathbb{E}|W_{1}|^{q}\}^{1/q}\bigg{\}},

where Θj,q={𝔼(|WjWj|q)}1/q\Theta_{j,q}=\{\mathbb{E}(|W_{j}^{*}-W_{j}|^{q})\}^{1/q}. Moreover, we observe that since Θj,2Θj,q\Theta_{j,2}\leq\Theta_{j,q} for any q2q\geq 2, it follows

{𝔼maxk=1N|t=1kWt|q}1/q\displaystyle\bigg{\{}\mathbb{E}\max_{k=1}^{N}|\sum_{t=1}^{k}W_{t}|^{q}\bigg{\}}^{1/q}\leq N1/2C1{j=1Θj,q+{𝔼|W1|2}1/2}\displaystyle N^{1/2}C_{1}\bigg{\{}\sum_{j=1}^{\infty}\Theta_{j,q}+\{\mathbb{E}|W_{1}|^{2}\}^{1/2}\bigg{\}}
+\displaystyle+ N1/qC2{j=1j1/21/qΘj,q+{𝔼|W1|q}1/q}.\displaystyle N^{1/q}C_{2}\bigg{\{}\sum_{j=1}^{\infty}j^{1/2-1/q}\Theta_{j,q}+\{\mathbb{E}|W_{1}|^{q}\}^{1/q}\bigg{\}}.

Next, by the first part of Lemma 3,

Θj,qqC𝔼{maxi=1n|δt,iδt,i|q}{(1nhd)q1+1}.\Theta_{j,q}^{q}\leq C\mathbb{E}\big{\{}\max_{i=1}^{n}|\delta_{t,i}-\delta_{t,i}^{*}|^{q}\big{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+1\bigg{\}}.

Since we have that N1nhdN\geq\frac{1}{nh^{d}}, the above inequality further implies that

{𝔼maxk=1N|t=1kWt|q}1/q\displaystyle\bigg{\{}\mathbb{E}\max_{k=1}^{N}|\sum_{t=1}^{k}W_{t}|^{q}\bigg{\}}^{1/q}\leq N1/2C1{j=1C𝔼{maxi=1n|δt,iδt,i|q}{(1nhd)q1}1/q+{𝔼|W1|2}1/2}\displaystyle N^{1/2}C^{{}^{\prime}}_{1}\bigg{\{}\sum_{j=1}^{\infty}C\mathbb{E}\big{\{}\max_{i=1}^{n}|\delta_{t,i}-\delta_{t,i}^{*}|^{q}\big{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}\bigg{\}}^{1/q}+\{\mathbb{E}|W_{1}|^{2}\}^{1/2}\bigg{\}}
+\displaystyle+ N1/qC2{j=1j1/21/qC𝔼{maxi=1n|δt,iδt,i|q}{(1nhd)q1+1}1/q+{𝔼|W1|q}1/q}\displaystyle N^{1/q}C^{{}^{\prime}}_{2}\bigg{\{}\sum_{j=1}^{\infty}j^{1/2-1/q}C\mathbb{E}\big{\{}\max_{i=1}^{n}|\delta_{t,i}-\delta_{t,i}^{*}|^{q}\big{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+1\bigg{\}}^{1/q}+\{\mathbb{E}|W_{1}|^{q}\}^{1/q}\bigg{\}}
\displaystyle\leq N1/2C1′′{j=1C𝔼{maxi=1n|δt,iδt,i|q}{(1nhd)1/21/q}{(1nhd)1/2+1}+{𝔼|W1|2}1/2}\displaystyle N^{1/2}C^{{}^{\prime\prime}}_{1}\bigg{\{}\sum_{j=1}^{\infty}C\mathbb{E}\big{\{}\max_{i=1}^{n}|\delta_{t,i}-\delta_{t,i}^{*}|^{q}\big{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2-1/q}\bigg{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}+\{\mathbb{E}|W_{1}|^{2}\}^{1/2}\bigg{\}}
+\displaystyle+ N1/qC2{j=1j1/21/qC𝔼{maxi=1n|δt,iδt,i|q}{(1nhd)q1+1}1/q+{𝔼|W1|q}1/q}\displaystyle N^{1/q}C^{{}^{\prime}}_{2}\bigg{\{}\sum_{j=1}^{\infty}j^{1/2-1/q}C\mathbb{E}\big{\{}\max_{i=1}^{n}|\delta_{t,i}-\delta_{t,i}^{*}|^{q}\big{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+1\bigg{\}}^{1/q}+\{\mathbb{E}|W_{1}|^{q}\}^{1/q}\bigg{\}}
\displaystyle\leq N1/2C1′′{j=1C𝔼{maxi=1n|δt,iδt,i|q}{(N)1/21/q}{(1nhd)1/2+1}+{𝔼|W1|2}1/2}\displaystyle N^{1/2}C^{{}^{\prime\prime}}_{1}\bigg{\{}\sum_{j=1}^{\infty}C\mathbb{E}\big{\{}\max_{i=1}^{n}|\delta_{t,i}-\delta_{t,i}^{*}|^{q}\big{\}}\bigg{\{}\bigg{(}N\bigg{)}^{1/2-1/q}\bigg{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}+\{\mathbb{E}|W_{1}|^{2}\}^{1/2}\bigg{\}}
+\displaystyle+ N1/qC2{j=1j1/21/qC𝔼{maxi=1n|δt,iδt,i|q}{(1nhd)q1+1}1/q+{𝔼|W1|q}1/q}.\displaystyle N^{1/q}C^{{}^{\prime}}_{2}\bigg{\{}\sum_{j=1}^{\infty}j^{1/2-1/q}C\mathbb{E}\big{\{}\max_{i=1}^{n}|\delta_{t,i}-\delta_{t,i}^{*}|^{q}\big{\}}\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+1\bigg{\}}^{1/q}+\{\mathbb{E}|W_{1}|^{q}\}^{1/q}\bigg{\}}.

From 1 d, the above inequality further implies that

{𝔼maxk=1N|t=1kWt|q}1/q\displaystyle\bigg{\{}\mathbb{E}\max_{k=1}^{N}|\sum_{t=1}^{k}W_{t}|^{q}\bigg{\}}^{1/q}\leq N1/2C1′′′{1+{(1nhd)1/2+1}+{𝔼|W1|2}1/2}\displaystyle N^{1/2}C_{1}^{\prime\prime\prime}\bigg{\{}1+\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{1/2}+1\bigg{\}}+\{\mathbb{E}|W_{1}|^{2}\}^{1/2}\bigg{\}}
+\displaystyle+ N1/qC2′′{1+{(1nhd)q1+1}1/q+{𝔼|W1|q}1/q}.\displaystyle N^{1/q}C^{{}^{\prime\prime}}_{2}\bigg{\{}1+\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+1\bigg{\}}^{1/q}+\{\mathbb{E}|W_{1}|^{q}\}^{1/q}\bigg{\}}.

By the second part of Lemma 3, it holds that

{𝔼maxk=1N|t=1kZt|q}1/qN1/2C1′′′′{1+{(1nhd)+1}1/2}+N1/qC2′′′{1+{(1nhd)q1+1}1/q}.\displaystyle\bigg{\{}\mathbb{E}\max_{k=1}^{N}|\sum_{t=1}^{k}Z_{t}|^{q}\bigg{\}}^{1/q}\leq N^{1/2}C_{1}^{\prime\prime\prime\prime}\bigg{\{}1+\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}+1\bigg{\}}^{1/2}\bigg{\}}+N^{1/q}C_{2}^{\prime\prime\prime}\bigg{\{}1+\bigg{\{}\bigg{(}\frac{1}{nh^{d}}\bigg{)}^{q-1}+1\bigg{\}}^{1/q}\bigg{\}}.

This immediately implies the desired result. ∎

Appendix D Additional Technical Results

Lemma 7.

Suppose that f,g:[0,1]df,g:[0,1]^{d}\to\mathbb{R} such that f,gr(L)f,g\in\mathcal{H}^{r}(L) for some r1r\geq 1 L>0L>0. Suppose in addition that {xm}m=1M\{x_{m}\}_{m=1}^{M} is a collection of grid points randomly sampled from a density u:[0,1]du:[0,1]^{d}\to\mathbb{R} such that infx[0,1]du(x)cu>0\inf_{x\in[0,1]^{d}}u(x)\geq c_{u}>0. If fgκ\|f-g\|_{\infty}\geq\kappa for some parameter κ>0\kappa>0, then

{maxm=1M|f(xm)g(xm)|34κ}1exp(cMκd),\mathbb{P}\bigg{\{}\max_{m=1}^{M}|f(x_{m})-g(x_{m})|\geq\frac{3}{4}\kappa\bigg{\}}\geq 1-\exp\big{(}-cM\kappa^{d}\big{)},

where cc is a constant only depending on dd.

Proof.

Let h=fgh=f-g. Since f,gr(L)f,g\in\mathcal{H}^{r}(L), hr(L)h\in\mathcal{H}^{r}(L). Since r1r\geq 1, we have that

|h(x)h(x)|L|xx|for allx,x[0,1]d.|h(x)-h(x^{\prime})|\leq L|x-x^{\prime}|\quad\text{for all}\quad x,x^{\prime}\in[0,1]^{d}.

for some absolute constant L>0L>0. Let x0[0,1]dx_{0}\in[0,1]^{d} be such that

|h(x0)|=h.|h(x_{0})|=\|h\|_{\infty}.

Then for all xB(x0,κ4L)[0,1]dx^{\prime}\in B(x_{0},\frac{\kappa}{4L})\cap[0,1]^{d},

|h(x)||h(x0)|L|x0x|34κ.|h(x^{\prime})|\geq|h(x_{0})|-L|x_{0}-x^{\prime}|\geq\frac{3}{4}\kappa.

Therefore

{maxm=1M|f(xm)g(xm)|<34κ}P({xm}m=1MB(x0,κ4L)).\mathbb{P}\bigg{\{}\max_{m=1}^{M}|f(x_{m})-g(x_{m})|<\frac{3}{4}\kappa\bigg{\}}\leq P\bigg{(}\{x_{m}\}_{m=1}^{M}\not\in B\big{(}x_{0},\frac{\kappa}{4L}\big{)}\bigg{)}.

Since

P({xm}m=1MB(x0,κ4L))={1P(x1B(x0,κ4L))}M(1{cuκ4L}d)Mexp(Mcκd),\displaystyle P\bigg{(}\{x_{m}\}_{m=1}^{M}\not\in B(x_{0},\frac{\kappa}{4L})\bigg{)}=\bigg{\{}1-P\bigg{(}x_{1}\in B(x_{0},\frac{\kappa}{4L})\bigg{)}\bigg{\}}^{M}\leq\bigg{(}1-\big{\{}\frac{c_{u}\kappa}{4L}\big{\}}^{d}\bigg{)}^{M}\leq\exp\big{(}-Mc\kappa^{d}\big{)},

the desired result follows. ∎

Lemma 8.

Let 𝒥\mathcal{J} be defined as in Definition 1 and suppose 1 e holds. Denote

ζk=910min{ηk+1ηk,ηkηk1}k{1,,K}.\zeta_{k}=\frac{9}{10}\min\{\eta_{k+1}-\eta_{k},\eta_{k}-\eta_{k-1}\}\ k\in\{1,...,K\}.

Then for each change-point ηk\eta_{k} there exists a seeded interval k=(sk,ek]\mathcal{I}_{k}=(s_{k},e_{k}] such that
a. k\mathcal{I}_{k} contains exactly one change-point ηk\eta_{k};
b. min{ηksk,ekηk}116ζk\min\{\eta_{k}-s_{k},e_{k}-\eta_{k}\}\geq\frac{1}{16}\zeta_{k}; and
c. max{ηksk,ekηk}ζk\max\{\eta_{k}-s_{k},e_{k}-\eta_{k}\}\leq\zeta_{k};

Proof.

These are the desired properties of seeded intervals by construction. The proof is the same as theorem 3 of Kovács et al. (2020) and is provided here for completeness.

Since ζk=Θ(T)\zeta_{k}=\Theta(T), by construction of seeded intervals, one can find a seeded interval (sk,ek]=(ckrk,ck+rk](s_{k},e_{k}]=(c_{k}-r_{k},c_{k}+r_{k}] such that (ckrk,ck+rk](ηkζk,ηk+ζk](c_{k}-r_{k},c_{k}+r_{k}]\subseteq(\eta_{k}-\zeta_{k},\eta_{k}+\zeta_{k}], rkζk4r_{k}\geq\frac{\zeta_{k}}{4} and |ckηk|5rk8|c_{k}-\eta_{k}|\leq\frac{5r_{k}}{8}. So (ckrk,ck+rk](c_{k}-r_{k},c_{k}+r_{k}] contains only one change-point ηk\eta_{k}. In addition,

ekηk=ck+rkηkrk|ckηk|3rk83ζk32,e_{k}-\eta_{k}=c_{k}+r_{k}-\eta_{k}\geq r_{k}-|c_{k}-\eta_{k}|\geq\frac{3r_{k}}{8}\geq\frac{3\zeta_{k}}{32},

and similarly ηksk3ζk32\eta_{k}-s_{k}\geq\frac{3\zeta_{k}}{32}, so b holds. Finally, since (ckrk,ck+rk](ηkζk,ηk+ζk](c_{k}-r_{k},c_{k}+r_{k}]\subseteq(\eta_{k}-\zeta_{k},\eta_{k}+\zeta_{k}], it holds that ck+rkηk+ζkc_{k}+r_{k}\leq\eta_{k}+\zeta_{k} and so

ekηk=ck+rkηkζk.e_{k}-\eta_{k}=c_{k}+r_{k}-\eta_{k}\leq\zeta_{k}.

D.1 Univariate CUSUM

We introduce some notation for one-dimensional change-point detection and the corresponding CUSUM statistics. Let {μi}i=1n,{ωi}i=1n\{\mu_{i}\}_{i=1}^{n},\{\omega_{i}\}_{i=1}^{n}\subseteq\mathbb{R} be two univariate sequences. We will make the following assumptions.

Assumption 4 (Univariate mean change-points).

Let {ηk}k=0K+1{0,,n}\{\eta_{k}\}_{k=0}^{K+1}\subseteq\{0,\ldots,n\}, where η0=0\eta_{0}=0 and ηK+1=T\eta_{K+1}=T, and

ωtωt+1if and only ift{η1,ηK},\omega_{t}\neq\omega_{t+1}\ \ \text{if and only if}\ \ t\in\{\eta_{1},...\eta_{K}\},

Assume

mink=1K+1(ηkηk1)Δ>0,\displaystyle\min_{k=1}^{K+1}(\eta_{k}-\eta_{k-1})\geq\Delta>0,
0<|ωηk+1ωηk|=κk for all k=1,,K.\displaystyle 0<|\omega_{\eta_{k+1}}-\omega_{\eta_{k}}|=\kappa_{k}\text{ for all }k=1,\ldots,K.

We also have the corresponding CUSUM statistics over any generic interval [s,e][1,T][s,e]\subseteq[1,T] defined as

μ~ts,e\displaystyle\widetilde{\mu}_{t}^{s,e} =et(es)(ts)i=s+1tμits(es)(et)i=t+1eμi,\displaystyle=\sqrt{\frac{e-t}{(e-s)(t-s)}}\sum_{i=s+1}^{t}\mu_{i}-\sqrt{\frac{t-s}{(e-s)(e-t)}}\sum_{i=t+1}^{e}\mu_{i},
ω~ts,e\displaystyle\widetilde{\omega}_{t}^{s,e} =et(es)(ts)i=s+1tωits(es)(et)i=t+1eωi.\displaystyle=\sqrt{\frac{e-t}{(e-s)(t-s)}}\sum_{i=s+1}^{t}\omega_{i}-\sqrt{\frac{t-s}{(e-s)(e-t)}}\sum_{i=t+1}^{e}\omega_{i}.

Throughout this section, all of our results are proven by regarding {μi}i=1T\{\mu_{i}\}_{i=1}^{T} and {ωi}i=1T\{\omega_{i}\}_{i=1}^{T} as two deterministic sequences. We will frequently assume that μ~ts,e\widetilde{\mu}_{t}^{s,e} is a good approximation of ω~ts,e\widetilde{\omega}_{t}^{s,e} in ways that we will specify through appropriate assumptions.

Consider the following events

𝒜((s,e],ρ,γ)={maxt=s+ρ+1eρ|μ~ts,eω~ts,e|γ};\displaystyle\mathcal{A}((s,e],\rho,\gamma)=\bigg{\{}\max_{t=s+\rho+1}^{e-\rho}|\widetilde{\mu}_{t}^{s,e}-\widetilde{\omega}^{s,e}_{t}|\leq\gamma\bigg{\}};
(r,ρ,γ)={maxN=ρTr|1Nt=r+1r+N(μtωt)|γ}{maxN=ρr|1Nt=rN+1r(μtωt)|γ}.\displaystyle\mathcal{B}(r,\rho,\gamma)=\bigg{\{}\max_{N=\rho}^{T-r}\bigg{|}\frac{1}{\sqrt{N}}\sum_{t=r+1}^{r+N}(\mu_{t}-\omega_{t})\bigg{|}\leq\gamma\bigg{\}}\bigcup\bigg{\{}\max_{N=\rho}^{r}\bigg{|}\frac{1}{\sqrt{N}}\sum_{t=r-N+1}^{r}(\mu_{t}-\omega_{t})\bigg{|}\leq\gamma\bigg{\}}.
Lemma 9.

Suppose 4 holds. Let [s,e][s,e] be an subinterval of [1,T][1,T] and contain at least one change-point ηr\eta_{r} with min{ηrs,eηr}cT\min\{\eta_{r}-s,e-\eta_{r}\}\geq cT for some constant c>0c>0. Let κmaxs,e=max{κp:min{ηps,eηp}cT}\kappa_{\max}^{s,e}=\max\{\kappa_{p}:\min\{\eta_{p}-s,e-\eta_{p}\}\geq cT\}. Let

bargmaxt=s+ρeρ|μ~ts,e|.b\in\arg\max_{t=s+\rho}^{e-\rho}|\widetilde{\mu}_{t}^{s,e}|.

For some c1>0c_{1}>0, λ>0\lambda>0 and δ>0\delta>0, suppose that the following events hold

𝒜((s,e],ρ,γ),\displaystyle\mathcal{A}((s,e],\rho,\gamma), (36)
(s,ρ,γ)(e,ρ,γ)η{ηk}k=1K(η,ρ,γ)\displaystyle\mathcal{B}(s,\rho,\gamma)\cup\mathcal{B}(e,\rho,\gamma)\cup\bigcup_{\eta\in\{\eta_{k}\}_{k=1}^{K}}\mathcal{B}(\eta,\rho,\gamma) (37)

and that

maxt=s+ρeρ|μ~ts,e|=|μ~bs,e|c1κmaxs,eT\displaystyle\max_{t=s+\rho}^{e-\rho}|\widetilde{\mu}_{t}^{s,e}|=|\widetilde{\mu}_{b}^{s,e}|\geq c_{1}\kappa_{\max}^{s,e}\sqrt{T} (38)

If there exists a sufficiently small c2>0c_{2}>0 such that

γc2κmaxs,eTand thatρc2T,\gamma\leq c_{2}\kappa_{\max}^{s,e}\sqrt{T}\quad\text{and that}\quad\rho\leq c_{2}T, (39)

then there exists a change-point ηk(s,e)\eta_{k}\in(s,e) such that

min{eηk,ηks}>c3Tand|ηkb|C3max{γ2κk2,ρ},\min\{e-\eta_{k},\eta_{k}-s\}>c_{3}T\quad\text{and}\quad|\eta_{k}-b|\leq C_{3}\max\{\gamma^{2}\kappa_{k}^{-2},\rho\},

where c3c_{3} is some sufficiently small constant independent of TT.

Proof.

The proof is the same as that for Lemma 22 in Wang et al. (2020). ∎

Lemma 10.

If [s,e][s,e] contain two and only two change-points ηr\eta_{r} and ηr+1\eta_{r+1}, then

maxt=se|ω~ts,e|eηr+1κr+1+ηrsκr.\max_{t=s}^{e}\left|\widetilde{\omega}^{s,e}_{t}\right|\leq\sqrt{e-\eta_{r+1}}\kappa_{r+1}+\sqrt{\eta_{r}-s}\kappa_{r}.
Proof.

This is Lemma 15 in Wang et al. (2020). ∎

Appendix E Common Stationary Processes

Basic time series models which are widely used in practice, can be incorporated by 1b and c. Functional autoregressive model (FAR) and functional moving average model (FMA) are presented in examples 1 below. The vector autoregressive (VAR) model and vector moving average (VMA) model can be defined in similar and simpler fashions.

Example 1 (FMA and FAR).

Let =(H,H)\mathcal{L}=\mathcal{L}(H,H) be the set of bounded linear operators from HH to HH, where H=H=\mathcal{L}_{\infty}. For A,A\in{\mathcal{L}}, we define the norm operator A=supεH1AεH.||A||_{\mathcal{L}}=\sup_{||\varepsilon||_{H}\leq 1}||A\varepsilon||_{H}. Suppose θ1,Ψ\theta_{1},\Psi\in{\mathcal{L}} with Ψ<1\|\Psi\|_{\mathcal{L}}<1 and θ1<\|\theta_{1}\|_{\mathcal{L}}<\infty.

a) For FMA model, let (εt:t)(\varepsilon_{t}:t\in{\mathbb{Z}}) be a sequence of independent and identically distributed random \mathcal{L}_{\infty} functions with mean zero. Then the FMA time series (ξj:j)(\xi_{j}:j\in{\mathbb{Z}}) of order 11 is given by the equation

ξt=θ1(εt1)+εt=g(,ε1,ε0,ε1,,εt1,εt).\xi_{t}=\theta_{1}(\varepsilon_{t-1})+\varepsilon_{t}=g(\ldots,\varepsilon_{-1},\varepsilon_{0},\varepsilon_{1},\ldots,\varepsilon_{t-1},\varepsilon_{t}). (40)

For any t2t\geq 2, by (40) we have that

ξtξt=0\xi_{t}-\xi^{*}_{t}=0

and ξ1ξ1=θ1(ε0)θ1(ε0).\xi_{1}-\xi^{*}_{1}=\theta_{1}(\varepsilon_{0})-\theta_{1}(\varepsilon_{0}^{{}^{\prime}}). As a result

t=1t1/21/q𝔼(ξtξtq)1/q=𝔼(ξ1ξ1q)1/q=𝔼(θ1(ε0)θ1(ε0)q)1/q<.\sum_{t=1}^{\infty}t^{1/2-1/q}\mathbb{E}(||\xi_{t}-\xi^{*}_{t}||_{\infty}^{q})^{1/q}=\mathbb{E}(||\xi_{1}-\xi^{*}_{1}||_{\infty}^{q})^{1/q}=\mathbb{E}(\|\theta_{1}(\varepsilon_{0})-\theta_{1}(\varepsilon_{0}^{{}^{\prime}})\|_{\infty}^{q})^{1/q}<\infty.

Therefore 1b is satisfied by FMA models.

b) We can define a FAR time series as

ξt=Ψ(ξt1)+εt.\xi_{t}=\Psi(\xi_{t-1})+\varepsilon_{t}. (41)

It admits the expansion,

ξt=\displaystyle\xi_{t}= j=0Ψj(εtj)\displaystyle\sum_{j=0}^{\infty}\Psi^{j}(\varepsilon_{t-j})
=\displaystyle= Ψ(εt)+Ψ1(εt1)++Ψt(ε0)+Ψt+1(ε1)+\displaystyle\Psi(\varepsilon_{t})+\Psi^{1}(\varepsilon_{t-1})+...+\Psi^{t}(\varepsilon_{0})+\Psi^{t+1}(\varepsilon_{-1})+...
=\displaystyle= g(,ε1,ε0,ε1,,εt1,εt).\displaystyle g(\ldots,\varepsilon_{-1},\varepsilon^{\prime}_{0},\varepsilon_{1},\ldots,\varepsilon_{t-1},\varepsilon_{t}).

Then for any t1,t\geq 1, we have that ξtξt=Ψt(ε0)Ψt(ε0).\xi_{t}-\xi^{*}_{t}=\Psi^{t}(\varepsilon_{0})-\Psi^{t}(\varepsilon_{0}^{{}^{\prime}}). Thus,

t=1t1/21/q𝔼(ξtξtq)1/q=\displaystyle\sum_{t=1}^{\infty}t^{1/2-1/q}\mathbb{E}(||\xi_{t}-\xi^{*}_{t}||_{\infty}^{q})^{1/q}= t=1t1/21/q𝔼(Ψt(ε0)Ψt(ε0)q)1/q\displaystyle\sum_{t=1}^{\infty}t^{1/2-1/q}\mathbb{E}(||\Psi^{t}(\varepsilon_{0})-\Psi^{t}(\varepsilon_{0}^{\prime})||_{\infty}^{q})^{1/q}
\displaystyle\leq t=1t1/21/qΨt𝔼(ε0ε0q)1/q<.\displaystyle\sum_{t=1}^{\infty}t^{1/2-1/q}||\Psi||_{\mathcal{L}}^{t}\mathbb{E}(||\varepsilon_{0}-\varepsilon_{0}^{\prime}||_{\infty}^{q})^{1/q}<\infty.

1b incorporates FAR time series.

Example 2 (MA and AR).

Suppose θ1\theta_{1} and ψ\psi are constants with |ψ|<1.|\psi|<1. Let ll\in{\mathbb{N}}, (εt:t)(\varepsilon_{t}:t\in{\mathbb{Z}}) be a sequence of independent and identically distributed random variables with mean zero. The moving average (MA) time series (δj:j)(\delta_{j}:j\in{\mathbb{Z}}) of order 11, is given by the equation

δt=θ1εt1+εt=g~n(,ε1,ε0,ε1,,εt1,εt).\delta_{t}=\theta_{1}\varepsilon_{t-1}+\varepsilon_{t}=\tilde{g}_{n}(\ldots,\varepsilon_{-1},\varepsilon^{\prime}_{0},\varepsilon_{1},\ldots,\varepsilon_{t-1},\varepsilon_{t}). (42)

For any t2t\geq 2, we have that

δtδt=0\delta_{t}-\delta^{*}_{t}=0

and δ1δ1=θ1(ε0)θ1(ε0).\delta_{1}-\delta^{*}_{1}=\theta_{1}(\varepsilon_{0})-\theta_{1}(\varepsilon_{0}^{{}^{\prime}}). Then, maxi=1nt=1t1/21/q𝔼(|δt,iδt,i|q)1/q<.\max_{i=1}^{n}\sum_{t=1}^{\infty}t^{1/2-1/q}\mathbb{E}(|\delta_{t,i}-\delta^{*}_{t,i}|^{q})^{1/q}<\infty. Therefore 1b is satisfied by FMA. Functional autoregressive time series, AR, is defined as

δt=ψδt1+εt.\delta_{t}=\psi\delta_{t-1}+\varepsilon_{t}. (43)

It admits the expansion,

δt=\displaystyle\delta_{t}= j=0ψjεtj\displaystyle\sum_{j=0}^{\infty}\psi^{j}\varepsilon_{t-j}
=\displaystyle= ψεt+ψ1εt1++ψtε0+ψt+1ε1+\displaystyle\psi\varepsilon_{t}+\psi^{1}\varepsilon_{t-1}+...+\psi^{t}\varepsilon_{0}+\psi^{t+1}\varepsilon_{-1}+...
=\displaystyle= g~n(,ε1,ε0,ε1,,εt1,εt).\displaystyle\tilde{g}_{n}(\ldots,\varepsilon_{-1},\varepsilon^{\prime}_{0},\varepsilon_{1},\ldots,\varepsilon_{t-1},\varepsilon_{t}).

Then, for any t1,t\geq 1, δtδt=ψtε0ψtε0.\delta_{t}-\delta^{*}_{t}=\psi^{t}\varepsilon_{0}-\psi^{t}\varepsilon_{0}^{{}^{\prime}}. Thus,

maxi=1nt=1t1/21/q𝔼(|δt,iδt,i|q)1/q=\displaystyle\max_{i=1}^{n}\sum_{t=1}^{\infty}t^{1/2-1/q}\mathbb{E}(|\delta_{t,i}-\delta^{*}_{t,i}|^{q})^{1/q}= maxi=1nt=1t1/21/q𝔼(|ψtε0,iψtε0,i|q)1/q\displaystyle\max_{i=1}^{n}\sum_{t=1}^{\infty}t^{1/2-1/q}\mathbb{E}(|\psi^{t}\varepsilon_{0,i}-\psi^{t}\varepsilon_{0,i}^{{}^{\prime}}|^{q})^{1/q}
\displaystyle\leq maxi=1nt=1t1/21/q|ψ|t𝔼(|ε0,iε0,i|q)1/q<.\displaystyle\max_{i=1}^{n}\sum_{t=1}^{\infty}t^{1/2-1/q}|\psi|^{t}\mathbb{E}(|\varepsilon_{0,i}-\varepsilon_{0,i}^{{}^{\prime}}|^{q})^{1/q}<\infty.

1b incorporates AR time series.