This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Multiscale Inference for High-Frequency Data

Adam Sykulski, Sofia C. Olhede and Grigorios A. Pavliotis A. Sykulski and G. A. Pavliotis are with the Department of Mathematics, Imperial College London, South Kensington Campus, SW7 2AZ London, UK (adam.sykulski03@imperial.ac.uk, g.pavliotis@imperial.ac.uk), tel: +44 20 7594 8564, fax +44 20 7594 8517.S. C. Olhede is with the Department of Statistical Science, University College London, Gower Street, London WC1 E6BT, UK (s.olhede@ucl.ac.uk), tel: +44 20 7679 8321, fax: +44 20 7383 4703.
Abstract

This paper proposes a novel multiscale estimator for the integrated volatility of an Itô process, in the presence of market microstructure noise (observation error). The multiscale structure of the observed process is represented frequency-by-frequency and the concept of the multiscale ratio is introduced to quantify the bias in the realized integrated volatility due to the observation error. The multiscale ratio is estimated from a single sample path, and a frequency-by-frequency bias correction procedure is proposed, which simultaneously reduces variance. We extend the method to include correlated observation errors and provide the implied time domain form of the estimation procedure. The new method is implemented to estimate the integrated volatility for the Heston and other models, and the improved performance of our method over existing methods is illustrated by simulation studies.

Index Terms:
Bias correction; market microstructure noise; realized volatility; multiscale inference; Whittle likelihood.

I Introduction

Over the last few decades there has been an explosion of available data in diverse areas such as econometrics, atmosphere/ocean science and molecular biology. It is essential to use this available data when developing and testing mathematical models in physics, finance, biology and other disciplines. It is imperative, therefore, to develop accurate and efficient methods for making statistical inference in a parametric as well as a non-parametric setting.

Many interesting phenomena in the sciences are inherently multiscale in the sense that there is an abundance of characteristic temporal and spatial scales. It is quite often the case that a simplified, coarse-grained model is used to describe the essential features of the problem under investigation. Available data is then used to estimate parameters in this reduced model [12, 17, 18]. This renders the problem of statistical inference quite subtle, since the simplified models that are being used are compatible with the data only at sufficiently large scales. In particular, it is not clear how and if the high frequency data that is available should be used in the statistical inference procedure.

On the other hand in many applications such as econometrics [32] and oceanography [13] the observed data is contaminated by high frequency observation error. Statistical inference for data with a multiscale structure and for data contaminated by high frequency noise share common features. In particular the main difficulty in both problems is that the model that we wish to fit the data to is not compatible with the data at all scales. This is an example of a model misspecification problem [20, p. 192].

Parametric and non-parametric estimation for systems with multiple scales and/or the usage of high frequency data has been studied quite extensively in the last few years for the two different types of models. First, the problem of estimating the integrated stochastic volatility in the presence of high frequency observation noise has been considered by various authors [1, 36]. Similar models and inference problems have also been studied in the context of oceanic transport [13]. It was assumed in [1, 36] that the observed process consists of two parts, an Itô process XtX_{t} (i.e. the solution of an SDE, which is a semimartingale) whose integrated stochastic volatility (quadratic variation Xt,Xt\langle X_{t},X_{t}\rangle) we want to estimate, and a high frequency noise component εtj\varepsilon_{t_{j}}

Ytj=Xtj+εtj.Y_{t_{j}}=X_{t_{j}}+\varepsilon_{t_{j}}. (1)

{Ytj}j=1N+1\{Y_{t_{j}}\}_{j=1}^{N+1} are the sampled observations. The additional noise {εtj}j=1N+1\{\varepsilon_{t_{j}}\}_{j=1}^{N+1} was used to model market microstructure.

It was shown for the model of (1) that using high frequency data leads to asymptotically biased estimators. In particular if all available data is used for the estimation of the quadratic variation of XtX_{t} then the realized integrated volatility [Yt,Yt]\left[Y_{t},Y_{t}\right] converges to the aggregated variance of the differenced observation noise. Subsampling is therefore necessary for the accurate estimation of the integrated volatility. An algorithm for estimating the integrated volatility which consists of subsampling at an optimal sampling rate combined with averaging and an appropriate debiasing step was proposed in [1, 36]. Various other estimators were suggested in [36, 32, 11, 15] for processes contaminated by high frequency nuisance structure.

Secondly, parameter estimation for fast/slow systems of SDEs for which a limiting SDE for the slow variable can be rigorously shown to exist was studied in [25, 24, 26]. In these papers the problem of making inferences for the parameters of the limiting (coarse-grained) SDE for the slow variable from observed data generated by the fast/slow system was examined. It was shown that the maximum likelihood estimator is asymptotically biased. In order to correctly estimate the parameters in the drift and the diffusion coefficient of the coarse-grained model from observations of the slow/fast system using maximum likelihood, subsampling at an appropriate rate is necessary. The subsampling rate depends on the ratio between the characteristic time scales of the fast and slow variables. A similar problem, with no explicit scale separation, was studied in [7].

All of the papers mentioned above propose inference methods in the time domain. Yet, it would seem natural to analyse multiscale and high frequency properties of the data in the frequency domain. Most of the time domain methods can be put in a unified framework as linear filtering techniques, i.e. as a convolution with a linear kernel, of some time-domain quadratic function of the data. The understanding of these methods is enhanced by studying them directly in the frequency domain, as convolutions in time are multiplications in frequency. Fourier domain estimators of the integrated volatility have been proposed for observations devoid of microstructure features, see [14, 2, 21]. Fourier domain estimators have also been used for estimating noisy Itô processes (i.e. processes of the form 1), see [22, 29, 30], based on smoothing the time domain quantities by using only a limited number of frequencies in the reconstruction.

The bias in the realized integrated volatility of the observed process YtjY_{t_{j}} due to the observation noise εtj\varepsilon_{t_{j}} can be understood directly in the frequency domain, since the energy associated with each frequency is contaminated by the microstructure noise process. This bias is particularly damaging at high frequencies. In this article we propose a frequency-by-frequency de-biasing procedure to improve the accuracy of the estimation of the integrated volatility. The proposed estimation method can also be viewed in the time domain as smoothing the estimated autocovariance of the increments of the process, but where the implied time domain smoothing kernel is itself estimated from the observed process.

In this paper we will consider a regularly sampled Itô process with additive white noise εtj\varepsilon_{t_{j}} superimposed upon it at each observation point tjt_{j}, cf (1). The Itô process satisfies an SDE of the form

dXt=μtdt+σtdBt,X0=x0.dX_{t}=\mu_{t}dt+\sigma_{t}dB_{t},\quad X_{0}=x_{0}. (2)

BtB_{t} denotes a standard one dimensional Brownian motion and μt\mu_{t}, σt\sigma_{t} are (in general) Itô processes, see for example the Heston model which is studied in Section III. The Brownian motions driving the three Itô processes can be correlated. The observations and the process are related through

Ytj=Xtj+εtj,j=1,2,,N+1,tj:=j1NT=(j1)Δt.Y_{t_{j}}=X_{t_{j}}+\varepsilon_{t_{j}},\quad j=1,2,\dots,N+1,\quad t_{j}:=\frac{j-1}{N}T=(j-1)\Delta t. (3)

We assume the data is regularly spaced. The length of the path TT is fixed. The additive noise {εtj}j=1N+1\{\varepsilon_{t_{j}}\}_{j=1}^{N+1} is initially taken to be a white noise process with variance σε2\sigma^{2}_{\varepsilon}, and it is assumed to be independent of the noise that drives the Itô process XtX_{t}. Our main objective is to estimate the integrated volatility, X,XT=0Tσt2𝑑t\langle X,X\rangle_{T}=\int_{0}^{T}\sigma^{2}_{t}\,dt of the Itô process {Xt}\left\{X_{t}\right\}, from the set of observations {Ytj}j=1N+1\left\{Y_{t_{j}}\right\}_{j=1}^{N+1}. In the absence of market microstructure noise (i.e., when Ytj=Xtj,j=1,,N+1Y_{t_{j}}=X_{t_{j}},\;j=1,\dots,N+1) the integrated volatility can be estimated from the realized integrated volatility of the process {Yt}\{Y_{t}\} [32]. In the presence of market microstructure noise this is no longer true, see also [36], and a different estimation procedure is necessary.

The proposed estimator can be described roughly as follows. Let {Jk(X)}\{J_{k}^{(X)}\} denote the Discrete Fourier Transform (DFT) of the differenced sampled XtX_{t} process, and similarly for YtjY_{t_{j}} and εtj\varepsilon_{t_{j}}. The integrated volatility can be written in terms of the inverse DFT of the variance of Jk(X)J_{k}^{(X)}. We calculate the bias in the variance of Jk(Y)J_{k}^{(Y)}, when using its sample estimator to estimate the variance of Jk(X).J_{k}^{(X)}. The high frequency coefficients are heavily contaminated by the microstructure noise. With a formula for the bias it is possible to debias the estimated variance of the Fourier transform at every frequency, with the unknown parameters of the bias estimated using the Whittle likelihood [34, 35]. This produces a debiased estimator of the integrated volatility via an aggregation of the estimated variance, and we show also that the variance of the proposed estimator is reduced by the debiasing.

Our estimator shows highly competitive mean square error performance; it also has several advantages over existing estimators. First, it is robust with respect to the signal to noise ratio; furthermore, it is easy to formulate and to implement; in addition, it readily generalizes to the case of correlated observation errors (in time). Finally, the properties of our estimator are transparent using frequency domain analysis.

The rest of the paper is organized as follows. In Section II we introduce our estimator and present some of its properties, stated in Theorems 1 and 2. We also discuss the time-domain understanding of the proposed method and the extension of the method to the case where the observation noise is correlated. In Section III we present the results of Monte Carlo simulations for our estimator. Section IV is reserved for conclusions. Various technical results are included in the appendices.

II Estimation Methods

Let {Ytj}\left\{Y_{t_{j}}\right\} be given by (3), where the noise {εtj}\left\{\varepsilon_{t_{j}}\right\} is independent of {Xtj}\left\{X_{t_{j}}\right\}, is zero-mean and its variance at any time is equal to σε2\sigma^{2}_{\varepsilon}. The simplest estimator of the integrated volatility of XtX_{t} would ignore the high frequency component of the data and use the realized integrated volatility of the observed process. The realized integrated volatility is given by

X,X^T(b)=[Y,Y]Tj=1N(Ytj+1Ytj)2=𝒪(1Δt).\widehat{\langle X,X\rangle}_{T}^{(b)}=\left[Y,Y\right]_{T}\equiv\sum_{j=1}^{N}\left(Y_{t_{j+1}}-Y_{t_{j}}\right)^{2}={\mathcal{O}}\left(\frac{1}{\Delta t}\right). (4)

This estimator is both inconsistent and biased, see [15]. For comparative purposes, we define also the realized integrated volatility of the sampled process {Xtj}\{X_{t_{j}}\}:

X,X^T(u)=[X,X]Tj=1N(Xtj+1Xtj)2=𝒪(NΔt)=𝒪(1).\widehat{\langle X,X\rangle}_{T}^{(u)}=\left[X,X\right]_{T}\equiv\sum_{j=1}^{N}\left(X_{t_{j+1}}-X_{t_{j}}\right)^{2}={\mathcal{O}}\left(N\Delta t\right)={\mathcal{O}}\left(1\right). (5)

This cannot be used in practice as XtjX_{t_{j}} is not directly observed. Both these are estimators of the integrated volatility (quadratic variation) of XX.

II-A Fourier Domain Properties

We shall start by deriving an alternative representation of (4) to motivate further development. Firstly we define the increment process of a sample from a generic time series Utj,j=1,N+1U_{t_{j}},\;j=1,\dots N+1 by ΔUtj=Utj+1Utj,j=1,N,\Delta U_{t_{j}}=U_{t_{j+1}}-U_{t_{j}},\;j=1,\dots N, and then the discrete Fourier Transform of ΔUtj\Delta U_{t_{j}} by Jk(U)J_{k}^{(U)} as by [27][p. 206]

Jk(U)=1Nj=1NΔUtje2πitjfk,fk=kT,U=X,Y,ε.\displaystyle J^{(U)}_{k}=\sqrt{\frac{1}{N}}\sum_{j=1}^{N}\Delta U_{t_{j}}e^{-2\pi it_{j}f_{k}},\quad f_{k}=\frac{k}{T},\quad U=X,\;Y,\;\varepsilon. (6)

Our proposed estimator will be based on examining the second order properties of {Jk(Y)}\{J^{(Y)}_{k}\}. |Jk(Y)|2\left|J^{(Y)}_{k}\right|^{2} is the periodogram [5] defined for a time series and is an inefficient estimator of var{Jk(Y)}=𝒮k,k(X)\mathrm{var}\{J^{(Y)}_{k}\}={\mathcal{S}}^{(X)}_{k,k}. Firstly we examine the properties of {Jk(X)}\{J^{(X)}_{k}\}. We have, with μ¯j=1Δt(j1)ΔtjΔtμs𝑑s\overline{\mu}_{j}=\frac{1}{\Delta t}\int_{(j-1)\Delta t}^{j\Delta t}\mu_{s}\,ds denoting the local average of μt\mu_{t},

ΔXtj\displaystyle\Delta X_{t_{j}} =\displaystyle= (j1)ΔtjΔt[μsds+σsdWs]=μ¯jΔt+(j1)ΔtjΔtσs𝑑Ws,\displaystyle\int_{(j-1)\Delta t}^{j\Delta t}[\mu_{s}ds+\sigma_{s}dW_{s}]=\overline{\mu}_{j}\Delta t+\int_{(j-1)\Delta t}^{j\Delta t}\sigma_{s}dW_{s},
Jk(X)\displaystyle J^{(X)}_{k} =\displaystyle= 1Nj=1NΔXtje2iπkjN=1Nj=1N[Δtμ¯j+(j1)ΔtjΔtσs𝑑Ws]e2iπkjN\displaystyle\sqrt{\frac{1}{N}}\sum_{j=1}^{N}\Delta X_{t_{j}}e^{-2i\pi\frac{kj}{N}}=\sqrt{\frac{1}{N}}\sum_{j=1}^{N}\left[\Delta t\overline{\mu}_{j}+\int_{(j-1)\Delta t}^{j\Delta t}\sigma_{s}dW_{s}\right]e^{-2i\pi\frac{kj}{N}} (7)
=\displaystyle= 𝒪(Δt1/2Tk)+1Nj=1N(j1)ΔtjΔtσs𝑑Wse2iπkjN.\displaystyle{\mathcal{O}}\left(\Delta t^{1/2}\frac{T}{k}\right)+\sqrt{\frac{1}{N}}\sum_{j=1}^{N}\int_{(j-1)\Delta t}^{j\Delta t}\sigma_{s}dW_{s}e^{-2i\pi\frac{kj}{N}}.

We define

J~k(X)=1Nj=1N(j1)ΔtjΔtσs𝑑Wse2iπkjN,\tilde{J}^{(X)}_{k}=\sqrt{\frac{1}{N}}\sum_{j=1}^{N}\int_{(j-1)\Delta t}^{j\Delta t}\sigma_{s}dW_{s}e^{-2i\pi\frac{kj}{N}},

and this to leading order approximates Jk(X)J^{(X)}_{k} as Δt0\Delta t\rightarrow 0 for all but a few frequencies. We can also note that, since μs\mu_{s} is an Itô process, it has almost surely continuous paths, which implies that

Δt2k=0N1|1Nj=1Nμ¯je2iπkjN|2k|1NΔtN/k|2=𝒪(Δt)\displaystyle\Delta t^{2}\sum_{k=0}^{N-1}\left|\sqrt{\frac{1}{N}}\sum_{j=1}^{N}\overline{\mu}_{j}e^{-2i\pi\frac{kj}{N}}\right|^{2}\sim\sum_{k}\left|\frac{1}{\sqrt{N}}\Delta tN/k\right|^{2}={\mathcal{O}}\left(\Delta t\right) (8)
Δtk=0N1|1Nj=1Nμ¯je2iπkjN|k|1NΔtN/k|=𝒪(log(Δt)Δt),\displaystyle\Delta t\sum_{k=0}^{N-1}\left|\sqrt{\frac{1}{N}}\sum_{j=1}^{N}\overline{\mu}_{j}e^{-2i\pi\frac{kj}{N}}\right|\sim\sum_{k}\left|\frac{1}{\sqrt{N}}\Delta tN/k\right|={\mathcal{O}}\left(\log(\Delta t)\sqrt{\Delta t}\right), (9)

as Δtjμ¯je2iπkjN=𝒪(Tk).\Delta t\sum_{j}\overline{\mu}_{j}e^{-2i\pi\frac{kj}{N}}={\mathcal{O}}\left(\frac{T}{k}\right). So we only need, to leading order, calculate k=0N1|J~k(X)|2=𝒪(1)\sum_{k=0}^{N-1}\left|\tilde{J}_{k}^{(X)}\right|^{2}={\mathcal{O}}(1) when calculating the properties of k=0N1|Jk(X)|2\sum_{k=0}^{N-1}\left|{J}_{k}^{(X)}\right|^{2} from (8) and (9). More formally we note that

k=0N1|Jk(X)|2=k=0N1|J~k(X)|2+𝒪(log(Δt)Δt).\sum_{k=0}^{N-1}\left|{J}_{k}^{(X)}\right|^{2}=\sum_{k=0}^{N-1}\left|\tilde{J}_{k}^{(X)}\right|^{2}+{\mathcal{O}}\left(\log(\Delta t)\sqrt{\Delta t}\right).

We need to determine the first and second order structure of {J~k(X)}k.\{\tilde{J}^{(X)}_{k}\}_{k}. In general {J~k(X)}k\{\tilde{J}^{(X)}_{k}\}_{k} is a complex-valued random vector, which may not be a sample from a multivariate Gaussian distribution. The covariance matrix of a complex random vector 𝐙\mathbf{Z} is given by cov{𝐙,𝐙}=E{𝐙𝐙H}E{𝐙}E{𝐙}H{\mathrm{cov}}\left\{\mathbf{Z},\mathbf{Z}\right\}=\mathrm{E}\left\{\mathbf{Z}\mathbf{Z}^{H}\right\}-\mathrm{E}\left\{\mathbf{Z}\right\}\mathrm{E}\left\{\mathbf{Z}\right\}^{H} [23, 28]. We have

E{J~k(X)}\displaystyle\mathrm{E}\left\{\tilde{J}^{(X)}_{k}\right\} =\displaystyle= 0,k=1,,N1.\displaystyle 0,\;k=1,\dots,N-1.

Furthermore, with 𝒮~k1,k2(X)=E{J~k1(X)J~k2(X)}\tilde{\mathcal{S}}^{(X)}_{k_{1},k_{2}}=\mathrm{E}\left\{\tilde{J}^{(X)}_{k_{1}}\tilde{J}^{(X)*}_{k_{2}}\right\},

𝒮~k1,k2(X)\displaystyle\tilde{\mathcal{S}}^{(X)}_{k_{1},k_{2}} =\displaystyle= 1NE{n=1N(n1)ΔtnΔtl=1N(l1)ΔtlΔtσs𝑑Wsσt𝑑Wte2iπ(k1nNk2lN)}\displaystyle\frac{1}{N}\mathrm{E}\left\{\sum_{n=1}^{N}\int_{(n-1)\Delta t}^{n\Delta t}\sum_{l=1}^{N}\int_{(l-1)\Delta t}^{l\Delta t}\sigma_{s}dW_{s}\sigma_{t}dW_{t}e^{-2i\pi\left(\frac{k_{1}n}{N}-\frac{k_{2}l}{N}\right)}\right\}
=\displaystyle= 1Nn=1N(n1)ΔtnΔtl=1N(l1)ΔtlΔtE{σsdWsσtdWt}e2iπ(k1nNk2lN).\displaystyle\frac{1}{N}\sum_{n=1}^{N}\int_{(n-1)\Delta t}^{n\Delta t}\sum_{l=1}^{N}\int_{(l-1)\Delta t}^{l\Delta t}\mathrm{E}\left\{\sigma_{s}dW_{s}\sigma_{t}dW_{t}\right\}e^{-2i\pi\left(\frac{k_{1}n}{N}-\frac{k_{2}l}{N}\right)}.
=\displaystyle= 1Nn=1N(n1)ΔtnΔtl=1N(l1)ΔtlΔtE{σsσt}δnlδ(ts)e2iπ(k1nNk2lN)𝑑s𝑑t.\displaystyle\frac{1}{N}\sum_{n=1}^{N}\int_{(n-1)\Delta t}^{n\Delta t}\sum_{l=1}^{N}\int_{(l-1)\Delta t}^{l\Delta t}\mathrm{E}\left\{\sigma_{s}\sigma_{t}\right\}\delta_{nl}\delta(t-s)e^{-2i\pi\left(\frac{k_{1}n}{N}-\frac{k_{2}l}{N}\right)}dsdt.

In particular we have that

𝒮~k,k(X)\displaystyle\tilde{\mathcal{S}}^{(X)}_{k,k} =\displaystyle= 1N0TE{σs2}𝑑s+𝒪(Δt2):=σ¯X2+𝒪(Δt2)\displaystyle\frac{1}{N}\int_{0}^{T}\mathrm{E}\left\{\sigma_{s}^{2}\right\}ds+{\mathcal{O}}\left(\Delta t^{2}\right):=\overline{\sigma}^{2}_{X}+{\mathcal{O}}\left(\Delta t^{2}\right) (10)
=\displaystyle= X,XTN+𝒪(Δt2),\displaystyle\frac{\langle X,X\rangle_{T}}{N}+{\mathcal{O}}\left(\Delta t^{2}\right),

where the error terms are due to the Riemann approximation to an integral and thus it follows that

k=0N1𝒮~k,k(X)\displaystyle\sum_{k=0}^{N-1}\tilde{\mathcal{S}}^{(X)}_{k,k} =\displaystyle= 0TE{σs2}𝑑s+𝒪(Δt).\displaystyle\int_{0}^{T}\mathrm{E}\left\{\sigma_{s}^{2}\right\}ds+{\mathcal{O}}(\Delta t). (11)

σ¯X2\overline{\sigma}^{2}_{X} does not depend on the value of kk but is constant irrespectively of the value of kk. Malliavin and Mancino [21] in contrast under very light assumptions show how the Fourier coefficients of {σt2}\{\sigma^{2}_{t}\} can be calculated from the Fourier coefficients of dXtdX_{t}, using a Parseval-Rayleigh relationship, see also [30, 22]. We can from (10) make a stronger link from the Fourier transform to the integrated volatility than that of the Parseval-Rayleigh relationship, and shall use this ‘uniformity of energy’ to estimate the microstructure bias.

We note that the covariance between different frequencies is given by:

𝒮~k1,k2(X)\displaystyle\tilde{\mathcal{S}}^{(X)}_{k_{1},k_{2}} =\displaystyle= 1Nn=1N(n1)ΔtnΔtE{σt2}𝑑te2iπn(k1Nk2N)\displaystyle\frac{1}{N}\sum_{n=1}^{N}\int_{(n-1)\Delta t}^{n\Delta t}\mathrm{E}\left\{\sigma_{t}^{2}\right\}\;dte^{-2i\pi n\left(\frac{k_{1}}{N}-\frac{k_{2}}{N}\right)}
=\displaystyle= 1N0TE{σt2}𝑑te2iπt(k1Tk2T)+𝒪(Δt2).\displaystyle\frac{1}{N}\int_{0}^{T}\mathrm{E}\left\{\sigma_{t}^{2}\right\}\;dte^{-2i\pi t\left(\frac{k_{1}}{T}-\frac{k_{2}}{T}\right)}+{\mathcal{O}}\left(\Delta t^{2}\right).

Let Σ(f):=0TEσt2e2iπft𝑑t\Sigma(f):=\int_{0}^{T}\mathrm{E}\sigma_{t}^{2}e^{-2i\pi ft}\;dt. We can bound the size of Σ(k1Tk2T)\Sigma(\frac{k_{1}}{T}-\frac{k_{2}}{T}) as |k1k2||k_{1}-k_{2}| increases. As E{σt2}\mathrm{E}\left\{\sigma_{t}^{2}\right\} is smooth in tt the modulus of the covariance can be bounded for increasing |k1k2||k_{1}-k_{2}|, as the Fourier transform Σ(k1Tk2T)\Sigma(\frac{k_{1}}{T}-\frac{k_{2}}{T}) decays proportionally to |k1k2|α1|k_{1}-k_{2}|^{-\alpha-1} where α\alpha is the number of smooth derivatives of E{σt2}\mathrm{E}\left\{\sigma_{t}^{2}\right\}. We can also directly note that the variance of the discrete Fourier transform of the noise is precisely (this is not a large sample result)

𝒮k1,k2(ε)\displaystyle{\mathcal{S}}^{(\varepsilon)}_{k_{1},k_{2}} =\displaystyle= σε2|2sin(πfk1Δt)|2δk1,k2,\displaystyle\sigma^{2}_{\varepsilon}\left|2\sin\left(\pi f_{k_{1}}\Delta t\right)\right|^{2}\delta_{k_{1},k_{2}}, (12)

by virtue of being the first difference of white noise (see also [4]). The naive estimator can therefore be rewritten as, with 𝒮k1,k2(Y)=cov{Jk1(Y),Jk2(Y)}{\mathcal{S}}_{k_{1},k_{2}}^{(Y)}={\mathrm{cov}}\{J_{k_{1}}^{(Y)},J_{k_{2}}^{(Y)}\}:

X,X^T(b)\displaystyle\widehat{\langle X,X\rangle}_{T}^{(b)} =\displaystyle= j=1NΔYtj2=k=0N1|Jk(Y)|2,\displaystyle\sum_{j=1}^{N}\Delta Y_{t_{j}}^{2}=\sum_{k=0}^{N-1}\left|J^{(Y)}_{k}\right|^{2}, (13a)
𝒮^k,k(Y)\displaystyle\widehat{\mathcal{S}}^{(Y)}_{k,k} =\displaystyle= |Jk(Y)|2,\displaystyle\left|J^{(Y)}_{k}\right|^{2}, (13b)
E{X,X^T(u)}\displaystyle\mathrm{E}\left\{\widehat{\langle X,X\rangle}_{T}^{(u)}\right\} =\displaystyle= k=0N1𝒮~k,k(X)+𝒪(log(Δt)Δt)+𝒪(Δt)\displaystyle\sum_{k=0}^{N-1}\tilde{\mathcal{S}}_{k,k}^{(X)}+{\mathcal{O}}\left(\log(\Delta t)\sqrt{\Delta t}\right)+{\mathcal{O}}\left(\Delta t\right) (13c)
\displaystyle\equiv k=0N1𝒮k,k(X).\displaystyle\sum_{k=0}^{N-1}{\mathcal{S}}_{k,k}^{(X)}. (13d)

The Parseval-Rayleigh relationship in (13a) is discussed in [22], and is used in [21]. We shall now develop a frequency domain specification of the bias of the naive estimator.

Lemma 1

(Frequency Domain Bias of the Naive Estimator) Let XtX_{t} be an Itô process and assume that the covariance of Jk1(X)J_{k_{1}}^{(X)} and Jk2(X)J_{k_{2}}^{(X)} to be 𝒮k1,k2(X){\mathcal{S}}^{(X)}_{k_{1},k_{2}} with the chosen sampling. Then the naive estimator of the integrated volatility given by (13) has an expectation given by:

E{X,X^T(b)}\displaystyle\mathrm{E}\left\{\widehat{\langle X,X\rangle}_{T}^{(b)}\right\} =\displaystyle= k=0N1(𝒮~k,k(X)+σε2|2sin(πfkΔt)|2)+𝒪(log(Δt)Δt)\displaystyle\sum_{k=0}^{N-1}\left(\tilde{\mathcal{S}}^{(X)}_{k,k}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}\right)+{\mathcal{O}}\left(\log(\Delta t)\sqrt{\Delta t}\right)
=\displaystyle= E{X,X^T(u)}+k=0N1σε2|2sin(πfkΔt)|2+𝒪(log(Δt)Δt)\displaystyle\mathrm{E}\left\{\widehat{\langle X,X\rangle}_{T}^{(u)}\right\}+\sum_{k=0}^{N-1}\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}+{\mathcal{O}}\left(\log(\Delta t)\sqrt{\Delta t}\right)
=\displaystyle= 𝒪(1)+𝒪(Δt1)+𝒪(log(Δt)Δt).\displaystyle{\mathcal{O}}(1)+{\mathcal{O}}(\Delta t^{-1})+{\mathcal{O}}\left(\log(\Delta t)\sqrt{\Delta t}\right).
Proof:

This result follows from the independence of {εt}\{\varepsilon_{t}\} and {Xt}\{X_{t}\}, combined with (11) and (12). ∎

We notice directly from (1) that the relative frequency contribution of ΔXt\Delta X_{t} and εt\varepsilon_{t}, i.e. 𝒮k,k(X){\mathcal{S}}_{k,k}^{(X)} compared to the noise contribution σε2|2sin(πfkΔt)|2\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2} determines the inherent bias of X,X^T(b)\widehat{\langle X,X\rangle}_{T}^{(b)}. Estimator (13) is inconsistent and biased since it is equivalent to estimator (4), and such a procedure would give an unbiased estimator of the integrated volatility only when σε2=0\sigma^{2}_{\varepsilon}=0. When the estimator is expressed in the time domain the microstructure cannot be disentangled from the Itô process. On the other hand in the frequency domain, from the very nature of a multiscale process, the contributions to 𝒮^k,k(Y)\widehat{\mathcal{S}}^{(Y)}_{k,k} can be disentangled.

II-B Multiscale Modelling

To correct the biased estimator we need to correct the usage of the biased estimator of 𝒮k,k(X),{\mathcal{S}}_{k,k}^{(X)}, 𝒮^k,k(Y),\widehat{\mathcal{S}}_{k,k}^{(Y)}, at each frequency. We therefore define a new shrinkage estimator [33, p. 155] of 𝒮k,k(X){\mathcal{S}}_{k,k}^{(X)} by

𝒮^k,k(X)(Lk)=Lk𝒮^k,k(Y).\widehat{{\mathcal{S}}}_{k,k}^{(X)}(L_{k})=L_{k}\widehat{\mathcal{S}}^{(Y)}_{k,k}. (15)

0Lk10\leq L_{k}\leq 1 is referred to as the multiscale ratio and its optimal form for perfect bias correction is for an arbitrary Itô process given by

Lk=𝒮k,k(X)𝒮k,k(X)+σε2|2sin(πfkΔt)|2.L_{k}=\frac{{\mathcal{S}}^{(X)}_{k,k}}{{\mathcal{S}}^{(X)}_{k,k}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}}. (16)

This quantity cannot be calculated without explicit knowledge of 𝒮k,k(X){\mathcal{S}}^{(X)}_{k,k} and σε2\sigma^{2}_{\varepsilon}. We can however use (10) to simplify (16) to obtain

Lk=σ¯X2σ¯X2+σε2|2sin(πfkΔt)|2.L_{k}=\frac{\overline{\sigma}^{2}_{X}}{\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}}. (17)

For a fixed 0Lk10\leq L_{k}\leq 1

E{𝒮^kk(X)(Lk)}\displaystyle\mathrm{E}\left\{\widehat{\mathcal{S}}^{(X)}_{kk}(L_{k})\right\} =\displaystyle= LkE{|Jk(Y)|2}\displaystyle L_{k}\mathrm{E}\left\{\left|J^{(Y)}_{k}\right|^{2}\right\}
=\displaystyle= σ¯X2+𝒪(Δt3/2k),\displaystyle\overline{\sigma}_{X}^{2}+{\mathcal{O}}\left(\frac{\Delta t^{3/2}}{k}\right),

where the order terms follow from the continuity of μs\mu_{s}. We can define a new estimator for the true LkL_{k} via:

X,X^T(m3)\displaystyle\widehat{\langle X,X\rangle}_{T}^{(m_{3})} =\displaystyle= k=0N1𝒮^kk(X)(Lk),\displaystyle\sum_{k=0}^{N-1}\widehat{\mathcal{S}}^{(X)}_{kk}(L_{k}),

where

E{X,X^T(m3)}\displaystyle\mathrm{E}\left\{\widehat{\langle X,X\rangle}_{T}^{(m_{3})}\right\} =\displaystyle= X,XT+𝒪(log(Δt)Δt).\displaystyle\langle X,X\rangle_{T}+{\mathcal{O}}\left(\log(\Delta t)\sqrt{\Delta t}\right).

Recall that X,XT=O(T)=O(1)\langle X,X\rangle_{T}=O(T)=O(1). Consequently, to leading order we can remove the bias from the naive estimator if we know the multiscale ratio. We shall now develop a multiscale understanding of the process under observation and use this to construct an estimator for the multiscale ratio.

II-C Estimation of the Multiscale Ratio

We have a two-parameter description on how the energy should be adjusted at each frequency. We only need to determine estimators of 𝝈=(σ¯X2,σε2)\bm{\sigma}=\left(\overline{\sigma}^{2}_{X},\,\sigma^{2}_{\varepsilon}\right). We propose to implement the estimation using the Whittle likelihood methods (see [3] or [34, 35]). For a time-domain sample 𝚫𝐘=(ΔYt1,,ΔYtN)\bm{\Delta}{\mathbf{Y}}=\left(\Delta Y_{t_{1}},\dots,\Delta Y_{t_{N}}\right) that is stationary, if suitable conditions are satisfied, see for example [8], then the Whittle likelihood approximates the time domain likelihood, with improving approximation as the sample size increases. It is possible to show a number of suitable properties of estimators based on the Whittle likelihood, see [34, 35]. For processes that are not stationary, such conditions are in general not met, and so the function can be used as an objective function to construct estimators, but not as a true likelihood. The Whittle log-likelihood is defined [34, 35] by

l(𝒮)\displaystyle l({\bf{\mathcal{S}}}) log[k=1N/211𝒮kk(Y)e𝒮^kk(Y)𝒮kk(Y)]\displaystyle\equiv\log\left[\prod_{k=1}^{N/2-1}\frac{1}{{\mathcal{S}}^{(Y)}_{kk}}e^{-\frac{\widehat{\mathcal{S}}^{(Y)}_{kk}}{{\mathcal{S}}^{(Y)}_{kk}}}\right]
=k=1N/21log(𝒮kk(X)+σε2|2sin(πfkΔt)|2)k=1N/21𝒮^kk(Y)𝒮kk(X)+σε2|2sin(πfkΔt)|2.\displaystyle=-\sum_{k=1}^{N/2-1}\log\left({\mathcal{S}}^{(X)}_{kk}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}\right)-\sum_{k=1}^{N/2-1}\frac{\widehat{{\mathcal{S}}}^{(Y)}_{kk}}{{\mathcal{S}}^{(X)}_{kk}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}}.

If {ΔXt}\left\{\Delta X_{t}\right\} is not stationary, then as long as the total contributions of the covariance of the incremental process can be bounded, using this likelihood will asymptotically (in Δt1\Delta t^{-1}) produce suitable estimators, as we shall discuss further.

Definition II.1

(Multiscale Energy Likelihood)
The multiscale energy log-likelihood is then defined using (10) as:

l(𝝈)\displaystyle l(\bm{\sigma}) =k=1N/21log(σ¯X2+σε2|2sin(πfkΔt)|2)k=1N/21𝒮^kk(Y)σ¯X2+σε2|2sin(πfkΔt)|2.\displaystyle=-\sum_{k=1}^{N/2-1}\log\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}\right)-\sum_{k=1}^{N/2-1}\frac{\widehat{\mathcal{S}}^{(Y)}_{kk}}{\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}}. (18)

We stress that strictly speaking this may not be a (log-)likelihood, but merely a device for determining the multiscale ratio. We maximise this function in 𝝈\bm{\sigma} to obtain a set of estimators 𝝈^\widehat{\bm{\sigma}}.

Theorem 1

(The Estimated Multiscale Ratio)
The estimated multiscale ratio is given by

L^k\displaystyle\widehat{L}_{k} =\displaystyle= σ^X2σ^X2+σ^ε2|2sin(πfkΔt)|2,\displaystyle\frac{\widehat{\sigma}^{2}_{X}}{\widehat{\sigma}^{2}_{X}+\widehat{\sigma}^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}}, (19)

where σ^X2\widehat{\sigma}^{2}_{X} and σ^ε2\widehat{\sigma}^{2}_{\varepsilon} maximise (𝛔)\ell(\bm{\sigma}) given in (18). L^k\widehat{L}_{k} satisfies

L^kLk\displaystyle\frac{\widehat{L}_{k}}{L_{k}} =\displaystyle= 1+𝒪(Δt1/4).\displaystyle 1+{\mathcal{O}}\left(\Delta t^{1/4}\right). (20)
Proof:

See Appendix A. ∎

Combining (15) with (19) the proposed estimator of the spectral density of {ΔXt}\left\{\Delta X_{t}\right\} is:

𝒮^kk(X)(L^k)=L^k𝒮^kk(Y),\widehat{\mathcal{S}}^{(X)}_{kk}(\widehat{L}_{k})=\widehat{L}_{k}\widehat{\mathcal{S}}^{(Y)}_{kk}, (21)

where L^k\widehat{L}_{k} is given by (19).

Theorem 2

(The Multiscale Estimator of the Integrated Volatility)
Assume that ΔXtj\Delta X_{t_{j}} satisfies the conditions of Lemma 1 and Theorem 1. The multiscale estimator of the integrated volatility defined by

X,X^T(m1)=k=0N1𝒮^kk(X)(L^k),\widehat{\langle X,X\rangle}_{T}^{(m_{1})}=\sum_{k=0}^{N-1}\widehat{\mathcal{S}}^{(X)}_{kk}(\widehat{L}_{k}), (22)

where 𝒮^kk(X)(L^k)\widehat{\mathcal{S}}^{(X)}_{kk}(\widehat{L}_{k}) is defined by (21) has a mean and variance given by:

E{X,X^T(m1)}\displaystyle\mathrm{E}\left\{\widehat{\langle X,X\rangle}_{T}^{(m_{1})}\right\} =\displaystyle= k=0N1𝒮kk(X)+𝒪(log(Δt)Δt)+𝒪(Δt1/4)\displaystyle\sum_{k=0}^{N-1}{\mathcal{S}}^{(X)}_{kk}+{\mathcal{O}}\left(\log(\Delta t)\sqrt{\Delta t}\right)+{\mathcal{O}}\left(\Delta t^{1/4}\right)
=\displaystyle= 0TE{σt2}+𝒪(log(Δt)Δt)+𝒪(Δt1/4)\displaystyle\int_{0}^{T}\mathrm{E}\left\{\sigma^{2}_{t}\right\}+{\mathcal{O}}\left(\log(\Delta t)\sqrt{\Delta t}\right)+{\mathcal{O}}\left(\Delta t^{1/4}\right)

and

var{X,X^T(m1)}\displaystyle\mathrm{var}\left\{\widehat{\langle X,X\rangle}_{T}^{(m_{1})}\right\} =\displaystyle= k=0N1Lk2|𝒮k,k(Y)|2+𝒪(Δt1/2)=𝒪(Δt1/2).\displaystyle\sum_{k=0}^{N-1}{L}_{k}^{2}\left|{\mathcal{S}}^{(Y)}_{k,k}\right|^{2}+{\mathcal{O}}(\Delta t^{1/2})={\mathcal{O}}(\Delta t^{1/2}).
Proof:

See Appendix B. ∎

We also note that

var{X,X^T(m1)}\displaystyle\mathrm{var}\left\{\widehat{\langle X,X\rangle}_{T}^{(m_{1})}\right\} =\displaystyle= k=0N1Lk2|𝒮k,k(Y)|2+𝒪(Δt1/2)\displaystyle\sum_{k=0}^{N-1}{L}_{k}^{2}\left|{\mathcal{S}}^{(Y)}_{k,k}\right|^{2}+{\mathcal{O}}(\Delta t^{1/2}) (23)
<\displaystyle< 𝒪(1Δt)=var{X,X^T(b)},\displaystyle{\mathcal{O}}\left(\frac{1}{\Delta t}\right)=\mathrm{var}\left\{\widehat{\langle X,X\rangle}_{T}^{(b)}\right\},

unless σε=0\sigma_{\varepsilon}=0. We note that the multiscale estimator has lower variance than the naive method of moments estimator X,X^T(b)\widehat{\langle X,X\rangle}_{T}^{(b)} due to the fact that 0Lk10\leq{L}_{k}\leq 1. We have thus removed bias and simultaneously decreased the variance, the latter effect usually being the main purpose of shrinkage estimators. Note that if we knew the true multiscale ratio LkL_{k} and used it rather than L^k\widehat{L}_{k} (i.e. used X,X^T(m3)\widehat{\langle X,X\rangle}_{T}^{(m_{3})}) then we would expect an estimator from this quantity to recover the same variance as the estimator based on the noise-free observations. This loss of efficiency is inevitable, as we have to estimate LkL_{k}. Finally we can also construct a Whittle estimator for the integrated volatility by starting from (10) and taking

X,X^T(w)=Nσ^X2.\widehat{\langle X,X\rangle}_{T}^{(w)}=N\widehat{\sigma}^{2}_{X}. (24)

The sampling properties of X,X^T(w)\widehat{\langle X,X\rangle}_{T}^{(w)} are found in Appendix A, and σ^X2\widehat{\sigma}_{X}^{2} is asymptotically unbiased. The results in Appendix A imply that

var{X,X^T(w)}\displaystyle\mathrm{var}\left\{\widehat{\langle X,X\rangle}_{T}^{(w)}\right\} =\displaystyle= TσετX1/216τX2Δt.\displaystyle T\frac{\sigma_{\varepsilon}}{\tau_{X}^{1/2}}16\tau_{X}^{2}\sqrt{\Delta t}. (25)

We see that the variance depends on the length of the time course, the inverse of the signal to noise ratio, the square root of the sampling period and the fourth power of the “average standard deviation” of the XtX_{t} process., We may compare the variance of (23) with the variance of (25), to determine which estimator of X,X^T(w)\widehat{\langle X,X\rangle}_{T}^{(w)} and X,X^T(m1)\widehat{\langle X,X\rangle}_{T}^{(m_{1})} is preferable. We shall return to this question of relative performance in the examples section, but intuitively argue that X,X^T(w)\widehat{\langle X,X\rangle}_{T}^{(w)} and X,X^T(m1)\widehat{\langle X,X\rangle}_{T}^{(m_{1})} are more or less the same estimator, with the latter estimator being more intuitive to explain.

II-D Time Domain Understanding of the Method

We may write the frequency domain estimator of the spectral density of ΔXt\Delta X_{t} in the time domain to clarify some of its properties. We define

s^τ(X)=1Nk=0N1𝒮^kk(X)e2iπkτN,τ,\widehat{s}^{(X)}_{\tau}=\frac{1}{N}\sum_{k=0}^{N-1}\widehat{\mathcal{S}}^{(X)}_{kk}e^{2i\pi\frac{k\tau}{N}},\quad\tau\in{\mathbb{N}},

which when ΔXt\Delta X_{t} is a stationary process corresponds to the estimated autocovariance sequence of ΔXt\Delta X_{t} using the method of moments estimator [5, Ch. 5]. We then have

𝒮^kk(X)\displaystyle\widehat{\mathcal{S}}^{(X)}_{kk} =\displaystyle= Lk𝒮^kk(Y),s^τ(X)=uτus^u(Y),\displaystyle L_{k}\widehat{\mathcal{S}}^{(Y)}_{kk},\quad\widehat{s}^{(X)}_{\tau}=\sum_{u}\ell_{\tau-u}\widehat{s}^{(Y)}_{u}, (26)

and so the estimated autocovariance of the ΔXt\Delta X_{t} process, namely s^τ(X)\widehat{s}^{(X)}_{\tau}, is a smoothed version of s^τ(Y)\widehat{s}^{(Y)}_{\tau}. We can therefore view 𝒮^kk(X)\widehat{\mathcal{S}}^{(X)}_{kk} as the Fourier transform of a smoothed version of the autocovariance sequence of ΔYt\Delta Y_{t}. We let

L(f)=σ¯X2σ¯X2+σε2|2sin(πfΔt)|2,L(f)=\frac{\overline{\sigma}^{2}_{X}}{\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f\Delta t)\right|^{2}}, (27)

be the continuous analogue of LkL_{k}. To find the smoothing kernel we are using we need to calculate

τ\displaystyle\ell_{\tau} =\displaystyle= 1Nk=0N1Lke2iπkτN\displaystyle\frac{1}{N}\sum_{k=0}^{N-1}L_{k}e^{2i\pi\frac{k\tau}{N}} (28)
=\displaystyle= 1212σ¯X2σ¯X2+4σε2sin2(πf)e2iπfτ𝑑f+𝒪(Δt).\displaystyle\int_{-\frac{1}{2}}^{\frac{1}{2}}\frac{\overline{\sigma}^{2}_{X}}{\overline{\sigma}^{2}_{X}+4\sigma^{2}_{\varepsilon}\sin^{2}(\pi f)}\;e^{2i\pi f\tau}\;df+{\mathcal{O}}\left(\Delta t\right).

Thus utilizing integration in the complex plane (see Appendix C) we obtain that

τ={(σε2σ¯X2)τ+𝒪((σεσ¯X)2τ+2)ifσε2<σ¯X2σ¯X2σε(1σ¯Xσε)τ+𝒪(σ¯X22σε2(1σ¯Xσε)τ)ifσε2>σ¯X2\displaystyle\ell_{\tau}=\left\{\begin{array}[]{lcr}\left(\frac{\sigma^{2}_{\varepsilon}}{\overline{\sigma}^{2}_{X}}\right)^{\tau}+{\mathcal{O}}\left(\left(\frac{\sigma_{\varepsilon}}{\overline{\sigma}_{X}}\right)^{2\tau+2}\right)&{\mathrm{if}}&\sigma^{2}_{\varepsilon}<\overline{\sigma}^{2}_{X}\\ \frac{\overline{\sigma}_{X}}{2\sigma_{\varepsilon}}\left(1-\frac{\overline{\sigma}_{X}}{\sigma_{\varepsilon}}\right)^{\tau}+{\mathcal{O}}\left(\frac{\overline{\sigma}_{X}^{2}}{2\sigma_{\varepsilon}^{2}}\left(1-\frac{\overline{\sigma}_{X}}{\sigma_{\varepsilon}}\right)^{\tau}\right)&{\mathrm{if}}&\sigma^{2}_{\varepsilon}>\overline{\sigma}_{X}^{2}\end{array}\right. (31)

These are both decreasing sequences in τ\tau. We write rτ=σ¯X2σε(1σ¯Xσε)τ.r_{\tau}=\frac{\overline{\sigma}_{X}}{2\sigma_{\varepsilon}}\left(1-\frac{\overline{\sigma}_{X}}{\sigma_{\varepsilon}}\right)^{\tau}. If we can additionally assume that L(f)L(f) decreases sufficiently rapidly to be near zero by f=1πf=\frac{1}{\pi} then we find that

τqτ=σ¯X2σεeσ¯Xσε|τ|.\displaystyle\ell_{\tau}\approx q_{\tau}=\frac{\overline{\sigma}_{X}}{2\sigma_{\varepsilon}}e^{-\frac{\overline{\sigma}_{X}}{\sigma_{\varepsilon}}|\tau|}.

In the limit of no observation noise (σ¯Xσε\frac{\overline{\sigma}_{X}}{\sigma_{\varepsilon}}\rightarrow\infty) then this sequence becomes a delta function centered at τ=0\tau=0. Let us plot these functions, i.e. τ\ell_{\tau}, rτr_{\tau} and qτq_{\tau} for a chosen case of σ¯X2/σε20.0331\overline{\sigma}_{X}^{2}/\sigma^{2}_{\varepsilon}\approx 0.0331 (the approximate SNR used in a later example), in Figure 1 (left). We see that theory coincides very well with practise, and almost perfect agreement between the three functions. τ\ell_{\tau} is however a strange choice of kernel, if dictated by the statistical inference problem: it has heavier tails than the common choice of the Gaussian kernel, and is extremely peaked around zero (a Gaussian kernel with the same variance has been overlaid in Figure 1). This is not strange, as we are trying to filter out correlations due to non-Itô behaviour, but counter to our intuition about suitable kernel functions, as the differenced Itô process exhibits very little covariance at any lag but zero, the sharp peak at zero is necessary.

Refer to caption Refer to caption

Figure 1: τ\ell_{\tau} as well as rτr_{\tau} and qτq_{\tau} for a chosen value of the SNR (left). The approximate weighting functions perfectly mirror the exact calculation. We overlay a Gaussian kernel with the same spread for comparison. τ\ell_{\tau} estimated for the MA(6) case (right).

II-E Correlated Errors

In many applications we need to consider correlated observation noise. We assume that despite being dependent the εtj\varepsilon_{t_{j}} is a stationary time series. Stationary processes can be conveniently represented in terms of aggregations of uncorrelated white noise processes, using the Wold decomposition theorem [6][p. 187]. We may therefore write the zero-mean observation εtj\varepsilon_{t_{j}} as

εtj=k=0θtkηtjtk,\varepsilon_{t_{j}}=\sum_{k=0}^{\infty}\theta_{t_{k}}\eta_{t_{j}-t_{k}}, (32)

where θt01,\theta_{t_{0}}\equiv 1, jθtj2<,\sum_{j}\theta_{t_{j}}^{2}<\infty, and {ηtn}\left\{\eta_{t_{n}}\right\} satisfies E{ηtn}=0\mathrm{E}\left\{\eta_{t_{n}}\right\}=0 and E{ηtnηtm}=ση2δn,m\mathrm{E}\left\{\eta_{t_{n}}\eta_{t_{m}}\right\}=\sigma^{2}_{\eta}\delta_{n,m}, a model also used in [31]. Common practise would involve approximating the variable by a finite number of elements in the sum, and thus we truncate (32) to some qq\in{\mathbb{Z}}. We therefore model the noise as a Moving Average (MA) process specified by

εtj=ηtj+k=1qθtkηtjk,\varepsilon_{t_{j}}=\eta_{t_{j}}+\sum_{k=1}^{q}\theta_{t_{k}}\eta_{t_{j-k}}, (33)

and the covariance of the DFT of the differenced εtj\varepsilon_{t_{j}} process takes the form:

𝒮k,k(ε)=ση2|1+k=1qθke2iπfk|2|2sin(πfΔt)|2.{\mathcal{S}}^{(\varepsilon)}_{k,k}=\sigma^{2}_{\eta}\left|1+\sum_{k=1}^{q}\theta_{k}e^{2i\pi fk}\right|^{2}|2\sin\left(\pi f\Delta t\right)|^{2}. (34)

This leads to defining a new multiscale ratio replacing σε2|2sin(πfΔt)|2\sigma^{2}_{\varepsilon}|2\sin\left(\pi f\Delta t\right)|^{2} of (17) with ση2|1+k=1qθke2iπfk|2|2sin(πfΔt)|2\sigma^{2}_{\eta}\left|1+\sum_{k=1}^{q}\theta_{k}e^{2i\pi fk}\right|^{2}|2\sin\left(\pi f\Delta t\right)|^{2}. We then obtain a new estimator of 𝒮kk(X){\mathcal{S}}^{(X)}_{kk}. In general the value of qq is not known. To simultaneously implement model choice, we need to penalize the likelihood. We define the corrected Aikake information criterion (AICC) by [6, p. 303] (refer to (18) for l(𝝈,𝜽)l\left(\bm{\sigma},\bm{\theta}\right) with σε2|2sin(πfΔt)|2\sigma^{2}_{\varepsilon}|2\sin\left(\pi f\Delta t\right)|^{2} replaced by ση2|1+k=1qθke2iπfk|2|2sin(πfΔt)|2\sigma^{2}_{\eta}\left|1+\sum_{k=1}^{q}\theta_{k}e^{2i\pi fk}\right|^{2}|2\sin\left(\pi f\Delta t\right)|^{2})

AICC(𝜽)\displaystyle{\mathrm{AICC}}(\bm{\theta}) =\displaystyle= 2l(𝝈,𝜽)+2(p+2)nnp3.\displaystyle-2l\left(\bm{\sigma},\bm{\theta}\right)+2\frac{(p+2)n}{n-p-3}. (35)

By minimizing this function, in 𝝈\bm{\sigma}, 𝜽\bm{\theta} and qq, we obtain the best fitting model for the noise accounting for overfitting by using the penalty term. With this method we retrieve a new multiplier that is applied in the Fourier domain, which corresponds to a new smoother in the Fourier domain, where the smoothing window (and its smoothing width) have been automatically chosen by the data. See an example of such a smoothing window τ\ell_{\tau} in in Figure 1 (right). Here LkL_{k} has been estimated from an Itô process immersed in an MA noise process. The spectrum of the MA has a trough at frequency 0.42. We therefore expect to reinforce oscillations at period 1/0.422.51/0.42\approx 2.5, which is evident from the oscillations of the estimated kernel. For more details of this process see section III-E.

III Monte Carlo Studies

In this section we demonstrate the performance of the multiscale estimator through Monte Carlo simulations. We first describe the de-biasing procedure of the estimator for the Heston Model using Fourier domain graphs. We then present bias, variance and mean square error results of various estimators (including the multiscale estimator, the naive estimator and the first-best estimator developed in [36]), for the Heston Model as well as Brownian and Ornstein Uhlenbeck processes. We then consider the case where the sample path in a Heston Model is much shorter and another case where the microstructure noise is greatly reduced. Finally, we consider the case of correlated errors and show how a stationary noise process can be captured using model choice methods and then the integrated volatility can be estimated using the adjusted multiscale estimator.

III-A The Heston Model

The Heston model is specified in [16]:

dXt=(μνt/2)dt+σtdBt,dνt=κ(ανt)dt+γνt1/2dWt,dX_{t}=\left(\mu-\nu_{t}/2\right)dt+\sigma_{t}dB_{t},\quad d\nu_{t}=\kappa\left(\alpha-\nu_{t}\right)dt+\gamma\nu_{t}^{1/2}dW_{t}, (36)

where νt=σt2\nu_{t}=\sigma_{t}^{2}, and BtB_{t} and WtW_{t} are correlated 1-D Brownian motions. We will use the same parameter values to the ones that were used in [36], namely μ=.05,κ=5,α=.04,γ=.5\mu=.05,\,\kappa=5,\,\alpha=.04,\,\gamma=.5 and the correlation coefficient between the two Brownian motions B and W is ρ=.5\rho=-.5. We set X0=0X_{0}=0 and ν0=0.04\nu_{0}=0.04, which is the long time limit of the expectation of the process νt\nu_{t}.111limt+𝔼νt=α.\lim_{t\rightarrow+\infty}\mathbb{E}\nu_{t}=\alpha.

We calculate 𝒮^kk(X)\widehat{\mathcal{S}}^{(X)}_{kk} and 𝒮^kk(ε)\widehat{\mathcal{S}}^{(\varepsilon)}_{kk} directly from simulated data and average across realizations, producing Figure 2, where kk is indicated by its frequency fk=k/Nf_{k}=k/N, and only plotted for k=0,,N/21k=0,\dots,N/2-1, as the spectrum (or 𝒮kk(X){\mathcal{S}}_{kk}^{(X)}) is symmetric. We see directly from these plots that (on average as we showed) 𝒮^kk(X)\widehat{\mathcal{S}}^{(X)}_{kk} is constant whilst 𝒮^kk(ε)\widehat{\mathcal{S}}^{(\varepsilon)}_{kk} is strongly increasing with kk, completely dwarfing the other spectrum at large kk. (11) implies that an equal weighting is given to all frequencies for the differenced Itô process. The noise process will in contrast have a spectrum that is far from flat, and a suitable bias correction would shrink the estimator of 𝒮kk(X){\mathcal{S}}^{(X)}_{kk} at higher frequencies.

Refer to caption Refer to caption

Figure 2: 𝒮^kk(X)\widehat{\mathcal{S}}^{(X)}_{kk} (left) and 𝒮^kk(ε)\widehat{\mathcal{S}}^{(\varepsilon)}_{kk} (right) averaged over 100,000 realizations. Note the different scaling of the yy axis in the two figures.

We also calculate 𝒮^kk(X)\widehat{\mathcal{S}}^{(X)}_{kk} and 𝒮^kk(ε)\widehat{\mathcal{S}}^{(\varepsilon)}_{kk} for one simulated path, displayed in Figure 3. Here we have used the same sample length TT and noise intensity σε2\sigma^{2}_{\varepsilon} as in [36]: T=1T=1 day and σε2=0.00052\sigma^{2}_{\varepsilon}=0.0005^{2}. The length of the sample path, T=1T=1 day or 23,400s23,400s with Δt=1s\Delta t=1s, corresponds to one trading day, since we take one trading day to be 6.5h6.5h long. Notice the different shape of the two periodograms. 𝒮^kk(Y)\widehat{\mathcal{S}}^{(Y)}_{kk} will not be distinguishable from 𝒮^kk(ε)\widehat{\mathcal{S}}^{(\varepsilon)}_{kk} at higher frequencies, despite the moderate to low intensity of the market microstructure noise. If we observed the two components XtX_{t} and εt\varepsilon_{t} separately, then the multiscale ratio LkL_{k} could be estimated from 𝒮^kk(X)\widehat{\mathcal{S}}^{(X)}_{kk} and 𝒮^kk(ε)\widehat{\mathcal{S}}^{(\varepsilon)}_{kk} using the method of moments formula. In this case, we would estimate LkL_{k} by the sample Fourier Transform variances

L~k=𝒮^kk(X)𝒮^kk(X)+𝒮^kk(ε).\widetilde{L}_{k}=\frac{\widehat{\mathcal{S}}^{(X)}_{kk}}{\widehat{\mathcal{S}}^{(X)}_{kk}+\widehat{\mathcal{S}}^{(\varepsilon)}_{kk}}. (37)

The corresponding estimator of the integrated volatility becomes

X,X^T(m2)=k=0N1L~k𝒮^kk(Y).\widehat{\langle X,X\rangle}_{T}^{(m_{2})}=\sum_{k=0}^{N-1}\widetilde{L}_{k}\widehat{\mathcal{S}}^{(Y)}_{kk}. (38)

The estimated multiscale ratio L~k\widetilde{L}_{k}, for the Heston model with the specified parameters, is plotted in Figure 4.

Refer to caption Refer to caption Refer to caption Refer to caption

Figure 3: A realisation of 𝒮^kk(X)\widehat{\mathcal{S}}^{(X)}_{kk} (top left), a realisation of 𝒮^kk(ε)\widehat{\mathcal{S}}^{(\varepsilon)}_{kk} (top right) with the Whittle estimates superimposed and of two biased corrected estimators of 𝒮kk(X){\mathcal{S}}^{(X)}_{kk}, using L~k𝒮^kk(X)\widetilde{L}_{k}\widehat{\mathcal{S}}^{(X)}_{kk} (bottom left) and L^k𝒮^kk(Y)\widehat{L}_{k}\widehat{\mathcal{S}}^{(Y)}_{kk} (bottom right). Notice the different scales in the four figures. Estimated spectra are plotted on a linear scale for ease of comparison to the effect of applying L^k\widehat{L}_{k}.

Refer to caption


Figure 4: The method of moments estimate L~k\widetilde{L}_{k} from a single realisation, with the Whittle estimate (white line) of LkL_{k} superimposed.

The multiscale ratio cannot be estimated using the method of moments in realistic scenarios, as we only observe the aggregated process YtY_{t} and not the two processes XtX_{t} and εt\varepsilon_{t} separately. Figure 3 displays the estimated multiscale ratio L~k\widetilde{L}_{k} applied to 𝒮^kk(Y)\widehat{\mathcal{S}}^{(Y)}_{kk} over one path realisation. This plot suggests that the energy over the high frequencies has been shrunk and that L~k𝒮^kk(Y)\widetilde{L}_{k}\widehat{\mathcal{S}}^{(Y)}_{kk} is a good approximation to 𝒮^kk(X)\widehat{\mathcal{S}}^{(X)}_{kk}. It therefore seems not unreasonable that the summation of this function across frequencies should make a good approximation to the integrated volatility.

The parameters (σ^X2\widehat{\sigma}^{2}_{X} and σ^ε2\widehat{\sigma}^{2}_{\varepsilon}) are found separately for each path using the MATLAB function fmincon on (18). Figure 3 shows σ^X2\widehat{\sigma}^{2}_{X} and σ^ε2|2sin(πfkΔt)|2\widehat{\sigma}^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2} (in white) plotted over the periodograms 𝒮^kk(X)\widehat{\mathcal{S}}^{(X)}_{kk} and 𝒮^kk(ε)\widehat{\mathcal{S}}^{(\varepsilon)}_{kk} for one simulated path. The approximated values of σ¯X2\overline{\sigma}^{2}_{X} and σε2\sigma^{2}_{\varepsilon} are quite similar to the averaged periodograms of Figure 2; in fact the accuracy of the new estimator depends on how consistently these parameters are estimated in the presence of limited information from the sampled process YtY_{t}. Figure 4 shows the corresponding estimated multiscale ratio L^k\widehat{L}_{k} (in white) from this simulated path, as defined in (19). The function decays, as expected, so that it will remove the high-frequency microstructure noise in the spectrum of YtY_{t}; the ratio is also a good approximation of L~k\widetilde{L}_{k}. Figure 3 shows L^k𝒮^kk(Y)\widehat{L}_{k}\widehat{\mathcal{S}}^{(Y)}_{kk}, which is again similar to 𝒮^kk(X)\widehat{\mathcal{S}}^{(X)}_{kk}. It would appear that the new estimator has successfully removed the microstructure effect from each frequency.

It is worth noting that the ratios LkL_{k} and L^k\widehat{L}_{k} quantify the effect of the multiscale structure of the process. If σε2\sigma_{\varepsilon}^{2} is zero (ie. there is no microstructure noise), then no correction will be made to the spectral density function (the ratio will equal 1 at all frequencies). So in the case of zero microstructure noise, the estimate would recover 𝒮^kk(X)\widehat{\mathcal{S}}^{(X)}_{kk} and from (13) the estimate of the integrated volatility would simply be the realized integrated volatility of the observable process.

We investigate the performance of the multiscale estimator using Monte Carlo simulations. In this study 50,000 simulated paths are generated. Table I displays the results of our simulation, where biases, variances and errors are calculated using a Riemann sum approximation of the integral

TNi=1Nσi2=0Tσt2𝑑t.\frac{T}{N}\sum_{i=1}^{N}\sigma^{2}_{i}=\int_{0}^{T}\sigma^{2}_{t}\,dt. (39)

The two estimators X,X^T(u)\widehat{\langle X,X\rangle}_{T}^{(u)} and X,X^T(m2)\widehat{\langle X,X\rangle}_{T}^{(m_{2})} (see (5) and (38) respectively) are both included for comparison, even though these require use of the unobservable XtX_{t} process. The performance of the first-best estimator in [36] (denoted by X,X^T(s1)\widehat{\langle X,X\rangle}_{T}^{(s_{1})}) is also included as a well-performing and tested estimator using only the YtY_{t} process, as is the naive estimator of the realized volatility on YtY_{t} at the highest frequency, X,X^T(b)\widehat{\langle X,X\rangle}_{T}^{(b)}, given in (4) (the fifth-best estimator in [36]). We also include the performance of X,X^T(w)\widehat{\langle X,X\rangle}_{T}^{(w)}, defined in (24).

Table I shows that the new estimator, X,X^T(m1)\widehat{\langle X,X\rangle}_{T}^{(m_{1})}, is competitive with the first-best approach in [36], X,X^T(s1)\widehat{\langle X,X\rangle}_{T}^{(s_{1})}, as an estimator of the integrated volatility for the Heston model with the stated parameters. For this simulation the new method performed marginally better. The similar performance of the two estimators is quite remarkable, given their different approach; both estimators involve a bias-correction, [36] perform this globally by weighting different sampling frequencies, whilst we correct locally at each frequency. The realized integrated volatility of YtY_{t} at the highest frequency, X,X^T(b)\widehat{\langle X,X\rangle}_{T}^{(b)}, produces disastrous results, as expected.

Sample bias Sample variance Sample RMSE
X,X^T(b)\widehat{\langle X,X\rangle}_{T}^{(b)} 1.17×1021.17\times 10^{-2} 1.80×1081.80\times 10^{-8} 1.17×1021.17\times 10^{-2}
X,X^T(s1)\widehat{\langle X,X\rangle}_{T}^{(s_{1})} 6.44×1076.44\times 10^{-7} 2.76×10102.76\times 10^{-10} 1.66×1051.66\times 10^{-5}
X,X^T(m1)\widehat{\langle X,X\rangle}_{T}^{(m_{1})} 2.90×1072.90\times 10^{-7} 2.59×10102.59\times 10^{-10} 1.61×1051.61\times 10^{-5}
X,X^T(w)\widehat{\langle X,X\rangle}_{T}^{(w)} 2.63×1072.63\times 10^{-7} 2.59×10102.59\times 10^{-10} 1.61×1051.61\times 10^{-5}
X,X^T(m2)\widehat{\langle X,X\rangle}_{T}^{(m_{2})} 1.39×1081.39\times 10^{-8} 2.07×10102.07\times 10^{-10} 1.44×1051.44\times 10^{-5}
X,X^T(u)\widehat{\langle X,X\rangle}_{T}^{(u)} 1.20×1081.20\times 10^{-8} 2.06×10102.06\times 10^{-10} 1.44×1051.44\times 10^{-5}
TABLE I: Simulation study comparing the new estimator with the best estimator of [36].

We also note that X,X^T(m1)\widehat{\langle X,X\rangle}_{T}^{(m_{1})} performs more or less identically to X,X^T(w).\widehat{\langle X,X\rangle}_{T}^{(w)}. These two estimators can almost be used interchangeably due to the invariance property of a maximum likelihood estimator. This observation is born out by our simulation studies, and we henceforth only report results for X,X^T(m1)\widehat{\langle X,X\rangle}_{T}^{(m_{1})}. Note that the variance of X,X^T(w)\widehat{\langle X,X\rangle}_{T}^{(w)} can be found from (25). To compare theory with simulations we note that the average estimated standard deviation is 1.6093×1051.6093\times 10^{-5} whilst the expression for the variance to leading order gives an expression for the standard deviation of [var{X,X^T(w)}]1/2=1.0246×105\left[\mathrm{var}\left\{\widehat{\langle X,X\rangle}_{T}^{(w)}\right\}\right]^{1/2}=1.0246\times 10^{-5}, using the parameter values of σ¯X26.8×109\overline{\sigma}^{2}_{X}\approx 6.8\times 10^{-9} and σε22.5×107\sigma^{2}_{\varepsilon}\approx 2.5\times 10^{-7}.

A histogram of the observed bias of the new estimator is plotted in Figure 5 along with a histogram of the observed bias of the first-best estimator in [36]. The observed bias of our estimator follows a Gaussian distribution centred at zero, suggesting that this estimator is unbiased, as out results claim to be true. Comparing our estimator to the first-best estimator, it can be seen that the new estimator has similar magnitudes of error also (hence the similar Root Mean Square Error (RMSE)).

Refer to caption Refer to caption

Figure 5: The histograms of the observed bias of the proposed estimator (a), and the first-best estimator (b), over 100,000 sample paths.

The new estimator requires calculation of σ^X2\widehat{\sigma}^{2}_{X} and σ^ε2\widehat{\sigma}^{2}_{\varepsilon} which will vary over each process due to the limited information given from the YtY_{t} process. The stability of this estimation is of great importance if the estimator is to perform well. Figure 6 shows the distribution of the parameters σ^X2\widehat{\sigma}^{2}_{X} and σ^ε2\widehat{\sigma}^{2}_{\varepsilon} over the simulated paths. The parameter estimation is quite consistent, with all values estimated within a narrow range. Figure 2 suggests that these estimates are roughly unbiased; as σ¯X26.8×109\overline{\sigma}^{2}_{X}\approx 6.8\times 10^{-9} and σε22.5×107\sigma^{2}_{\varepsilon}\approx 2.5\times 10^{-7} (as σε2|2sin(πfk)|21×106\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k})\right|^{2}\approx 1\times 10^{-6}, at fk=0.5f_{k}=0.5).

Refer to caption Refer to caption

Figure 6: The histograms of the estimated σ¯X2\overline{\sigma}_{X}^{2} (a) and σε2\sigma^{2}_{\varepsilon} (b).

III-B Brownian Process and Ornstein Uhlenbeck Process

We repeated our simulations for a Brownian Process given by:

dXt=2σt2dBt,dX_{t}=\sqrt{2\sigma^{2}_{t}}dB_{t}, (40)

where σt2=0.01\sigma_{t}^{2}=0.01. We otherwise keep the same simulation setup as before with 50,000 simulated paths of length 23,400. The results are displayed in Table II. The new estimator, X,X^T(m1)\widehat{\langle X,X\rangle}_{T}^{(m_{1})}, again delivers a marked improvement on the naive estimator, X,X^T(b)\widehat{\langle X,X\rangle}_{T}^{(b)}, and performs marginally better than the first-best estimator in [36], X,X^T(s1)\widehat{\langle X,X\rangle}_{T}^{(s_{1})}.

Sample bias Sample variance Sample RMSE
X,X^T(b)\widehat{\langle X,X\rangle}_{T}^{(b)} 1.17×1021.17\times 10^{-2} 1.77×1081.77\times 10^{-8} 1.17×1021.17\times 10^{-2}
X,X^T(s1)\widehat{\langle X,X\rangle}_{T}^{(s_{1})} 6.52×1076.52\times 10^{-7} 2.68×10112.68\times 10^{-11} 5.22×1065.22\times 10^{-6}
X,X^T(m1)\widehat{\langle X,X\rangle}_{T}^{(m_{1})} 3.02×1073.02\times 10^{-7} 1.98×10111.98\times 10^{-11} 4.46×1064.46\times 10^{-6}
X,X^T(m2)\widehat{\langle X,X\rangle}_{T}^{(m_{2})} 1.96×1091.96\times 10^{-9} 6.93×10136.93\times 10^{-13} 8.32×1078.32\times 10^{-7}
X,X^T(u)\widehat{\langle X,X\rangle}_{T}^{(u)} 3.79×1093.79\times 10^{-9} 5.44×10135.44\times 10^{-13} 7.38×1077.38\times 10^{-7}
TABLE II: Simulation study for the Brownian process.

We also performed a Monte Carlo simulation for the Ornstein Uhlenbeck process given by:

dXt=Xtdt+2σtdBt,dX_{t}=X_{t}dt+\sqrt{2\sigma_{t}}dB_{t}, (41)

where also σt2=0.01\sigma_{t}^{2}=0.01. Again we retain the same simulation setup and the results are displayed in Table III. The results are almost identical to that of the Brownian process, with the new estimator again outperforming other time-domain estimators.

Sample bias Sample variance Sample RMSE
X,X^T(b)\widehat{\langle X,X\rangle}_{T}^{(b)} 1.17×1021.17\times 10^{-2} 1.78×1081.78\times 10^{-8} 1.17×1021.17\times 10^{-2}
X,X^T(s1)\widehat{\langle X,X\rangle}_{T}^{(s_{1})} 6.69×1076.69\times 10^{-7} 2.66×10112.66\times 10^{-11} 5.20×1065.20\times 10^{-6}
X,X^T(m1)\widehat{\langle X,X\rangle}_{T}^{(m_{1})} 2.95×1072.95\times 10^{-7} 1.97×10111.97\times 10^{-11} 4.44×1064.44\times 10^{-6}
X,X^T(m2)\widehat{\langle X,X\rangle}_{T}^{(m_{2})} 5.09×1095.09\times 10^{-9} 6.76×10136.76\times 10^{-13} 8.22×1078.22\times 10^{-7}
X,X^T(u)\widehat{\langle X,X\rangle}_{T}^{(u)} 6.29×1096.29\times 10^{-9} 5.33×10135.33\times 10^{-13} 7.30×1077.30\times 10^{-7}
TABLE III: Simulation study for the Ornstein Uhlenbeck process.

III-C Comparing estimators over shorter sample lengths

This section compares estimators for a shorter sample length which will reduce the benefit of subsampling due to the variance issues of small-length data but will also affect the variance of the multiscale ratio (cf Theorem 1).

The simulation setup is exactly the same as before (using the Heston model with the same parameters) except that TT, the simulation length, is reduced by a factor of 10 to 0.1 days or 2340s2340s. Before the results of the simulation are reported, it is of interest to see whether the frequency domain methods developed still model each process accurately. Figure 7 shows the calculated σ^X2\widehat{\sigma}^{2}_{X} and σ^ε2|sin(πΔtfk)|2\widehat{\sigma}^{2}_{\varepsilon}\left|\sin(\pi\Delta tf_{k})\right|^{2} (in white) together with the periodograms 𝒮^kk(X)\widehat{\mathcal{S}}^{(X)}_{kk} and 𝒮^kk(ε)\widehat{\mathcal{S}}^{(\varepsilon)}_{kk} for one simulated path. The estimator still approximates the energy structure of the processes accurately. Figure 7 also shows the corresponding estimate of the multiscale ratio L^k\widehat{L}_{k} (in white) from this simulated path (together with L~k\widetilde{L}_{k}) and the corresponding plot of L^k\widehat{L}_{k}𝒮^kk(Y)\widehat{\mathcal{S}}^{(Y)}_{kk}. The new estimator has removed the microstructure noise effect and has formed a good approximation of 𝒮^kk(X)\widehat{\mathcal{S}}^{(X)}_{kk}. The approximation of the periodograms is still accurate despite the shortening of available data.

Refer to caption Refer to caption Refer to caption Refer to caption

Figure 7: A realisation of 𝒮^kk(X)\widehat{\mathcal{S}}^{(X)}_{kk} (top left), a realisation of 𝒮^kk(ε)\widehat{\mathcal{S}}^{(\varepsilon)}_{kk} (top right) with the Whittle estimates superimposed, the estimate of LkL_{k} (bottom left) with the Whittle estimate of LkL_{k} superimposed and the biased corrected estimator of 𝒮kk(X){\mathcal{S}}^{(X)}_{kk} using L^k𝒮^kk(Y)\widehat{L}_{k}\widehat{\mathcal{S}}^{(Y)}_{kk} (bottom right). Notice the different scales in the four figures.

Table IV displays the accuracy of the estimators over the 50,000 simulated paths. The first-best estimator in [36], X,X^T(s1)\widehat{\langle X,X\rangle}_{T}^{(s_{1})}, and the new estimator, X,X^T(m1)\widehat{\langle X,X\rangle}_{T}^{(m_{1})}, are once again comparable in performance and both estimates are close to the best attainable RMSE given by, X,X^T(u)\widehat{\langle X,X\rangle}_{T}^{(u)}, the realized integrated volatility on XtX_{t}.

Sample bias Sample variance Sample RMSE
X,X^T(b)\widehat{\langle X,X\rangle}_{T}^{(b)} 1.17×1031.17\times 10^{-3} 2.29×1092.29\times 10^{-9} 1.17×1031.17\times 10^{-3}
X,X^T(s1)\widehat{\langle X,X\rangle}_{T}^{(s_{1})} 1.00×1061.00\times 10^{-6} 4.51×10104.51\times 10^{-10} 2.13×1052.13\times 10^{-5}
X,X^T(m1)\widehat{\langle X,X\rangle}_{T}^{(m_{1})} 1.84×1071.84\times 10^{-7} 4.23×10104.23\times 10^{-10} 2.06×1052.06\times 10^{-5}
X,X^T(m2)\widehat{\langle X,X\rangle}_{T}^{(m_{2})} 4.80×1084.80\times 10^{-8} 2.42×10102.42\times 10^{-10} 1.55×1051.55\times 10^{-5}
X,X^T(u)\widehat{\langle X,X\rangle}_{T}^{(u)} 5.27×1085.27\times 10^{-8} 2.28×10102.28\times 10^{-10} 1.51×1051.51\times 10^{-5}
TABLE IV: Simulation study for shorter sampler length.

III-D Comparing estimators with a low-noise process

This section compares estimators for smaller levels of microstructure noise. Reducing the microstructure noise will reduce the need to subsample. The first-best estimator in [36], X,X^T(s1)\widehat{\langle X,X\rangle}_{T}^{(s_{1})}, will have a higher sampling frequency and the new estimator will reduce its estimate of σ^ε2\widehat{\sigma}^{2}_{\varepsilon} accordingly. For very small levels of noise, however, the first-best estimator will become zero, as the optimal number of samples becomes nn (the highest available). This possibility is now examined, using the Heston model as before, with all parameters unchanged except the noise is reduced by a factor of 10, ie. σε2=0.000052\sigma^{2}_{\varepsilon}=0.00005^{2}. Note that the path length is kept at its original length of T=1T=1 day.

Figure 8 shows the estimates of σ^X2\widehat{\sigma}_{X}^{2} and σ^ε2|2sin(πΔtfk)|2\widehat{\sigma}^{2}_{\varepsilon}\left|2\sin(\pi\Delta tf_{k})\right|^{2} (in white) along with the periodograms 𝒮^kk(Y)\widehat{\mathcal{S}}^{(Y)}_{kk} and 𝒮^kk(ε)\widehat{\mathcal{S}}^{(\varepsilon)}_{kk} for one simulated path along with the corresponding estimate of the multiscale ratio L^k\widehat{L}_{k} (in white) (plotted over the approximated L~k\widetilde{L}_{k}) and the corresponding plot of L^k\widehat{L}_{k}𝒮^kk(Y)\widehat{\mathcal{S}}^{(Y)}_{kk}. The estimation method works well again; notice how the magnitude of the microstructure noise has been greatly reduced (the scale is now of order 10810^{-8} rather than 10610^{-6}) causing the multiscale ratio LkL_{k} to be more tempered across the high frequencies than it was before, due to the smaller microstructure noise. Nonetheless, the new estimator has still detected the smaller levels of noise in the data.

Refer to caption Refer to caption Refer to caption Refer to caption

Figure 8: A realisation of 𝒮^kk(X)\widehat{\mathcal{S}}^{(X)}_{kk} (top left), a realisation of 𝒮^kk(ε)\widehat{\mathcal{S}}^{(\varepsilon)}_{kk} (top right) with the Whittle estimates superimposed, the estimate of LkL_{k} (bottom left) with the Whittle estimate of LkL_{k} superimposed and the biased corrected estimator of 𝒮kk(X){\mathcal{S}}^{(X)}_{kk} using L^k𝒮^kk(Y)\widehat{L}_{k}\widehat{\mathcal{S}}^{(Y)}_{kk} (bottom right). Notice the different scales in the four figures.

Table V reports on the results of 50,000 simulations performed as before. The first-best estimator of [36], X,X^T(s1)\widehat{\langle X,X\rangle}_{T}^{(s_{1})}, categorically failed for this model. This is due to the fact that the optimal number of samples was always equal to nn, the total number of samples available. Therefore, the first-best estimator was always zero. The second-best estimator in [36], denoted by X,X^T(s2)\widehat{\langle X,X\rangle}_{T}^{(s_{2})}, was reasonably effective. This is simply an estimator that averages estimates calculated from sub-sampled paths at different starting points and is therefore asymptotically biased. The new estimator, X,X^T(m1)\widehat{\langle X,X\rangle}_{T}^{(m_{1})}, was remarkably robust, with RMSE very close to the RMSE of estimators based on the XtX_{t} process. The difference in performance between estimators using YtY_{t} and estimators using XtX_{t} is expected to become smaller with less microstructure noise and this can be seen by the similar order RMSE errors between all estimators. Nevertheless, the new estimator was much closer in performance to the realized integrated volatility on XtX_{t} than it was to any other estimator on YtY_{t}, a result that demonstrates the precision and robustness of this new estimator of integrated volatility.

Sample bias Sample variance Sample RMSE
X,X^T(b)\widehat{\langle X,X\rangle}_{T}^{(b)} 1.17×1041.17\times 10^{-4} 2.11×10102.11\times 10^{-10} 1.18×1041.18\times 10^{-4}
X,X^T(s2)\widehat{\langle X,X\rangle}_{T}^{(s_{2})} 3.53×1063.53\times 10^{-6} 1.00×1091.00\times 10^{-9} 3.19×1053.19\times 10^{-5}
X,X^T(m1)\widehat{\langle X,X\rangle}_{T}^{(m_{1})} 7.63×1097.63\times 10^{-9} 2.12×10102.12\times 10^{-10} 1.46×1051.46\times 10^{-5}
X,X^T(m2)\widehat{\langle X,X\rangle}_{T}^{(m_{2})} 7.91×1097.91\times 10^{-9} 2.06×10102.06\times 10^{-10} 1.44×1051.44\times 10^{-5}
X,X^T(u)\widehat{\langle X,X\rangle}_{T}^{(u)} 9.83×1099.83\times 10^{-9} 2.05×10102.05\times 10^{-10} 1.43×1051.43\times 10^{-5}
TABLE V: Simulation study for lower market microstructure noise.

III-E Correlated Noise

In this section we consider microstructure noise that is correlated. If this process is stationary, the noise process can be modelled as an MA process (as described in Section II-E), and the corresponding parameters can be estimated by maximising the multiscale Whittle likelihood using (17) and (34). Figure 9 shows the multiscale estimator applied to the Heston Model (with the same parameters as before) with a microstructure noise that follows an MA(6) process (parameters given in the caption). The Whittle estimates (in white) form a good approximation of 𝒮^kk(X)\widehat{\mathcal{S}}^{(X)}_{kk} and 𝒮^kk(ε)\widehat{\mathcal{S}}^{(\varepsilon)}_{kk} despite the more complicated nuisance structure. The corresponding estimate of the multiscale ratio L^k\widehat{L}_{k} (in white) therefore removes energy from the correct frequencies and the corresponding plot of L^k\widehat{L}_{k}𝒮^kk(Y)\widehat{\mathcal{S}}^{(Y)}_{kk} is a good approximation of 𝒮^kk(X)\widehat{\mathcal{S}}^{(X)}_{kk}. This is the same noise process and Itô process for which we calculated the optimal smoothing window in section II-E, and the trough in the noise at about f=0.42f=0.42 corresponds to the oscillations in the kernel plotted in Figure 1.

Refer to caption Refer to caption Refer to caption Refer to caption

Figure 9: A realisation of 𝒮^kk(X)\widehat{\mathcal{S}}^{(X)}_{kk} (top left), a realisation of 𝒮^kk(ε)\widehat{\mathcal{S}}^{(\varepsilon)}_{kk} (top right) with the Whittle estimates superimposed, the estimate of LkL_{k} (bottom left) with the Whittle estimate of LkL_{k} superimposed and the biased corrected estimator of 𝒮kk(X){\mathcal{S}}^{(X)}_{kk} using L^k𝒮^kk(Y)\widehat{L}_{k}\widehat{\mathcal{S}}^{(Y)}_{kk} (bottom right). In this example we use an MA(6) with θ1=0.5\theta_{1}=0.5, θ2=0.1\theta_{2}=-0.1, θ3=0.1\theta_{3}=-0.1, θ4=0.2\theta_{4}=0.2, θ5=0\theta_{5}=0 and θ6=0.4\theta_{6}=0.4. Notice the different scales in the four figures.

If the length of the MA(p)(p) process is unknown, then pp can be determined using (35). In Table VI we show an example with p=4p=4 with paramaters θ1=0.8\theta_{1}=0.8, θ2=0.6\theta_{2}=-0.6, θ3=0.1\theta_{3}=0.1, θ4=0.4\theta_{4}=0.4, Clearly p=4p=4 is identified as the best fitting model yielding near to perfect estimates of the noise parameters. The estimator is therefore robust to removing the effect of microstructure noise when this process is correlated (and stationary), even if the length of the MA(p)(p) process is not explicitly known.

MA(pp) θ1\theta_{1} θ2\theta_{2} θ3\theta_{3} θ4\theta_{4} θ5\theta_{5} θ6\theta_{6} θ7\theta_{7} θ8\theta_{8} AICC
p=1p=1 0.935 3.208490×105-3.208490\times 10^{5}
p=2p=2 0.624 -0.445 3.239947×105-3.239947\times 10^{5}
p=3p=3 0.658 -0.459 -0.046 3.240000×105-3.240000\times 10^{5}
p=4p=4 0.806 -0.603 -0.101 0.410 3.262427×105-3.262427\times 10^{5}
p=5p=5 0.813 -0.606 -0.101 0.411 -0.008 3.262416×105-3.262416\times 10^{5}
p=6p=6 0.815 -0.604 -0.097 0.420 -0.003 0.000 3.262409×105-3.262409\times 10^{5}
p=7p=7 0.807 -0.613 -0.114 0.413 0.002 -0.002 -0.005 3.262402×105-3.262402\times 10^{5}
p=8p=8 0.817 -0.614 -0.128 0.427 0.005 0.011 -0.009 -0.017 3.262384×105-3.262384\times 10^{5}
TABLE VI: Values of θ\theta found by modelling the noise process as an MA(p) process for p=1,,8p=1,\ldots,8. Model choice methods (AICC) are used to select which process to model the noise by, in this case the AICC is minimised by selecting an MA(4) with the given parameters. The true noise is indeed an MA(4) process (with paramaters θ1=0.8\theta_{1}=0.8, θ2=0.6\theta_{2}=-0.6, θ3=0.1\theta_{3}=0.1, θ4=0.4\theta_{4}=0.4).

We also tested our estimator using Monte Carlo simulations in [31] for a variety of MA(1) processes and the results showed a significant reduction in error compared with not only the naive estimator, but also the estimators based on a white-noise assumption. Furthermore, the adjusted multiscale estimator performed almost identically to our multiscale estimator when we set θ1=0\theta_{1}=0 and recovered a white-noise process, meaning the loss in precision from searching for a parameter unnecessarily was negligible (as to be expected for qNq\ll N). Notice also that in Table VI there appears to be little loss in precision from estimating more parameters in the MA(4)(4) process then is required as θp\theta_{p} for p>4p>4 is always estimated to be very close to zero. This further demonstrates the robustness and precision of our estimation technique.

IV Conclusions

The problem of estimating the integrated stochastic volatility of an Itô process from noisy observations was studied in this paper. Unlike most previous works on this problem, see [36, 26], the method for estimating the integrated volatility developed in this paper is based on the frequency domain representation of both the Itô process and the noisy observations. The integrated volatility can be represented as a summation of variation in the process of interest over all frequencies (or scales). In our estimator we adjust the raw sample variance at each frequency. Such an estimator is truly multiscale, as it corrects the estimated energy directly at every scale. In other words, the estimator is debiased locally at each frequency, rather than globally.

To estimate the degree of scale separation in the data we used the Whittle likelihood, and quantified the noise contribution by the multiscale ratio. Various properties of the multiscale estimator were determined, see Theorems 1 and 2. As was illustrated by the set of examples, our estimator performs extremely well on data simulated from the Heston model, and is competitive with the methods proposed by [36], under varying signal-to-noise and sampling scenarios. The proposed estimator is truly multiscale in nature and adapts automatically to the degree of noise contamination of the data, a clear strength. It is also easily implemented and computationally efficient.

The new estimator for the integrated stochastic volatility can be written as

X,X^\displaystyle\widehat{\langle X,X\rangle} =\displaystyle= uuk(XtkuXtku1)(XtkXtk1),\displaystyle\sum_{u}\ell_{-u}\sum_{k}\left(X_{t_{k-u}}-X_{t_{k-u-1}}\right)\left(X_{t_{k}}-X_{t_{k-1}}\right),

where the kernel u\ell_{u} is given by (28). We can compare this estimator with kernel estimators, see [10]. There the estimated increment square ΔXt2\Delta X_{t}^{2} is locally smoothed to estimate the diffusion coefficient using a kernel function, K()K(\cdot). Contrary to this approach we estimate the integrated volatility by smoothing the estimated autocovariance of ΔXtj\Delta X_{t_{j}}. In particular, we use a data-dependent choice of smoothing window. We show that, from a minimum bias perspective, using a Laplace window to smooth is optimal. This data-dependent choice of smoothing window becomes more interesting after relaxing the assumptions on the noise process, and treating correlated observation error.

Inference procedures implemented in the frequency domain are still very underdeveloped for problems with a multiscale structure. The modern data deluge has caused an excess of high frequency observations in a number of application areas, for example finance and molecular dynamics. More flexible models could also be used for the high frequency nuisance structure. In this paper we have introduced a new frequency domain based estimator and applied it to a relatively simple problem, namely the estimation of the integrated stochastic volatility, for data contaminated by high frequency noise. There are many extensions and potential applications of the new estimator. Here we list a few which seem interesting to us and which are currently under investigation.

  • Study parameter estimation for noisily observed SDEs which are driven by more general noise processes, for example Lévy processes.

  • Application of the new estimator to the problem of statistical inference for fast/slow systems of SDEs, of the type studied in [26, 24].

  • Study the combined effects of high-frequency and multiscale structure in the data. A first step in this direction was taken in [7].

References

  • [1] Y. Ait-Sahalia, P. A. Mykland, and L Zhang, How often to sample a continuous-time process in the presence of market microstructure noise, Rev. Financ. Studies, 18 (2005), pp. 351–416.
  • [2] E. Barucci and R. Reno, On measuring volatility of diffusion processes with high frequency data, Economics Letters, 74 (2002), pp. 371–378.
  • [3] J. Beran, Statistics for long-memory Processes, Chapman and Hall, London, 1994.
  • [4] Peter Bloomfield, Fourier Analysis of Time Series, Wiley-IEEE, New York, 2004.
  • [5] D. Brillinger, Time series: data analysis and theory, Society for Industrial and Applied Mathematics, Philadelphia, USA, 2001.
  • [6] P. J. Brockwell and R. A. Davis, Time Series: Theory and Methods, Springer-Verlag, New York, 1991.
  • [7] C. J. Cotter and G. A. Pavliotis, Estimating Eddy diffusivities from Lagrangian Observations, Preprint, 2009.
  • [8] K. O Dzhamparidze and A. M. Yaglom, Spectrum parameter estimation in time series analysis, in Developments in Statistics, PR Krishnaiah, ed., vol. 4, New York: Academic Press., 1983, pp. 1–181.
  • [9] J. Fan and Y. Wang, Multi-scale jump and volatility analysis for high-frequency financial data, J. of the American Statistical Association, 102 (2007), pp. 1349–1362.
  • [10] D. Florens-Zmirou, On estimating the diffusion coefficient from discrete observations, Journal of Applied Probability, 30 (1993), pp. 790–804.
  • [11] J.-P. Fouque, G. Papanicolaou, K. R. Sircar, and K. Solna, Short time-scale in SP-500 volatility, J. Comp. Finance, 6 (2003), pp. 1–23.
  • [12] D. Givon, R. Kupferman, and A. M. Stuart, Extracting macroscopic dynamics: model problems and algorithms, Nonlinearity, 17 (2004), pp. R55–R127.
  • [13] A. Griffa, K. Owensa, L. Piterbarg, and B. Rozovskii, Estimates on turbulence parameters from lagrangian data using a stochastic particle model, J. Marine Research, 53 (1995), pp. 371–401.
  • [14] P. R. Hansen and A. Lunde, A forecast comparison of volatility models: does anything beat a GARCH(1,1)?, Journal of Applied Econometrics, 20 (2005), pp. 873–889.
  • [15]  , Realized variance and market microstructure noise, J. Business & Economic Statistics, 24 (2006), pp. 127–161.
  • [16] S. L. Heston, A closed form solution for options with stochastic volatility with applications to bond and currency options, Review of Financial Studies, 6 (1993), pp. 327–343.
  • [17] I. Horenko, C. Hartmann, C. Schütte and F. Noe, Data-based parameter estimation of generalized multidimensional Langevin processes, Physical Review E, 76 (2007), no. 016706.
  • [18] I. Horenko and C. Schütte, Likelihood-based estimation of multidimensional Langevin Models and its Application to biomolecular dynamics, Multiscale Modeling and Simulation, 7 (2008), 731–773.
  • [19] I. Karatzas and S. E. Shreve, Brownian Motion and Stochastic Calculus, Springer-Verlag, New York, 1991.
  • [20] Y. A. Kutoyants, Statistical inference for ergodic diffusion processes, Springer Series in Statistics, London, 2004.
  • [21] P. Malliavin and M. E. Mancino, Fourier series for measurements of multivariate volatilities, Finance and Stochastics, 6 (2002), pp. 49–61.
  • [22] M. E. Mancino and S. Sanfelici, Robustness of Fourier estimators of integrated volatility in the presence of microstructure noise, tech. report, University of Firenze, 2006.
  • [23] F. Neeser and J. Massey, Proper complex random processes with applications to information theory, IEEE Trans. Info. Theory, 39 (1993), 1293–1302.
  • [24] T. Papavasiliou, G. A. Pavliotis, and A.M. Stuart, Maximum likelihood estimation for multiscale diffusions, 2008.
  • [25] G. A. Pavliotis, Y. Pokern, and A.M. Stuart, Parameter estimation for multiscale diffusions: an overview, 2008.
  • [26] G. A. Pavliotis and A. M. Stuart, Parameter estimation for multiscale diffusions, J. Stat. Phys., 127 (2007), pp. 741–781.
  • [27] D. B. Percival and A. T. Walden, Spectral Analysis for Physical Applications, Cambridge University Press, Cambridge, UK, 1993.
  • [28] B. Picinbono, Second-Order Complex Random Vectors and Normal Distributions, IEEE Trans. Signal Proc., 44 (1996), pp. 2637–2640.
  • [29] R. Renò, Volatility estimate via Fourier analysis, PhD thesis, Scuola Normale Superiore, 2005.
  • [30]  , Nonparametric estimation of the diffusion coefficient of stochastic volatility models, Econometric Theory, 24 (2008), pp. 1174–1206.
  • [31] A. Sykulski, S. Olhede, and G. A. Pavliotis, High frequency variability and microstructure bias, in Inference and Estimation in Probabilistic Time-Series Models, D. Barber, A. T. Cemgil, and S. Chiappa, eds., Cambridge, UK, 2008, Issac Newton Institute for Mathematical Sciences. Proceedings available at http://www.newton.ac.uk/programmes/SCH/schw05-papers.pdf.
  • [32] R. S. Tsay, Analysis of Financial Time Series, John Wiley & Sons, New York, USA, 2005.
  • [33] L. Wasserman, All of Nonparametric Statistics, Springer, Berlin, 2007.
  • [34] P. Whittle, Estimation and information in stationary time series, Arkiv för Matematik, 2 (1953), pp. 423–434.
  • [35] P. Whittle, Gaussian estimation in stationary time series, Bull. Inst. Statist. Inst., 39 (1962), pp. 105–129.
  • [36] L. Zhang, P. A. Mykland, and Y. Ait-Sahalia, A tale of two time scales: Determining integrated volatility with noisy high-frequency data, J. Am. Stat. Assoc., 100 (2005), pp. 1394–1411.

A Proof of Theorem 1

Let the true value of 𝝈\bm{\sigma} be denoted 𝝈.\bm{\sigma}^{\star}. We differentiate the multiscale energy likelihood function (18) with respect to 𝝈\bm{\sigma} to obtain

˙X(𝝈)\displaystyle\dot{\ell}_{X}(\bm{\sigma}) =(𝝈)σ¯X2=k=1N/211σ¯X2+σε2|2sin(πfkΔt)|2+k=1N/21𝒮^kk(Y)(σ¯X2+σε2|2sin(πfkΔt)|2)2\displaystyle=\frac{\partial\ell(\bm{\sigma})}{\partial\overline{\sigma}^{2}_{X}}=-\sum_{k=1}^{N/2-1}\frac{1}{\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}}+\sum_{k=1}^{N/2-1}\frac{\widehat{\mathcal{S}}^{(Y)}_{kk}}{\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}\right)^{2}}
˙ε(𝝈)\displaystyle\dot{\ell}_{\varepsilon}(\bm{\sigma}) =(𝝈)σε2=k=1N/21|2sin(πfkΔt)|2σ¯X2+σε2|2sin(πfkΔt)|2+k=1N/21|2sin(πfkΔt)|2𝒮^kk(Y)(σ¯X2+σε2|2sin(πfkΔt)|2)2.\displaystyle=\frac{\partial\ell(\bm{\sigma})}{\partial\sigma^{2}_{\varepsilon}}=-\sum_{k=1}^{N/2-1}\frac{\left|2\sin(\pi f_{k}\Delta t)\right|^{2}}{\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}}+\sum_{k=1}^{N/2-1}\frac{\left|2\sin(\pi f_{k}\Delta t)\right|^{2}\widehat{\mathcal{S}}^{(Y)}_{kk}}{\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}\right)^{2}}.

To remove implicit Δt\Delta t dependence we let τX=σ¯X2/Δt,\tau_{X}=\overline{\sigma}^{2}_{X}/\Delta t, and denote derivatives with respect to τX\tau_{X} by subscript τ\tau. Then ˙τ(𝝈^)=Δt˙X(𝝈^)\dot{\ell}_{\tau}(\widehat{\bm{\sigma}})=\Delta t\dot{\ell}_{X}(\widehat{\bm{\sigma}}), and so on. We calculate the expectation and variance of the score functions evaluated at 𝝈,\bm{\sigma}^{\star}, and find that the bias of τ^X\widehat{\tau}_{X} is order 𝒪(Δt1/2log(Δt)){\mathcal{O}}\left(\Delta t^{1/2}\log(\Delta t)\right) and the bias of σ^ε2\widehat{\sigma}^{2}_{\varepsilon} is order 𝒪(Δt2log(Δt)).{\mathcal{O}}\left(\Delta t^{2}\log(\Delta t)\right). These contributions become negligible, and are of lesser importance compared to the variance.

To show large sample properties we Taylor expand the multiscale likelihood with 𝝈^\widehat{\bm{\sigma}} corresponding to the estimated maximum likelihood, and 𝝈\bm{\sigma}^{\prime} is lying between 𝝈^\widehat{\bm{\sigma}} and 𝝈{\bm{\sigma}}^{\star}. Then

˙τ(𝝈^)\displaystyle\dot{\ell}_{\tau}(\widehat{\bm{\sigma}}) =˙τ(𝝈)+¨ττ(𝝈)[σ^X2σX2]/Δt+¨τε(𝝈)[σ^ε2σε2]\displaystyle=\dot{\ell}_{\tau}(\bm{\sigma}^{\star})+\ddot{\ell}_{\tau\tau}(\bm{\sigma}^{\prime})\left[\widehat{\sigma}^{2}_{X}-\sigma_{X}^{\star 2}\right]/\Delta t+\ddot{\ell}_{\tau\varepsilon}(\bm{\sigma}^{\prime})\left[\widehat{\sigma}^{2}_{\varepsilon}-\sigma_{\varepsilon}^{\star 2}\right]
˙ε(𝝈^)\displaystyle\dot{\ell}_{\varepsilon}(\widehat{\bm{\sigma}}) =˙ε(𝝈)+¨ετ(𝝈)[σ^X2σX2]/Δt+¨εε(𝝈)[σ^ε2σε2].\displaystyle=\dot{\ell}_{\varepsilon}(\bm{\sigma}^{\star})+\ddot{\ell}_{\varepsilon\tau}(\bm{\sigma}^{\prime})\left[\widehat{\sigma}^{2}_{X}-\sigma_{X}^{\star 2}\right]/\Delta t+\ddot{\ell}_{\varepsilon\varepsilon}(\bm{\sigma}^{\prime})\left[\widehat{\sigma}^{2}_{\varepsilon}-\sigma_{\varepsilon}^{\star 2}\right].

We note with the observed Fisher information

𝑭=[¨ττ(𝝈)¨τε(𝝈);¨ετ(𝝈)¨εε(𝝈)]\bm{F}=\left[\ddot{\ell}_{\tau\tau}(\bm{\sigma}^{\prime})\quad\ddot{\ell}_{\tau\varepsilon}(\bm{\sigma}^{\prime});\ddot{\ell}_{\varepsilon\tau}(\bm{\sigma}^{\prime})\quad\ddot{\ell}_{\varepsilon\varepsilon}(\bm{\sigma}^{\prime})\right]

that

((σ^X2σX2)/Δtσ^ε2σε2)=𝑭1(˙τ(𝝈^)˙τ(𝝈)˙ε(𝝈^)˙ε(𝝈)).\displaystyle\begin{pmatrix}(\widehat{\sigma}^{2}_{X}-\sigma_{X}^{\star 2})/\Delta t\\ \widehat{\sigma}^{2}_{\varepsilon}-\sigma_{\varepsilon}^{\star 2}\end{pmatrix}=\bm{F}^{-1}\begin{pmatrix}\dot{\ell}_{\tau}(\widehat{\bm{\sigma}})-\dot{\ell}_{\tau}(\bm{\sigma}^{\star})\\ \dot{\ell}_{\varepsilon}(\widehat{\bm{\sigma}})-\dot{\ell}_{\varepsilon}(\bm{\sigma}^{\star})\end{pmatrix}. (A-42)

We henceforth ignore the term Jk(μ)=Jk(X)J~k(X)J^{(\mu)}_{k}=J^{(X)}_{k}-\tilde{J}^{(X)}_{k} as this will not contribute to leading order, and write Jk(X)J^{(X)}_{k} where formally we would write J~k(X)\tilde{J}^{(X)}_{k} or Jk(X){J}^{(X)}_{k}. We can observe the suitability of this directly from (18) and use bounds for Jk(μ)J^{(\mu)}_{k}, where we could formally apply these to get bounds on each derivative of l(𝝈)l(\bm{\sigma}) (note that we cannot differentiate bounds). To avoid needless technicalities, the details of this approach will not be reported. To leading order

var(˙τ(𝝈))\displaystyle\mathrm{var}\left(\dot{\ell}_{\tau}({\bm{\sigma}})\right) =\displaystyle= l=1N/21k=1N/21Δt2cov(𝒮^kk(Y),𝒮^ll(Y))(σ¯X2+σε2|2sin(πfkΔt)|2)2(σ¯X2+σε2|2sin(πflΔt)|2)2\displaystyle\sum_{l=1}^{N/2-1}\sum_{k=1}^{N/2-1}\frac{\Delta t^{2}{\mathrm{cov}}\left(\widehat{\mathcal{S}}^{(Y)}_{kk},\widehat{\mathcal{S}}^{(Y)}_{ll}\right)}{\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}\right)^{2}\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{l}\Delta t)\right|^{2}\right)^{2}}
var(˙ε(𝝈))\displaystyle\mathrm{var}\left(\dot{\ell}_{\varepsilon}({\bm{\sigma}})\right) =\displaystyle= l=1N/21k=1N/21|2sin(πfkΔt)|2|2sin(πflΔt)|2cov(𝒮^kk(Y),𝒮^ll(Y))(σ¯X2+σε2|2sin(πfkΔt)|2)2(σ¯X2+σε2|2sin(πflΔt)|2)2\displaystyle\sum_{l=1}^{N/2-1}\sum_{k=1}^{N/2-1}\frac{\left|2\sin(\pi f_{k}\Delta t)\right|^{2}\left|2\sin(\pi f_{l}\Delta t)\right|^{2}{\mathrm{cov}}\left(\widehat{\mathcal{S}}^{(Y)}_{kk},\widehat{\mathcal{S}}^{(Y)}_{ll}\right)}{\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}\right)^{2}\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{l}\Delta t)\right|^{2}\right)^{2}}
cov(˙τ(𝝈),˙ε(𝝈))\displaystyle{\mathrm{cov}}\left(\dot{\ell}_{\tau}({\bm{\sigma}}),\dot{\ell}_{\varepsilon}({\bm{\sigma}})\right) =\displaystyle= l=1N/21k=1N/21Δt|2sin(πflΔt)|2cov(𝒮^kk(Y),𝒮^ll(Y))(σ¯X2+σε2|2sin(πfkΔt)|2)2(σ¯X2+σε2|2sin(πflΔt)|2)2.\displaystyle\sum_{l=1}^{N/2-1}\sum_{k=1}^{N/2-1}\frac{\Delta t\left|2\sin(\pi f_{l}\Delta t)\right|^{2}{\mathrm{cov}}\left(\widehat{\mathcal{S}}^{(Y)}_{kk},\widehat{\mathcal{S}}^{(Y)}_{ll}\right)}{\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}\right)^{2}\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{l}\Delta t)\right|^{2}\right)^{2}}.

We now need to calculate cov(𝒮^kk(Y),𝒮^ll(Y)){\mathrm{cov}}\left(\widehat{\mathcal{S}}^{(Y)}_{kk},\widehat{\mathcal{S}}^{(Y)}_{ll}\right) which is

cov(𝒮^kk(Y),𝒮^ll(Y))\displaystyle{\mathrm{cov}}\left(\widehat{\mathcal{S}}^{(Y)}_{kk},\widehat{\mathcal{S}}^{(Y)}_{ll}\right) =\displaystyle= E{Jk(Y)[Jk(Y)][Jl(Y)]Jl(Y)}E{𝒮^kk(Y)}E{𝒮^ll(Y)}\displaystyle\mathrm{E}\left\{J_{k}^{(Y)}[J^{(Y)}_{k}]^{*}[J_{l}^{(Y)}]^{*}J_{l}^{(Y)}\right\}-\mathrm{E}\left\{\widehat{\mathcal{S}}^{(Y)}_{kk}\right\}\mathrm{E}\left\{\widehat{\mathcal{S}}^{(Y)}_{ll}\right\} (A-43)
=\displaystyle= ρkl(Y)𝒮kk(Y)𝒮ll(Y).\displaystyle\rho_{kl}^{(Y)}{\mathcal{S}}^{(Y)}_{kk}{\mathcal{S}}^{(Y)}_{ll}.

Furthermore

E{Jk(Y)[Jk(Y)][Jl(Y)]Jl(Y)}E{Jk(Y)[Jk(Y)]}E{[Jl(Y)]Jl(Y)}\displaystyle\mathrm{E}\left\{J_{k}^{(Y)}[J^{(Y)}_{k}]^{*}[J_{l}^{(Y)}]^{*}J_{l}^{(Y)}\right\}-\mathrm{E}\left\{J_{k}^{(Y)}[J^{(Y)}_{k}]^{*}\right\}\mathrm{E}\left\{[J_{l}^{(Y)}]^{*}J_{l}^{(Y)}\right\}
=\displaystyle= E{(Jk(X)+Jk(ε))[(Jk(X)+Jk(ε))][(Jl(X)+Jl(ε))](Jl(X)+Jl(ε))}\displaystyle\mathrm{E}\left\{(J_{k}^{(X)}+J_{k}^{(\varepsilon)})[(J_{k}^{(X)}+J_{k}^{(\varepsilon)})]^{*}[(J_{l}^{(X)}+J_{l}^{(\varepsilon)})]^{*}(J_{l}^{(X)}+J_{l}^{(\varepsilon)})\right\}
E{Jk(Y)[Jk(Y)]}E{[Jl(Y)]Jl(Y)}\displaystyle-\mathrm{E}\left\{J_{k}^{(Y)}[J^{(Y)}_{k}]^{*}\right\}\mathrm{E}\left\{[J_{l}^{(Y)}]^{*}J_{l}^{(Y)}\right\}
=\displaystyle= cov{𝒮^kk(X),𝒮^ll(X)}+cov{𝒮^kk(ε),𝒮^ll(ε)}+𝒮kl(X)𝒮kl(ε)+𝒮kl(X)𝒮kl(ε).\displaystyle{\mathrm{cov}}\left\{\widehat{\mathcal{S}}^{(X)}_{kk},\widehat{\mathcal{S}}^{(X)}_{ll}\right\}+{\mathrm{cov}}\left\{\widehat{\mathcal{S}}^{(\varepsilon)}_{kk},\widehat{\mathcal{S}}^{(\varepsilon)}_{ll}\right\}+{\mathcal{S}}^{(X)}_{kl}{\mathcal{S}}^{(\varepsilon)*}_{kl}+{\mathcal{S}}^{(X)*}_{kl}{\mathcal{S}}^{(\varepsilon)}_{kl}.

We therefore need to calculate the individual terms of this expression. We note

cov{𝒮^kk(ε),𝒮^ll(ε)}=δkl[𝒮kk(ε)]2,𝒮kl(X)𝒮kl(ε)+𝒮kl(X)𝒮kl(ε)=2δkl𝒮kk(X)𝒮kk(ε).{\mathrm{cov}}\left\{\widehat{\mathcal{S}}^{(\varepsilon)}_{kk},\widehat{\mathcal{S}}^{(\varepsilon)}_{ll}\right\}=\delta_{kl}[{\mathcal{S}}^{(\varepsilon)}_{kk}]^{2},\quad{\mathcal{S}}^{(X)}_{kl}{\mathcal{S}}^{(\varepsilon)*}_{kl}+{\mathcal{S}}^{(X)*}_{kl}{\mathcal{S}}^{(\varepsilon)}_{kl}=2\delta_{kl}{\mathcal{S}}^{(X)}_{kk}{\mathcal{S}}^{(\varepsilon)}_{kk}.

Then it follows

cov{𝒮^kk(Y),𝒮^ll(Y)}=cov{𝒮^kk(X),𝒮^ll(X)}+δkl[𝒮kk(ε)]2+2δkl𝒮kk(X)𝒮kk(ε).{\mathrm{cov}}\left\{\widehat{\mathcal{S}}^{(Y)}_{kk},\widehat{\mathcal{S}}^{(Y)}_{ll}\right\}={\mathrm{cov}}\left\{\widehat{\mathcal{S}}^{(X)}_{kk},\widehat{\mathcal{S}}^{(X)}_{ll}\right\}+\delta_{kl}[{\mathcal{S}}^{(\varepsilon)}_{kk}]^{2}+2\delta_{kl}{\mathcal{S}}^{(X)}_{kk}{\mathcal{S}}^{(\varepsilon)}_{kk}. (A-44)

We therefore only need to worry about cov{𝒮^kk(X),𝒮^ll(X)}.{\mathrm{cov}}\left\{\widehat{\mathcal{S}}^{(X)}_{kk},\widehat{\mathcal{S}}^{(X)}_{ll}\right\}. We need

E{Jk(X)[Jk(X)][Jl(X)]Jl(X)}=1N2E{n=1N(n1)ΔtnΔtσsdWse2iπknN\displaystyle\mathrm{E}\left\{J_{k}^{(X)}[J^{(X)}_{k}]^{*}[J_{l}^{(X)}]^{*}J_{l}^{(X)}\right\}=\frac{1}{N^{2}}\mathrm{E}\left\{\sum_{n=1}^{N}\int_{(n-1)\Delta t}^{n\Delta t}\sigma_{s}dW_{s}e^{-2i\pi\frac{kn}{N}}\right.
×p=1N(p1)ΔtpΔtσtdWte2iπkpNm=1N(m1)ΔtmΔtσudWue2iπlmNw=1N(w1)ΔtwΔtσvdWve2iπlwN}\displaystyle\left.\times\sum_{p=1}^{N}\int_{(p-1)\Delta t}^{p\Delta t}\sigma_{t}dW_{t}e^{2i\pi\frac{kp}{N}}\sum_{m=1}^{N}\int_{(m-1)\Delta t}^{m\Delta t}\sigma_{u}dW_{u}e^{-2i\pi\frac{lm}{N}}\sum_{w=1}^{N}\int_{(w-1)\Delta t}^{w\Delta t}\sigma_{v}dW_{v}e^{2i\pi\frac{lw}{N}}\right\}
=:1N2n=1Np=1Nm=1Nρ=1N(eknekpemeρE(MnMpMmMρ)),\displaystyle=:\frac{1}{N^{2}}\sum_{n=1}^{N}\sum_{p=1}^{N}\sum_{m=1}^{N}\sum_{\rho=1}^{N}\left(e_{kn}e^{*}_{kp}e_{\ell m}^{*}e_{\ell\rho}\mathrm{E}\Big{(}M_{n}M_{p}M_{m}M_{\rho}\Big{)}\right),

where Mn:=(n1)ΔtnΔtσs𝑑WsM_{n}:=\int_{(n-1)\Delta t}^{n\Delta t}\sigma_{s}\,dW_{s} and ekn:=e2iπknNe_{kn}:=e^{-\frac{2i\pi kn}{N}}. Since Brownian motion has independent increments, we have that E(MnMpMmMρ)=EMn4ifn=p=m=ρ\mathrm{E}\Big{(}M_{n}M_{p}M_{m}M_{\rho}\Big{)}=\mathrm{E}M_{n}^{4}\;{\mathrm{if}}\;n=p=m=\rho, E(MnMkMmMρ)=EMn2EMk2\mathrm{E}\Big{(}M_{n}M_{k}M_{m}M_{\rho}\Big{)}=\mathrm{E}M_{n}^{2}\mathrm{E}M_{k}^{2} if n=k,m=ρn=k,\;m=\rho and E(MnMpMmMρ)=0\mathrm{E}\Big{(}M_{n}M_{p}M_{m}M_{\rho}\Big{)}=0, otherwise. Consequently,

E{Jk(X)[Jk(X)][Jl(X)]Jl(X)}\displaystyle\mathrm{E}\left\{J_{k}^{(X)}[J^{(X)}_{k}]^{*}[J_{l}^{(X)}]^{*}J_{l}^{(X)}\right\} =\displaystyle= 1N2n=1NEMn4+1N2(n=1NEMn2)2\displaystyle\frac{1}{N^{2}}\sum_{n=1}^{N}\ EM_{n}^{4}+\frac{1}{N^{2}}\left(\sum_{n=1}^{N}\ EM_{n}^{2}\right)^{2}
+1N2n=1Np=1NeknenekpepEMn2EMp2\displaystyle+\frac{1}{N^{2}}\sum_{n=1}^{N}\sum_{p=1}^{N}e_{kn}e_{\ell n}^{*}e_{kp}^{*}e_{\ell p}\ EM_{n}^{2}\ EM_{p}^{2}
+1N2n=1Np=1NeknepekpenEMn2EMp2\displaystyle+\frac{1}{N^{2}}\sum_{n=1}^{N}\sum_{p=1}^{N}e_{kn}e_{\ell p}^{*}e_{kp}^{*}e_{\ell n}\ EM_{n}^{2}\ EM_{p}^{2}

We use standard bounds on moments of stochastic integrals [19] to obtain the bound

1N2n=1NEMn41N2n=1N36Δt(n1)ΔtnΔtEσs4𝑑sC(Δt)3,\frac{1}{N^{2}}\sum_{n=1}^{N}\ EM_{n}^{4}\leq\frac{1}{N^{2}}\sum_{n=1}^{N}36\Delta t\int_{(n-1)\Delta t}^{n\Delta t}\mathrm{E}\sigma_{s}^{4}\,ds\leq C(\Delta t)^{3},

since, by assumption, Eσs4=𝒪(1)\mathrm{E}\sigma^{4}_{s}=\mathcal{O}(1)222CC in this paper denotes a generic constant, rather than the same constant.. We have:

ρkl(X)𝒮kk(X)𝒮ll(X)\displaystyle\rho_{kl}^{(X)}{\mathcal{S}}^{(X)}_{kk}{\mathcal{S}}^{(X)}_{ll} =\displaystyle= E{Jk(X)[Jk(X)][Jl(X)]Jl(X)}E|Jk(X)|2E|Jl(X)|2\displaystyle\mathrm{E}\left\{J_{k}^{(X)}[J^{(X)}_{k}]^{*}[J_{l}^{(X)}]^{*}J_{l}^{(X)}\right\}-\mathrm{E}\left|J_{k}^{(X)}\right|^{2}\mathrm{E}\left|J_{l}^{(X)}\right|^{2}
=\displaystyle= 1N20T0T(cos(2π(k+l)(suT))+cos(2π(kl)(suT)))\displaystyle\frac{1}{N^{2}}\int_{0}^{T}\int_{0}^{T}\left(\cos(2\pi(k+l)(\frac{s-u}{T}))+\cos(2\pi(k-l)(\frac{s-u}{T}))\right)
×E{σs2}E{σu2}dsdu+𝒪((Δt)3)\displaystyle\times\mathrm{E}\left\{\sigma_{s}^{2}\right\}\mathrm{E}\left\{\sigma_{u}^{2}\right\}dsdu+\mathcal{O}((\Delta t)^{3})
=\displaystyle= 12N20T0TE{σs2}E{σu2}(e2iπ(k+l)(suT)+e2iπ(k+l)(suT)\displaystyle\frac{1}{2N^{2}}\int_{0}^{T}\int_{0}^{T}\mathrm{E}\left\{\sigma_{s}^{2}\right\}\mathrm{E}\left\{\sigma_{u}^{2}\right\}\left(e^{2i\pi(k+l)(\frac{s-u}{T})}+e^{-2i\pi(k+l)(\frac{s-u}{T})}\right.
+e2iπ(kl)(suT)+e2iπ(kl)(suT))dsdu+𝒪((Δt)3)\displaystyle\left.+e^{2i\pi(k-l)(\frac{s-u}{T})}+e^{-2i\pi(k-l)(\frac{s-u}{T})}\right)dsdu+\mathcal{O}((\Delta t)^{3})
=\displaystyle= 12N2(Σ(k+lT)Σ(k+lT)+Σ(k+lT)Σ(k+lT)\displaystyle\frac{1}{2N^{2}}\left(\Sigma(-\frac{k+l}{T})\Sigma(\frac{k+l}{T})+\Sigma(\frac{k+l}{T})\Sigma(-\frac{k+l}{T})\right.
+Σ(klT)Σ(klT)+Σ(klT)Σ(klT))+𝒪((Δt)3).\displaystyle\left.+\Sigma(-\frac{k-l}{T})\Sigma(\frac{k-l}{T})+\Sigma(\frac{k-l}{T})\Sigma(-\frac{k-l}{T})\right)+\mathcal{O}((\Delta t)^{3}).

Since Eσt2\mathrm{E}\sigma^{2}_{t} is a smooth function of time we can bound the decay of Σ(f)1f\Sigma(f)\propto\frac{1}{f} so that:

ρkl(X)𝒮kk(X)𝒮ll(X)\displaystyle\rho_{kl}^{(X)}{\mathcal{S}}^{(X)}_{kk}{\mathcal{S}}^{(X)}_{ll} =\displaystyle= Δt2(𝒪(1(k+l)2)+𝒪(1(kl)2)).\displaystyle\Delta t^{2}\left(\mathcal{O}\left(\frac{1}{(k+l)^{2}}\right)+\mathcal{O}\left(\frac{1}{(k-l)^{2}}\right)\right). (A-45)

We combine the foregoing calculations with (A-44)

var{𝒮^kk(Y)}=(𝒮kk(X)+𝒮kk(ε))2.\mathrm{var}\left\{\widehat{\mathcal{S}}^{(Y)}_{kk}\right\}=\left({\mathcal{S}}^{(X)}_{kk}+{\mathcal{S}}^{(\varepsilon)}_{kk}\right)^{2}.
var(˙τ(𝝈^))\displaystyle\mathrm{var}\left(\dot{\ell}_{\tau}(\widehat{\bm{\sigma}})\right) =\displaystyle= l=1N/21k=1N/21Δt2cov(𝒮^kk(Y),𝒮^ll(Y))(σ¯X2+σε2|2sin(πfkΔt)|2)2(σ¯X2+σε2|2sin(πflΔt)|2)2.\displaystyle\sum_{l=1}^{N/2-1}\sum_{k=1}^{N/2-1}\frac{\Delta t^{2}{\mathrm{cov}}\left(\widehat{\mathcal{S}}^{(Y)}_{kk},\widehat{\mathcal{S}}^{(Y)}_{ll}\right)}{\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}\right)^{2}\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{l}\Delta t)\right|^{2}\right)^{2}}. (A-46)

We note that

cov(𝒮^kk(Y),𝒮^ll(Y))=ρkl(X)𝒮kk(X)𝒮ll(X)+δkl[𝒮ll(ε)]2+2δkl𝒮kk(X)𝒮kk(ε).{\mathrm{cov}}\left(\widehat{\mathcal{S}}^{(Y)}_{kk},\widehat{\mathcal{S}}^{(Y)}_{ll}\right)=\rho_{kl}^{(X)}{\mathcal{S}}^{(X)}_{kk}{\mathcal{S}}^{(X)}_{ll}+\delta_{kl}\left[{\mathcal{S}}^{(\varepsilon)}_{ll}\right]^{2}+2\delta_{kl}{\mathcal{S}}^{(X)}_{kk}{\mathcal{S}}^{(\varepsilon)}_{kk}.

Thus it follows that:

var(˙τ(𝝈^))\displaystyle\mathrm{var}\left(\dot{\ell}_{\tau}(\widehat{\bm{\sigma}})\right) =\displaystyle= l=1N/21Δt2(σ¯X2+σε2|2sin(πflΔt)|2)2+C+𝒪(log(Δt)Δt1/4)\displaystyle\sum_{l=1}^{N/2-1}\frac{\Delta t^{2}}{\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{l}\Delta t)\right|^{2}\right)^{2}}+C+\mathcal{O}(\log(\Delta t)\Delta t^{-1/4})
=\displaystyle= 𝒪(Δt1/2)+C+𝒪(log(Δt)Δt1/4).\displaystyle\mathcal{O}(\Delta t^{-1/2})+C+\mathcal{O}(\log(\Delta t)\Delta t^{-1/4}).

The extra order terms acknowledge potential effects from the drift. We need to establish the size of CC. Using (A-44) we find that:

|C|\displaystyle|C| \displaystyle\leq lkN/21Δt4C2((k+l)2+(kl)2)(σ¯X2+σε2|2sin(πfkΔt)|2)2(σ¯X2+σε2|2sin(πflΔt)|2)2\displaystyle\sum_{l\neq k}^{N/2-1}\frac{\Delta t^{4}C_{2}((k+l)^{-2}+(k-l)^{-2})}{\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}\right)^{2}\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{l}\Delta t)\right|^{2}\right)^{2}}
\displaystyle\leq 2k=1N/21τ=1kΔt4C2((2kτ)2+τ2)(σ¯X2+σε2|2sin(πfkτΔt)|2)2(σ¯X2+σε2|2sin(πfkΔt)|2)2\displaystyle 2\sum_{k=1}^{N/2-1}\sum_{\tau=1}^{k}\frac{\Delta t^{4}C_{2}((2k-\tau)^{-2}+\tau^{-2})}{\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k-\tau}\Delta t)\right|^{2}\right)^{2}\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}\right)^{2}}
\displaystyle\sim 2k=1N/21τ=1kC2((2kτ)2+τ2)(τX2+σε2|2sin(πfkτΔt)|2/Δt)2(τX2+σε2|2sin(πfkΔt)|2/Δt)2\displaystyle 2\sum_{k=1}^{N/2-1}\sum_{\tau=1}^{k}\frac{C_{2}((2k-\tau)^{-2}+\tau^{-2})}{\left({\tau}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k-\tau}\Delta t)\right|^{2}/\Delta t\right)^{2}\left({\tau}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}/\Delta t\right)^{2}}
=\displaystyle= 𝒪(log(Δt)).\displaystyle{\mathcal{O}}(\log(\Delta t)).

This is negligible in size compared to Δt1/2\Delta t^{-1/2}. Similar calculations can bound contributions from the off diagonals in the other two calculations. Also as σ¯X2=τXΔt\overline{\sigma}^{2}_{X}=\tau_{X}\Delta t

E{¨ττ(𝝈)}\displaystyle-\mathrm{E}\left\{\ddot{\ell}_{\tau\tau}(\bm{\sigma})\right\} =k=1N/21Δt2(σ¯X2+σε2|2sin(πfkΔt)|2)2+𝒪(log(Δt))=𝒪(Δt1/2)\displaystyle=\sum_{k=1}^{N/2-1}\frac{\Delta t^{2}}{\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}\right)^{2}}+\mathcal{O}(\log(\Delta t))={\mathcal{O}}(\Delta t^{-1/2}) (A-48)
E{¨εε(𝝈)}\displaystyle-\mathrm{E}\left\{\ddot{\ell}_{\varepsilon\varepsilon}(\bm{\sigma})\right\} =k=1N/21|2sin(πfk)|4(σ¯X2+σε2|2sin(πfkΔt)|2)2+𝒪(log(Δt))=𝒪(Δt1)\displaystyle=\sum_{k=1}^{N/2-1}\frac{\left|2\sin(\pi f_{k})\right|^{4}}{\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}\right)^{2}}+\mathcal{O}(\log(\Delta t))={\mathcal{O}}(\Delta t^{-1})
E{¨τε(𝝈)}\displaystyle-\mathrm{E}\left\{\ddot{\ell}_{\tau\varepsilon}(\bm{\sigma})\right\} =k=1N/21Δt|2sin(πfk)|2(σ¯X2+σε2|2sin(πfkΔt)|2)2+𝒪(log(Δt))=𝒪(Δt1/2).\displaystyle=\sum_{k=1}^{N/2-1}\frac{\Delta t\left|2\sin(\pi f_{k})\right|^{2}}{\left(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}\right)^{2}}+{\mathcal{O}}(\log(\Delta t))={\mathcal{O}}(\Delta t^{-1/2}).

The order terms follow from usual spectral theory on the white noise process, as well as bounds on Jk(μ)J_{k}^{(\mu)}. We can also by considering the variance of the observed Fisher information deduce that renormalized versions of the entries of the observed Fisher information converge in probability to a constant, or

diag(Δt1/4,Δt1/2)𝑭diag(Δt1/4,Δt1/2)𝓕,{\mathrm{diag}}(\Delta t^{1/4},\Delta t^{1/2}){\bm{F}}{\mathrm{diag}}(\Delta t^{1/4},\Delta t^{1/2})\longrightarrow{\bm{\mathcal{F}}},

and thus using Slutsky’s theorem we can deduce that:

diag(Δt1/4,Δt1/2)[(σ^X2/Δtσ^ε2)(σ¯X2/Δtσε2)]diag(Δt1/4,Δt1/2)𝐿N(𝟎,𝓕1),{\mathrm{diag}}(\Delta t^{-1/4},\Delta t^{-1/2})\left[\begin{pmatrix}\widehat{\sigma}^{2}_{X}/\Delta t\\ \widehat{\sigma}^{2}_{\varepsilon}\end{pmatrix}-\begin{pmatrix}\overline{\sigma}_{X}^{\ast 2}/\Delta t\\ \sigma^{\ast 2}_{\varepsilon}\end{pmatrix}\right]{\mathrm{diag}}(\Delta t^{-1/4},\Delta t^{-1/2})\overset{L}{\longrightarrow}N\left(\bm{0},\bm{\mathcal{F}}^{-1}\right),

where the entries of 𝓕\bm{\mathcal{F}} can be found from (A-48), (A-46) and (A), and

diag(Δt1/4,Δt1/2)var{[(σ^X2/Δtσ^ε2)(σ¯X2/Δtσε2)]}diag(Δt1/4,Δt1/2)\displaystyle{\mathrm{diag}}(\Delta t^{-1/4},\Delta t^{-1/2})\mathrm{var}\left\{\left[\begin{pmatrix}\widehat{\sigma}^{2}_{X}/\Delta t\\ \widehat{\sigma}^{2}_{\varepsilon}\end{pmatrix}-\begin{pmatrix}\overline{\sigma}_{X}^{\ast 2}/\Delta t\\ \sigma^{\ast 2}_{\varepsilon}\end{pmatrix}\right]\right\}{\mathrm{diag}}(\Delta t^{-1/4},\Delta t^{-1/2})
=\displaystyle= diag(Δt1/4,Δt1/2)𝑭1𝑭𝑭1diag(Δt1/4,Δt1/2)\displaystyle{\mathrm{diag}}(\Delta t^{-1/4},\Delta t^{-1/2})\bm{F}^{-1}\bm{F}\bm{F}^{-1}{\mathrm{diag}}(\Delta t^{-1/4},\Delta t^{-1/2})
=\displaystyle= diag(Δt1/4,Δt1/2)𝑭1diag(Δt1/4,Δt1/2)𝓕1.\displaystyle{\mathrm{diag}}(\Delta t^{-1/4},\Delta t^{-1/2})\bm{F}^{-1}{\mathrm{diag}}(\Delta t^{-1/4},\Delta t^{-1/2})\longrightarrow{\bm{\mathcal{F}}}^{-1}.

We have

𝓕\displaystyle{\bm{\mathcal{F}}} =\displaystyle= (Tσε161τX3/2002Tσε4)=(ττ00εε).\displaystyle\begin{pmatrix}\frac{T}{\sigma_{\varepsilon}16}\frac{1}{\tau_{X}^{3/2}}&0\\ 0&\frac{2T}{\sigma^{4}_{\varepsilon}}\end{pmatrix}=\begin{pmatrix}{\cal I}_{\tau\tau}&0\\ 0&{\cal I}_{\varepsilon\varepsilon}\end{pmatrix}. (A-49)

This expression follows by direct calculation. Asymptotic normality of both τ^x\widehat{\tau}_{x} and σ^ε2\widehat{\sigma}_{\varepsilon}^{2} follows by the usual arguments. We can determine the asymptotic variance of X,X^(w)\widehat{\langle X,X\rangle}^{(w)} via

var{X,X^(w)}\displaystyle\mathrm{var}\left\{\widehat{\langle X,X\rangle}^{(w)}\right\} =\displaystyle= T2var{τ^x}\displaystyle T^{2}\mathrm{var}\left\{\widehat{\tau}_{x}\right\} (A-50)
=\displaystyle= TσετX1/216τX2Δt.\displaystyle T\frac{\sigma_{\varepsilon}}{\tau_{X}^{1/2}}16\tau_{X}^{2}\sqrt{\Delta t}.

We see that the variance depends on the length of the time course, the inverse of the signal to noise ratio, the square root of the sampling period and the fourth power of the “average standard deviation” of the XtX_{t} process.

B Proof of Theorem 2

We now wish to use these results to deduce properties of σ^\widehat{\sigma}. Firstly using the well known invariance of maximum likelihood estimators to transfer the estimators of σ¯X2\overline{\sigma}_{X}^{2} and σε2\sigma^{2}_{\varepsilon} to estimators of X,XT\langle X,X\rangle_{T}. We therefore take

X,X^T(m1)=k=0N1𝒮^kk(X)(L^k)=k=0N1L^k𝒮^kk(Y)\displaystyle\widehat{\langle X,X\rangle}_{T}^{(m_{1})}=\sum_{k=0}^{N-1}\widehat{\mathcal{S}}^{(X)}_{kk}(\widehat{L}_{k})=\sum_{k=0}^{N-1}\widehat{L}_{k}\widehat{\mathcal{S}}^{(Y)}_{kk}

It therefore follows that with τ^X=τX+δτX\widehat{\tau}_{X}=\tau_{X}+\delta\tau_{X} and σ^ε2=σε2+δσε2\widehat{\sigma}_{\varepsilon}^{2}={\sigma}_{\varepsilon}^{2}+\delta\sigma^{2}_{\varepsilon}

E{X,X^T(m1)}\displaystyle\mathrm{E}\left\{\widehat{\langle X,X\rangle}_{T}^{(m_{1})}\right\} =\displaystyle= k=0N1E{(σ¯X2+δσX2σ¯X2+δσX2+(σε2+δσε2)|2sin(πfkΔt)|2)𝒮^kk(Y)}\displaystyle\sum_{k=0}^{N-1}\mathrm{E}\left\{\left(\frac{\overline{\sigma}^{2}_{X}+\delta\sigma^{2}_{X}}{\overline{\sigma}^{2}_{X}+\delta\sigma^{2}_{X}+(\sigma^{2}_{\varepsilon}+\delta\sigma^{2}_{\varepsilon})\left|2\sin(\pi f_{k}\Delta t)\right|^{2}}\right)\widehat{\mathcal{S}}^{(Y)}_{kk}\right\}
=\displaystyle= k=0N1E{(σ¯X2+δσX21+[δσX2+δσε2|2sin(πfkΔt)|2]σ¯X2+σε2|2sin(πfkΔt)|2)𝒮^kk(Y)σ¯X2+σε2|2sin(πfkΔt)|2}\displaystyle\sum_{k=0}^{N-1}\mathrm{E}\left\{\left(\frac{\overline{\sigma}^{2}_{X}+\delta\sigma^{2}_{X}}{1+\frac{\left[\delta\sigma^{2}_{X}+\delta\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}\right]}{\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}}}\right)\frac{\widehat{\mathcal{S}}^{(Y)}_{kk}}{\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}}\right\}
=\displaystyle= k=0N1E{(σ¯X2+δσX2)(1[δσX2+δσε2|2sin(πfkΔt)|2](σ¯X2+σε2|2sin(πfkΔt)|2))𝒮^kk(Y)σ¯X2+σε2|2sin(πfkΔt)|2}\displaystyle\sum_{k=0}^{N-1}\mathrm{E}\left\{\left(\overline{\sigma}^{2}_{X}+\delta\sigma^{2}_{X}\right)\left(1-\frac{\left[\delta\sigma^{2}_{X}+\delta\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}\right]}{(\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2})}\right)\frac{\widehat{\mathcal{S}}^{(Y)}_{kk}}{\overline{\sigma}^{2}_{X}+\sigma^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}}\right\}
=\displaystyle= k=0N1[σ¯X2+O(Δt5/4)]+𝒪(Δtlog(Δt))=E{X,XT}+𝒪(Δtlog(Δt))\displaystyle\sum_{k=0}^{N-1}[\overline{\sigma}^{2}_{X}+O\left(\Delta t^{5/4}\right)]+{\mathcal{O}}\left(\sqrt{\Delta t}\log(\Delta t)\right)=\mathrm{E}\left\{\langle X,X\rangle_{T}\right\}+{\mathcal{O}}\left(\sqrt{\Delta t}\log(\Delta t)\right)
+𝒪(Δt4).\displaystyle+{\mathcal{O}}\left(\sqrt[4]{\Delta t}\right).

This implies that the estimator is asymptotically unbiased. We can also note that the variance of the new estimator is given by:

var{X,X^T(m1)}\displaystyle\mathrm{var}\left\{\widehat{\langle X,X\rangle}_{T}^{(m_{1})}\right\} =\displaystyle= jkcov{L^j𝒮^jj(Y),L^k𝒮^ll(Y)}\displaystyle\sum_{j}\sum_{k}{\mathrm{cov}}\{\widehat{L}_{j}\widehat{\mathcal{S}}_{jj}^{(Y)},\widehat{L}_{k}\widehat{\mathcal{S}}_{ll}^{(Y)}\}
=\displaystyle= jkcov{L^jLjLj𝒮^jj(Y),L^kLkLk𝒮^kk(Y)}\displaystyle\sum_{j}\sum_{k}{\mathrm{cov}}\{\frac{\widehat{L}_{j}}{L_{j}}L_{j}\widehat{\mathcal{S}}_{jj}^{(Y)},\frac{\widehat{L}_{k}}{L_{k}}L_{k}\widehat{\mathcal{S}}_{kk}^{(Y)}\}
=\displaystyle= jkcov{(1+δτXτXδτXΔt+δσε2|2sin(πfjΔt)|2τXΔt+σε2|2sin(πfjΔt)|2)Lj𝒮^jj(Y),\displaystyle\sum_{j}\sum_{k}{\mathrm{cov}}\left\{\left(1+\frac{\delta\tau_{X}}{\tau_{X}}-\frac{\delta\tau_{X}\Delta t+\delta{\sigma}^{2}_{\varepsilon}\left|2\sin(\pi f_{j}\Delta t)\right|^{2}}{\tau_{X}\Delta t+{\sigma}^{2}_{\varepsilon}\left|2\sin(\pi f_{j}\Delta t)\right|^{2}}\right)L_{j}\widehat{\mathcal{S}}_{jj}^{(Y)},\right.
(1+δτXτXδτXΔt+δσε2|2sin(πfkΔt)|2τXΔt+σε2|2sin(πfkΔt)|2)Lk𝒮^kk(Y)}.\displaystyle\left.\left(1+\frac{\delta\tau_{X}}{\tau_{X}}-\frac{\delta\tau_{X}\Delta t+\delta{\sigma}^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}}{\tau_{X}\Delta t+{\sigma}^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}}\right)L_{k}\widehat{\mathcal{S}}_{kk}^{(Y)}\right\}.

Then

var{X,X^T(m1)}\displaystyle\mathrm{var}\left\{\widehat{\langle X,X\rangle}_{T}^{(m_{1})}\right\} =\displaystyle= jk{cov{Lj𝒮^jj(Y),Lk𝒮^kk(Y)}+cov{δτXτXLj𝒮^jj(Y),Lk𝒮^kk(Y)}\displaystyle\sum_{j}\sum_{k}\left\{{\mathrm{cov}}\{L_{j}\widehat{\mathcal{S}}_{jj}^{(Y)},L_{k}\widehat{\mathcal{S}}_{kk}^{(Y)}\}+{\mathrm{cov}}\{\frac{\delta\tau_{X}}{\tau_{X}}L_{j}\widehat{\mathcal{S}}_{jj}^{(Y)},L_{k}\widehat{\mathcal{S}}_{kk}^{(Y)}\}\right.
+cov{Lj𝒮^jj(Y),δτXτXLk𝒮^kk(Y)}\displaystyle+{\mathrm{cov}}\{L_{j}\widehat{\mathcal{S}}_{jj}^{(Y)},\frac{\delta\tau_{X}}{\tau_{X}}L_{k}\widehat{\mathcal{S}}_{kk}^{(Y)}\}
cov{δτXΔt+δσε2|2sin(πfjΔt)|2τXΔt+σε2|2sin(πfjΔt)|2Lj𝒮^jj(Y),Lk𝒮^kk(Y)}\displaystyle-{\mathrm{cov}}\{\frac{\delta\tau_{X}\Delta t+\delta{\sigma}^{2}_{\varepsilon}\left|2\sin(\pi f_{j}\Delta t)\right|^{2}}{\tau_{X}\Delta t+{\sigma}^{2}_{\varepsilon}\left|2\sin(\pi f_{j}\Delta t)\right|^{2}}L_{j}\widehat{\mathcal{S}}_{jj}^{(Y)},L_{k}\widehat{\mathcal{S}}_{kk}^{(Y)}\}
cov{Lj𝒮^jj(Y),δτXΔt+δσε2|2sin(πfkΔt)|2τXΔt+σε2|2sin(πfkΔt)|2Lk𝒮^kk(Y)}\displaystyle-{\mathrm{cov}}\{L_{j}\widehat{\mathcal{S}}_{jj}^{(Y)},\frac{\delta\tau_{X}\Delta t+\delta{\sigma}^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}}{\tau_{X}\Delta t+{\sigma}^{2}_{\varepsilon}\left|2\sin(\pi f_{k}\Delta t)\right|^{2}}L_{k}\widehat{\mathcal{S}}_{kk}^{(Y)}\}
+cov{δτXτXLj𝒮^jj(Y),δτXτXLk𝒮^kk(Y)}+}\displaystyle\left.+{\mathrm{cov}}\{\frac{\delta\tau_{X}}{\tau_{X}}L_{j}\widehat{\mathcal{S}}_{jj}^{(Y)},\frac{\delta\tau_{X}}{\tau_{X}}L_{k}\widehat{\mathcal{S}}_{kk}^{(Y)}\}+\dots\right\}
=\displaystyle= jk{δjkσX4+LjLkcov{δτXτX𝒮^jj(Y),𝒮^kk(Y)}+}\displaystyle\sum_{j}\sum_{k}\left\{\delta_{jk}\sigma^{4}_{X}+L_{j}L_{k}{\mathrm{cov}}\{\frac{\delta\tau_{X}}{\tau_{X}}\widehat{\mathcal{S}}_{jj}^{(Y)},\widehat{\mathcal{S}}_{kk}^{(Y)}\}+\dots\right\}

By looking at the individual terms of this expression, and noting that the estimated renormalized variance τ^X=τX+δτX\widehat{\tau}_{X}=\tau_{X}+\delta\tau_{X} and σ^ε2=σXε+δσε2\widehat{\sigma}_{\varepsilon}^{2}={\sigma}^{\varepsilon}_{X}+\delta\sigma^{2}_{\varepsilon} are linear combinations of 𝒮^kk(Y),\widehat{\mathcal{S}}_{kk}^{(Y)}, we can deduce the stated order terms, by again noting the Δt\sqrt{\Delta t} order of the important terms. However to leading order, this estimator will perform identically to X,X^(w)\widehat{\langle X,X\rangle}^{(w)} in terms of variance.

C Proof of Time Domain Form

The integral can be calculated from first principles using complex-variables with z=e2iπfz=e^{2i\pi f}. Thus dz/df=2iπzdz/df=2i\pi z or df=dz/(2iπz)df=dz/(2i\pi z). (28) takes the form

τ\displaystyle\ell_{\tau} =\displaystyle= 12iπ|z|=1σ¯X2σ¯X2zσε2[z1]2zτ𝑑z.\displaystyle\frac{1}{2i\pi}\oint_{|z|=1}\frac{\overline{\sigma}_{X}^{2}}{\overline{\sigma}^{2}_{X}z-\sigma^{2}_{\varepsilon}[z-1]^{2}}\;z^{\tau}\;dz. (C-51)

We need the poles, or:

σ¯X2zσε2[z1]2=0z=1+σ¯X22σε2±σ¯X2σε2+σ¯X44σε4=z±\displaystyle\overline{\sigma}^{2}_{X}z-\sigma^{2}_{\varepsilon}[z-1]^{2}=0\;\Longleftrightarrow\;z=1+\frac{\overline{\sigma}^{2}_{X}}{2\sigma^{2}_{\varepsilon}}\pm\sqrt{\frac{\overline{\sigma}^{2}_{X}}{\sigma^{2}_{\varepsilon}}+\frac{\overline{\sigma}^{4}_{X}}{4\sigma^{4}_{\varepsilon}}}=z^{\pm}

If |σε2σ¯X2|<1\left|\frac{\sigma^{2}_{\varepsilon}}{\overline{\sigma}^{2}_{X}}\right|<1 we have

z\displaystyle z^{-} =\displaystyle= 1+σ¯X22σε2σ¯X22σε21+4σε2σX2\displaystyle 1+\frac{\overline{\sigma}^{2}_{X}}{2\sigma^{2}_{\varepsilon}}-\frac{\overline{\sigma}_{X}^{2}}{2\sigma^{2}_{\varepsilon}}\sqrt{1+\frac{4\sigma^{2}_{\varepsilon}}{\sigma^{2}_{X}}}
=1+σ¯X22σε2σ¯X22σε2(1+124σε2σX2+14(1)2[4σε2σX2]2+𝒪(σε6σX6))\displaystyle=1+\frac{\overline{\sigma}_{X}^{2}}{2\sigma^{2}_{\varepsilon}}-\frac{\overline{\sigma}_{X}^{2}}{2\sigma^{2}_{\varepsilon}}\left(1+\frac{1}{2}\frac{4\sigma^{2}_{\varepsilon}}{\sigma^{2}_{X}}+\frac{1}{4}\frac{(-1)}{2}\left[\frac{4\sigma^{2}_{\varepsilon}}{\sigma^{2}_{X}}\right]^{2}+{\mathcal{O}}\left(\frac{\sigma^{6}_{\varepsilon}}{\sigma^{6}_{X}}\right)\right)
=\displaystyle= σε2σX2+𝒪(σε4σX4)\displaystyle\frac{\sigma^{2}_{\varepsilon}}{\sigma^{2}_{X}}+{\mathcal{O}}\left(\frac{\sigma^{4}_{\varepsilon}}{\sigma^{4}_{X}}\right)
z+\displaystyle z^{+} =\displaystyle= σX2σε2+\displaystyle\frac{\sigma^{2}_{X}}{\sigma^{2}_{\varepsilon}}+\dots

We then note that:

τ\displaystyle\ell_{\tau} =\displaystyle= 12iπ|z|=1σ¯X2/(σε2)σ¯X2/(σε2)z+[z1]2zτ𝑑z=12iπ|z|=1σ¯X2/(σε2)(zz)(zz+)zτ𝑑z\displaystyle-\frac{1}{2i\pi}\oint_{|z|=1}\frac{\overline{\sigma}_{X}^{2}/(\sigma^{2}_{\varepsilon})}{-\overline{\sigma}_{X}^{2}/(\sigma^{2}_{\varepsilon})z+[z-1]^{2}}\;z^{\tau}\;dz=-\frac{1}{2i\pi}\oint_{|z|=1}\frac{\overline{\sigma}_{X}^{2}/(\sigma^{2}_{\varepsilon})}{(z-z^{-})(z-z^{+})}\;z^{\tau}\;dz
=\displaystyle= 2iπ2iπσ¯X2/(σε2)(σε2σX2)τz+σε2σX2+𝒪(σε4σX4)=(σε2σX2)τ+𝒪(σε2τ+2σ¯X2τ+2)\displaystyle\frac{2i\pi}{2i\pi}\overline{\sigma}_{X}^{2}/(\sigma^{2}_{\varepsilon})\frac{\left(\frac{\sigma^{2}_{\varepsilon}}{\sigma^{2}_{X}}\right)^{\tau}}{z^{+}-\frac{\sigma^{2}_{\varepsilon}}{\sigma^{2}_{X}}+{\mathcal{O}}\left(\frac{\sigma^{4}_{\varepsilon}}{\sigma^{4}_{X}}\right)}=\left(\frac{\sigma^{2}_{\varepsilon}}{\sigma^{2}_{X}}\right)^{\tau}+{\mathcal{O}}\left(\frac{\sigma^{2\tau+2}_{\varepsilon}}{\overline{\sigma}^{2\tau+2}_{X}}\right)

If on the other hand you consider |σε2σX2|>1\left|\frac{\sigma^{2}_{\varepsilon}}{\sigma^{2}_{X}}\right|>1 which in many scenarios is more realistic then we find that:

z\displaystyle z^{-} =\displaystyle= 1+σ¯X22σε2σ¯Xσε1+σX24Δtσε2=1+σ¯X22σε2σ¯Xσε(1+12σX24σε2)\displaystyle 1+\frac{\overline{\sigma}_{X}^{2}}{2\sigma^{2}_{\varepsilon}}-\frac{\overline{\sigma}_{X}}{\sigma_{\varepsilon}}\sqrt{1+\frac{\sigma^{2}_{X}}{4\Delta t\sigma^{2}_{\varepsilon}}}=1+\frac{\overline{\sigma}_{X}^{2}}{2\sigma^{2}_{\varepsilon}}-\frac{\overline{\sigma}_{X}}{\sigma_{\varepsilon}}\left(1+\frac{1}{2}\frac{\sigma^{2}_{X}}{4\sigma^{2}_{\varepsilon}}\right)
=\displaystyle= 1σ¯Xσε+𝒪(σ¯X2σε2)\displaystyle 1-\frac{\overline{\sigma}_{X}}{\sigma_{\varepsilon}}+{\mathcal{O}}\left(\frac{\overline{\sigma}_{X}^{2}}{\sigma^{2}_{\varepsilon}}\right)
z+\displaystyle z^{+} =\displaystyle= 1+σ¯Xσε\displaystyle 1+\frac{\overline{\sigma}_{X}}{\sigma_{\varepsilon}}

In this case we find that

τ\displaystyle\ell_{\tau} =\displaystyle= σ¯X2/(σε2)[1σ¯Xσε]τ2σ¯Xσε+𝒪(σ¯X2σε2)=σ¯X2σε(1σ¯Xσε)τ+𝒪(σ¯X22σε2(1σ¯Xσε)τ).\displaystyle\overline{\sigma}_{X}^{2}/(\sigma^{2}_{\varepsilon})\frac{[1-\frac{\overline{\sigma}_{X}}{\sigma_{\varepsilon}}]^{\tau}}{2\frac{\overline{\sigma}_{X}}{\sigma_{\varepsilon}}+{\mathcal{O}}\left(\frac{\overline{\sigma}_{X}^{2}}{\sigma^{2}_{\varepsilon}}\right)}=\frac{\overline{\sigma}_{X}}{2\sigma_{\varepsilon}}\left(1-\frac{\overline{\sigma}_{X}}{\sigma_{\varepsilon}}\right)^{\tau}+{\mathcal{O}}\left(\frac{\overline{\sigma}_{X}^{2}}{2\sigma_{\varepsilon}^{2}}\left(1-\frac{\overline{\sigma}_{X}}{\sigma_{\varepsilon}}\right)^{\tau}\right).

In both cases the decay of the filter is geometric. We note that in most practical examples LkL_{k} decays very rapidly in kk. Therefore, we do not need to integrate between 1/2-1/2 to 1/21/2, and only need to integrate over 1/π-1/\pi to 1/π1/\pi. In this range of ff we find that for smallish remainder term R3R_{3} we have: sin2(πf)=π2f2+R3(fπ)\sin^{2}(\pi f)=\pi^{2}f^{2}+R_{3}(f\pi). Then we note

τ\displaystyle\ell_{\tau} =\displaystyle= 1π1πσ¯X2σ¯X2+4σε2π2f2+R3(fπ)e2iπfτ𝑑f+C\displaystyle\int_{-\frac{1}{\pi}}^{\frac{1}{\pi}}\frac{\overline{\sigma}_{X}^{2}}{\overline{\sigma}_{X}^{2}+4\sigma^{2}_{\varepsilon}\pi^{2}f^{2}+R_{3}(f\pi)}\;e^{2i\pi f\tau}\;df+C
=\displaystyle= σ¯X2σε[2σ¯X/(σε)σX2σε2+4π2f2+R4(f)]e2iπfτ𝑑f+C\displaystyle\frac{\overline{\sigma}_{X}}{2\sigma_{\varepsilon}}\int_{-\infty}^{\infty}\left[2\frac{\overline{\sigma}_{X}/(\sigma_{\varepsilon})}{\frac{\sigma^{2}_{X}}{\sigma^{2}_{\varepsilon}}+4\pi^{2}f^{2}}+R_{4}(f)\right]\;e^{2i\pi f\tau}\;df+C
=\displaystyle= σ¯X2σεeσ¯X|τ|σε+C.\displaystyle\frac{\overline{\sigma}_{X}}{2\sigma_{\varepsilon}}e^{-\frac{\overline{\sigma}_{X}|\tau|}{\sigma_{\varepsilon}}}+C.

Thus we are smoothing the autocovariance sequence with a smoothing window that becomes a delta function as σ¯X/σε\overline{\sigma}_{X}/\sigma_{\varepsilon}\rightarrow\infty. It is reasonable that this non-dimensional quantity arises as an important factor.