Parameter Estimation for Multivariate Diffusion Systems

Melvin M. Varughese¹¹1Email: melvin.varughese@uct.ac.za. Department of Statistical Sciences, University of Cape Town, Rondebosch, Cape Town 7707, South Africa.

Abstract

Diffusion processes are widely used for modelling real-world phenomena. Except for select cases however, analytical expressions do not exist for a diffusion process’ transitional probabilities. It is proposed that the cumulant truncation procedure can be applied to predict the evolution of the cumulants of the system. These predictions may be subsequently used within the saddlepoint procedure to approximate the transitional probabilities. An approximation to the likelihood of the diffusion system is then easily derived. The method is applicable for a wide-range of diffusion systems - including multivariate, irreducible diffusion systems that existing estimation schemes struggle with. Not only is the accuracy of the saddlepoint comparable with the Hermite expansion - a popular approximation to a diffusion system’s transitional density - it also appears to be less susceptible to increasing lags between successive samplings of the diffusion process. Furthermore, the saddlepoint is more stable in regions of the parameter space that are far from the maximum likelihood estimates. Hence, the saddlepoint method can be naturally incorporated within a Markov Chain Monte Carlo (MCMC) routine in order to provide reliable estimates and credibility intervals of the diffusion model’s parameters. The method is applied to fit the Heston model to daily observations of the S&P 500 and VIX indices from December 2009 to November 2010.

keywords:

Diffusion process , Fokker-Planck equation , Cumulant truncation procedure , Saddlepoint approximation , Markov Chain Monte Carlo

1 Introduction

Diffusion processes are continuous-time, continuous-space stochastic processes that have proven to be natural modelling frameworks for many real world phenomena. Over an infinitesimal interval $dt$ , the evolution of a multivariate diffusion process $\phi_{t}^{*}$ is represented by the following, possibly time inhomogeneous, stochastic differential equation (SDE):

d\phi_{t}^{*}=\mu\left(\phi_{t}^{*},t;\theta\right)dt+\sigma\left(\phi_{t}^{*},t;\theta\right)dB_{t},

(1)

where $\phi_{t}^{*}=\left(\phi_{i}\right)_{i=1,...,m}$ ; $\theta=\left(\theta_{i}\right)_{i=1,...,p}$ is the parameter vector; $\mu\left(\phi_{t}^{*},t;\theta\right)=\left(\mu_{i}\right)_{i=1,...,m}$ ; $\sigma^{2}\left(\phi_{t}^{*},t;\theta\right)=$ $\left(\sigma_{ij}\right)_{i,j=1,...,m}$ with $\sigma^{2}\left(\phi_{t}^{*},t;\theta\right)=\sigma\left(\phi_{t}^{*},t;\theta\right)^{T}\sigma\left(\phi_{t}^{*},t;\theta\right)$ and $B_{t}$ is an $m$ -dimensional vector of independent Brownian motions. The $m$ -dimensional vector $\phi_{t}^{*}$ represents a set of state variables which characterizes the diffusion system through time. The assumption that the $m$ Brownian motions are independent does not lead to any loss of generality since allowance is made for off-diagonal terms within the diffusion matrix $\sigma^{2}\left(\phi_{t}^{*},t;\theta\right)$ . Within this framework, the primary focus is on estimating the parameter vector $\theta$ from discretely-sampled data.

The drift vector $\mu\left(\phi_{t}^{*},t;\theta\right)$ and the diffusion matrix $\sigma^{2}\left(\phi_{t}^{*},t;\theta\right)$ characterize the evolution of $\phi_{t}^{*}$ . Their individual elements are defined as:

\mu_{i}=\lim_{\Delta t\rightarrow 0}\frac{\mbox{E}[\Delta\phi_{i}|\phi_{t}^{*}]}{\Delta t},\;\;\sigma_{ij}=\lim_{\Delta t\rightarrow 0}\frac{\mbox{Cov}[\Delta\phi_{i},\Delta\phi_{j}|\phi_{t}^{*}]}{\Delta t}.

Over an infinitesimally small interval, the diffusion system is distributed as follows:

\phi_{t+\Delta t}^{*}-\phi_{t}^{*}\sim\mbox{Normal}\left(\mu\left(\phi_{t}^{*},t;\theta\right)\Delta t,\sigma^{2}\left(\phi_{t}^{*},t;\theta\right)\Delta t\right).

(2)

That is, the diffusion system has a multivariate-normal distribution characterized by the drift vector and diffusion matrix over any infinitesimal interval. Since diffusion processes are Markovian, equation (2) may be used to derive the likelihood for continuously sampled diffusion paths. However with discretely sampled diffusion paths, statistical inference is considerably more challenging. This is because the distribution of the diffusion increments over discretely sampled time points is often unknown.

Instead of representing a multivariate diffusion system $\phi_{t}^{*}$ as a stochastic differential equation, one may instead focus on the Kolmogorov forward equation, which dictates the evolution of its probability density function $p(\phi_{t}^{*})$ . This is given by:

	$\displaystyle\frac{\partial p(\phi_{t}^{*})}{\partial t}$	$\displaystyle=$	$\displaystyle-\sum_{i=1}^{m}\frac{\partial}{\partial\phi_{i}}\left[\mu\left(\phi_{t}^{},t;\theta\right)p(\phi_{t}^{})\right]$		(3)
		$\displaystyle+$	$\displaystyle\frac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}\frac{\partial^{2}}{\partial\phi_{i}\partial\phi_{j}}\left[\sigma_{ij}(\phi_{t}^{},t;\theta)p(\phi_{t}^{})\right].$		(3)

This is also known as the Fokker-Planck equation. Except where required, the dependence of the drift vector and diffusion tensor on the parameter vector shall be suppressed within the notation. Since a diffusion process is Markovian, the likelihood of a diffusion system sampled at discrete time points $(t_{1},t_{2},...,t_{N})$ is given by:

L(\theta)=p(\phi_{t_{1}}^{*})\prod_{i=2}^{N}p\left(\phi_{t_{i}}^{*}|\phi_{t_{i-1}}^{*}\right).

(4)

Asymptotically, for large $N$ , the term $p(\phi_{t_{1}}^{*})$ may be ignored whilst the transitional probability distribution $p(\phi_{t_{i}}^{*}|\phi_{t_{i-1}}^{*})$ is the solution to equation (3) at time $t_{i}$ with the boundary condition that $p(\phi_{t}^{*})$ is given by the Dirac delta function centered around $\phi_{t_{i-1}}^{*}$ at time $t_{i-1}$ . The likelihood function is central to many inference procedures: it enables us to derive parameter estimates, confidence intervals and to conduct hypothesis tests. Unfortunately, except for a few special cases, equation (3) (and hence also the likelihood) is analytically intractable.

The inability to solve equation (3) is an impediment to statistical inference. This may be circumvented by attempting to match, by choice of parameters, characteristics of the sampled path with characteristics of the diffusion model. For example, one may choose to estimate the instantaneous means and variances using the corresponding sample moments of the differenced data (Gallant and Long, 1997; Ragwitz and Kantz, 2001). Alternatively one may employ Bayesian imputation to augment the observed data so that the diffusion increments are approximately normally distributed (Roberts and Stramer, 2001).

Since likelihood based methods tend to give more precise parameter estimates than method of moments estimators (Hurn et al., 2007), we instead seek an approximation to the transitional probability distribution of the diffusion process. Monte Carlo methods may be used to approximate the likelihood (Kleinhans and Friedrich, 2007; Durham and Gallant, 2002). Alternatively, equation (3) could be solved numerically. Wojtkiewicz and Bergman (2000) discretized the spatial domain and solved the partial differential equation numerically at each point on the lattice. The finite-difference method discretizes the time domain, taking advantage of the fact that over an infinitesimally small time period, the diffusion process is normally distributed (Wehner and Wolfer, 1987). Huang (2012) developed a quasi-maximum likelihood estimator that approximates the first two conditional moments using a Wagner-Platen approximation. The resulting normal distribution can be used to approximate the transitional probability.

Another possibility is to approximate the transitional probability distribution by a closed form analytic function; for example, a Hermite polynomial expansion (Ait-Sahalia, 2002). This method was shown by Ait-Sahalia to be superior to many of the competing methods - both in terms of the accuracy of the transitional distribution approximation as well as the speed of the algorithm. Stramer et al. (2010) created an MCMC procedure based on the Hermite approximation which could allow for measurement errors in the diffusion process.

It must be stressed that the Hermite approximation is only applicable for reducible diffusion processes. A diffusion process $X$ is reducible if there exists a one-to-one transformation $Y=h(X,\theta)$ such that the covariance function of $Y$ is the identity matrix. Though all univariate diffusion processes are reducible, only some multivariate diffusions share this property. Ait-Sahalia (2008) extended the method to irreducible, multivariate diffusions, but not only is the procedure more difficult to implement, there is also a reduction in the accuracy of the closed-form approximation to the transitional density. Furthermore, the Hermite approximation does not in general integrate to one. Indeed, for parameter values far from the maximum likelihood estimates, the normalizer can be very far from one. This often prevents convergence when applying the Hermite approximation within an MCMC setting (Stramer et al., 2010). Consequently, without modification, the resulting MCMC credibility intervals often suffer from considerable undercoverage.

It is proposed that the transitional probability may rather be estimated by a saddlepoint approximation (Daniels, 1954). The saddlepoint approximation is an algebraic expression based on a random variable’s cumulant generation function (CGF). In cases where the first few moments of a random variable is known but the corresponding probability density is difficult to obtain, the saddlepoint approximation to the density can be calculated. The tails of a saddlepoint approximation are more accurate than those of a Edgeworth-expansion (Barndorff-Nielsen and Klüppelberg, 1999). Saddlepoint methods have already been used to approximate the transition densities of diffusions (Ait-Sahalia and Yu, 2006; Preston and Wood, 2012). Preston and Wood apply the saddlepoint approximation to the CGF of a truncated small-time sample-path expansion whilst Ait-Sahalia and Yu (2006) apply the saddlepoint to a truncated expansion, in small-time, of the characteristic function of the transition density.

The saddlepoint approximation will be applied to a truncated expansion of the CGF of $\phi_{t}^{*}$ with respect to the CGF parameters. In contrast to the aforementioned small-time methods, the focus is instead on predicting the evolution of the diffusion system’s cumulants through time, which may be achieved with the cumulant truncation procedure (Whittle, 1957; Gillespie and Renshaw, 2007). The cumulant truncation procedure may also be applied to multivariate diffusion processes (Varughese and Fatti, 2008; Varughese, 2011). These predicted cumulants can be subsequently substituted within a saddlepoint to approximate the transitional densities. After fitting a diffusion model, one may subsequently test for model misspecification of the SDE (Zhang et al., 2012).

In Section 2, the saddlepoint approximation to the transitional probability distribution of the diffusion process is introduced. In Section 3, an MCMC algorithm which uses this approximation for parameter estimation is introduced. Since Ait-Sahalia (2002) demonstrated that the Hermite approximation is superior to a number of alternative approximations, we compare the saddlepoint and the Hermite approximation in Section 4. The two methods are tested on univariate diffusion processes. The Heston model (an irreducible, multivariate diffusion process) is fitted in Section 5 to the S&P 500 and VIX indices. This is followed by a discussion of the various results and some conclusions being drawn.

2 Approximating the transitional probability distribution

2.1 The saddlepoint approximation

In cases where the distribution of a random vector $\mathbf{X}^{T}=(X_{1},X_{2},...,X_{m})$ is unknown, the saddlepoint method provides an algebraic approximation based on $\mathbf{X}$ ’s cumulant generating function. Let $\Lambda^{T}=(\upsilon_{1},...,\upsilon_{m})$ be the parameters for $\mathbf{X}$ ’s moment generating function (MGF) $M(\Lambda)=E\left[\exp[\upsilon_{1}X_{1}+...+\upsilon_{m}X_{m}]\right]$ . Also, let $K(\Lambda)=\ln M(\Lambda)$ denote the cumulant generating function (CGF). Assuming the MGF exists in an open neighbourhood around the origin (Daniels, 1954), the leading order $m$ -dimensional saddlepoint approximation is given by:

f(\mathbf{x})=(2\pi)^{-m/2}\left|\nabla^{2}K(\Lambda)\right|^{-1/2}\exp\left\{K(\Lambda)-\Lambda^{T}\mathbf{x}\right\},

(5)

where $\mathbf{x}$ is an $m$ -dimensional vector and $\nabla^{2}K(\Lambda)$ represents the $m\times m$ Hessian matrix for $K(\Lambda)$ and

\nabla K(\Lambda)=\mathbf{x}.

Except for some select cases such as the Normal and the Gamma distribution, substituting the true CGF within equation (5) does not simplify to the true distribution, but rather to an approximation thereof.

In many cases, the CGF $K(\Lambda)$ is unknown, but it may be approximated. Given the first $n$ cumulants $(\kappa_{1},\kappa_{2},...,\kappa_{n})$ for a univariate random variable $X$ , $K(\theta)=E[\exp\{\theta X\}]$ may be approximated by:

K(\theta)\approx\sum_{i=1}^{n}{\frac{\theta^{n}}{n!}\kappa_{n}}.

(6)

In the univariate case, the saddlepoint is consequently given by:

\begin{split}&f_{n}(x)=\left(2\pi\sum_{i=0}^{n-2}\frac{\kappa_{i+2}\theta_{0}^{i}}{i!}\right)^{-\frac{1}{2}}\exp\left[\sum_{i=1}^{n}\frac{\kappa_{i}\theta_{0}^{i}}{i!}-\theta_{0}x\right],\\ &\mbox{ where: }x=\sum_{i=0}^{n-1}\frac{\kappa_{i+1}\theta_{0}^{i}}{i!}.\end{split}

(7)

The approximation within (7) is two-fold as not only is the saddlepoint an approximation to the true density, but the saddlepoint is now based on a truncated approximation to the true CGF. For a diffusion process, the approximate evolution of the cumulants (and hence also the approximation evolution of equation (6)) can be calculated through the cumulant truncation procedure.

2.2 Applying the cumulant truncation procedure to a diffusion process

Consider a diffusion process $\phi_{t}^{*}=\left(\phi_{i}\right)_{i=1,...,m}$ with MGF $M(\Lambda,t)=E\left[\exp[\upsilon_{1}\phi_{1}(t)+\upsilon_{2}\phi_{2}(t)+...+\upsilon_{m}\phi_{m}(t)]\right]$ and CGF $K(\Lambda,t)=\ln M(\Lambda,t)$ . In order to calculate the likelihood given in (4), we approximate the transitional density $p(\phi_{t_{i}}^{*}|\phi_{t_{i-1}}^{*})$ by a saddlepoint $f(\mathbf{x})$ . This necessitates the calculation of an approximation to $K(\Lambda,t)$ at time $t_{i}$ given that the diffusion process is equal to $\phi_{t_{i-1}}^{*}$ at time $t_{i-1}$ .

Let $\Lambda^{*}=(\partial/\partial\upsilon_{1},...,\partial/\partial\upsilon_{m})$ . Like the partial differential equation for the probability density (given in (3)), there is a corresponding partial differential equation for the MGF $M(\Lambda,t)$ , the solution of which enables us to predict the evolution of the system’s cumulants. The MGF obeys the following relation (see Appendix):

\frac{\partial M(\Lambda,t)}{\partial t}=\bigg{[}\sum_{i=1}^{m}\upsilon_{i}\mu_{i}(\Lambda^{*},t)+\frac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}\upsilon_{i}\upsilon_{j}\sigma_{ij}(\Lambda^{*},t)\bigg{]}M(\Lambda,t).

(8)

Though the terms $\mu_{i}(\Lambda,t)$ and $\sigma_{ij}(\Lambda,t)$ denote functions of $\Lambda$ and $t$ , the terms $\mu_{i}(\Lambda^{*},t)$ and $\sigma_{ij}(\Lambda^{*},t)$ represent differential operators that act on the MGF $M=M(\Lambda,t)$ . So, for example, if we have a 2-dimensional diffusion system with drift term $\mu_{1}(\phi_{1},\phi_{2})=a\phi_{1}-b\phi_{1}^{2}+c\phi_{1}\phi_{2}$ , then $\mu_{1}\left(\frac{\partial}{\partial\upsilon_{1}},\frac{\partial}{\partial\upsilon_{2}}\right)=a\frac{\partial}{\partial\upsilon_{1}}-b\frac{\partial^{2}}{\partial\upsilon_{1}^{2}}+c\frac{\partial^{2}}{\partial\upsilon_{1}\partial\upsilon_{2}}$ is a differential operator and $\mu_{1}\left(\frac{\partial}{\partial\upsilon_{1}},\frac{\partial}{\partial\upsilon_{2}}\right)M=a\frac{\partial M}{\partial\upsilon_{1}}-b\frac{\partial^{2}M}{\partial\upsilon_{1}^{2}}+c\frac{\partial^{2}M}{\partial\upsilon_{1}\partial\upsilon_{2}}$ represents the action of the operator on the MGF $M(\Lambda,t)$ . The MGF may be used to characterize the evolution of the diffusion system through time. Unfortunately, as with equation (3), equation (8) is generally analytically intractable.

The cumulant truncation procedure enables us to convert the partial differential equation for the MGF into a system of ordinary differential equations that also depends on the system parameters $\theta$ . The resulting set of equations proves to be far easier to solve numerically than the original partial differential equation. Though, in principle, one can derive a system of ordinary differential equations directly from equation (8), the cumulant truncation procedure is instead based on the corresponding partial differential equation for the CGF $K(\Lambda,t)=\ln M(\Lambda,t)$ . This is because $K(\Lambda,t)$ can be expressed as a series expansion of the diffusion system’s cumulants.

K=\sum_{r_{1}\geq 0}\sum_{r_{2}\geq 0}...\sum_{r_{m}\geq 0}\frac{\left(\prod_{i=1}^{m}\upsilon_{i}^{r_{i}}\right)\kappa_{r_{1},r_{2},...,r_{m}}(t)}{\prod_{i=1}^{m}r_{i}!},

(9)

where:

\kappa_{r_{1},...,r_{m}}(t)=\left\{\begin{array}[]{ll}\mbox{E}[\phi_{i}]&(r_{i}=1,r_{j}=0;j\neq i)\\ \mbox{E}\left[\prod_{i=1}^{m}\left(\phi_{i}-E[\phi_{i}]\right)^{r_{i}}\right]&(4>\sum_{j=1}^{m}r_{j}>1).\end{array}\right.

In order to be able to predict the evolution of the diffusion system’s cumulants, we need to derive a partial differential equation governing the evolution of $K(\Lambda,t)$ . Since $K(\Lambda,t)=\ln M(\Lambda,t)$ , we have:

\frac{1}{M}\frac{\partial M}{\partial x}=\frac{\partial K}{\partial x},

(10)

where $M$ and $K$ depend on some variable $x$ . By repeatedly applying the above relation, it is possible to derive the corresponding partial differential equation for the CGF from the partial differential equation for the MGF given by equation (8). The general form of this equation cannot be shown, but rather must be independently derived for each system of interest. The CGF partial differential equation is derived for selected examples within Sections 4 and 5.

By substituting the series expansion in (9), truncated to order $n$ , within the partial differential equation for $K(\Lambda,t)$ and subsequently matching the cumulant coefficients $\left\{\left(\prod_{i=1}^{m}\upsilon_{i}^{r_{i}}\bigg{/}\prod_{i=1}^{m}r_{i}!\right):r_{i}=0,1,2,...;1\leq i\leq m\right\}$ , it is possible to derive a system of ordinary differential equations for the centered moments or cumulants. Consequently, by solving this system of differential equations, it is possible to predict the approximate evolution of the cumulants for each proposed parameter vector $\theta$ . Throughout the paper this analysis was performed using MATHEMATICA v6.0. These predictions are accurate across a wide range of parameter values (Varughese, 2009). This is key, as we can subsequently use the predicted cumulants within the saddlepoint to approximate $p(\phi_{t_{i}}^{*}|\phi_{t_{i-1}}^{*})$ and hence derive the approximate likelihood at $\theta$ . The parameter estimates are taken to be the values that maximize this approximate likelihood, which may be estimated by a Markov Chain Monte Carlo (MCMC) procedure.

3 A modified MCMC parameter estimation algorithm

In this section, a parameter estimation algorithm for a diffusion system is proposed. The algorithm uses the saddlepoint approximation introduced in Section 2 to recreate the transitional probability distribution. This is subsequently used to approximate the likelihood function of the diffusion system. The approximate likelihoods enable us to explore the parameter space of the diffusion system with a modified MCMC procedure. The algorithm is described below.

1.

Apply the cumulant truncation procedure to the diffusion system thus deriving a system of ordinary differential equations that describe the cumulants’ evolution. (Note these differential equations will depend on the model parameters $\theta$ .)
2.

Choose an arbitrary, starting set of parameter values $\theta_{0}$ . Set $\theta_{old}=\theta_{0}$ .
3.

Propose a jump from the old set of parameter values $\theta_{old}$ to a new set of parameter values, $\theta_{new}=\theta_{old}+\Delta\theta$ using a suitably chosen proposal distribution $q(\theta_{new}|\theta_{old})$ .
4.
Calculate the likelihoods $L(\theta_{old})$ and $L(\theta_{new})$ . $L(\theta)$ is calculated as follows:
1. i.
  
  Set the likelihood, $L(\theta)=1$ .
2. ii.
  
  Set $i$ to 1.
3. iii.
  
  Use the system of ordinary differential equations from Step 1 together with the data values observed at time $t_{i}$ , $\phi_{t_{i}}^{*}=(\phi_{1},\phi_{2},...,\phi_{m})$ to predict the values of the cumulants at time $t_{i+1}$ .
4. iv.
  
  Given the data $\phi_{t_{i}}^{*}$ at time $t_{i}$ , the transitional probability distribution at time $t_{i+1}$ can be approximated by a saddlepoint approximation $f_{n}(\phi_{t_{i+1}}^{*}|\phi_{t_{i}}^{*})$ after substituting for the cumulants derived from step iii.
5. v.
  
  Set the likelihood $L(\theta)=L(\theta)\times f_{n}(\phi_{t_{i+1}}^{*}|\phi_{t_{i}}^{*})$ . This is an implementation of equation (4).
6. vi.
  
  Set $i=i+1$ and if $i<N$ (where $N$ is the number of data points) go to step iii.

Calculate the acceptance ratio $R$ . This is given by:

R=\frac{L(\theta_{new})}{L(\theta_{old})}\frac{\pi(\theta_{new})}{\pi(\theta_{old})}\frac{q(\theta_{old}|\theta_{new})}{q(\theta_{new}|\theta_{old})},

where $\pi(\theta)$ is the prior density. We then accept the proposed parameter values $\theta_{new}$ with probability $\min(1,R)$ . That is, set $\theta_{new}=\theta_{old}$ with probability $1-\min(1,R)$ . Otherwise leave $\theta_{new}$ unchanged.

6.

Go back to step 3.

After a suitably chosen burn-in period, the MCMC chains sample from the posterior distribution of the diffusion parameters. Hence, in addition to using the medians of the chains as estimates of the diffusion parameters, the $\alpha/2$ -th and $(1-\alpha/2)$ -th percentile may be used to construct $(100-\alpha)$ % credibility intervals.

4 A comparative study of the Saddlepoint and Hermite methods

Since the Hermite approximation (Ait-Sahalia, 2002) is a popular method for estimating the parameters of a diffusion system, we compare its performance against the saddlepoint. In order to judge the relative accuracies of the saddlepoint and Hermite procedures, the true transitional probability distribution is required as a baseline. Unfortunately, the diffusion models for which the transitional distribution is known are few. Amongst, diffusion processes with analytical, non-normal transitional distributions, the Cox-Ingersoll-Ross (CIR) process and Geometric Brownian motion are arguably the most well known. Neither the saddlepoint nor the Hermite approximation in general integrate to one and the approximations are not normalized when performing the comparisons in this section.

Within this section, the CIR process is analyzed didactically: the analysis is meant to illustrate the general derivation of the saddlepoint approximation to a transitional distribution. This is followed by a study of the relative accuracy, for the CIR process, of the Hermite and saddlepoint approximations to the transitional distribution. The section concludes by comparing MCMC implementations for the Hermite and saddlepoint procedures for both the CIR process as well as Geometric Brownian motion.

4.1 Example: Deriving the saddlepoint approximation for the CIR process

The CIR process is commonly used to model financial data such as short-term interest rates. The model may be represented as follows:

dX_{t}=b(\mu-X_{t})dt+\sigma\sqrt{X_{t}}dB_{t}.

(11)

The CIR process is mean-reverting. Furthermore, provided $2b\mu>\sigma^{2}$ , the CIR process is always positive. This is advantageous as many real-world phenomena are positive and display mean reversion. The probability distribution of the process obeys the following partial differential equation:

\frac{\partial p(X_{t})}{\partial t}=-\frac{\partial}{\partial X_{t}}\left[b(\mu-X_{t})p(X_{t})\right]+\frac{1}{2}\frac{\partial^{2}}{\partial X_{t}^{2}}\left[\sigma^{2}X_{t}p(X_{t})\right].

(12)

An analytical solution to equation (12) exists (Cox et al., 1985). Hence it is possible to derive an analytical expression for the transitional distribution which may be subsequently compared with the Hermite and the saddlepoint approximations. This enables us to compare the relative accuracies of the two approximation schemes.

The first step of the cumulant truncation procedure is to derive the evolution of the moment generating function through time. This requires the drift and diffusion terms of the process. These are given by:

\mu(X_{t},t)=b(\mu-X_{t});\;\sigma(X_{t},t)=\sigma\sqrt{X_{t}}.

By substituting the drift and diffusion terms within equation (8), we derive the partial differential equation for the MGF $M(\lambda,t)=E\left[\exp[\lambda X_{t}]\right]$ :

\frac{\partial M}{\partial t}=\lambda b\mu M-\left[\lambda b-\frac{1}{2}\lambda^{2}\sigma^{2}\right]\frac{\partial M}{\partial\lambda}.

(13)

By dividing both sides of equation (13) by $M(\lambda,t)$ and applying equation (10), it is possible to derive the differential equation for the CGF:

\frac{\partial K}{\partial t}=\lambda b\mu-\left[\lambda b-\frac{1}{2}\lambda^{2}\sigma^{2}\right]\frac{\partial K}{\partial\lambda}.

(14)

In most cases the partial differential equation for the CGF will be analytically intractable. Under such scenarios, one may instead substitute a truncated expansion of the CGF in place of the full CGF. As the order of the expansion increases, the accuracy of the approximation tends to increase (Varughese, 2009). Suppose we approximate $K(\lambda,t)$ by a fourth-order expansion of the CGF:

K(\lambda,t)\approx 1+\lambda\kappa_{1}(t)+\frac{\lambda^{2}}{2}\kappa_{2}(t)+\frac{\lambda^{3}}{3!}\kappa_{3}(t)+\frac{\lambda^{4}}{4!}\kappa_{4}(t).

(15)

By substituting the above expansion within equation (14) and subsequently equating the various coefficients of $\left\{\lambda^{i}:i=1,..,4\right\}$ , it is possible to derive a system of ordinary differential equations that describe the evolution of the cumulants:

	$\displaystyle\dot{\kappa}_{1}(t)$	$\displaystyle=b\left[\mu-\kappa_{1}(t)\right]$
	$\displaystyle\dot{\kappa}_{2}(t)$	$\displaystyle=\sigma^{2}\kappa_{1}(t)-2b\kappa_{2}(t)$
	$\displaystyle\dot{\kappa}_{3}(t)$	$\displaystyle=-3b\kappa_{3}(t)+3\sigma^{2}\kappa_{2}(t)$
	$\displaystyle\dot{\kappa}_{4}(t)$	$\displaystyle=-4b\kappa_{4}(t)+6\sigma^{2}\kappa_{3}(t).$

Note that there is a trade-off as the order of the approximation to the CGF increases. Not only does the approximation’s accuracy increase, but the complexity of the system of equations also increases, which will lead to a rise in computing time. A fourth order approximation is chosen as it seems to be a good balance between accuracy and speed.

Boundary conditions must be specified in order to make the solution of the above system of equations unique. When predicting the cumulants at time $t_{i}$ , the boundary conditions are taken to be: $\kappa_{1}(t_{i-1})=\phi_{t_{i-1}}$ , $\kappa_{2}(t_{i-1})=0$ , $\kappa_{3}(t_{i-1})=0$ , $\kappa_{4}(t_{i-1})=0$ . That is, in predicting the cumulants at any future time $t_{i}$ , we condition on the observed value of the process at time $t_{i-1}$ . This enables the system of ordinary differential equations to be solved numerically. The resulting predictions may be inserted into equation (7) to give us the saddlepoint approximation to the transitional probability distribution $p(X_{t_{i}}|X_{t_{i-1}})$ .

Since the CGF is truncated after the fourth cumulant, we shall approximate the transitional probability distribution by $f_{4}(x)$ . From equation (7) it follows that:

f_{4}(x)=\left[2\pi\left(\kappa_{2}+\kappa_{3}\theta_{0}+\frac{\kappa_{4}}{2}\theta_{0}^{2}\right)\right]^{-\frac{1}{2}}\exp\left[-\frac{\theta_{0}^{2}}{2}\kappa_{2}-\frac{\theta_{0}^{3}}{3}\kappa_{3}-\frac{\theta_{0}^{4}}{8}\kappa_{4}\right],

(16)

where:

x=\kappa_{1}+\theta_{0}\kappa_{2}+\frac{\theta_{0}^{2}}{2}\kappa_{3}+\frac{\theta_{0}^{3}}{6}\kappa_{4}.

Note that $\theta_{0}$ is a function of $x$ . A cursory glance at the above equation might suggest that $f_{4}(x)$ does not depend on the first cumulant. This however is incorrect since $\theta_{0}$ depends on $\kappa_{1}$ .

4.2 The relative accuracies of the transitional distribution approximations

The transitional distribution of the CIR process may be approximated using both the saddlepoint and Hermite approximations. Suppose the current value of the process $X_{t}$ is assumed to be 50 and we are interested in the probability distribution one month from now. That is, $t_{i}-t_{i-1}=1/12$ . Figure 1 compares the relative accuracies of the two methods, implemented in MATHEMATICA v6.0, at the parameter values: $b=1.5$ ; $\mu=58$ ; $\sigma^{2}=15$ . On the whole, the saddlepoint approximation appears to be more accurate.

Refer to caption — Figure 1: The relative accuracies of the saddlepoint and Hermite approximations for the CIR process at $b=1.5$ , $\mu=58$ and $\sigma^{2}=15$ . The right panel is a plot of three curves: the true transitional probability together with the saddlepoint and Hermite approximations. The saddlepoint and Hermite approximations follow the true transitional probability distribution (a non-central Chi-squared distribution) very closely. The left panel highlights the absolute difference between the true transitional density and the two approximations. Overall, the saddlepoint approximation is more accurate

Let the Integrated Error of an approximation $\widehat{f(x)}$ be defined as:

\int_{0}^{\infty}{|f(x)-\widehat{f(x)}|dx}.

To get a better idea of how the relative accuracy of the two approximations behaves throughout the parameter space, we plot the Integrated Error of the two models for varying values of the parameters $b$ and $\sigma^{2}$ .

One can see from Figure 2 that there are regions in the parameter space where the Hermite approximation is more accurate and regions where the saddlepoint approximation is more accurate.

As a final point, note that unlike the saddlepoint approximation, the Hermite approximation is a small-time expansion. Consequently, the larger $t_{i}-t_{i-1}$ becomes, the less accurate the Hermite approximation will be. Figure 3 shows how the Integrated Error for both the saddlepoint and the Hermite approximations changes with $t_{i}-t_{i-1}$ . The Hermite approximation loses accuracy more quickly for larger time scales.

This investigation has been restricted to a subset of the parameter space for the CIR model. In some sense however, the accuracy of the transitional distribution is only important in as far as it enables us to obtain precise parameter estimates for the diffusion model. With multiple parameters, the computational efficiency of the MCMC procedure makes it suitable for parameter estimation. This motivates a comparison of the modified saddlepoint MCMC algorithm described in Section 3 with modified MCMC algorithms that uses the Hermite approximation.

4.3 A comparison of MCMC implementations of the two approximations

Both the saddlepoint MCMC and the Hermite MCMC are judged by comparing the coverage of their resulting credibility intervals with the corresponding coverage obtained from the true MCMC. The traditional Hermite MCMC is known to fail since the Hermite approximation can explode to infinity for parameter values that are far from the maximum likelihood estimate (Stramer et al., 2010). We thus run two versions of the Hermite MCMC: the traditional version that is known to fail and a modified algorithm developed by Stramer et al. (2010) that avoids computation of the posterior’s normalizing constant. To make the analysis more robust, the MCMC implementations are compared for both the CIR process as well as for Geometric Brownian motion.

The empirical coverage of the credibility intervals is calculated from 100 simulated time series; each of length 40. For each of the time series, the four variants of the MCMC procedure (the saddlepoint MCMC, the two Hermite MCMCs and the true MCMC) are run and a credibility interval is calculated. We use improper, uniform priors and a normal proposal density for the parameter values when running the MCMC chains. If a parameter must be positive, any negative value sampled from the proposal distribution is thrown out and another sample is taken. The coverage of each variant of the MCMC procedure is estimated as the proportion of the 100 credibility intervals that contain the true parameter values i.e. the parameter values used to simulate the time series. In calculating the credibility intervals, an MCMC chain of length 20,000 was run and the first 10,000 steps were trimmed as part of the burn-in period. 90% credibility intervals will correspond to the 5-th and 95-th percentiles of the resulting chains. It is desirable that the credibility intervals of a proposed MCMC implementation match as closely as possible the observed coverages of the true MCMC.

Despite the computational efficiency of both the saddlepoint and the two versions of the Hermite procedures, a study of the coverage of their corresponding implementations is computationally intensive. 100 MCMC chains, each of length 20,000, must be run for the three MCMC implementations. This restricts us to comparing the coverage of their credibility intervals for a single set of parameter values.

In order to make our analysis as robust as possible, we chose to compare the coverage of the credibility intervals at points in the parameter space where the saddlepoint approximation performs particularly badly in comparison to the Hermite approximation. We compare the relative accuracies of the two approximations using the following statistic:

B=\frac{\int_{0}^{\infty}\left|f(x)-\mbox{Saddle}(x)\right|dx}{\int_{0}^{\infty}\left|f(x)-\mbox{Hermite}(x)\right|dx}.

(17)

where $f(x)$ represents the true distribution and $\mbox{Hermite}(x)$ and $\mbox{Saddle}(x)$ represent the Hermite approximation and saddlepoint approximation respectively. If the saddlepoint approximation is more accurate than the Hermite approximation, one would expect $B$ to be less than 1.

First, for the CIR process, we choose to simulate 100 time series with the parameter values:

b=0.12,\ \mu=0.05,\ \sigma=0.02,\ x_{0}=0.049,\ \Delta t=1/52.

For the above parameter values, the relative efficiency statistic, $B=27.96$ . Second, for Geometric Brownian motion, we choose to simulate 100 time series with the parameter values:

\mu=0.12,\ \sigma=0.2,\ x_{0}=0.049,\ \Delta t=1/12.

For the above parameter values, $B=2.99$ . For both diffusion processes, the saddlepoint approximation to the transitional distribution is considerably less accurate than the Hermite approximation.

Table 1 shows the respective coverages of the saddlepoint MCMC, the two Hermite MCMCs (in the table, the traditional algorithm is referred to as Hermite whilst the modified algorithm developed by Stramer et al. (2010) is referred to as Hermite*) and the true MCMC for both the CIR process as well as Geometric Brownian Motion. A Euler-Maruyama scheme was used to determine the proposal density for the external variates of the Hermite* procedure. Despite running the comparison for both models at points where the saddlepoint is less accurate than the Hermite approximation, the credibility intervals of the saddlepoint MCMC is closer to the true credibility intervals. The results indicate that the saddlepoint MCMC may be reliably used both to estimate the parameters of a diffusion model as well as to derive their corresponding credibility intervals. This suggests that the stability of the approximations throughout the parameter space is more important than their accuracy at the true parameter values. The credibility intervals produced by the traditional Hermite MCMC have severe undercoverage. This is particularly true for the Geometric Brownian motion since the MCMC chains were started far from the true parameter values ( $\mu_{0}=0.3$ , $\sigma_{0}=0.1$ ) and the chains often became stuck. For the CIR process, all the chains were started at the true parameter values ( $b_{0}=0.12$ , $\mu_{0}=0.05$ , $\sigma_{0}=0.02$ ). This serves to highlight the convergence problems that the Hermite MCMC chains suffer from. Though, the modified Hermite* MCMC improves the coverage of the credibility intervals, there is still evidence of undercoverage for the Geometric Brownian motion. This seems to be due to the modified MCMC sometimes taking longer than the burn-in period to converge rather than the chains getting stuck. This is a topic for future research.

Table 1: Observed coverage for the four methods for constructing 90% credibility intervals. Hermite refers to a traditional MCMC implementation of the Hermite approximation and Hermite* refers to the modified algorithm developed by Stramer et al. (2010).

CIR process					Geometric Brownian
Par.	True	Saddle	Hermite	Hermite*	Par.	True	Saddle	Hermite	Hermite*
$b$	0.81	0.79	0.61	0.70	$\mu$	0.90	0.90	0.14	0.44
$\mu$	0.82	0.79	0.60	0.81	$\sigma$	0.90	0.93	0.14	0.48
$\sigma$	0.88	0.86	0.66	0.84

Table 2 shows the time taken to study the coverage properties of the credibility intervals as well as the average acceptance ratios for the proposed MCMC steps. There are indications that the acceptance ratios for the Hermite procedures are lower. Both the saddlepoint and the Hermite approximations schemes take considerably longer to run than the case where the true distribution is known. The Saddle MCMC takes 70% longer to run than the traditional Hermite for the CIR process and 217% longer for the Geometric Brownian motion. However, the modifications to the traditional algorithm, Hermite* MCMC take considerably longer to run. Hermite* MCMC takes 81% longer than the Saddle MCMC for the CIR process and 6% longer for the Geometric Brownian motion.

Table 2: Time (in hours) required to run 100 MCMC chains of length 20,000 for the four methods as well as the corresponding average acceptance ratios. The chains were run in MATHEMATICA v6.0 on a Duo Core 2.00 GHz CPU with 4GB RAM.

	CIR process				Geometric Brownian
	True	Saddle	Hermite	Hermite*	True	Saddle	Hermite	Hermite*
time	9.80	102.30	60.35	184.76	1.60	98.29	31.04	104.22
ratio	0.63	0.69	0.49	0.54	0.73	0.55	0.23	0.37

4.4 Coverage for non-standard diffusion processes

We extend our study of the coverage of the saddlepoint procedure to a nonlinear, multivariate diffusion process with unknown transitional distribution. Consider a bivariate-system that behaves as follows:

	$\displaystyle d\phi_{1}$	$\displaystyle=$	$\displaystyle\left(a\phi_{1}\phi_{2}-b\phi_{1}^{2}\right)dt+c\phi_{2}dB^{(1)}_{t}$
	$\displaystyle d\phi_{2}$	$\displaystyle=$	$\displaystyle g\left(\phi^{*}-\phi_{2}\right)+\sigma dB^{(2)}_{t},$		(18)

where the Brownian motions are independent. Both processes are mean-reverting: $\phi_{2}$ is an Ornstein-Uhlenbeck process and for a fixed value of $\phi_{2}$ , the process $\phi_{1}$ will have long-term mean $a\phi_{2}/b$ . As $\phi_{2}$ increases, the instantaneous mean and variance of $\phi_{1}$ also increases. From equations (8) and (10), the corresponding partial differential for $K(\nu_{1},\nu_{2},t)=\ln\mbox{E}[\exp\{\nu_{1}\phi_{1}+\nu_{2}\phi_{2}\}]$ is given by:

	$\displaystyle\frac{\partial K}{\partial t}$	$\displaystyle=$	$\displaystyle a\nu_{1}\left[\frac{\partial K}{\partial\nu_{1}}\frac{\partial K}{\partial\nu_{2}}+\frac{\partial^{2}K}{\partial\nu_{1}\partial\nu_{2}}\right]-b\nu_{1}\left[\left(\frac{\partial K}{\partial\nu_{1}}\right)^{2}+\frac{\partial^{2}K}{\partial\nu_{1}^{2}}\right]$		(19)
		$\displaystyle+$	$\displaystyle\left[\frac{c\nu_{1}^{2}}{2}-g\nu_{2}\right]\frac{\partial K}{\partial\nu_{2}}+g\phi^{*}\nu_{2}+\frac{\sigma^{2}}{2}\nu_{2}^{2}.$		(19)

A third order cumulant truncation of $K(\nu_{1},\nu_{2},t)$ yields:

	$\displaystyle K(\nu_{1},\nu_{2},t)$	$\displaystyle\approx$	$\displaystyle 1+\nu_{1}\kappa_{10}(t)+\nu_{2}\kappa_{01}(t)+\nu_{1}\nu_{2}\kappa_{11}(t)$		(20)
		$\displaystyle+$	$\displaystyle\frac{\nu_{1}^{2}}{2!}\kappa_{20}(t)+\frac{\nu_{2}^{2}}{2!}\kappa_{02}(t).$		(20)

(20) may be substituted within (19) to give a system of ordinary differential equations for the cumlulants that can subsequently be solved and used within the saddlepoint approximation.

The diffusion system has six parameters: $a$ , $b$ , $c$ , $g$ , $\phi^{*}$ and $\sigma$ . Suppose from (18) we simulate time series, each of length 100, for $\phi_{1}$ and $\phi_{2}$ . As in the previous subsection, an investigation into the coverage of the credibility intervals is computationally expensive and hence is only feasible for a single set of parameter values. We simulate 100 time series using the parameter values:

a=0.1,b=0.02,c=1.8,g=0.5,\phi^{*}=5,\sigma=1.

For each time series, the saddlepoint procedure is used to construct 90% credibility intervals for the six parameters. The observed coverages for the parameters are:

a:\frac{80}{100};\mbox{\ \ }b:\frac{82}{100};\mbox{\ \ }c:\frac{83}{100};\mbox{\ \ }g:\frac{87}{100};\mbox{\ \ }\phi^{*}:\frac{82}{100};\mbox{\ \ }\sigma:\frac{81}{100}.

These coverages are close to the advertised values, suggesting that the procedure works well in this nonlinear, multivariate example.

5 Application to Financial Data

We study the S&P 500 and it’s relation to the VIX index (a popular measure of implied volatility) over the period December 2009 to November 2010. Figure 4 shows the two indices over this period. The Heston model is fitted to the dataset. It uses proxies for market volatility - such as the VIX index - to account for market movements. The key feature of this model is that the volatility of the asset price movements are assumed to be stochastic.

d\left[\begin{array}[]{c}S_{t}\\ V_{t}\end{array}\right]=\left[\begin{array}[]{c}rS_{t}\\ \delta(\theta-V_{t})\end{array}\right]dt+\left[\begin{array}[]{cc}S_{t}\sqrt{(1-\rho^{2})V_{t}}&\rho S_{t}\sqrt{V_{t}}\\ 0&\sigma\sqrt{V_{t}}\end{array}\right]d\left[\begin{array}[]{c}B_{t}^{(1)}\\ B_{t}^{(2)}\end{array}\right],

(21)

where $S_{t}$ represents the asset price, $V_{t}$ is the underlying volatility that drives the asset movements and $r$ , $\delta$ , $\theta$ , $\rho$ and $\sigma$ are parameters. The Heston model is a multivariate, irreducible diffusion processes for which many of the competing methods would not be applicable.

We wish to run the saddle MCMC in order to obtain both estimates and credibility intervals for the parameters for the Heston model. To run the saddle MCMC, the diffusion tensor $\sigma_{ij}(S_{t},V_{t};\theta)$ is needed. This is given by:

\left[\begin{array}[pos]{cc}S_{t}\sqrt{(1-\rho^{2})V_{t}}&0\\ \rho S_{t}\sqrt{V_{t}}&\sigma\sqrt{V_{t}}\end{array}\right]^{T}\left[\begin{array}[pos]{cc}S_{t}\sqrt{(1-\rho^{2})V_{t}}&0\\ \rho S_{t}\sqrt{V_{t}}&\sigma\sqrt{V_{t}}\end{array}\right]=\left[\begin{array}[pos]{cc}S_{t}^{2}V_{t}&\rho\sigma V_{t}S_{t}\\ \rho\sigma V_{t}S_{t}&\sigma^{2}V_{t}\end{array}\right].

By applying equations (8) and (10) to the above diffusion process, we obtain the following partial differential equation for $K(\nu_{1},\nu_{2},t)=\ln\mbox{E}[\exp\{\nu_{1}S_{t}+\nu_{2}V_{t}\}]$ :

$\displaystyle\frac{\partial K(\nu_{1},\nu_{2},t)}{\partial t}$	$\displaystyle=\nu_{1}r\frac{\partial K}{\partial\nu_{1}}+\nu_{2}\delta\theta-\nu_{2}\delta\frac{\partial K}{\partial\nu_{2}}$
	$\displaystyle+\frac{1}{2}\nu_{1}^{2}\left(\frac{\partial K}{\partial\nu_{1}}\right)^{2}\left(\frac{\partial K}{\partial\nu_{2}}\right)+\nu_{1}^{2}\left(\frac{\partial K}{\partial\nu_{1}}\right)\left(\frac{\partial^{2}K}{\partial\nu_{1}\partial\nu_{2}}\right)$
	$\displaystyle+\frac{1}{2}\nu_{1}^{2}\left(\frac{\partial^{2}K}{\partial\nu_{1}^{2}}\right)\left(\frac{\partial K}{\partial\nu_{2}}\right)+\frac{1}{2}\nu_{1}^{2}\frac{\partial^{3}K}{\partial\nu_{1}^{2}\partial\nu_{2}}$
	$\displaystyle+\frac{1}{2}\nu_{2}^{2}\sigma^{2}\frac{\partial K}{\partial\nu_{2}}+\nu_{1}\nu_{2}\rho\sigma\left(\frac{\partial K}{\partial\nu_{1}}\right)\left(\frac{\partial K}{\partial\nu_{2}}\right)$
	$\displaystyle+\nu_{1}\nu_{2}\rho\sigma\frac{\partial^{2}K}{\partial\nu_{1}\partial\nu_{2}}.$	(22)

A third order cumulant truncation is performed to yield a system of ordinary differential equations for the cumulants. These may be subsequently used to approximate the likelihood of the Heston model. The modified MCMC (described in section 3) may then be run on the S&P and scaled VIX dataset.

The resulting parameter estimates and 95% credibility intervals are shown in Table 3. One may see that there is strong negative correlation between the two Brownian motions driving $S_{t}$ and $V_{t}$ . The parameters $r$ and $\delta$ have wide credibility intervals, suggesting that they are not strongly constrained by the data.

Table 3: Parameter estimates for the S&P and scaled VIX indices over the period December 2009 to November 2010.

Par.	Estimate	95% C.I.
$r$	0.118	(0.008; 0.430)
$\delta$	9.863	(1.410; 13.464)
$\theta$	0.034	(0.020; 0.058)
$\rho$	-0.855	(-0.880; -0.824)
$\sigma$	0.250	(0.234; 0.266)

6 Conclusion

A general statistical framework for estimating the parameters of a diffusion system is proposed. It is assumed that the elements of both the mean vector and the diffusion matrix may be represented by finite order polynomials. Apart from this, the method may be readily applied to multivariate diffusion systems. Many existing parameter estimation methods are only applicable for univariate diffusion processes or, if applicable to multivariate systems, require the diffusion system to be reducible (Ait-Sahalia, 2008; Beskos et al., 2006).

The transitional distribution of a diffusion system is approximated with a saddlepoint. We show how the cumulants of the system may be predicted and subsequently used to approximate the true transitional distribution of the system. The proposed method is compared with the Hermite approximation proposed by Ait-Sahalia (2002). For both the Cox-Ingersoll Ross model and Geometric Brownian motion, the accuracy of the saddlepoint is comparable to the Hermite approximation at the true parameter values. Furthermore, since the Hermite approximation is a small-time expansion, the Hermite approximation rapidly becomes inaccurate as the time interval between successive observations increases.

The saddlepoint approximations may be easily incorporated within an MCMC algorithm. Since, in regions far from the maximum likelihood estimates, the saddlepoint approximation to the transitional density is better behaved than the Hermite approximation, the saddlepoint MCMC chains have better convergence properties. Consequently, the coverage of the credibility intervals for the saddlepoint MCMC is close to those obtained from the true MCMC. In contrast, the credibility intervals for the unmodified Hermite MCMC suffer from undercoverage. This undercoverage is especially severe when the chains are not started at the true parameter values since the chains often become stuck. This is mitigated by the modified algorithm developed by Stramer et al. (2010) which does not require the computation of the posterior’s normalizing constant. However, the modified chains are not only slower than the saddlepoint MCMC, but they take longer to converge, which caused undercoverage for the Geometric Brownian motion example studied in Section 4 since the chains had not converged by the end of the burn-in period.

The comparison between the saddlepoint and Hermite approximations was only performed for univariate diffusion processes and hence caution must be exercised in making claims for multivariate diffusion processes. However, the good coverage observed for the credibility intervals of the saddlepoint MCMC for the non-linear, multivariate example suggests that the procedure still works well for multivariate diffusion processes. Also, for irreducible multivariate processes, the Hermite procedure requires a two-fold approximation, which leads to a further drop in its accuracy (Ait-Sahalia, 2008).

The saddlepoint MCMC is used to fit the Heston model to the S&P 500 and the VIX indices over the period December 2009 to November 2010. Since the Heston model is a multivariate, irreducible diffusion process, the estimation of the model parameters presents a formidable challenge, but the saddlepoint method provides reliable estimates and credibility intervals for the model parameters. As expected, there was significant negative correlation between the two Brownian motions driving the asset prices and the volatility process.

The proposed estimation algorithm has several virtues: it is fast; applicable for a wide range of diffusion systems and gives parameter estimates close to the true values. Furthermore, the corresponding credibility intervals for these estimates have coverage close to their advertised values. This makes the algorithm suitable for estimating the parameters of many diffusion systems.

Acknowledgments

This material is based upon work supported financially by the National Research Foundation of South Africa. The author is grateful to two referees for their constructive comments which have greatly improved the manuscript. Thanks are also due to Drs Trevor Hastie, Leonard Stefanski and Eric Renshaw for helpful discussions. Part of this work was performed whilst hosted by the Department of Mathematics and Statistics at the University of Melbourne.

Appendix: Derivation of equation (8)

The MGF of a diffusion system behaves according to the following partial differential equation:

\frac{\partial M(\Lambda,t)}{\partial t}=\bigg{[}\sum_{i=1}^{m}\upsilon_{i}\mu_{i}(\Lambda^{*},t)+\frac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}\upsilon_{i}\upsilon_{j}\sigma_{ij}(\Lambda^{*},t)\bigg{]}M(\Lambda,t).

(A.1)

Proof: Consider the multivariate diffusion system $\phi_{t}^{*}$ with instantaneous mean vector $\mu(\phi_{t}^{*},t)$ and diffusion matrix $\sigma(\phi_{t}^{*},t)$ . We denote the transition probability by:

Pr\left[\Delta\phi_{t}^{*}=\Delta x|\phi_{t}^{*}\right]=g\left(\Delta x|\phi_{t}^{*}\right).

Let $\Lambda^{T}=(\upsilon_{1},...,\upsilon_{m})$ be the parameters of the MGF and let $\Lambda^{*}=(\partial/\partial\upsilon_{1},...,\partial/\partial\upsilon_{m})$ . The MGF, $M(\Lambda,t)=E\left[\exp[\upsilon_{1}\phi_{1}(t)+...+\upsilon_{m}\phi_{m}(t)]\right]$ obeys the following differential equation (Barbour, 1972):

\frac{\partial M(\Lambda,t)}{\partial t}=\sum_{\Delta x}\left(e^{\Lambda^{T}\Delta x}-1\right)g\left(\Delta x|\Lambda^{*}\right)M(\Lambda,t).

(A.2)

For small $\Delta x$ , we have:

e^{\Lambda^{T}\Delta x}-1=e^{\sum\upsilon_{i}\Delta x_{i}}-1=\sum_{i=1}^{m}\upsilon_{i}\Delta x_{i}+\frac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}\upsilon_{i}\upsilon_{j}\Delta x_{i}\Delta x_{j}.

(A.3)

For diffusion processes the higher order terms are negligible. We thus only require the joint and marginal transitional rates of the diffusion system to characterize the evolution of the MGF. We denote the joint transition probabilities for $\phi_{i}$ and $\phi_{j}$ by:

Pr\left[\Delta\phi_{i}=x_{i},\Delta\phi_{j}=x_{j}|\phi_{t}^{*}\right]=g_{ij}\left(x_{i},x_{j}|\phi_{t}^{*}\right).

The marginal transition rate is denoted by:

Pr\left[\Delta\phi_{i}=x_{i}|\phi_{t}^{*}\right]=g_{i}\left(x_{i}|\phi_{t}^{*}\right).

Note that the instantaneous mean $\mu_{i}(\phi_{t}^{*},t)$ and covariances $\sigma_{i}(\phi_{t}^{*},t)$ can be represented in terms of the joint and marginal transitional rates:

\mu_{i}(\phi_{t}^{*},t)=\lim_{\Delta t,\Delta x_{i}\rightarrow 0}\frac{\Delta x_{i}}{\Delta t}\left(g_{i}\left(\Delta x_{i}|\phi_{t}^{*}\right)-g_{i}\left(-\Delta x_{i}|\phi_{t}^{*}\right)\right).

(A.4)

\sigma_{ij}(\phi_{t}^{*},t)=\lim_{\Delta t,\Delta x_{i},\Delta x_{j}\rightarrow 0}\frac{(\Delta x_{i}\Delta x_{j})}{\Delta t}\bigg{(}g_{ij}\left(\Delta x_{i},\Delta x_{j}|\phi_{t}^{*}\right)\\ +g_{ij}\left(-\Delta x_{i},-\Delta x_{j}|\phi_{t}^{*}\right)-g_{ij}\left(\Delta x_{i},-\Delta x_{j}|\phi_{t}^{*}\right)\\ -g_{ij}\left(-\Delta x_{i},\Delta x_{j}|\phi_{t}^{*}\right)\bigg{)}.

(23)

By substituting (A.3) within (A.2), we obtain:

\frac{\partial M(\Lambda,t)}{\partial t}=\sum_{i=1}^{m}\Delta x_{i}\left[\frac{g_{i}(\Delta x_{i}|\Lambda^{*})}{\Delta t}+\frac{g_{i}(-\Delta x_{i}|\Lambda^{*})}{\Delta t}\right]\upsilon_{i}M\\ +\frac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}\Delta x_{i}\Delta x_{j}\bigg{[}\frac{g_{ij}(\Delta x_{i},\Delta x_{j}|\Lambda^{*})+g_{ij}(-\Delta x_{i},-\Delta x_{j}|\Lambda^{*})}{\Delta t}\\ -\frac{g_{ij}(\Delta x_{i},-\Delta x_{j}|\Lambda^{*})+g_{ij}(-\Delta x_{i},\Delta x_{j}|\Lambda^{*})}{\Delta t}\bigg{]}.

(24)

The expressions for the instantaneous mean and variance may be substituted within (24). This gives us:

\frac{\partial M(\Lambda,t)}{\partial t}=\bigg{[}\sum_{i=1}^{m}\upsilon_{i}\mu_{i}(\Lambda^{*},t)+\frac{1}{2}\sum_{i=1}^{m}\sum_{j=1}^{m}\upsilon_{i}\upsilon_{j}\sigma_{ij}(\Lambda^{*},t)\bigg{]}M(\Lambda,t).

(A.7)

References

Ait-Sahalia (2002) Ait-Sahalia, Y., 2002. Maximum likelihood estimation of discretely sampled diffusions: a closed-form approximation approach. Econometrica 70, 223–262.
Ait-Sahalia (2008) Ait-Sahalia, Y., 2008. Closed-form likelihood expansions for multivariate diffusions. Ann. Stat. 36, 906–937.
Ait-Sahalia and Yu (2006) Ait-Sahalia, Y., Yu, J., 2006. Saddlepoint approximations for continuous-time markov processes. J. Econometrics 134, 507–551.
Barbour (1972) Barbour, A., 1972. The principle of the diffusion of arbitrary constants. J. Appl. Prob. 9, 519–541.
Barndorff-Nielsen and Klüppelberg (1999) Barndorff-Nielsen, O., Klüppelberg, C., 1999. Tail exactness of multivariate saddlepoint approximations. Scand. J. Stat. 26, 253–264.
Beskos et al. (2006) Beskos, A., Papaspiliopoulos, O., Roberts, G., Fearnhead, P., 2006. Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes. J. R. Statist. Soc. B 68, 333–382.
Cox et al. (1985) Cox, J., Ingersoll, J., Ross, S., 1985. A theory of the term structure of interest rates. Econometrica 53, 385–407.
Daniels (1954) Daniels, H., 1954. Saddlepoint approximations in statistics. Ann. Math. Stat. 25, 631–650.
Durham and Gallant (2002) Durham, G., Gallant, A., 2002. Numerical techniques for maximum likelihood estimation of continuous-time diffusion processes (with discussion). J. Bus. Econ. Statist. 20, 297–338.
Gallant and Long (1997) Gallant, A., Long, J., 1997. Estimating stochastic differential equations efficiently by minimum chi-squared. Biometrika 84, 125–141.
Gillespie and Renshaw (2007) Gillespie, C., Renshaw, E., 2007. An improved saddlepoint approximation. Math. Biosci. 208, 359–374.
Huang (2012) Huang, X., 2012. Quasi-maximum likelihood estimation of multivariate diffusions. Studies in Nonlinear Dynamics & Econometrics. 5, 390–455.
Hurn et al. (2007) Hurn, A., Jeisman, J., Lindsay, K., 2007. Seeing the wood for the trees: a critical evaluation of methods to estimate the parameters of stochastic differential equations. J. Financ. Econometrics 5, 390–455.
Kleinhans and Friedrich (2007) Kleinhans, D., Friedrich, R., 2007. Maximum likelihood estimation of drift and diffusion functions. Phys. Lett. A 368, 194–198.
Preston and Wood (2012) Preston, S., Wood, A., 2012. Approximation of transition densities of stochastic differential equations by saddlepoint methods applied to small-time ito-taylor sample-path expansions. Stat. Comput. 22, 205–217.
Ragwitz and Kantz (2001) Ragwitz, M., Kantz, H., 2001. Indispensable finite time corrections for fokker-planck equations from time series data. Phys. Rev. D 87, 25–29.
Roberts and Stramer (2001) Roberts, G., Stramer, O., 2001. On inference for partially observed nonlinear diffusion models using the metropolis-hastings algorithm. Biometrika 88, 603–621.
Stramer et al. (2010) Stramer, O., Bognar, M., Schneider, P., 2010. Bayesian inference for discretely sampled markov processes with closed-form likelihood expansions. Journal of Financial Econometrics 8, 450–480.
Varughese (2009) Varughese, M., 2009. On the accuracy of a diffusion approximation to a discrete state-space markovian model of a population. Theor. Popul. Biol. 76, 241–247.
Varughese (2011) Varughese, M., 2011. A framework for modelling ecological communities and their interactions with the environment. Ecol. Complex. 8, 105–112.
Varughese and Fatti (2008) Varughese, M., Fatti, L., 2008. Incorporating environmental stochasticity within a biological population model. Theor. Popul. Biol.
Wehner and Wolfer (1987) Wehner, M., Wolfer, W., 1987. Numerical evaluation of path-integral solutions to fokker-planck equations. Phys. Rev. A 35, 1795–1801.
Whittle (1957) Whittle, P., 1957. On the use of the normal approximation in the treatment of stochastic processes. J. Royal Stat. Soc. 19, 268–281.
Wojtkiewicz and Bergman (2000) Wojtkiewicz, S., Bergman, L., 2000. Numerical solution of high dimensional fokker planck equations. In: Proceedings of the 8th Specialty Conference on Probabilistic Mechanics and Structural Reliability,. Paper PMC2000-167.
Zhang et al. (2012) Zhang, S., Song, P., Shi, D., Zhou, Q., 2012. Information ratio test for model misspecification on parametric structures in stochastic diffusion models. Comput. Stat. Data. An. 56, 3975–3987.