This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Multilevel rejection sampling for approximate Bayesian computation

David J. Warne1∗, Ruth E. Baker2, Matthew J. Simpson1

1School of Mathematical Sciences, Queensland University of Technology,
Brisbane, Queensland 4001, Australia
2Mathematical Institute, University of Oxford,
Oxford, OX2 6GG, United Kingdom
To whom correspondence should be addressed; E-mail: david.warne@qut.edu.au
(September 17, 2025)
Abstract

Likelihood-free methods, such as approximate Bayesian computation, are powerful tools for practical inference problems with intractable likelihood functions. Markov chain Monte Carlo and sequential Monte Carlo variants of approximate Bayesian computation can be effective techniques for sampling posterior distributions in an approximate Bayesian computation setting. However, without careful consideration of convergence criteria and selection of proposal kernels, such methods can lead to very biased inference or computationally inefficient sampling. In contrast, rejection sampling for approximate Bayesian computation, despite being computationally intensive, results in independent, identically distributed samples from the approximated posterior. An alternative method is proposed for the acceleration of likelihood-free Bayesian inference that applies multilevel Monte Carlo variance reduction techniques directly to rejection sampling. The resulting method retains the accuracy advantages of rejection sampling while significantly improving the computational efficiency.

1 Introduction

Statistical inference is of fundamental importance to all areas of science. Inference enables the testing of theoretical models against observations, and provides a rational means of quantifying uncertainty in existing models. Modern approaches to statistical inference, based on Monte Carlo sampling techniques, provide insight into many complex phenomena (Beaumont et al., 2002; Pooley et al., 2015; Ross et al., 2017; Stumpf, 2014; Sunnåker et al., 2013; Tavaré et al., 1997; Thorne and Stumpf, 2012; Vo et al., 2015).

Suppose we have: a set of observations, 𝒟\mathcal{D}; a method of determining the likelihood of these observations, (𝜽;𝒟)\mathcal{L}(\bm{\theta};\mathcal{D}), under the assumption of some model characterised by parameter vector 𝜽𝚯\bm{\theta}\in\bm{\Theta}; and a prior probability density, p(𝜽)p(\bm{\theta}). The posterior probability density, p(𝜽𝒟)p(\bm{\theta}\mid\nobreak\mathcal{D}), can be computed using Bayes’ Theorem,

p(𝜽𝒟)=(𝜽;𝒟)p(𝜽)𝚯(𝜽;𝒟)p(𝜽)d𝜽.p(\bm{\theta}\mid\nobreak\mathcal{D})=\frac{\mathcal{L}(\bm{\theta};\mathcal{D})p(\bm{\theta})}{\int_{\bm{\Theta}}\mathcal{L}(\bm{\theta};\mathcal{D})p(\bm{\theta})\text{d}\bm{\theta}}. (1)

Explicit expressions for likelihood functions are rarely available (Tavaré et al., 1997; Warne et al., 2017; Wilkinson, 2009); motivating the development of likelihood-free methods, such as approximate Bayesian computation (ABC) (Stumpf, 2014; Sunnåker et al., 2013). ABC methods approximate the likelihood through evaluating the discrepancy between data generated by a simulation of the model of interest and the observations, yielding an approximate posterior,

p(𝜽d(𝒟,𝒟s)<ϵ)(d(𝒟,𝒟s)<ϵ𝜽)p(𝜽).p(\bm{\theta}\mid\nobreak d(\mathcal{D},\mathcal{D}_{s})<\epsilon)\propto\mathbb{P}(d(\mathcal{D},\mathcal{D}_{s})<\epsilon\mid\nobreak\bm{\theta})p(\bm{\theta}). (2)

Here, 𝒟sf(𝒟𝜽)\mathcal{D}_{s}\sim f(\mathcal{D}\mid\nobreak\bm{\theta}) is data generated by the model simulation process, f(𝒟𝜽)f(\mathcal{D}\mid\nobreak\bm{\theta}), dd is a discrepancy metric, and ϵ>0\epsilon>0 is the acceptance threshold. Due to this approximation, Monte Carlo estimators based on Equation (2) are biased (Barber et al., 2015). In spite of this bias, however, ABC methods have proven to be very powerful tools for practical inference applications in many scientific areas, including evolutionary biology (Beaumont et al., 2002; Tavaré et al., 1997; Thorne and Stumpf, 2012), ecology (Stumpf, 2014), cell biology (Ross et al., 2017; Johnston et al., 2014; Vo et al., 2015) and systems biology (Wilkinson, 2009).

1.1 Sampling algorithms for ABC

The most elementary implementation of ABC is ABC rejection sampling (Pritchard et al., 1999; Sunnåker et al., 2013), see Algorithm 1. This method generates NN independent and identically distributed samples 𝜽1,,𝜽N\bm{\theta}^{1},\ldots,\bm{\theta}^{N} from the posterior distribution by accepting proposals, 𝜽p(𝜽)\bm{\theta}^{*}~\sim~p(\bm{\theta}), when the data generated by the model simulation process f(𝒟𝜽)f(\mathcal{D}\mid\nobreak\bm{\theta}^{*}) is within ϵ\epsilon of the observed data, 𝒟\mathcal{D}, under the discrepancy metric d(𝒟,)d(\mathcal{D},\cdot). ABC rejection sampling is simple to implement, and samples are independent and identically distributed. Therefore ABC rejection sampling is widely used in many applications (Browning et al., 2018; Grelaud et al., 2009; Navascués et al., 2017; Ross et al., 2017; Vo et al., 2015). However, ABC rejection sampling can be computationally prohibitive in practice (Barber et al., 2015; Fearnhead and Prangle, 2012). This is especially true when the prior density is highly diffuse compared with the target posterior density (Marin et al., 2012), as most proposals are rejected.

Algorithm 1 ABC rejection sampler
1:for  i=1,,Ni=1,\ldots,N do
2:  repeat
3:   Sample prior, 𝜽p(𝜽)\bm{\theta}^{*}\sim p(\bm{\theta}).
4:   Generate data, 𝒟sf(𝒟𝜽)\mathcal{D}_{s}\sim f(\mathcal{D}\mid\nobreak\bm{\theta}^{*}).
5:  until d(𝒟,𝒟s)ϵd(\mathcal{D},\mathcal{D}_{s})\leq\epsilon
6:  Set 𝜽i𝜽\bm{\theta}^{i}\leftarrow\bm{\theta}^{*}.
7:end for

To improve the efficiency of ABC rejection sampling, one can consider a likelihood-free modification of Markov chain Monte Carlo (MCMC) (Beaumont et al., 2002; Marjoram et al., 2003; Tanaka et al., 2006) in which a Markov chain is constructed with a stationary distribution identical to the desired posterior. Given the Markov chain is in state 𝜽i\bm{\theta}^{i}, a state transition is proposed via a proposal kernel, K(𝜽𝜽i)K(\bm{\theta}\mid\nobreak\bm{\theta}^{i}).

The Metropolis-Hastings (Hastings, 1970; Metropolis et al., 1953) state transition probability, hh, can be modified within an ABC framework to yield

h={min(p(𝜽)K(𝜽i𝜽)p(𝜽i)K(𝜽𝜽i),1)if d(𝒟,𝒟s)ϵ,0otherwise.h=\begin{cases}\min\left(\frac{p(\bm{\theta}^{*})K(\bm{\theta}^{i}\mid\nobreak\bm{\theta}^{*})}{p(\bm{\theta}^{i})K(\bm{\theta}^{*}\mid\nobreak\bm{\theta}^{i})},1\right)&\text{if }d(\mathcal{D},\mathcal{D}_{s})\leq\epsilon,\\ 0&\text{otherwise}.\end{cases}

The stationary distribution of such a Markov chain is the desired approximate posterior (Marjoram et al., 2003). Algorithm 2 provides a method for computing NTN_{T} iterations of this Markov chain.

While MCMC-ABC sampling can be highly efficient, the samples in the sequence, 𝜽1,,𝜽NT\bm{\theta}^{1},\ldots,\bm{\theta}^{N_{T}}, are not independent. This can be problematic as it is possible for the Markov chain to take long excursions into regions of low posterior probability. This incurs additional bias that is potentially significant (Sisson et al., 2007). A poor choice of proposal kernel can also have considerable impact upon the efficiency of MCMC-ABC (Green et al., 2015). The question of how to choose the proposal kernel is non-trivial. Typically proposal kernels are determined heuristically. However, automatic and adaptive schemes are available to assist in obtaining near optimal proposals in some cases (Cabras et al., 2015; Roberts and Rosenthal, 2009). Another additional complication is that of determining when the Markov Chain has converged; this is a challenging problem to solve in practice (Roberts and Rosenthal, 2004).

Algorithm 2 MCMC-ABC
1:Given initial sample 𝜽1p(𝜽d(𝒟,𝒟s)<ϵ)\bm{\theta}^{1}\sim p(\bm{\theta}\mid\nobreak d(\mathcal{D},\mathcal{D}_{s})<\epsilon).
2:for i=2,,NTi=2,\ldots,N_{T} do
3:  Sample transition kernel, 𝜽K(𝜽𝜽i1)\bm{\theta}^{*}\sim K(\bm{\theta}\mid\nobreak\bm{\theta}^{i-1}).
4:  Generate data, 𝒟sf(𝒟𝜽)\mathcal{D}_{s}\sim f(\mathcal{D}\mid\nobreak\bm{\theta}^{*}).
5:  if d(𝒟,𝒟s)ϵd(\mathcal{D},\mathcal{D}_{s})\leq\epsilon then
6:   Set hmin(p(𝜽)K(𝜽i1𝜽)/p(𝜽i1)K(𝜽𝜽i1),1)h\leftarrow\min\left(p(\bm{\theta}^{*})K(\bm{\theta}^{i-1}\mid\nobreak\bm{\theta}^{*})/p(\bm{\theta}^{i-1})K(\bm{\theta}^{*}\mid\nobreak\bm{\theta}^{i-1}),1\right).
7:   Sample uniform distribution, u𝒰(0,1)u\sim\mathcal{U}(0,1).
8:   if uhu\leq h then
9:     Set 𝜽i𝜽\bm{\theta}^{i}\leftarrow\bm{\theta}^{*}.
10:   else
11:     Set 𝜽i𝜽i1\bm{\theta}^{i}\leftarrow\bm{\theta}^{i-1}.
12:   end if
13:  else
14:   Set 𝜽i𝜽i1\bm{\theta}^{i}\leftarrow\bm{\theta}^{i-1}.
15:  end if
16:end for

Sequential Monte Carlo (SMC) sampling was introduced to address these potential inefficiencies (Del Moral et al., 2006) and later extended within an ABC context (Sisson et al., 2007; Drovandi and Pettitt, 2011; Toni et al., 2009). A set of samples, referred to as particles, is evolved through a sequence of ABC posteriors defined through a sequence of TT acceptance thresholds, ϵ1,,ϵT\epsilon_{1},\ldots,\epsilon_{T} (Sisson et al., 2007; Beaumont et al., 2009). At each step, t[0,T]t\in[0,T], the current ABC posterior, p(𝜽d(𝒟,𝒟s)<ϵt)p(\bm{\theta}\mid\nobreak d(\mathcal{D},\mathcal{D}_{s})<\epsilon_{t}), is approximated by a discrete distribution constructed from a set of NPN_{P} particles 𝜽t1,,𝜽tNP\bm{\theta}_{t}^{1},\ldots,\bm{\theta}_{t}^{N_{P}} with importance weights Wt1,,WtNPW_{t}^{1},\ldots,W_{t}^{N_{P}} that is, (𝜽=𝜽ti)=Wti\mathbb{P}(\bm{\theta}=\bm{\theta}_{t}^{i})=W_{t}^{i} for i=1,,NPi=1,\ldots,N_{P}. The collection is updated to represent the ABC posterior of the next step, p(𝜽d(𝒟,𝒟s)<ϵt+1)p(\bm{\theta}\mid\nobreak d(\mathcal{D},\mathcal{D}_{s})<\epsilon_{t+1}), through application of rejection sampling on particles perturbed by a proposal kernel, K(𝜽𝜽i1)K(\bm{\theta}\mid\nobreak\bm{\theta}^{i-1}). If p(𝜽d(𝒟,𝒟s)<ϵt)p(\bm{\theta}\mid\nobreak d(\mathcal{D},\mathcal{D}_{s})<\epsilon_{t}) is similar to p(𝜽d(𝒟,𝒟s)<ϵt+1)p(\bm{\theta}\mid\nobreak d(\mathcal{D},\mathcal{D}_{s})<\epsilon_{t+1}), the acceptance rate should be high. The importance weights of the new family of particles are updated using an optimal backwards kernel (Sisson et al., 2007; Beaumont et al., 2009). Algorithm 3 outlines the process.

Algorithm 3 SMC-ABC
1:Initialise 𝜽1ip(𝜽)\bm{\theta}_{1}^{i}\sim p(\bm{\theta}) and W1i=1/NPW_{1}^{i}=1/N_{P}, for i=1,,NPi=1,\ldots,N_{P}.
2:for t=2,,Tt=2,\ldots,T do
3:  for i=1,,NPi=1,\ldots,N_{P} do
4:   repeat
5:     Set 𝜽𝜽t1j\bm{\theta}^{*}\leftarrow\bm{\theta}_{t-1}^{j} with probability Wt1jW_{t-1}^{j}.
6:     Sample transition kernel, 𝜽K(𝜽𝜽)\bm{\theta}^{**}\sim K(\bm{\theta}\mid\nobreak\bm{\theta}^{*}).
7:     Generate data, 𝒟sf(𝒟𝜽)\mathcal{D}_{s}\sim f(\mathcal{D}\mid\nobreak\bm{\theta}^{**}).
8:   until d(𝒟,𝒟s)ϵtd(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{t}
9:   Set 𝜽ti𝜽\bm{\theta}_{t}^{i}\leftarrow\bm{\theta}^{**}.
10:   Set Wtip(𝜽ti)/j=1NPWt1jK(𝜽ti𝜽t1j)W_{t}^{i}\leftarrow p(\bm{\theta}_{t}^{i})/\sum_{j=1}^{N_{P}}W_{t-1}^{j}K(\bm{\theta}_{t}^{i}\mid\nobreak\bm{\theta}_{t-1}^{j}).
11:  end for
12:  Normalise weights so that i=1NPWti=1\sum_{i=1}^{N_{P}}W_{t}^{i}=1.
13:end for

Through the use of independent weighted particles, SMC-ABC avoids long excursions into the distribution tails that are possible with MCMC-ABC. However, SMC-based methods can be affected by particle degeneracy, and the efficiency of each step is still dependent on the choice of proposal kernel (Green et al., 2015; Filippi et al., 2013). Just as with MCMC-ABC, adaptive schemes are available to guide proposal kernel selection (Beaumont et al., 2009; Del Moral et al., 2012). The choice of the sequence of acceptance thresholds is also important for the efficiency of SMC-ABC. However, there are good solutions to generate these sequences in an adaptive manner (Drovandi and Pettitt, 2011; Silk et al., 2013).

1.2 Multilevel Monte Carlo

Multilevel Monte Carlo (MLMC) is a recent development that can significantly reduce the computational burden in the estimation of expectations (Giles, 2008). To demonstrate the basic idea of MLMC, consider computing the expectation of a continuous-time stochastic process XtX_{t} at time TT. Let ZtτZ_{t}^{\tau} denote a discrete-time approximation to XtX_{t} with time step τ\tau: the expectations of XTX_{T} and ZTτZ_{T}^{\tau} are related according to

𝔼[XT]=𝔼[ZTτ]+𝔼[XTZTτ].\mathbb{E}\left[X_{T}\right]=\mathbb{E}\left[Z_{T}^{\tau}\right]+\mathbb{E}\left[X_{T}-Z_{T}^{\tau}\right].

That is, an estimate of 𝔼[ZTτ]\mathbb{E}\left[Z_{T}^{\tau}\right] can be treated as a biased estimate of 𝔼[XT]\mathbb{E}\left[X_{T}\right]. By taking a sequence of time steps τ1>>τL\tau_{1}>\cdots>\tau_{L}, the indices of which are referred to as levels, we can arrive at a telescoping sum,

𝔼[ZTτL]=𝔼[ZTτ1]+=2L𝔼[ZTτZTτ1].\mathbb{E}\left[Z_{T}^{\tau_{L}}\right]=\mathbb{E}\left[Z_{T}^{\tau_{1}}\right]+\sum_{\ell=2}^{L}\mathbb{E}\left[Z_{T}^{\tau_{\ell}}-Z_{T}^{\tau_{\ell-1}}\right]. (3)

Computing this form of the expectation returns the same bias as that returned when computing 𝔼[ZTτL]\mathbb{E}\left[Z_{T}^{\tau_{L}}\right] directly. However, Giles (2008) demonstrates that a Monte Carlo estimator for the telescoping sum can be computed more efficiently than directly estimating 𝔼[ZTτL]\mathbb{E}\left[Z_{T}^{\tau_{L}}\right] in the context of stochastic differential equations (SDEs). This efficiency comes from exploiting the fact that the bias correction terms, 𝔼[ZTτZTτ1]\mathbb{E}\left[Z_{T}^{\tau_{\ell}}-Z_{T}^{\tau_{\ell-1}}\right], measure the expected difference between the estimates on levels \ell and 1\ell-1. Therefore, sample paths of ZTτ1Z_{T}^{\tau_{\ell-1}} need not be independent of sample paths of ZTτZ_{T}^{\tau_{\ell}}. In the case of SDEs, samples may be generated in pairs driven by the same underlying Brownian motion, that is, the pair is coupled. By the strong convergence properties of numerical schemes for SDEs, Giles (2008) shows that this coupling is sufficient to reduce the variance of the Monte Carlo estimator. This reduction in variance is achieved through optimally trading off statistical error and computational cost across all levels. Through this trade-off, an estimator is obtained with the same accuracy in mean-square to standard Monte Carlo, but at a reduced computational cost. This saving of computational cost is achieved since fewer samples of the most accurate discretisation are required.

1.3 Related work

Recently, several examples of MLMC applications to Bayesian inference problems have appeared in the literature. One of the biggest challenges in the application of MLMC to inverse problems is the introduction of a strong coupling between levels. That is, the construction of a coupling mechanism that reduces the variances of the bias correction terms enough to enable the MLMC estimator to be computed more efficiently than standard Monte Carlo. Dodwell et al. (2015) demonstrate a MLMC scheme for MCMC sampling applicable to high-dimensional Bayesian inverse problems with closed-form likelihood expressions. The coupling of Dodwell et al. (2015) is based on correlating Markov chains defined on a hierarchy in parameter space. A similar approach is also employed by Efendiev et al. (2015). A multilevel method for ensemble particle filtering is proposed by Gregory et al. (2016) that employs an optimal transport problem to correlate a sequence of particle filters of different levels of accuracy. Due to the computational cost of the transport problem, a local approximation scheme is introduced for multivariate parameters (Gregory et al., 2016). Beskos et al. (2017) look more generally at the case of applying MLMC methods when independent and identically distributed sampling of the distributions on some levels is infeasible, the result is an extension of MLMC in which coupling is replaced with sequential importance sampling, that is, a multilevel variant of SMC (MLSMC).

MLMC has also recently been considered in an ABC context. Guha and Tan (2017) extend the work of Efendiev et al. (2015) by replacing the Metropolis-Hasting acceptance probability in a similar way to the MCMC-ABC method (Marjoram et al., 2003). The MLSMC method (Beskos et al., 2017) is exploited to achieve coupling in an ABC context by Jasra et al. (2017).

1.4 Aims and contribution

ABC samplers based on MCMC and SMC are generally more computationally efficient than ABC rejection sampling (Marjoram et al., 2003; Sisson et al., 2007; Toni et al., 2009). However, there are many advantages to using ABC rejection sampling. Specific advantages are: (i) its simple implementation; (ii) it produces truly independent and identically distributed samples, that is, there is no need to re-weight samples; and (iii) there are no algorithm parameters that affect the computational efficiency. In particular, no proposal kernels need be heuristically defined for ABC rejection sampling.

The aim of this work is to design a new algorithm for ABC inference that retains the aforementioned advantages of ABC rejection sampling, while still being efficient and accurate. The aim is not to develop a method that is always superior to MCMC and SMC methods, but rather provide a method that requires little user-defined configuration, and is still computationally reasonable. To this end, we investigate MLMC methods applied directly to ABC rejection sampling.

Various applications of MLMC to ABC inference have been demonstrated very recently in the literature (Guha and Tan, 2017; Jasra et al., 2017), with each implementation contributing different ideas for constructing effective control variates in an ABC context. It is not yet clear when one approach will be superior to another. Thus the question of how best to apply MLMC in an ABC context remains an open problem. Since there are no examples of an application of MLMC methods to the most fundamental of ABC samplers, that is rejection sampling, such an application is a significant contribution to this field.

We describe a new algorithm for ABC rejection sampling, based on MLMC. Our new algorithm, called MLMC-ABC, is as general as standard ABC rejection sampling but is more computationally efficient through the use of variance reduction techniques that employ a novel construction for the coupling problem. We describe the algorithm, its implementation, and validate the method using a tractable Bayesian inverse problem. We also compare the performance of the new method against MCMC-ABC and SMC-ABC using a standard benchmark problem from epidemiology.

Our method benefits from the simplicity of ABC rejection sampling. We require only the discrepancy metric and a sequence of acceptance thresholds to be defined for a given inverse problem. Our approach is also efficient, and achieves comparable or superior performance to MCMC-ABC and SMC-ABC methods, at least for the examples considered in this paper. Therefore, we demonstrate that our algorithm is a promising method that could be extended to design viable alternatives to current state-of-the-art approaches. This work, along with that of Guha and Tan (2017) and Jasra et al. (2017), provides an additional set of computational tools to further enhance the utility of ABC methods in practice.

2 Methods

In this section, we demonstrate our application of MLMC ideas to the likelihood-free inference problem given in Equation (2). The initial aim is to compute an accurate approximation to the joint posterior cumulative distribution function (CDF) of 𝜽\bm{\theta}, using as few data generation steps as possible. We define a data generation step to be a simulation of the model of interest given a proposed parameter vector. While the initial presentation is given in the context of estimating the joint CDF, the method naturally extends to expectations of other functions (see A). However, the joint CDF is considered first for clarity in the introduction of the MLMC coupling strategy.

We apply MLMC methods to likelihood-free Bayesian inference by reposing the problem of computing the posterior CDF as an expectation calculation. This allows the MLMC telescoping sum idea, as in Equation (3), to be applied. In this context, the levels of the MLMC estimator are parameterised by a sequence of LL acceptance thresholds ϵ1,,ϵL\epsilon_{1},\ldots,\epsilon_{L} with ϵ>ϵ+1\epsilon_{\ell}>\epsilon_{\ell+1} for all [1,L)\ell\in[1,L). The efficiency of MLMC relies upon computing the terms of the telescoping sum with low variance. Variance reduction is achieved through exploiting features of the telescoping sum for CDF approximation, and further computational gains are achieved by using properties of nested ABC rejection samplers.

2.1 ABC posterior as an expectation

We first reformulate the Bayesian inference problem (Equation (1)) as an expectation. To this end, note that, given a kk-dimensional parameter space, 𝚯\bm{\Theta}, and the random parameter vector, 𝜽\bm{\theta}, if the joint posterior CDF, F(𝐬)=(θ1s1,,θksk)F(\mathbf{s})~=~\mathbb{P}(\theta_{1}\leq s_{1},\ldots,\theta_{k}\leq s_{k}) is differentiable, that is, its probability density function (PDF) exists, then

(θ1s1,,θksk)\displaystyle\mathbb{P}(\theta_{1}\leq s_{1},\ldots,\theta_{k}\leq s_{k}) =s1skp(𝜽𝒟)dθkdθ1\displaystyle=\int_{-\infty}^{s_{1}}\cdots\int_{-\infty}^{s_{k}}p(\bm{\theta}\mid\nobreak\mathcal{D})\,\text{d}\theta_{k}\ldots\text{d}\theta_{1}
=A𝐬p(𝜽𝒟)dθkdθ1,\displaystyle=\int_{A_{\mathbf{s}}}p(\bm{\theta}\mid\nobreak\mathcal{D})\,\text{d}\theta_{k}\ldots\text{d}\theta_{1},

where A𝐬={𝜽𝚯:θ1s1,,θksk}A_{\mathbf{s}}=\{\bm{\theta}\in\bm{\Theta}:\theta_{1}\leq s_{1},\ldots,\theta_{k}\leq s_{k}\}. This can be expressed as an expectation by noting

A𝐬p(𝜽𝒟)dθkdθ1\displaystyle\int_{A_{\mathbf{s}}}p(\bm{\theta}\mid\nobreak\mathcal{D})\,\text{d}\theta_{k}\ldots\text{d}\theta_{1} =𝚯𝟙A𝐬(𝜽)p(𝜽𝒟)dθkdθ1\displaystyle=\int_{\bm{\Theta}}\mathds{1}_{A_{\mathbf{s}}}(\bm{\theta})p(\bm{\theta}\mid\nobreak\mathcal{D})\,\text{d}\theta_{k}\ldots\text{d}\theta_{1}
=𝔼[𝟙A𝐬(𝜽)𝒟],\displaystyle=\mathbb{E}\left[\mathds{1}_{A_{\mathbf{s}}}(\bm{\theta})\mid\nobreak\mathcal{D}\right],

where 𝟙A𝐬(𝜽)\mathds{1}_{A_{\mathbf{s}}}(\bm{\theta}) is the indicator function: 𝟙A𝐬(𝜽)=1\mathds{1}_{A_{\mathbf{s}}}(\bm{\theta})=1 whenever 𝜽A𝐬\bm{\theta}\in A_{\mathbf{s}} and 𝟙A𝐬(𝜽)=0\mathds{1}_{A_{\mathbf{s}}}(\bm{\theta})=0 otherwise. Now, consider the ABC approximation given in Equation (2) with discrepancy metric d(𝒟,𝒟s)d(\mathcal{D},\mathcal{D}_{s}) and acceptance threshold ϵ\epsilon. The ABC posterior CDF, denoted by Fϵ(𝐬)F_{\epsilon}(\mathbf{s}), will be

Fϵ(𝐬)\displaystyle F_{\epsilon}(\mathbf{s}) =A𝐬p(𝜽d(𝒟,𝒟s)<ϵ)dθkdθ1\displaystyle=\int_{A_{\mathbf{s}}}p(\bm{\theta}\mid\nobreak d(\mathcal{D},\mathcal{D}_{s})<\epsilon)\,\text{d}\theta_{k}\ldots\text{d}\theta_{1}
=𝔼[𝟙A𝐬(𝜽)d(𝒟,𝒟s)<ϵ].\displaystyle=\mathbb{E}\left[\mathds{1}_{A_{\mathbf{s}}}(\bm{\theta})\mid\nobreak d(\mathcal{D},\mathcal{D}_{s})<\epsilon\right]. (4)

The marginal ABC posterior CDFs, Fϵ,1(s),,Fϵ,k(s)F_{\epsilon,1}(s),\ldots,F_{\epsilon,k}(s), are

Fϵ,j(s)=limsijFϵ(𝐬).F_{\epsilon,j}(s)=\lim_{s_{i\neq j}\rightarrow\infty}F_{\epsilon}(\mathbf{s}).

2.2 Multilevel estimator formulation

We now introduce some notation that will simplify further derivations. We define 𝜽ϵ\bm{\theta}_{\epsilon} to be a random vector distributed according to the ABC posterior CDF, Fϵ(𝐬)F_{\epsilon}(\mathbf{s}), with acceptance threshold, ϵ\epsilon, as given in Equation (2.1). This provides us with simplification in notation for the ABC posterior PDF, p(𝜽ϵ)=p(𝜽d(𝒟,𝒟s)<ϵ)p(\bm{\theta}_{\epsilon})=p(\bm{\theta}\mid\nobreak d(\mathcal{D},\mathcal{D}_{s})<\epsilon), and the conditional expectation, 𝔼[𝟙A𝐬(𝜽ϵ)]=𝔼[𝟙A𝐬(𝜽)d(𝒟,𝒟s)<ϵ]\mathbb{E}\left[\mathds{1}_{A_{\mathbf{s}}}(\bm{\theta}_{\epsilon})\right]=\mathbb{E}\left[\mathds{1}_{A_{\mathbf{s}}}(\bm{\theta})\mid\nobreak d(\mathcal{D},\mathcal{D}_{s})<\epsilon\right]. For any expectation, PP, we use P^N\hat{P}^{N} to denote the Monte Carlo estimate of this expectation using NN samples.

The standard Monte Carlo integration approach is to generate NN samples 𝜽ϵ1,,𝜽ϵN\bm{\theta}_{\epsilon}^{1},\ldots,\bm{\theta}_{\epsilon}^{N} from the ABC posterior, p(𝜽ϵ)p(\bm{\theta}_{\epsilon}), then evaluate the empirical CDF (eCDF),

F^ϵN(𝐬)=1Ni=1N𝟙A𝐬(𝜽ϵi),\hat{F}^{N}_{\epsilon}(\mathbf{s})=\frac{1}{N}\sum_{i=1}^{N}\mathds{1}_{A_{\mathbf{s}}}(\bm{\theta}_{\epsilon}^{i}), (5)

for 𝐬𝒮\mathbf{s}\in\mathcal{S}, where 𝒮\mathcal{S} is a discretisation of the parameter space 𝚯\bm{\Theta}. For simplicity, we will consider 𝒮\mathcal{S} to be a kk-dimensional regular lattice. In general, a regular lattice will not be well suited for high dimensional problems. However, since this is the first time that this MLMC approach has been presented, it is most natural to begin the exposition with a regular lattice, and then discuss other more computationally efficient approaches later. More attention is given to other possibilities in Section 4.

The eCDF is not, however, the only Monte Carlo approximation to the CDF one may consider. In particular, Giles et al. (2015) demonstrate the application of MLMC to a univariate CDF approximation. We now present a multivariate equivalent of the MLMC CDF of Giles et al. (2015) in the context of ABC posterior CDF estimation. Consider a sequence of LL acceptance thresholds, {ϵ}=1=L\{\epsilon_{\ell}\}_{\ell=1}^{\ell=L}, that is strictly decreasing, that is, ϵ>ϵ+1\epsilon_{\ell}>\epsilon_{\ell+1}. In this work, the problem of constructing optimal sequences is not considered, as the focus is the initial development of the method. More discussion around this problem is given in Section 4. Given such a sequence, {ϵ}=1=L\{\epsilon_{\ell}\}_{\ell=1}^{\ell=L}, we can represent the CDF (Equation (2.1)) using the telescoping sum

FϵL(𝐬)=𝔼[𝟙A𝐬(𝜽ϵL)]==1LY(𝐬),F_{\epsilon_{L}}(\mathbf{s})=\mathbb{E}\left[\mathds{1}_{A_{\mathbf{s}}}(\bm{\theta}_{\epsilon_{L}})\right]=\sum_{\ell=1}^{L}Y_{\ell}(\mathbf{s}), (6)

where

Y(𝐬)={𝔼[𝟙A𝐬(𝜽ϵ1)]if =1,𝔼[𝟙A𝐬(𝜽ϵ)𝟙A𝐬(𝜽ϵ1)]if >1.Y_{\ell}(\mathbf{s})=\begin{cases}\mathbb{E}\left[\mathds{1}_{A_{\mathbf{s}}}(\bm{\theta}_{\epsilon_{1}})\right]&\text{if }\ell=1,\\ \mathbb{E}\left[\mathds{1}_{A_{\mathbf{s}}}(\bm{\theta}_{\epsilon_{\ell}})-\mathds{1}_{A_{\mathbf{s}}}(\bm{\theta}_{\epsilon_{\ell-1}})\right]&\text{if }\ell>1.\end{cases} (7)

Using our notation, the MLMC estimator for Equation (6) and Equation (7) is given by

F^ϵLN1,,NL(𝐬)==1LY^N(𝐬),\displaystyle\hat{F}^{N_{1},\ldots,N_{L}}_{\epsilon_{L}}(\mathbf{s})=\sum_{\ell=1}^{L}\hat{Y}^{N_{\ell}}_{\ell}(\mathbf{s}), (8)

where

Y^N(𝐬)={1N1i=1N1g𝐬(𝜽ϵ1i)if =1,1Ni=1N[g𝐬(𝜽ϵi)g𝐬(𝜽ϵ1i)]if >1,\hat{Y}^{N_{\ell}}_{\ell}(\mathbf{s})=\begin{cases}\frac{1}{N_{1}}\sum_{i=1}^{N_{1}}g_{\mathbf{s}}(\bm{\theta}_{\epsilon_{1}}^{i})&\text{if }\ell=1,\\ \frac{1}{N_{\ell}}\sum_{i=1}^{N_{\ell}}\left[g_{\mathbf{s}}(\bm{\theta}_{\epsilon_{\ell}}^{i})-g_{\mathbf{s}}(\bm{\theta}_{\epsilon_{\ell-1}}^{i})\right]&\text{if }\ell>1,\end{cases} (9)

and g𝐬(𝜽)g_{\mathbf{s}}(\bm{\theta}) is a Lipschitz continuous approximation to the indicator function; this approximation is computed using a tensor product of cubic polynomials,

g𝐬(𝜽)=j=1kξ(sjθjδj),g_{\mathbf{s}}(\bm{\theta})=\prod_{j=1}^{k}\xi\left(\frac{s_{j}-\theta_{j}}{\delta_{j}}\right),

where δj\delta_{j} is the lattice spacing in the jjth dimension, and ξ(x)\xi(x) is a piece-wise continuous polynomial,

ξ(x)={1x1,58x398x+121<x<1,0x1.\xi(x)=\begin{cases}1&x\leq-1,\\ \frac{5}{8}x^{3}-\frac{9}{8}x+\frac{1}{2}&-1<x<1,\\ 0&x\geq 1.\end{cases}

This expression is based on the rigorous treatment of smoothing required in the univariate case given by Giles et al. (2015). While other polynomials that satisfy certain conditions are possible (Giles et al., 2015; Reiss, 1981), here we restrict ourselves to this relatively simple form. Application of the smoothing function improves the quality of the final CDF estimate, just as using smoothing kernels improves the quality of PDF estimators (Silverman, 1986). Such a smoothing is also necessary to avoid convergence issues with MLMC caused by the discontinuity of the indicator function (Avikainen, 2009; Giles et al., 2015).

To compute the Y^1N1(𝐬)\hat{Y}^{N_{1}}_{1}(\mathbf{s}) term (Equation (9)), we generate N1N_{1} samples 𝜽ϵ11,,𝜽ϵ1N1\bm{\theta}_{\epsilon_{1}}^{1},\ldots,\bm{\theta}_{\epsilon_{1}}^{N_{1}} from p(𝜽ϵ1)p(\bm{\theta}_{\epsilon_{1}}); this represents a biased estimate for FϵL(𝐬)F_{\epsilon_{L}}(\mathbf{s}). To compensate for this bias, correction terms, Y^N(𝐬)\hat{Y}^{N_{\ell}}_{\ell}(\mathbf{s}), are evaluated for >2\ell>2, each requiring the generation of NN_{\ell} samples 𝜽ϵ1,,𝜽ϵN\bm{\theta}_{\epsilon_{\ell}}^{1},\ldots,\bm{\theta}_{\epsilon_{\ell}}^{N_{\ell}} from p(𝜽ϵ)p(\bm{\theta}_{\epsilon_{\ell}}) and NN_{\ell} samples 𝜽ϵ11,,𝜽ϵ1N\bm{\theta}_{\epsilon_{\ell-1}}^{1},\ldots,\bm{\theta}_{\epsilon_{\ell-1}}^{N_{\ell}} from p(𝜽ϵ1)p(\bm{\theta}_{\epsilon_{\ell-1}}), as given in Equation (9). It is important to note that the samples, 𝜽ϵ11,,𝜽ϵ1N\bm{\theta}_{\epsilon_{\ell-1}}^{1},\ldots,\bm{\theta}_{\epsilon_{\ell-1}}^{N_{\ell}}, used to compute Y^N(𝐬)\hat{Y}^{N_{\ell}}_{\ell}(\mathbf{s}) are independent of the samples, 𝜽ϵ11,,𝜽ϵ1N1\bm{\theta}_{\epsilon_{\ell-1}}^{1},\ldots,\bm{\theta}_{\epsilon_{\ell-1}}^{N_{\ell-1}}, used to compute Y^1N1(𝐬)\hat{Y}^{N_{\ell-1}}_{\ell-1}(\mathbf{s}).

2.3 Variance reduction

The goal is to introduce a coupling between levels that controls the variance of the bias correction terms. With an effective coupling, the result is an estimator with lower variance, hence the number of samples required to obtain an accurate estimate is reduced. Denote vv_{\ell} as the variance of the estimator Y^N(𝐬)\hat{Y}^{N_{\ell}}_{\ell}(\mathbf{s}). For 2\ell\geq 2 this can be expressed as

v=\displaystyle v_{\ell}= 𝕍[g𝐬(𝜽ϵ)g𝐬(𝜽ϵ1)]\displaystyle\,\mathbb{V}\left[g_{\mathbf{s}}(\bm{\theta}_{\epsilon_{\ell}})-g_{\mathbf{s}}(\bm{\theta}_{\epsilon_{\ell-1}})\right]
=\displaystyle= 𝕍[g𝐬(𝜽ϵ)]+𝕍[g𝐬(𝜽ϵ1)]2[g𝐬(𝜽ϵ),g𝐬(𝜽ϵ1)],\displaystyle\,\mathbb{V}\left[g_{\mathbf{s}}(\bm{\theta}_{\epsilon_{\ell}})\right]+\mathbb{V}\left[g_{\mathbf{s}}(\bm{\theta}_{\epsilon_{\ell-1}})\right]-2\cdot\mathbb{C}\left[g_{\mathbf{s}}(\bm{\theta}_{\epsilon_{\ell}}),g_{\mathbf{s}}(\bm{\theta}_{\epsilon_{\ell-1}})\right],

where 𝕍[]\mathbb{V}\left[\cdot\right] and [,]\mathbb{C}\left[\cdot,\cdot\right] denote the variance and covariance, respectively. Introducing a positive correlation between the random variables 𝜽ϵ\bm{\theta}_{\epsilon_{\ell}} and 𝜽ϵ1\bm{\theta}_{\epsilon_{\ell-1}} will have the desired effect of reducing the variance of Y^N(𝐬)\hat{Y}^{N_{\ell}}_{\ell}(\mathbf{s}).

In many applications of MLMC, a positive correlation is introduced through driving samplers at both the \ell and 1\ell-1 level with the same randomness. Properties of Brownian motion or Poisson processes are typically used for the estimation of expectations involving SDEs or Markov processes (Giles, 2008; Anderson and Higham, 2012; Lester et al., 2016). In the context of ABC methods, however, simulation of the quantity of interest is necessarily based on rejection sampling. The reliance on rejection sampling makes a strong coupling, in the true sense of MLMC, a difficult, if not impossible task. Rather, here we introduce a weaker form of coupling through exploiting the fact that our MLMC estimator is performing the task of computing an estimate of the ABC posterior CDF. We combine this with a property of nested ABC rejection samplers to arrive at an efficient algorithm for computing F^ϵL(𝐬)\hat{F}_{\epsilon_{L}}(\mathbf{s}).

We proceed to establish a correlation between levels as follows. Assume we have computed, for some <L\ell<L, the terms Y^1N1(𝐬),,Y^N(𝐬)\hat{Y}^{N_{1}}_{1}(\mathbf{s}),\ldots,\hat{Y}^{N_{\ell}}_{\ell}(\mathbf{s}) in Equation (8). That is, we have an estimator to the CDF at level \ell by taking the sum

F^ϵN1,,N(𝐬)=m=1Y^mNm(𝐬),\hat{F}^{N_{1},\ldots,N_{\ell}}_{\epsilon_{\ell}}(\mathbf{s})~=~\sum_{m=1}^{\ell}\hat{Y}^{N_{m}}_{m}(\mathbf{s}),

with marginal distributions F^ϵ,jN1,,N(sj)\hat{F}^{N_{1},\ldots,N_{\ell}}_{\epsilon_{\ell},j}(s_{j}) for j=1,,kj=1,\ldots,k. We can use this to determine a coupling based on matching marginal probabilities when computing Y^+1(𝐬)\hat{Y}_{\ell+1}(\mathbf{s}). After generating N+1N_{\ell+1} samples 𝜽ϵ+11,,𝜽ϵ+1N+1\bm{\theta}_{\epsilon_{\ell+1}}^{1},\ldots,\bm{\theta}_{\epsilon_{\ell+1}}^{N_{\ell+1}} from p(𝜽ϵ+1)p(\bm{\theta}_{\epsilon_{\ell+1}}), we compute the eCDF, F^ϵ+1N+1(𝐬)\hat{F}^{N_{\ell+1}}_{\epsilon_{\ell+1}}(\mathbf{s}), using Equation (5) and obtain the marginal distributions, F^ϵ+1,jN+1(sj)\hat{F}^{N_{\ell+1}}_{\epsilon_{\ell+1},j}(s_{j}) for j=1,,kj=1,\ldots,k. We can thus generate N+1N_{\ell+1} coupled pairs {𝜽ϵ+1i,𝜽ϵi}\{\bm{\theta}^{i}_{\epsilon_{\ell+1}},\bm{\theta}^{i}_{\epsilon_{\ell}}\} by choosing the 𝜽ϵi\bm{\theta}^{i}_{\epsilon_{\ell}} with the same marginal probabilities as the empirical probability of 𝜽ϵ+1i\bm{\theta}^{i}_{\epsilon_{\ell+1}}. That is, the jjth component of 𝜽ϵi\bm{\theta}^{i}_{\epsilon_{\ell}} is given by

θϵ,ji=G^ϵN1,,N(F^ϵ+1,jN+1(θϵ+1,ji)),\theta^{i}_{\epsilon_{\ell},j}=\hat{G}^{N_{1},\ldots,N_{\ell}}_{\epsilon_{\ell}}\left(\hat{F}^{N_{\ell+1}}_{\epsilon_{\ell+1},j}(\theta^{i}_{\epsilon_{\ell+1},j})\right),

where θϵ+1i\theta^{i}_{\epsilon_{\ell+1}} is the jjth component of 𝜽ϵ+1i\bm{\theta}^{i}_{\epsilon_{\ell+1}} and G^ϵ,jN1,,N(s)\hat{G}^{N_{1},\ldots,N_{\ell}}_{\epsilon_{\ell},j}(s) is the inverse of the jjth marginal distribution of F^ϵN1,,N(𝐬)\hat{F}^{N_{1},\ldots,N_{\ell}}_{\epsilon_{\ell}}(\mathbf{s}). This introduces a positive correlation between the sample pairs, {𝜽ϵ+1i,𝜽ϵi}\{\bm{\theta}_{\epsilon_{\ell+1}}^{i},\bm{\theta}_{\epsilon_{\ell}}^{i}\}, since an increase in any of the components of 𝜽ϵ+1i\bm{\theta}_{\epsilon_{\ell+1}}^{i} will cause an increase in the same component 𝜽ϵi\bm{\theta}_{\epsilon_{\ell}}^{i}. This correlation reduces the variance in the bias correction estimator Y^ϵ+1N+1(𝐬)\hat{Y}^{N_{\ell+1}}_{\epsilon_{\ell+1}}(\mathbf{s}) computed according to Equation (9). We can then update the MLMC CDF to get an improved estimator by using

F^ϵ+1N1,,N+1(𝐬)=F^ϵN1,,N(𝐬)+Y^ϵ+1N+1(𝐬),\hat{F}^{N_{1},\ldots,N_{\ell+1}}_{\epsilon_{\ell+1}}(\mathbf{s})=\hat{F}^{N_{1},\ldots,N_{\ell}}_{\epsilon_{\ell}}(\mathbf{s})+\hat{Y}^{N_{\ell+1}}_{\epsilon_{\ell+1}}(\mathbf{s}),

and apply an adjustment that ensures monotonicity. We continue this process iteratively to obtain F^ϵLN1,,NL(𝐬)\hat{F}^{N_{1},\dots,N_{L}}_{\epsilon_{L}}(\mathbf{s}).

It must be noted here that this coupling mechanism introduces an approximation for the general inference problem; therefore some additional bias can be introduced. This is made clear when one considers the process in terms of the copula distributions of 𝜽ϵ+1\bm{\theta}_{\epsilon_{\ell+1}} and 𝜽ϵ\bm{\theta}_{\epsilon_{\ell}}. If these copula distributions are the same, then the coupling is exact and there is no additional bias. The coupling is also exact for the univariate case (k=1k=1). Therefore, under the assumption that the correlation structure does not change significantly between levels, then the bias should be low. In practice, this requirement affects the choice of the acceptance threshold sequence, ϵ1,,ϵL\epsilon_{1},\ldots,\epsilon_{L}; we discuss this in more detail in Section 4. In Section 3.1, we demonstrate that for sensible choices of this sequence, the introduced bias is small compared with the bias that is inherent in the ABC approximation.

2.4 Optimal sample sizes

We now require the sample numbers N1,,NLN_{1},\ldots,N_{L} that are the optimal trade-off between accuracy and efficiency. Denote dd_{\ell} as the number of data generation steps required during the computation of Y^N(𝐬)\hat{Y}^{N_{\ell}}_{\ell}(\mathbf{s}) and let c=d/Nc_{\ell}=d_{\ell}/N_{\ell} be the average number of data generation steps per accepted ABC posterior sample using acceptance threshold ϵ\epsilon_{\ell}.

Given v1=𝕍[g𝐬(𝜽ϵ1)]v_{1}~=~\mathbb{V}\left[g_{\mathbf{s}}(\bm{\theta}_{\epsilon_{1}})\right] and v=𝕍[g𝐬(𝜽ϵ)g𝐬(𝜽ϵ1)]v_{\ell}~=~\mathbb{V}\left[g_{\mathbf{s}}(\bm{\theta}_{\epsilon_{\ell}})-g_{\mathbf{s}}(\bm{\theta}_{\epsilon_{\ell-1})}\right], for >1\ell~>~1, one can construct the optimal NN_{\ell} under the constraint 𝕍[F^ϵLN1,,NL(𝐬)]=𝒪(h2)\mathbb{V}\left[\hat{F}^{N_{1},\ldots,N_{L}}_{\epsilon_{L}}(\mathbf{s})\right]=\mathcal{O}(h^{2}), where h2h^{2} is the target variance of the MLMC CDF estimator. As shown by Giles (2008), using a Lagrange multiplier method, the optimal N1,,NLN_{1},\ldots,N_{L} are given by

N=𝒪(h2)vcm=1Lvmcm,=1,,L.N_{\ell}=\mathcal{O}\left(h^{-2}\right)\sqrt{\frac{v_{\ell}}{c_{\ell}}}\sum_{m=1}^{L}{\sqrt{v_{m}c_{m}}},\quad\ell=1,\ldots,L. (10)

In practice, the values for v1,,vLv_{1},\ldots,v_{L} and c1,,cLc_{1},\ldots,c_{L} will not have analytic expressions available; rather, we perform a low accuracy trial simulation with all N1==NL=cN_{1}=\cdots=N_{L}=c, for some comparatively small constant, cc, to obtain the relative scaling of variances and data generation requirements.

2.5 Improving acceptance rates

A MLMC method based on the estimator in Equation (8) and the variance reduction strategy given in Section 2.3 would depend on standard ABC rejection sampling (Algorithm 1) for the final bias correction term Y^ϵLNL\hat{Y}^{N_{L}}_{\epsilon_{L}}. For many ABC applications of interest, the computation of this final term will dominate the computational costs. Therefore, the potential computational gains depend entirely on the size of NLN_{L} compared to the number of samples, NN, required for the equivalent standard Monte Carlo approach (Equation (5)). While this approach is often an improvement over rejection sampling, we can achieve further computational gains by exploiting the iterative computation of the bias correction terms.

Let supp(f(x))\operatorname*{supp}(f(x)) denote the support of a function f(x)f(x), and note that, for any [2,L]\ell\in[2,L], supp(p(𝜽ϵ))supp(p(𝜽ϵ1))\operatorname*{supp}(p(\bm{\theta}_{\epsilon_{\ell}}))\subseteq\operatorname*{supp}(p(\bm{\theta}_{\epsilon_{\ell-1}})). This follows from the fact that if, for any 𝜽\bm{\theta}, (d(𝒟,𝒟s)<ϵ𝜽)>0\mathbb{P}(d(\mathcal{D},\mathcal{D}_{s})<\epsilon_{\ell}\mid\nobreak\bm{\theta})>0, then (d(𝒟,𝒟s)<ϵ1𝜽)>0\mathbb{P}(d(\mathcal{D},\mathcal{D}_{s})<\epsilon_{\ell-1}\mid\nobreak\bm{\theta})>0 since ϵ<ϵ1\epsilon_{\ell}<\epsilon_{\ell-1}. That is, any simulated data generated using parameter values taken from outside supp(p(𝜽ϵ1))\operatorname*{supp}(p(\bm{\theta}_{\epsilon_{\ell-1}})) cannot be accepted on level \ell since d(𝒟,𝒟s)>ϵ1d(\mathcal{D},\mathcal{D}_{s})>\epsilon_{\ell-1} almost surely. Therefore, we can truncate the prior to the support of p(𝜽ϵ1)p(\bm{\theta}_{\epsilon_{\ell-1}}) when computing Y^N(𝐬)\hat{Y}^{N_{\ell}}_{\ell}(\mathbf{s}), thus increasing the acceptance rate of level \ell samples. In practice, we need to approximate the support regions through sampling. For simplicity, in this work we restrict sampling of the prior at level \ell to within the bounding box that contains all the samples generated at level 1\ell-1. However, more sophisticated approaches could be considered, and may result in further computational improvements.

2.6 The algorithm

We now have all the components to construct our MLMC-ABC algorithm. We compute the MLMC CDF estimator (Equation (8)) using the coupling technique for the bias correction terms (Section 2.3) and prior truncation for improved acceptance rates (Section 2.5).

Optimal N1,,NLN_{1},\ldots,N_{L} are estimated as per Equation (10) and Section 2.4. Once N1,,NLN_{1},\ldots,N_{L} have been estimated, computation of the MLMC-ABC posterior CDF F^ϵL(𝐬)\hat{F}_{\epsilon_{L}}(\mathbf{s}) proceeds according to Algorithm 4.

Algorithm 4 MLMC-ABC
1:Initialise ϵ1,,ϵL\epsilon_{1},\ldots,\epsilon_{L}, N1,,NLN_{1},\dots,N_{L} and prior p(𝜽)p(\bm{\theta}).
2:Set p(𝜽ϵ0)p(𝜽)p(\bm{\theta}_{\epsilon_{0}})\leftarrow p(\bm{\theta}).
3:for =1,,L\ell=1,\ldots,L do
4:  for i=1,,Ni=1,\ldots,N_{\ell} do
5:   repeat
6:     Sample 𝜽p(𝜽)\bm{\theta}^{*}\sim p(\bm{\theta}) restricted to supp(p(𝜽ϵ1))\operatorname*{supp}(p(\bm{\theta}_{\epsilon_{\ell-1}})).
7:     Generate data, 𝒟sf(𝒟𝜽)\mathcal{D}_{s}\sim f(\mathcal{D}\mid\nobreak\bm{\theta}^{*}).
8:   until d(𝒟,𝒟s)ϵd(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{\ell}.
9:   Set 𝜽ϵi𝜽\bm{\theta}_{\epsilon_{\ell}}^{i}\leftarrow\bm{\theta}^{*}.
10:  end for
11:  for 𝐬𝒮\mathbf{s}\in\mathcal{S} do
12:   Set F^ϵN(𝐬)i=1Ng𝐬(𝜽ϵi)/N\hat{F}_{\epsilon_{\ell}}^{N_{\ell}}(\mathbf{s})\leftarrow\sum_{i=1}^{N_{\ell}}g_{\mathbf{s}}(\bm{\theta}_{\epsilon_{\ell}}^{i})/N_{\ell}.
13:  end for
14:  if >1\ell>1 then
15:   for i=1,,Ni=1,\ldots,N_{\ell} do
16:     for j=1,,kj=1,\ldots,k do
17:      Set θϵ1,jiG^ϵ1,jN1,,N1(F^ϵ,jN(θϵ,ji))\theta_{\epsilon_{\ell-1},j}^{i}\leftarrow\hat{G}^{N_{1},\dots,N_{\ell-1}}_{\epsilon_{\ell-1},j}\left(\hat{F}_{\epsilon_{\ell},j}^{N_{\ell}}\left(\theta_{\epsilon_{\ell},j}^{i}\right)\right).
18:     end for
19:   end for
20:   for 𝐬𝒮\mathbf{s}\in\mathcal{S} do
21:     Set Y^ϵN(𝐬)i=1N[g𝐬(𝜽ϵi)g𝐬(𝜽ϵ1i)]/N\hat{Y}^{N_{\ell}}_{\epsilon_{\ell}}(\mathbf{s})\leftarrow\sum_{i=1}^{N_{\ell}}\left[g_{\mathbf{s}}(\bm{\theta}_{\epsilon_{\ell}}^{i})-g_{\mathbf{s}}(\bm{\theta}_{\epsilon_{\ell-1}}^{i})\right]/N_{\ell}.
22:     Set F^ϵN1,,N(𝐬)F^ϵ1N1,,N1(𝐬)+Y^ϵN(𝐬)\hat{F}^{N_{1},\ldots,N_{\ell}}_{\epsilon_{\ell}}(\mathbf{s})\leftarrow\hat{F}^{N_{1},\ldots,N_{\ell-1}}_{\epsilon_{\ell-1}}(\mathbf{s})+\hat{Y}^{N_{\ell}}_{\epsilon_{\ell}}(\mathbf{s}).
23:   end for
24:  end if
25:end for

The computational complexity of MLMC-ABC (Algorithm 4) is roughly 𝒪(cS+NL(cL+cG))\mathcal{O}(c_{S}+N_{L}(c_{L}+c_{G})) where cLc_{L} is the expected cost of generating a single sample of the ABC posterior with threshold ϵL\epsilon_{L}, cSc_{S} is the cost of updating the CDF estimate and cGc_{G} is the cost of the coupling which involves the marginal CDF inverses. From Algorithm 4, we have cS=𝒪(N1|𝒮|)c_{S}=\mathcal{O}(N_{1}|\mathcal{S}|) and from Barber et al. (2015) we have cL=𝒪(ϵn)c_{L}=\mathcal{O}(\epsilon^{-n}) where nn is the dimensionality of 𝒟\mathcal{D}. The marginal inverse operations require only two steps:

  1. 1.

    find 𝐬𝒮\mathbf{s}\in\mathcal{S} such that, sjθϵ,ji<sj+δjs_{j}\leq\theta_{\epsilon_{\ell},j}^{i}<s_{j}+\delta_{j}. Such an operation is, at most, 𝒪(logk|𝒮|)\mathcal{O}(\log_{k}|\mathcal{S}|);

  2. 2.

    invert the interpolating cubic spline, which can be done in 𝒪(1)\mathcal{O}(1).

It follows that cG=𝒪(logk|𝒮|)c_{G}=\mathcal{O}(\log_{k}|\mathcal{S}|). For any practical application, ϵL\epsilon_{L} will need to be sufficiently small, that is, (N1|𝒮|/NLlogk|𝒮|)cL(N_{1}|\mathcal{S}|/N_{L}-\log_{k}|\mathcal{S}|)\ll c_{L}, in order for the cost of generating posterior samples at level LL dominates the cost of the marginal CDF inverse operations and the lattice updating. Computational gains over ABC rejection sampling are achieved through decreasing NLN_{L} and cLc_{L}, via variance reduction and prior truncation.

Our primary focus in Algorithm 4 is on using MLMC to estimate the posterior CDF. The coupling mechanism is more readily communicated in this case. However, the MLMC-ABC method is more general and can be used to estimate 𝔼[U(𝜽ϵ)]\mathbb{E}\left[U(\bm{\theta}_{\epsilon_{\ell}})\right] where U(𝜽ϵ)U(\bm{\theta}_{\epsilon_{\ell}}) is any Lipschitz continuous function. In this more general case, only the marginal CDFs need be accumulated to facilitate the coupling mechanism. For more details see A.

3 Results

In this section, we provide numerical results to demonstrate the validity, accuracy and performance of our MLMC-ABC method using some common models from epidemiology. In the first instance, we consider a tractable compartmental model, the stochastic SIS (Susceptible-Infected-Susceptible) model (Weiss and Dishon, 1971), to confirm the convergence and accuracy of MLMC-ABC. We then consider the Tuberculosis transmission model introduced by Tanaka et al. (2006) as a benchmark to compare our method with MCMC-ABC (Algorithm 2) and SMC-ABC (Algorithm 3). While we have chosen to perform our evaluation using an epidemiological model due to the particular prevalence of ABC methods in the life sciences, the techniques outlined in this manuscript are completely general and applicable to many areas of science.

3.1 A tractable example

The SIS model is a common model from epidemiology that describes the spread of a disease or infection for which no significant immunity is obtained after recovery; the common cold, for example. The model is given by

S+I𝛽2I,\displaystyle S+I\overset{\beta}{\rightarrow}2I,
I𝛾S,\displaystyle I\overset{\gamma}{\rightarrow}S,

with parameters 𝜽={β,γ}\bm{\theta}=\{\beta,\gamma\} and hazard functions for infection and recovery given by

hI(S,I)=βSI and hR(S,I)=γI,h_{I}(S,I)=\beta SI\text{ and }h_{R}(S,I)=\gamma I,

respectively. This process defines a discrete-state continuous-time Markov process with a forward transitional density function that is computationally feasible to evaluate exactly for small populations Npop=S+IN_{pop}=S+I. That is, for t>st>s, the probability of S(t)=xS(t)=x given S(s)=yS(s)=y, with density denoted by p(x,ty,s;β,γ)p(x,t\mid\nobreak y,s;\beta,\gamma), has a solution obtained through the xxth element of the vector

P(y,β,γ)=exp(Q(β,γ)(ts))𝐲,P(y,\beta,\gamma)=\exp(Q(\beta,\gamma)(t-s))\mathbf{y}, (11)

where the yyth element of column vector, 𝐲\mathbf{y}, is one and all other elements zero, exp()\exp(\cdot) denotes the matrix exponential and Q(β,γ)Q(\beta,\gamma) is the infinitesimal generator matrix of the Markov process; for the SIS model, Q(β,γ)Q(\beta,\gamma) is a tri-diagonal matrix dependent only on the parameters of the model.

Let Sobs(t)S_{obs}(t) be an observation at time tt of the number of susceptible individuals in the population. We generate observations using a single realisation of the SIS model with parameters β=0.003\beta=0.003 and γ=0.1\gamma=0.1, population size Npop=101N_{pop}=101, and initial conditions S(0)=100S(0)=100 and I(0)=1I(0)=1; Observations are taken at times t1=4,t2=8,,t10=40t_{1}=4,\,t_{2}=8,\ldots,\,t_{10}=40. Using the analytic solution to the SIS transitional density (using Equation (11)), we arrive at the likelihood function

(β,γ;𝒟)=i=110p(Sobs(ti),tiSobs(ti1),ti1;β,γ),\mathcal{L}(\beta,\gamma;\mathcal{D})=\prod_{i=1}^{10}p(S_{obs}(t_{i}),t_{i}\mid\nobreak S_{obs}(t_{i-1}),t_{i-1};\beta,\gamma), (12)

where Sobs(t0)=s0S_{obs}(t_{0})=s_{0} almost surely. Hence, we can obtain an exact solution to the SIS posterior p(β,γ𝒟)p(\beta,\gamma\mid\nobreak\mathcal{D}) given the priors β𝒰(0,0.06)\beta\sim\mathcal{U}(0,0.06) and γ𝒰(0,2)\gamma\sim\mathcal{U}(0,2). Given this exact posterior, quadrature can be applied to compute the posterior CDF, given by

F(s1,s2)=s1s2p(β,γ𝒟)dβdγ.F(s_{1},s_{2})=\int_{-\infty}^{s_{1}}\int_{-\infty}^{s_{2}}p(\beta,\gamma\mid\nobreak\mathcal{D})\,\text{d}\beta\text{d}\gamma.

For the ABC approximation we use a discrepancy metric based on the sum of squared errors,

d(𝒟,𝒟s)=[i=110(Sobs(ti)S(ti))2]1/2,d(\mathcal{D},\mathcal{D}_{s})=\left[\sum_{i=1}^{10}\left(S_{obs}(t_{i})-S^{*}(t_{i})\right)^{2}\right]^{1/2},

where S(t)S^{*}(t) is a realisation of the model generated using the Gillespie algorithm (Gillespie, 1977) for a given set of parameters, and 𝒟s=[S(t1),S(t2),,S(t10)]\mathcal{D}_{s}=[S^{*}(t_{1}),S^{*}(t_{2}),\ldots,S^{*}(t_{10})]. An appropriate acceptance threshold sequence for this metric is ϵ1,,ϵL\epsilon_{1},\ldots,\epsilon_{L}, with ϵ=ϵ1m1\epsilon_{\ell}=\epsilon_{1}m^{1-\ell}, m=2m=2 and ϵ1=75\epsilon_{1}=75 (Toni et al., 2009).

We can use the exact likelihood (Equation (12)) to evaluate the convergence properties of our MLMC-ABC method. Based on the analysis of ABC rejection sampling by Barber et al. (2015), the rate of decay of the root mean-squared error (RMSE) as the computational cost is increased is slower for ABC than standard Monte Carlo. We find experimentally for ABC rejection sampling, that the decay of RMSE of the SIS model is approximately 𝒪(C1/4)\mathcal{O}(C^{-1/4}), where CC is the expected computational cost of generating NN ABC posterior samples. The 95%95\% confidence interval of this experimental rate is [0.27,0.23][-0.27,-0.23], which is consistent with the expected theoretical result of 0.25-0.25 (Barber et al., 2015). This is slower than the expected 𝒪(C1/2)\mathcal{O}(C^{-1/2}) decay typically achieved with standard Monte Carlo. We compare the ABC rejection sampler decay rate with that achieved by our MLMC-ABC methods.

We use the LL_{\infty} norm for the RMSE, that is,

RMSE=𝔼[FF^ϵL2],\text{RMSE}=\sqrt{\mathbb{E}\left[\left\|F-\hat{F}_{\epsilon_{L}}\right\|_{\infty}^{2}\right]},

where FF is the exact posterior CDF, evaluated directly from the likelihood (Equation (12)), and F^ϵL\hat{F}_{\epsilon_{L}} is the Monte Carlo estimate of the CDF. A regular lattice that consists of 300×300300\times 300 points is used for this estimate. The computation required for Gillespie simulations completely dominates the added computation of updating the lattice. The RMSE is computed using 2020 independent MLMC-ABC CDF estimations for L=1,,3L=1,\ldots,3. The sequence of sample numbers, N1,,NLN_{1},\ldots,N_{L}, is computed using Equation (10) with target variance of 𝒪(ϵ2)\mathcal{O}(\epsilon^{2}) and 100100 trial samples. Figure 1 demonstrates the improved convergence rate over the ABC rejection sampler convergence rate. Based on the least-squares fit, the RMSE decay is approximately 𝒪(C1/3)\mathcal{O}(C^{-1/3}). The 95%95\% confidence interval for the rate is [0.26,0.34][-0.26,-0.34] which is consistent with theory from Giles et al. (2015) for the univariate situation. For smaller target RMSE, this results in an order of magnitude reduction in the computational cost.

Refer to caption
Figure 1: RMSE convergence for MLMC-ABC compared with ABC rejection sampling. The RMSE is computed using the exact solution to the posterior density of the SIS model.

We also consider the effect of the additional bias introduced through the coupling mechanism as presented in Section 2.3. We set the sample numbers at all levels to be 10410^{4} to ensure the Monte Carlo error is negligible and compare the bias for different values of acceptance threshold scaling factor mm. The bias is computed according to the LL_{\infty} norm, that is,

Bias=𝔼[F^ϵLcF^ϵLu],\text{Bias}=\mathbb{E}\left[\left\|\hat{F}_{\epsilon_{L}}^{c}-\hat{F}_{\epsilon_{L}}^{u}\right\|_{\infty}\right],

where F^ϵLc\hat{F}_{\epsilon_{L}}^{c} denotes the estimator computed according to Algorithm 4 and F^ϵLu\hat{F}_{\epsilon_{L}}^{u} denotes the estimator computed without any coupling. That is, F^ϵLu\hat{F}_{\epsilon_{L}}^{u} is computed using standard Monte Carlo to evaluate each term in the MLMC telescoping sum (Equation (6)). Note that, computationally, F^ϵLu\hat{F}_{\epsilon_{L}}^{u} will always be inferior to a standard Monte Carlo estimate. Figure 2 shows that, not only does the bias decay as mm decreases, but also the additional bias is well within the order of bias expected from the ABC approximation. This result, along with Figure 1, demonstrates that the reduction in estimator variance can dominate the increase in bias. Thus, compared with standard Monte Carlo, a significantly lower RMSE for the same computational effort is achieved with MLMC using this coupling strategy.

Refer to caption
Figure 2: Convergence of coupling bias as a function of mm.

3.2 Performance evaluation

We now evaluate the performance of MLMC-ABC using the model developed by Tanaka et al. (2006) in the study of tuberculosis transmission rate parameters using DNA fingerprint data (Small et al., 1994). This model has been selected due to the availability of published comparative performance evaluations of MCMC-ABC and SMC-ABC (Tanaka et al., 2006; Sisson et al., 2007).

The model proposed by Tanaka et al. (2006) describes the occurrence of tuberculosis infections over time and the mutation of the bacterium responsible, Myobacterium tuberculosis. The number of infections, II, at time tt is

It=i=1GtXi,t,I_{t}=\sum_{i=1}^{G_{t}}X_{i,t},

where GtG_{t} is the number of distinct genotypes and Xi,tX_{i,t} is the number of infections caused by the iith genotype at time tt. For each genotype, new infections occur with rate α\alpha, infections terminate with rate δ\delta, and mutation occurs with rate μ\mu; causing an increase in the number of genotypes. This process, as with the SIS model, can be described by a discrete-state continuous-time Markov process. In this case, however, the likelihood is intractable, but the model can still be simulated using the Gillespie algorithm (Gillespie, 1977). After a realisation of the model is completed, either by extinction or when a maximum infection count is reached, a sub-sample of 473473 cases is collected and compared against the IS61106110 DNA fingerprint data of tuberculosis bacteria samples (Small et al., 1994). The dataset consists of 326326 distinct genotypes; the infection cases are clustered according to the genotype responsible for the infection. The collection of clusters can be summarised succinctly as 301 231 151 101 81 52 44 313 220 1282,30^{1}\,23^{1}\,15^{1}\,10^{1}\,8^{1}\,5^{2}\,4^{4}\,3^{13}\,2^{20}\,1^{282}, where njn^{j} denotes there are jj clusters of size nn. The discrepancy metric used is

d(𝒟,𝒟s)=1n|g(𝒟)g(𝒟s)|+|H(𝒟)H(𝒟s)|,d(\mathcal{D},\mathcal{D}_{s})=\frac{1}{n}\left|g(\mathcal{D})-g(\mathcal{D}_{s})\right|+\left|H(\mathcal{D})-H(\mathcal{D}_{s})\right|, (13)

where nn is the size of the population sub-sample (e.g., n=473n=473), g(𝒟)g(\mathcal{D}) denotes the number of distinct genotypes in the dataset (e.g., g(𝒟)=326g(\mathcal{D})=326) and the genetic diversity is

H(𝒟)=11n2i=1g(𝒟)ni(𝒟)2,H(\mathcal{D})=1-\frac{1}{n^{2}}\sum_{i=1}^{g(\mathcal{D})}n_{i}(\mathcal{D})^{2},

where ni(𝒟)n_{i}(\mathcal{D}) is the cluster size of the iith genotype in the dataset.

We perform likelihood-free inference on the tuberculosis model for the parameters 𝜽={α,δ,μ}\bm{\theta}=\left\{\alpha,\delta,\mu\right\} with the goal of evaluating the efficiencies of MLMC-ABC, MCMC-ABC and SMC-ABC. We use a target posterior distribution of p(𝜽d(𝒟,𝒟s)<ϵ)p(\bm{\theta}\mid\nobreak d(\mathcal{D},\mathcal{D}_{s})<\epsilon) with d(𝒟,𝒟s)d(\mathcal{D},\mathcal{D}_{s}) as defined in Equation (13) and ϵ=0.0025\epsilon=0.0025. The acceptance threshold sequence, ϵ1,,ϵ10\epsilon_{1},\ldots,\epsilon_{10}, used for both SMC-ABC and MLMC-ABC is ϵi=ϵ10+(ϵi1ϵ10)/2\epsilon_{i}=\epsilon_{10}+(\epsilon_{i-1}-\epsilon_{10})/2 with ϵ1=1\epsilon_{1}=1 and ϵ10=0.0025\epsilon_{10}=0.0025. The improper prior is given by α𝒰(0,5)\alpha\sim\mathcal{U}(0,5), δ𝒰(0,α)\delta\sim\mathcal{U}(0,\alpha) and μ𝒩(0.198,0.067352)\mu\sim\mathcal{N}(0.198,0.06735^{2}) (Sisson et al., 2007; Tanaka et al., 2006). For the MCMC-ABC and SMC-ABC algorithms we apply a typical Gaussian proposal kernel,

K(𝜽(i)𝜽(i1))=𝒩(𝜽(i1),Σ),K(\bm{\theta}^{(i)}\mid\nobreak\bm{\theta}^{(i-1)})=\mathcal{N}\left(\bm{\theta}^{(i-1)},\Sigma\right),

with covariance matrix

Σ=[0.7520000.7520000.032].\displaystyle\Sigma=\begin{bmatrix}0.75^{2}&0&0\\ 0&0.75^{2}&0\\ 0&0&0.03^{2}\end{bmatrix}. (14)

Such a proposal kernel is reasonable to characterise the initial explorations of an ABC posterior as no correlations between parameters are assumed.

While MLMC-ABC does not require a proposal kernel function, some knowledge of the variance of each bias correction term is needed to determine the optimal sample numbers, N1,,NLN_{1},\ldots,N_{L}. This is achieved using 100100 trial samples of each level. The number of data generation steps is also recorded to compute N1,,NLN_{1},\dots,N_{L}, as per Equation (10). The resulting sample numbers are then scaled such that NLN_{L} is a user prescribed value.

The efficiency metric we use is the root mean-squared error (RMSE) of the CDF estimate versus the total number of data generation steps, NsN_{s}. The mean-squared error (MSE) is taken under the LL_{\infty} norm, that is, MSE=𝔼[FϵF^ϵ2]\text{MSE}=\mathbb{E}\left[\|F_{\epsilon_{\ell}}-\hat{F}_{\epsilon_{\ell}}\|_{\infty}^{2}\right] where FϵF_{\epsilon_{\ell}} is the exact ABC posterior CDF and F^ϵ\hat{F}_{\epsilon_{\ell}} is the Monte Carlo estimate using a regular lattice with 100×100×100100\times 100\times 100 points. To compute the RMSE, a high precision solution is computed using 10610^{6} ABC rejection samples of the target ABC posterior. This is computed over a period of 48 hours using 500 processor cores.

Table 1 presents the RMSE for MCMC-ABC, SMC-ABC and MLMC-ABC for different configurations. The RMSE values are computed using 10 independent estimator calculations. The algorithm parameter varied is the number of particles, NPN_{P}, for SMC-ABC, the number of iterations, NTN_{T}, for MCMC-ABC and the level LL sample number, NLN_{L}, for MLMC-ABC.

MLMC-ABC MCMC-ABC SMC-ABC
NLN_{L} NsN_{s} RMSE NTN_{T} NsN_{s} RMSE NPN_{P} NsN_{s} RMSE
100100 102,246102,246 0.13620.1362 160,000160,000 163,800163,800 0.24340.2434 400400 347,281347,281 0.10710.1071
200200 255,443255,443 0.11360.1136 320,000320,000 323,841323,841 0.18280.1828 800800 701,100701,100 0.09540.0954
400400 577,312577,312 0.10670.1067 640,000640,000 643,930643,930 0.18320.1832 1,6001,600 1,385,7901,385,790 0.08580.0858
800800 1,040,1401,040,140 0.08610.0861 1,280,0001,280,000 1,283,5901,283,590 0.13450.1345 3,2003,200 2,792,5102,792,510 0.07840.0784
Table 1: Comparison of MLMC-ABC against MCMC-ABC and SMC-ABC using a naive proposal kernel.

Using the proposal kernel provided in Equation (14), SMC-ABC requires almost 30%30\% more data generation steps than MLMC-ABC to obtain the same RMSE. MLMC-ABC obtains nearly double the accuracy of MCMC-ABC for the same number of data generation steps. Figure 3 shows an example of the high precision marginal posterior CDFs, Fα(s)F_{\alpha}(s), Fδ(s)F_{\delta}(s) and Fμ(s)F_{\mu}(s), compared with the numerical solutions computed using the three methods.

Refer to caption
Figure 3: Estimated marginal CDFs–(A) α\alpha, (B) δ\delta and (C) μ\mu–for the tuberculosis transmission stochastic model. Estimate computed using using MLMC-ABC with NL=800N_{L}=800 (solid yellow), MCMC-ABC over 1.2×1061.2\times 10^{6} iterations (solid blue), SMC-ABC with 3,2003,200 particles (solid red) and high precision solution (dashed black).

We note that these results represent a typical scenario when solving this problem with a standard choice of proposal densities for MCMC-ABC and SMC-ABC. However, obtaining a good proposal kernel is a difficult open problem, and infeasible to do heuristically for high-dimensional parameter spaces. Therefore, efficient proposal kernels are almost never obtained without significant manual adjustment or additional algorithmic modifications such as adaptive schemes (Beaumont et al., 2009; Del Moral et al., 2012; Roberts and Rosenthal, 2009). Nevertheless, we demonstrate in Table 2 the increased efficiency for MCMC-ABC and SMC-ABC when using a highly configured Gaussian proposal kernel with covariance matrix as determined by Tanaka et al. (2006)

Σ=[0.520.22500.2250.520000.0152].\displaystyle\Sigma=\begin{bmatrix}0.5^{2}&0.225&0\\ 0.225&0.5^{2}&0\\ 0&0&0.015^{2}\end{bmatrix}. (15)
MLMC-ABC MCMC-ABC SMC-ABC
NLN_{L} NsN_{s} RMSE NTN_{T} NsN_{s} RMSE NPN_{P} NsN_{s} RMSE
100100 102,246102,246 0.13620.1362 160,000160,000 161,604161,604 0.17180.1718 400400 102,902102,902 0.12700.1270
200200 255,443255,443 0.11360.1136 320,000320,000 322,962322,962 0.12540.1254 800800 198,803198,803 0.07670.0767
400400 577,312577,312 0.10670.1067 640,000640,000 642,412642,412 0.11270.1127 1,6001,600 402,334402,334 0.06520.0652
800800 1,040,1401,040,140 0.08610.0861 1,280,0001,280,000 1,305,3401,305,340 0.09340.0934 3,2003,200 797,893797,893 0.05600.0560
Table 2: Comparison of MLMC-ABC against MCMC-ABC and SMC-ABC using heuristically chosen proposal densities.

We note that Sisson et al. (2007) use a similar proposal kernel, however they do not explicitly state the difference in the covariance matrix Σ\Sigma; we therefore assume that Equation (15) represents a proposal kernel that is heuristically optimal to the target posterior density. We emphasise that it would be incredibly rare to arrive at an optimal proposal kernel without any additional experimentation. Even in this unlikely case, MLMC-ABC is still comparable with MCMC-ABC. However, MLMC-ABC is clearly not as efficient as SMC-ABC for this heuristically optimised scenario. The scenario is intentionally biased toward MCMC-ABC and SMC-ABC, so this result is not unexpected. Future research could consider a comparison of the methods when the extra computational burden of determining the optimal Σ\Sigma, such as implementation of an adaptive scheme, is taken into account.

4 Discussion

Our results indicate that, while SMC-ABC and MCMC-ABC can be heuristically optimised to be highly efficient, an accurate estimate of the parameter posterior can be obtained using the MLMC-ABC method presented here in a relatively automatic fashion. Furthermore, the efficiency of MLMC-ABC is comparable or improved over MCMC-ABC and SMC-ABC, even in the case when efficient proposal kernels are employed.

The need to estimate the variances of each bias correction term could be considered a limitation of the MLMC-ABC approach. However, we find in practice that these need not be computed to high accuracy and can often be estimated with a relatively small number of samples. There could be examples of Bayesian inference problems where MLMC-ABC is inefficient on account of the variance estimation inaccuracy. We have so far, however, failed to find an example for which 100100 samples of each bias correction term is insufficient to obtain a good MLMC-ABC estimator.

There are many modifications one could consider to further improve MLMC-ABC. In this work, we explicitly specify the sequence of acceptance thresholds in advance for both MLMC-ABC and SMC-ABC. As this is the initial presentation of the method, it is appropriate to consider this idealised case. However, it is unclear if an optimal sequence of thresholds for SMC-ABC will also be optimal for MLMC-ABC and vice versa. Furthermore, as mention in Section 1, practical applications of SMC-ABC often determine such sequences adaptively (Drovandi and Pettitt, 2011; Silk et al., 2013). A modification of MLMC-ABC to allow for adaptive acceptance thresholds would make MLMC-ABC even more practical as it could be used to minimise the coupling bias. The exact mechanisms to achieve this could be based on similar ideas to adaptive SMC-ABC (Drovandi and Pettitt, 2011). Given the solution at level 1\ell-1, the next level \ell could be determined through: (i) sampling a fixed set of prior samples; (ii) sorting these samples based on the discrepancy metric; (iii) selecting a discrepancy threshold, ϵ\epsilon_{\ell}, that optimises coupling bias and the variance of the bias correction term. Future work should address these open problems.

Other improvements could focus on the discretisation used for the eCDF calculations. The MLMC-ABC method has no requirement of a regular lattice, and alternative choices would likely enable MLMC-ABC to scale to much higher dimensional parameter spaces. Adaptive grids that refine with each level is one option that could be considered; however, unstructured grids or kernel based methods also have potential.

Improvements to the coupling scheme are also possible avenues for future consideration. The coupling approach we have considered depends only on the computation of the marginal posterior CDFs and assumes nothing about the underlying model. It may be possible to take advantage of certain model specific features to improve variance reduction. There may also be cases where the rejection sampling scheme is prohibitive for the more accurate acceptance levels. The combination of our coupling scheme with the MLSMC scheme recently proposed by Beskos et al. (2017) and Jasra et al. (2017) is a promising possibility to mitigate these issues. Future work should investigate, compare and contrast the variety of available coupling strategies in the growing body of literature on MLMC for ABC and Bayesian inverse problems.

We have shown, in a practical way, how MLMC techniques can be applied to ABC inference. We also demonstrate that variance reduction strategies, even when applied to simple methods such as rejection sampling, can achieve performance improvements comparable, and in some cases superior, to modern advanced ABC methods based on MCMC and SMC methods. Therefore, the MLMC framework is a promising area for designing improved samplers for complex statistical problems with intractable likelihoods.

Acknowledgements

This work is supported by the Australian Research Council (FT130100148). REB would like to thank the Royal Society for a Royal Society Wolfson Research Merit Award and the Leverhulme Trust for a Leverhulme Research Fellowship. REB would also like to acknowledge support from the Biotechnology and Biological Sciences Research Council (BB/R000816/1). Computational resources were provided by High Performance Computing and Advanced Research Computing support (HPC-ARCs), Queensland University of Technology (QUT). The authors thank Mike Giles and Chris Drovandi for helpful discussions, and two anonymous reviewers and the Associate Editor for insightful comments and suggestions.

Appendix A MLMC for ABC with general Lipschitz functions

Minor modifications of the MLMC-ABC method (Algorithm 4) are possible to enable the computation of expectations of the form

𝔼[U(𝜽)d(𝒟,𝒟s)<ϵ]=𝚯U(𝜽)p(𝜽d(𝒟,𝒟s)<ϵ)dθkdθ1,\mathbb{E}\left[U(\bm{\theta})\mid\nobreak d(\mathcal{D},\mathcal{D}_{s})<\epsilon\right]=\int_{\bm{\Theta}}U(\bm{\theta})p(\bm{\theta}\mid\nobreak d(\mathcal{D},\mathcal{D}_{s})<\epsilon)\,\text{d}\theta_{k}\ldots\text{d}\theta_{1},

where U(𝜽)U(\bm{\theta}) is any Lipschitz continuous function.

Using the same notation as defined in Section 2.2, the MLMC telescoping sum may be formed for a given sequence of acceptance thresholds, ϵ1>>ϵL\epsilon_{1}>\cdots>\epsilon_{L}, to compute the expectation EϵL=𝔼[U(𝜽ϵL)]E_{\epsilon_{L}}=\mathbb{E}\left[U(\bm{\theta}_{\epsilon_{L}})\right]. That is,

EϵL==1LPϵ,Pϵ={𝔼[U(𝜽ϵ1)],=1,𝔼[U(𝜽ϵ)U(𝜽ϵ1)],>1.E_{\epsilon_{L}}=\sum_{\ell=1}^{L}P_{\epsilon_{\ell}},\quad P_{\epsilon_{\ell}}=\begin{cases}\mathbb{E}\left[U(\bm{\theta}_{\epsilon_{1}})\right],&\ell=1,\\ \mathbb{E}\left[U(\bm{\theta}_{\epsilon_{\ell}})-U(\bm{\theta}_{\epsilon_{\ell-1}})\right],&\ell>1.\end{cases} (16)

From Equation (16), it is straightforward to obtain equivalent expressions to Equation (8) and Equation (9).

Algorithm 5 Modified MLMC-ABC
Initialise ϵ1,,ϵL\epsilon_{1},\ldots,\epsilon_{L}, N1,,NLN_{1},\dots,N_{L} and prior p(𝜽)p(\bm{\theta}).
Set p(𝜽ϵ0)p(𝜽)p(\bm{\theta}_{\epsilon_{0}})\leftarrow p(\bm{\theta}).
for =1,,L\ell=1,\ldots,L do
  for i=1,,Ni=1,\ldots,N_{\ell} do
   repeat
     Sample 𝜽p(𝜽)\bm{\theta}^{*}\sim p(\bm{\theta}) restricted to supp(p(𝜽ϵ1))\operatorname*{supp}(p(\bm{\theta}_{\epsilon_{\ell-1}})).
     Generate data, 𝒟sf(𝒟𝜽)\mathcal{D}_{s}\sim f(\mathcal{D}\mid\nobreak\bm{\theta}^{*}).
   until d(𝒟,𝒟s)ϵd(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{\ell}.
   Set 𝜽ϵi𝜽\bm{\theta}_{\epsilon_{\ell}}^{i}\leftarrow\bm{\theta}^{*}.
  end for
  for j=1,,kj=1,\ldots,k do
   for s𝒮js\in\mathcal{S}_{j} do
     Set F^ϵ,jN(s)i=1Nξ((sjθϵ,ji)/δj)/N\hat{F}_{\epsilon_{\ell},j}^{N_{\ell}}(s)\leftarrow\sum_{i=1}^{N_{\ell}}\xi\left((s_{j}-\theta_{\epsilon_{\ell},j}^{i})/\delta_{j}\right)/N_{\ell}.
   end for
  end for
  if =1\ell=1 then
   E^ϵNi=1NU(𝜽ϵi)/N\hat{E}_{\epsilon_{\ell}}^{N_{\ell}}\leftarrow\sum_{i=1}^{N_{\ell}}U(\bm{\theta}^{i}_{\epsilon_{\ell}})/N_{\ell}
  else
   for i=1,,Ni=1,\ldots,N_{\ell} do
     for j=1,,kj=1,\ldots,k do
      Set θϵ1,jiG^ϵ1,jN1,,N1(F^ϵ,jN(θϵ,ji))\theta_{\epsilon_{\ell-1},j}^{i}\leftarrow\hat{G}^{N_{1},\dots,N_{\ell-1}}_{\epsilon_{\ell-1},j}\left(\hat{F}_{\epsilon_{\ell},j}^{N_{\ell}}\left(\theta_{\epsilon_{\ell},j}^{i}\right)\right).
     end for
   end for
   for j=1,,kj=1,\ldots,k do
     for s𝒮js\in\mathcal{S}_{j} do
      Set Y^ϵ,jN(s)i=1N[ξ((sjθϵ,ji)/δj)ξ((sjθϵ1,ji)/δj)]/N\hat{Y}^{N_{\ell}}_{\epsilon_{\ell},j}(s)\leftarrow\sum_{i=1}^{N_{\ell}}\left[\xi\left((s_{j}-\theta_{\epsilon_{\ell},j}^{i})/\delta_{j}\right)-\xi\left((s_{j}-\theta_{\epsilon_{\ell-1},j}^{i})/\delta_{j}\right)\right]/N_{\ell}.
      Set F^ϵ,jN1,,N(s)F^ϵ1,jN1,,N1(s)+Y^ϵ,jN(s)\hat{F}^{N_{1},\ldots,N_{\ell}}_{\epsilon_{\ell},j}(s)\leftarrow\hat{F}^{N_{1},\ldots,N_{\ell-1}}_{\epsilon_{\ell-1},j}(s)+\hat{Y}^{N_{\ell}}_{\epsilon_{\ell},j}(s).
     end for
   end for
   P^ϵNi=1N[U(𝜽ϵi)U(𝜽ϵ1i))]/N\hat{P}_{\epsilon_{\ell}}^{N_{\ell}}\leftarrow\sum_{i=1}^{N_{\ell}}\left[U(\bm{\theta}^{i}_{\epsilon_{\ell}})-U(\bm{\theta}^{i}_{\epsilon_{\ell-1}}))\right]/N_{\ell}
   E^ϵN1,,NE^ϵ1N1,,N1+P^ϵN\hat{E}^{N_{1},\ldots,N_{\ell}}_{\epsilon_{\ell}}\leftarrow\hat{E}^{N_{1},\ldots,N_{\ell-1}}_{\epsilon_{\ell-1}}+\hat{P}_{\epsilon_{\ell}}^{N_{\ell}}.
  end if
end for

The modified MLMC-ABC algorithm proceeds in a very similar fashion to Algorithm 4, however, there is no need to hold a complete discretisation of the parameter space, 𝚯\bm{\Theta}. This is because only the kk marginal CDFs, Fϵ,1(s1),,Fϵ,k(sk)F_{\epsilon_{\ell},1}\left(s_{1}\right),\ldots,F_{\epsilon_{\ell},k}\left(s_{k}\right), are required to form the coupling strategy in Section 2.3. Thus, we denote 𝒮j\mathcal{S}_{j} to be a discretisation of the jjth coordinate axis. This significantly reduces the computational burden of the lattice in higher dimensions since |𝒮j|=𝒪(logk|𝒮|)|\mathcal{S}_{j}|=\mathcal{O}(\log_{k}|\mathcal{S}|). The resulting algorithm is given in Algorithm 5.

References

  • Anderson and Higham (2012) Anderson, D.F., Higham, D.J., 2012. Multilevel Monte Carlo for continuous time Markov chains, with applications in biochemical kinetics. Multiscale Modeling & Simulation 10(1):146–179. DOI 10.1137/110840546
  • Avikainen (2009) Avikainen, R., 2009. On irregular functionals of SDEs and the Euler scheme. Finance and Stochastics 13(3):381–401. DOI 10.1007/s00780-009-0099-7
  • Barber et al. (2015) Barber, S., Voss, J., Webster, M., 2015. The rate of convergence for approximate Bayesian computation. Electronic Journal of Statistics 9:80–105. DOI 10.1214/15-EJS988
  • Beaumont et al. (2002) Beaumont, M.A., Zhang, W., Balding, D.J., 2002. Approximate Bayesian computation in population genetics. Genetics 162(4):2025–2035.
  • Beaumont et al. (2009) Beaumont, M.A., Cornuet, J.M., Marin, J.M., Robert, C.P., 2009. Adaptive approximate Bayesian computation. Biometrika 96(4):983–990.
  • Beskos et al. (2017) Beskos, A., Jasra, A., Law, K., Tempone, R., Zhou, Y., 2017. Multilevel sequential Monte Carlo samplers. Stochastic Processes and their Applications 127(5):1417–1440. DOI 10.1016/j.spa.2016.08.004
  • Browning et al. (2018) Browning, A.P., McCue, S.W., Binny, R.N., Plank, M.J., Shah, E.T., Simpson, M.J., 2018. Inferring parameters for a lattice-free model of cell migration and proliferation using experimental data. Journal of Theoretical Biology 437:251–260. DOI 10.1016/j.jtbi.2017.10.032
  • Cabras et al. (2015) Cabras, S., Castellanos Neuda, M.E., Ruli, E., 2015. Approximate Bayesian computation by modelling summary statistics in a quasi-likelihood framework. Bayesian Analysis 10(2):411–439. DOI 10.1214/14-BA921
  • Del Moral et al. (2006) Del Moral, P., Doucet, A., Jasra, A., 2006. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68(3):411–436. DOI 10.1111/j.1467-9868.2006.00553.x
  • Del Moral et al. (2012) Del Moral, P., Doucet, A., Jasra, A., 2012. An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and Computing 22(5):1009–1020. DOI 10.1007/s11222-011-9271-y
  • Dodwell et al. (2015) Dodwell, T.J., Ketelsen, C., Scheichl, R., Teckentrup, A.L., 2015. A hierarchical multilevel Markov chain Monte Carlo algorithm with applications to uncertainty quantification in subsurface flow. SIAM/ASA Journal on Uncertainty Quantification 3(1):1075–1108. DOI 10.1137/130915005
  • Drovandi and Pettitt (2011) Drovandi, C.C., Pettitt, A.N., 2011. Estimation of parameters for macroparasite population evolution using approximate Bayesian computation. Biometrics 67(1):225–233. DOI 10.1111/j.1541-0420.2010.01410.x
  • Efendiev et al. (2015) Efendiev, Y., Jin, B., Michael, P., Tan, X., 2015. Multilevel Markov chain Monte Carlo method for high-contrast single-phase flow problems. Communications in Computational Physics 17(1):259–286. DOI 10.4208/cicp.021013.260614a
  • Fearnhead and Prangle (2012) Fearnhead, P., Prangle, D., 2012. Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. Journal of the Royal Statistical Society Series B (Statistical Methodology) 74(3):419–474. DOI 10.1111/j.1467-9868.2011.01010.x
  • Filippi et al. (2013) Filippi, S., Barnes, C.P., Cornebise, J., Stumpf, M.P.H., 2013. On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo. Statistical Applications in Genetics and Molecular Biology 12(1):87–107. DOI 10.1515/sagmb-2012-0069
  • Giles (2008) Giles, M.B., 2008. Multilevel Monte Carlo path simulation. Operations Research 56(3):607–617. DOI 10.1287/opre.1070.0496
  • Giles et al. (2015) Giles, M.B., Nagapetyan, T., Ritter, K., 2015. Multilevel Monte Carlo approximation of cumulative distribution function and probability densities. SIAM/ASA Journal on Uncertainty Quantification 3(1):267–295. DOI 10.1137/140960086
  • Gillespie (1977) Gillespie, D.T., 1977. Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry 81(25):2340–2361. DOI 10.1021/j100540a008
  • Green et al. (2015) Green, P.J., Łatuszyński, K., Pereyra, M., Robert, C.P., 2015. Bayesian computation: a summary of the current state, and samples backwards and forwards. Statistics and Computing 25(4):835–862. DOI 10.1007/s11222-015-9574-5
  • Gregory et al. (2016) Gregory, A., Cotter, C.J., Reich, S., 2016. Multilevel ensemble transform particle filtering. SIAM Journal on Scientific Computing 38(3):A1317–A1338. DOI 10.1137/15M1038232
  • Grelaud et al. (2009) Grelaud, A., Marin, J.-M., Robert, C.P., Rodolphe, F., Tallay, J.-F., 2009. ABC likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis 4(2):317–335. DOI 10.1214/09-BA412
  • Guha and Tan (2017) Guha, N., Tan, X., 2017. Multilevel approximate Bayesian approaches for flows in highly heterogeneous porous media and their applications. Journal of Computational and Applied Mathematics 317:700 – 717. DOI 10.1016/j.cam.2016.10.008
  • Hastings (1970) Hastings, W.K., 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109.
  • Jasra et al. (2017) Jasra, A., Jo, S., Nott, D., Shoemaker, C., Tempone, R., 2017. Multilevel Monte Carlo in approximate Bayesian computation. ArXiv e-prints URL https://arxiv.org/abs/1702.03628, 1702.03628.
  • Johnston et al. (2014) Johnston, S.T., Simpson, M.J., McElwain, D.L.S., Binder, B.J., Ross, J.V., 2014 Interpreting scratch assays using pair density dynamics and approximate Bayesian computation. Open Biology 4(9):140097. DOI 10.1098/rsob.140097
  • Lester et al. (2016) Lester, C., Baker, R.E., Giles, M.B., Yates, C.A., 2016. Extending the multi-level method for the simulation of stochastic biological systems. Bulletin of Mathematical Biology 78(8):1640–1677. DOI 10.1007/s11538-016-0178-9
  • Marjoram et al. (2003) Marjoram, P., Molitor, J., Plagnol, V., Tavaré, S., 2003. Markov chain Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences of the United States of America 100(26):15,324–15,328. DOI 10.1073/pnas.0306899100
  • Marin et al. (2012) Marin, J.-M., Pudlo, P., Robert, C.P., Ryder, R.J., 2012. Approximate Bayesian computational methods. Statistics and Computing 22(6):1167–1180. DOI 10.1007/s11222-011-9288-2
  • Metropolis et al. (1953) Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E., 1953. Equation of state calculations by fast computing machines. The Journal of Chemical Physics 21(6):1087–1092. DOI 10.1063/1.1699114
  • Navascués et al. (2017) Navascués, M., Leblois, R., Burgarella, C., 2017. Demographic inference through approximate-Bayesian-computation skyline plots. PeerJ 5:e3530. DOI 10.7717/peerj.3530
  • Pooley et al. (2015) Pooley, C.M., Bishop, S.C., Marion, G., 2015. Using model-based proposals for fast parameter inference on discrete state space, continuous-time Markov processes. Journal of the Royal Society Interface 12(107):20150225. DOI 10.1098/rsif.2015.0225
  • Pritchard et al. (1999) Pritchard, J.K, Seielstad, M.T., Perez-Lezaun, A.. Feldman, M.W., 1999. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology and Evolution 16(12):1791–1798. DOI 10.1093/oxfordjournals.molbev.a026091
  • Reiss (1981) Reiss, R.-D., 1981. Nonparametric estimation of smooth distribution functions. Scandinavian Journal of Statistics 8:116–119.
  • Roberts and Rosenthal (2004) Roberts, G.O., Rosenthal, J.S., 2004. General state space Markov chains and MCMC algorithms. Probability Surveys 1:20–71. DOI 10.1214/154957804100000024
  • Roberts and Rosenthal (2009) Roberts, G.O., Rosenthal, J.S., 2009. Examples of adaptive MCMC. Journal of Computational and Graphical Statistics 18(2):349–367. DOI 10.1198/jcgs.2009.06134
  • Ross et al. (2017) Ross, R.J.H., Baker, R.E., Parker, A., Ford, M.J., Mort, R.L., Yates, C.A., 2017. Using approximate Bayesian computation to quantify cell-cell adhesion parameters in cell migratory process. npj Systems Biology and Applications 3(1):9. DOI 10.1038/s41540-017-0010-7
  • Silk et al. (2013) Silk, D., Filippi, S., Stumpf, M.P.H., 2013. Optimizing threshold-schedules for sequential approximate Bayesian computation: applications to molecular systems. Statistical Applications in Genetics and Molecular Biology 12(3):603–618. DOI 10.1515/sagmb-2012-004
  • Silverman (1986) Silverman, B.W., 1986. Density estimation for statistics and data analysis. Monographs on Statistics and Applied Probability. Boca Ranton: Chapman & Hall/CRC.
  • Sisson et al. (2007) Sisson, S.A., Fan, Y., Tanaka, M.M., 2007. Sequential Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences of the United States of America 104(6):1760–1765. DOI 10.1073/pnas.0607208104
  • Small et al. (1994) Small, P.M., Hopewell, P.C., Singh, S.P., Paz, A., Parsonnet, J., Ruston, D.C., Schecter, G.F., Daley, C.L., Schoolnik, G.K., 1994. The epidemiology of tuberculosis in San Francisco – a population-based study using conventional and molecular methods. New England Journal of Medicine 330(24):1703–1709. DOI 10.1056/NEJM199406163302402
  • Stumpf (2014) Stumpf, M.P.H., 2014. Approximate Bayesian inference for complex ecosystems. F1000Prime Reports 6:60. DOI 10.12703/P6-60
  • Sunnåker et al. (2013) Sunnåker, M., Busetto, A.G., Numminen, E., Corander, J., Foll, M., Dessimoz, C., 2013. Approximate Bayesian computation. PLOS Computational Biology 9(1):e1002803. DOI 10.1371/journal.pcbi.1002803
  • Tanaka et al. (2006) Tanaka, M.M., Francis, A.R., Luciani, F., Sisson, S.A., 2006. Using approximate Bayesian computation to estimate tuberculosis transmission parameter from genotype data. Genetics 173(3):1511–1520. DOI 10.1534/genetics.106.055574
  • Tavaré et al. (1997) Tavaré, S., Balding, D.J., Griffiths, R.C., Donnelly, P., 1997. Inferring coalescence times from DNA sequence data. Genetics 145(2):505–518.
  • Thorne and Stumpf (2012) Thorne, T., Stumpf, M.P.H., 2012. Graph spectral analysis of protein interaction network evolution. Journal of The Royal Society Interface 9(75):2653–2666. DOI 10.1098/rsif.2012.0220
  • Toni et al. (2009) Toni, T., Welch, D., Strelkowa, N., Ipsen, A., Stumpf, M.P.H., 2009. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. Journal of the Royal Society Interface 6:187–202. DOI 10.1098/rsif.2008.0172
  • Vo et al. (2015) Vo, B.N., Drovandi, C.C., Pettit, A.N., Simpson, M.J., 2015. Quantifying uncertainty in parameter estimates for stochastic models of collective cell spreading using approximate Bayesian computation. Mathematical Biosciences 263:133–142. DOI 10.1016/j.mbs.2015.02.010
  • Warne et al. (2017) Warne, D.J., Baker, R.E., Simpson M.J., 2017. Optimal quantification of contact inhibition in cell populations. Biophysical Journal 113(9):1920–1924. DOI 10.1016/j.bpj.2017.09.016
  • Weiss and Dishon (1971) Weiss, G.H., Dishon, M., 1971. On the asymptotic behavior of the stochastic and deterministic models of an epidemic. Mathematical Biosciences 11(3):261–265. DOI 10.1016/0025-5564(71)90087-3
  • Wilkinson (2009) Wilkinson, D.J., 2009. Stochastic modelling for quantitative description of heterogeneous biological systems. Nature Reviews Genetics 10:122–133. DOI 10.1038/nrg2509