This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Rapid Bayesian inference for expensive stochastic models

David J. Warne111To whom correspondence should be addressed. E-mail: david.warne@qut.edu.au School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland 4001, Australia Centre for Data Science, Queensland University of Technology, Brisbane, Queensland 4001, Australia Ruth E. Baker Mathematical Institute, University of Oxford, Oxford, OX2 6GG, United Kingdom Matthew J. Simpson School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland 4001, Australia Centre for Data Science, Queensland University of Technology, Brisbane, Queensland 4001, Australia
Abstract

Almost all fields of science rely upon statistical inference to estimate unknown parameters in theoretical and computational models. While the performance of modern computer hardware continues to grow, the computational requirements for the simulation of models are growing even faster. This is largely due to the increase in model complexity, often including stochastic dynamics, that is necessary to describe and characterize phenomena observed using modern, high resolution, experimental techniques. Such models are rarely analytically tractable, meaning that extremely large numbers of stochastic simulations are required for parameter inference. In such cases, parameter inference can be practically impossible. In this work, we present new computational Bayesian techniques that accelerate inference for expensive stochastic models by using computationally inexpensive approximations to inform feasible regions in parameter space, and through learning transforms that adjust the biased approximate inferences to closer represent the correct inferences under the expensive stochastic model. Using topical examples from ecology and cell biology, we demonstrate a speed improvement of an order of magnitude without any loss in accuracy. This represents a substantial improvement over current state-of-the-art methods for Bayesian computations when appropriate model approximations are available.

Keywords: Parameter estimation; continuum-limit approximation; Bayesian inference; approximate Bayesian computation; sequential Monte Carlo

1 Introduction

Modern experimental techniques allow us to observe the natural world in unprecedented detail and resolution (Chen and Zhang, 2014). Advances in machine learning and artificial intelligence provide many new techniques for pattern recognition and prediction, however, in almost all scientific inquiry there is a need for detailed mathematical models to provide mechanistic insight into the phenomena observed (Baker et al., 2018; Coveney et al., 2016). This is particularly true in the biological and ecological sciences, where detailed stochastic models are routinely applied to develop and validate theory as well as interpret and analyze data (Black and McKane, 2012; Drawert et al., 2017; Wilkinson, 2009).

Two distinct computational challenges arise when stochastic models are considered, they are: (i) the forwards problem; and (ii) the inverse problem, sometimes called the backwards problem (Warne et al., 2019a). While the computational generation of a single sample path, that is the forwards problem, may be feasible, generating hundreds or thousands or more such sample paths may be required to gain insight into the range of possible model predictions and to conduct parameter sensitivity analysis (Gunawan et al., 2005; Lester et al., 2017; Marino et al., 2008). The problem is further compounded if the models must be calibrated using experimental data, that is the inverse problem of parameter estimation, since millions of sample paths may be necessary.

In many cases, the forwards problem can be sufficiently computationally expensive to render both parameter sensitivity analysis and the inverse problem completely intractable, despite recent advances in computational inference (Sisson et al., 2018). This has prompted recent interest in the use of mathematical approximations to circumvent the computational burden, both in the context of the forwards and inverse problems. For example, linear approximations are applied to the forwards problem of chemical reaction networks with bimolecular and higher-order reactions (Cao and Grima, 2018), and various approximations, including surrogate models (Rynn et al., 2019), emulators (Buzbas and Rosenberg, 2015) and transport maps (Parno and Marzouk, 2018), are applied to inverse problems with expensive forwards models, for example, in the study of climate science (Holden et al., 2018). Furthermore, a number of developments, such as multilevel Monte Carlo methods (Giles, 2015), have demonstrated that families of approximations can be combined to improve computational performance without sacrificing accuracy.

In recent years, the Bayesian approach to the inverse problem of model calibration and parameter inference has been particularly successful in many fields of science including, astronomy (EHT Collaboration et al., 2019), anthropology and archaeology (King et al., 2014; Malaspinas et al., 2016), paleontology and evolution (O’Dea et al., 2016; Pritchard et al., 1999; Tavaré et al., 1997), epidemiology (Liu et al., 2018), biology (Lawson et al., 2018; Guindani et al., 2014; Woods and Barnes, 2016; Vo et al., 2015), and ecology (Ellison, 2004; Stumpf, 2014). For complex stochastic models, parameterized by 𝜽𝚯\boldsymbol{\theta}\in\boldsymbol{\Theta}, computing the likelihood of observing data 𝒟𝔻\mathcal{D}\in\mathbb{D} is almost always impossible (Browning et al., 2018; Johnston et al., 2014; Vankov et al., 2019). Thus, approximate Bayesian computation (ABC) methods (Sisson et al., 2018) are essential. ABC methods replace likelihood evaluation with an approximation based on stochastic simulations of the proposed model, this is captured directly in ABC rejection sampling (Beaumont et al., 2002; Pritchard et al., 1999; Tavaré et al., 1997) (Section 2) where samples are generated from an approximate posterior using stochastic simulations of the forwards problem as a replacement for the likelihood.

Unfortunately, ABC rejection sampling can be computationally expensive or even completely prohibitive, especially for high-dimensional parameter spaces, since a very large number of stochastic simulations are required to generate enough samples from the approximate Bayesian posterior distribution (Sisson et al., 2018; Warne et al., 2019c). This is further compounded when the forwards problem is computationally expensive. In contrast, an appropriately chosen approximate model may yield a tractable likelihood that removes the need for ABC methods (Browning et al., 2019; Warne et al., 2017, 2019b). This highlights a key advantage of such approximations because no ABC sampling is required. However, approximations can perform poorly in terms of their predictive capability, and inference based on such models will always be biased, with the extent of the bias dependent on the level of accuracy.

We consider ABC-based inference algorithms for the challenging problem of parameter inference for computationally expensive stochastic models when an appropriate approximation is available to inform the search in parameter space. Under our approach, the approximate model need not be quantitatively accurate in terms of the forwards problem, but must qualitatively respond to changes in parameter values in a similar way to the stochastic model. In particular, we extend the sequential Monte Carlo ABC sampler (SMC-ABC) of Sisson et al. (2007) (Section 2) to exploit the approximate model in two ways: (i) to generate an intermediate proposal distribution, that we call a preconditioner, to improve ABC acceptance rates for the stochastic model; and (ii) to construct a biased ABC posterior, then reduce this bias using a moment-matching transform. We describe both methods and then present relevant examples from ecology and cell biology. Example calculations demonstrate that our methods generate ABC posteriors with a significant reduction in the number of required expensive stochastic simulations, leading to as much as a tenfold computational speedup. The methods we demonstrate here enable substantial acceleration of accurate statistical inference for a broad range of applications, since many areas of science utilise model approximations out of necessity despite potential inference inaccuracies.

As a motivating case study for this work, we focus on stochastic models that can replicate many spatiotemporal patterns that naturally arise in biological and ecological systems. Stochastic discrete random walk models (Section 3), henceforth called discrete models, can accurately characterize the microscale interactions of individual agents, such as animals, plants, micro-organisms, and cells (Agnew et al., 2014; Codling et al., 2008; von Hardenberg et al., 2001; Law et al., 2003; Taylor and Hastings, 2005; Vincenot et al., 2016). Mathematical modeling of populations as complex systems of agents can enhance our understanding of real biological and ecological populations with applications in cancer treatment (Böttger et al., 2015), wound healing (Callaghan et al., 2006), wildlife conservation (McLane et al., 2011; DeAngelis and Grimm, 2014), and the management of invasive species (Chkrebtii et al., 2015; Taylor and Hastings, 2005).

For example, the discrete model formulation can replicate many realistic spatiotemporal patterns observed in cell biology. Figure 1(A),(B) demonstrates typical microscopy images obtained from in vitro cell culture assays; ubiquitous and important experimental techniques used in the study of cell motility, cell proliferation and drug design. Various patterns are observed: prostate cancer cells (PC-3 line) tend to be highly motile, and spread uniformly to invade vacant spaces (Figure 1(A)); in contrast breast cancer cells (MBA-MD-231 line) tend be relatively stationary with proliferation events driving the formation of aggregations (Figure 1(B)). These phenomena may be captured using a lattice-based discrete model framework by varying the ratio Pp/PmP_{p}/P_{m} where Pp[0,1]P_{p}\in[0,1] and Pm[0,1]P_{m}\in[0,1] are, respectively, the probabilities that an agent attempts to proliferate and attempts to move during a time interval of duration τ>0\tau>0 (See Section 3.2). For Pp/Pm1P_{p}/P_{m}\ll 1, behavior akin to PC-3 cells is recovered (Figure 1(C)–(F)) (Jin et al., 2016). Setting Pp/Pm1P_{p}/P_{m}\gg 1, as in Figure 1(H)–(K), leads to clusters of occupied lattices sites that are similar to the aggregates of MBA-MD-231 cells (Agnew et al., 2014; Simpson et al., 2013).

It is common practice to derive approximate continuum-limit differential equation descriptions of discrete models (Callaghan et al., 2006; Jin et al., 2016; Simpson et al., 2010) (Supplementary Material). Such approximations provide a means of performing analysis with significantly reduced computational requirements, since evaluating an exact analytical solution, if available, or otherwise numerically solving a differential equation is typically several orders of magnitude faster than generating a single realization of the discrete model, of which hundreds or thousands may be required for reliable ABC sampling (Browning et al., 2018). However, such approximations are generally only valid within certain parameter regimes, for example here when Pp/Pm1P_{p}/P_{m}\ll 1 (Callaghan et al., 2006; Simpson et al., 2010). Consider Figure 1(G), the population density growth curve from the continuum-limit logistic growth model is superimposed with stochastic data for four realizations of a discrete model with Pp/Pm1P_{p}/P_{m}\ll 1 and Pp/Pm1P_{p}/P_{m}\gg 1, under initial conditions simulating a proliferation assay, where each lattice site is randomly populated with constant probability, such that there are no macroscopic gradients present at t=0t=0. The continuum-limit logistic growth model is an excellent match for the Pp/Pm1P_{p}/P_{m}\ll 1 case (Figure 1(C)–(F)), but severely overestimates the population density when Pp/Pm1P_{p}/P_{m}\gg 1 since the mean-field assumptions underpinning the continuum-limit model are violated by the presence of clustering (Figure 1(H)–(K)) (Agnew et al., 2014; Simpson et al., 2013).

As we demonstrate in Section 3, our methods generate accurate ABC posteriors for inference on the discrete problem for a range of biologically relevant parameter regimes, including those where the continuum-limit approximation is poor. In this respect we demonstrate a novel use of approximations that qualitatively respond to changes in parameters in a similar way to the full exact stochastic model.

Refer to caption
Figure 1: Discrete random walk models can replicate observed spatial patterns in cell culture: (A) PC-3 prostate cancer cells (reprinted from Jin et al. (2017) with permission); and (B) MBA-MD-231 breast cancer cells (reprinted from Simpson et al. (2013) with permission). (C)–(F) Discrete simulations with Pp/Pm1P_{p}/P_{m}\ll 1 replicate the uniform distribution of (A) PC-3 cells. (H)–(K) Discrete simulations with Pp/Pm1P_{p}/P_{m}\gg 1 replicate spatial clustering of (B) MBA-MD-231 cells. (G) Averaged population density profiles 𝒞(t)\mathcal{C}(t) for the discrete model with highly motile agents, Pm=1P_{m}=1 (dashed green), and near stationary agents, Pm=5×104P_{m}=5\times 10^{-4} (dashed orange), compared with the logistic growth continuum limit (solid black), time is non-dimensionalized with T=Ppt/τT=P_{p}t/\tau.

2 Methods

In this section, we present details of two new algorithms for the acceleration of ABC inference for expensive stochastic models when an appropriate approximation is available. First, we present essential background in ABC inference and sequential Monte Carlo (SMC) samplers for ABC (Sisson et al., 2007; Toni et al., 2009). We then describe our extensions to SMC samplers for ABC and provide numerical examples of our approaches using topical examples from ecology and cell biology.

2.1 Sequential Monte Carlo for Approximate Bayesian computation

Bayesian analysis techniques are powerful tools for the quantification of uncertainty in parameters, models and predictions (Gelman et al., 2014). Unfortunately, for many stochastic models of practical interest, the likelihood function is intractable. ABC methods replace likelihood evaluation with an approximation based on stochastic simulations of the proposed model, this is captured directly in ABC rejection sampling (Pritchard et al., 1999; Tavaré et al., 1997) where \mathcal{M} samples are generated from an approximate posterior, denoted by p(𝜽ρ(𝒟,𝒟s)ϵ)p(\boldsymbol{\theta}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon). Here 𝒟ss(𝒟𝜽)\mathcal{D}_{s}\sim s(\mathcal{D}\mid\boldsymbol{\theta}) is a data generation process based on simulation of the model, ρ(𝒟,𝒟s)\rho(\mathcal{D},\mathcal{D}_{s}) is a discrepancy metric, and ϵ\epsilon is the discrepancy threshold. The resulting accepted parameter samples are distributed according to p(𝜽ρ(𝒟,𝒟s)ϵ)p(𝜽𝒟)p(\boldsymbol{\theta}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon)\to p(\boldsymbol{\theta}\mid\mathcal{D}) as ϵ0\epsilon\to 0.

The average acceptance probability of a proposed parameter sample 𝜽\boldsymbol{\theta}^{*} is 𝒪(ϵd)\mathcal{O}(\epsilon^{d}) (Fearnhead and Prangle, 2012), where dd is the dimensionality of the data space, 𝔻\mathbb{D}. This renders rejection sampling computationally expensive or even completely prohibitive, especially for high-dimensional parameter spaces (Marjoram et al., 2003; Sisson et al., 2007). Summary statistics can reduce the data dimensionality, however, they will often incur information loss (Barnes et al., 2012; Blum et al., 2013; Fearnhead and Prangle, 2012). However, strategies including regression adjustment and marginal adjustment strategies can improve the accuracy of dimension reductions (Beaumont et al., 2002; Nott et al., 2014).

In the SMC-ABC method, importance resampling is applied to a sequence of RR ABC posteriors with discrepancy thresholds ϵ1>>ϵR\epsilon_{1}>\cdots>\epsilon_{R}, with ϵR\epsilon_{R} indicating the target ABC posterior. Given \mathcal{M} weighted samples {(𝜽i,wi)}i=1\{(\boldsymbol{\theta}^{i},w^{i})\}_{i=1}^{\mathcal{M}}, called particles, from the prior p(𝜽)p(\boldsymbol{\theta}), particles are filtered through each ABC posterior using three main steps for each particle 𝜽i\boldsymbol{\theta}^{i}: (i) the particle is perturbed using a proposal kernel density qr(𝜽𝜽i)q_{r}(\boldsymbol{\theta}\mid\boldsymbol{\theta}^{i}); (ii) an accept/reject step is performed; and (iii) importance weights are updated. Once all particles have been updated and reweighted, resampling of particles is performed to avoid particle degeneracy. For reference, the SMC-ABC algorithm as initially developed by Sisson et al. (2007) and Toni et al. (2009) is given in Algorithm 1. The number of particles, \mathcal{M}, and the number of intermediate distributions, RR, influence the accuracy and performance, respectively, of the sampler. Setting \mathcal{M} too small can lead to large estimator variability and particle degeneracy, and setting RR too small leads to large divergence between successive distributions that can result in high rejection rates.

Algorithm 1 SMC-ABC
1:Initialize 𝜽0ip(𝜽)\boldsymbol{\theta}_{0}^{i}\sim p(\boldsymbol{\theta}) and w0i=1/w_{0}^{i}=1/\mathcal{M}, for i=1,,i=1,\ldots,\mathcal{M};
2:for r=1,,Rr=1,\ldots,R do
3:  for i=1,,i=1,\ldots,\mathcal{M} do
4:   repeat
5:     Draw 𝜽\boldsymbol{\theta}^{*} from {𝜽r1j}j=1\{\boldsymbol{\theta}_{r-1}^{j}\}_{j=1}^{\mathcal{M}} according to the probability mass function
(𝜽=𝜽r1j)=wr1jk=1wr1k,forj=1,,;\mathbb{P}(\boldsymbol{\theta}^{*}=\boldsymbol{\theta}_{r-1}^{j})=\frac{w_{r-1}^{j}}{\sum_{k=1}^{\mathcal{M}}w_{r-1}^{k}},\quad\text{for}\,j=1,\ldots,\mathcal{M};
6:     Sample transition kernel, 𝜽qr(𝜽𝜽)\boldsymbol{\theta}^{**}\sim q_{r}(\boldsymbol{\theta}\mid\boldsymbol{\theta}^{*});
7:     Generate data, 𝒟ss(𝒟𝜽)\mathcal{D}_{s}\sim s(\mathcal{D}\mid\boldsymbol{\theta}^{**});
8:   until ρ(𝒟,𝒟s)ϵr\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r}
9:   Set 𝜽ri𝜽\boldsymbol{\theta}_{r}^{i}\leftarrow\boldsymbol{\theta}^{**};
10:   Set wrip(𝜽ri)/[j=1wr1jqr(𝜽ri𝜽r1j)]w_{r}^{i}\leftarrow p(\boldsymbol{\theta}_{r}^{i})/\left[{\sum_{j=1}^{\mathcal{M}}w_{r-1}^{j}q_{r}(\boldsymbol{\theta}_{r}^{i}\mid\boldsymbol{\theta}_{r-1}^{j})}\right];
11:  end for
12:  Resample weighted particles, {(𝜽ri,wri)}i=1\{(\boldsymbol{\theta}_{r}^{i},w_{r}^{i})\}_{i=1}^{\mathcal{M}}, with replacement;
13:  Set wri1/w_{r}^{i}\leftarrow 1/{\mathcal{M}} for all i=1,,i=1,\ldots,\mathcal{M};
14:end for

Note that throughout all algorithms used in this manuscript, we assume that the initial set of weighted particles, {𝜽0i,w0i}i=1\{\boldsymbol{\theta}_{0}^{i},w_{0}^{i}\}_{i=1}^{\mathcal{M}}, are independent, identically distributed samples from the prior, p(𝜽)p(\boldsymbol{\theta}), and therefore have uniform weight, w0i=1/w_{0}^{i}=1/\mathcal{M}, for all i=1,,i=1,\ldots,\mathcal{M}. However, the methods are general enough to deal with prior distributions that require importance sampling to draw truly weighted particles.

For a fixed choice of RR, efficient use of SMC-ABC depends critically on the selection of appropriate proposal kernels and threshold sequences. The process of sampling at the target threshold, ϵr\epsilon_{r}, given the weights of the previous threshold, ϵr1\epsilon_{r-1}, is described by Filippi et al. (2013)

ξr(𝜽r𝒟)=1ar,r1𝚯𝔻𝟙B(𝒟,ϵr)(𝒟s)s(𝒟s𝜽r)qr(𝜽r𝜽r1)w(𝜽r1)d𝒟sd𝜽r1,\xi_{r}\left(\boldsymbol{\theta}_{r}\mid\mathcal{D}\right)=\frac{1}{a_{r,r-1}}\int_{\boldsymbol{\Theta}}\int_{\mathbb{D}}\mathds{1}_{B(\mathcal{D},\epsilon_{r})}\left(\mathcal{D}_{s}\right)s(\mathcal{D}_{s}\mid\boldsymbol{\theta}_{r})q_{r}(\boldsymbol{\theta}_{r}\mid\boldsymbol{\theta}_{r-1})w(\boldsymbol{\theta}_{r-1})\,\text{d}\mathcal{D}_{s}\text{d}\boldsymbol{\theta}_{r-1}, (1)

where the data space, 𝔻\mathbb{D}, has dimensionality dd, B(𝒟,ϵr)B(\mathcal{D},\epsilon_{r}) is a dd-dimensional ball centered on the data with radius ϵr\epsilon_{r}, and 𝟙A(x)\mathds{1}_{A}\left(x\right) denotes the indicator function with 𝟙A(x)=1\mathds{1}_{A}\left(x\right)=1 if xAx\in A, otherwise 𝟙A(x)=0\mathds{1}_{A}\left(x\right)=0. The normalization constant, ar,r1a_{r,r-1}, can be interpreted as the average acceptance probability across all particles. We see this by noting that Equation (1) can be reduced to

ξr(𝜽r𝒟)=ηr1(𝜽r)(𝒟sB(𝒟,ϵr)𝜽r)ar,r1.\xi_{r}\left(\boldsymbol{\theta}_{r}\mid\mathcal{D}\right)=\frac{\eta_{r-1}(\boldsymbol{\theta}_{r})\mathbb{P}(\mathcal{D}_{s}\in B(\mathcal{D},\epsilon_{r})\mid\boldsymbol{\theta}_{r})}{a_{r,r-1}}. (2)

Here the distribution ηr1(𝜽r)\eta_{r-1}(\boldsymbol{\theta}_{r}) represent the proposal mechanism,

ηr1(𝜽r)=j=1qr(𝜽r𝜽r1j)w(𝜽r1),\eta_{{r-1}}(\boldsymbol{\theta}_{r})=\sum_{j=1}^{\mathcal{M}}q_{r}(\boldsymbol{\theta}_{r}\mid\boldsymbol{\theta}_{r-1}^{j})w(\boldsymbol{\theta}_{r-1}), (3)

and (𝒟sB(𝒟,ϵr)𝜽r)\mathbb{P}(\mathcal{D}_{s}\in B(\mathcal{D},\epsilon_{r})\mid\boldsymbol{\theta}_{r}) is the probability that simulated data is within ϵr\epsilon_{r} of the data 𝒟\mathcal{D} given a parameter value 𝜽r\boldsymbol{\theta}_{r}. Therefore, the normalizing constant is

ar,r1=𝔼[(𝒟sB(𝒟,ϵr)𝜽r)],a_{r,r-1}=\mathbb{E}\left[\mathbb{P}(\mathcal{D}_{s}\in B(\mathcal{D},\epsilon_{r})\mid\boldsymbol{\theta}_{r})\right], (4)

that is, ar,r1a_{r,r-1} is the average acceptance probability.

From a computational perspective, the goal is to choose ηr1(𝜽r)\eta_{r-1}(\boldsymbol{\theta}_{r}) to maximize ar,r1a_{r,r-1}. However, this would not necessarily result in a ξr(𝜽r𝒟)\xi_{r}\left(\boldsymbol{\theta}_{r}\mid\mathcal{D}\right) that is an accurate approximation to the true target ABC posterior p(𝜽rρ(𝒟,𝒟s)ϵr)p(\boldsymbol{\theta}_{r}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r}). To achieve this goal, we require ηr1(𝜽r)\eta_{r-1}(\boldsymbol{\theta}_{r}) such that the Kullback-Leibler divergence (Kullback and Leibler, 1951), DKL(ξr(𝒟);p(ρ(𝒟,𝒟s)ϵr))D_{\text{KL}}\left(\xi_{r}\left(\cdot\mid\mathcal{D}\right);p(\cdot\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r})\right), is minimized. Beaumont et al. (2009) and Filippi et al. (2013) demonstrate how the latter goal provides insight into how to optimally choose ηr1(𝜽r)\eta_{r-1}(\boldsymbol{\theta}_{r}). The key is to note that DKL(ξr(𝒟);p(ρ(𝒟,𝒟s)ϵr))D_{\text{KL}}\left(\xi_{r}\left(\cdot\mid\mathcal{D}\right);p(\cdot\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r})\right) can be decomposed as follows,

DKL(ξr(𝒟);p(ρ(𝒟,𝒟s)ϵr))=DKL(ηr1();p(ρ(𝒟,𝒟s)ϵr))E(𝜽r)+logear,r1,\displaystyle\begin{split}D_{\text{KL}}\left(\xi_{r}\left(\cdot\mid\mathcal{D}\right);p(\cdot\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r})\right)&=D_{\text{KL}}\left(\eta_{r-1}\left(\cdot\right);p(\cdot\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r})\right)\\ &\quad-E(\boldsymbol{\theta}_{r})+\log_{e}a_{r,r-1},\end{split} (5)

where E(𝜽r)=𝔼[loge(𝔻𝟙B(𝒟,ϵr)(𝒟s)s(𝒟s𝜽r)d𝒟s)]E(\boldsymbol{\theta}_{r})=\mathbb{E}\left[\log_{e}\left(\int_{\mathbb{D}}\mathds{1}_{B(\mathcal{D},\epsilon_{r})}\left(\mathcal{D}_{s}\right)s(\mathcal{D}_{s}\mid\boldsymbol{\theta}_{r})\,\text{d}\mathcal{D}_{s}\right)\right] is independent of ηr1(𝜽r)\eta_{r-1}(\boldsymbol{\theta}_{r}). By rearranging Equation (5), we obtain

DKL(ηr1();p(ρ(𝒟,𝒟s)ϵr))=DKL(ξr(𝒟);p(ρ(𝒟,𝒟s)ϵr))+E(𝜽r)logear,r1.D_{\text{KL}}\left(\eta_{r-1}\left(\cdot\right);p(\cdot\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r})\right)=D_{\text{KL}}\left(\xi_{r}\left(\cdot\mid\mathcal{D}\right);p(\cdot\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r})\right)+E(\boldsymbol{\theta}_{r})-\log_{e}a_{r,r-1}.

That is, minimizing DKL(ηr1();p(ρ(𝒟,𝒟s)ϵr))D_{\text{KL}}\left(\eta_{r-1}\left(\cdot\right);p(\cdot\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r})\right) is equivalent to minimizing
DKL(ξr(𝒟);p(ρ(𝒟,𝒟s)ϵr))D_{\text{KL}}\left(\xi_{r}\left(\cdot\mid\mathcal{D}\right);p(\cdot\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r})\right) and maximizing ar,r1a_{r,r-1} simultaneously. Therefore, any proposal mechanism that is closer, in the Kullback-Liebler sense, to p(ρ(𝒟,𝒟s)ϵr)p(\cdot\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r}) is more efficient.

We apply the optimal adaptive scheme of Beaumont et al. (2009) and Filippi et al. (2013) for multivariate Gaussian proposals. That is, we set

qr(𝜽r𝜽r1)=1(2π)ndet(Σ)exp((𝜽r𝜽r1)TΣ1(𝜽r𝜽r1)/2),q_{r}(\boldsymbol{\theta}_{r}\mid\boldsymbol{\theta}_{r-1})=\frac{1}{\sqrt{(2\pi)^{n}\det(\Sigma)}}\exp\left(-(\boldsymbol{\theta}_{r}-\boldsymbol{\theta}_{r-1})^{\text{T}}\Sigma^{-1}(\boldsymbol{\theta}_{r}-\boldsymbol{\theta}_{r-1})/2\right),

where nn is the dimensionality of parameter space, 𝚯\boldsymbol{\Theta}, and

Σ=21i=1(𝜽r1iμr1)(𝜽r1iμr1)Twithμr1=1i=1𝜽r1i.\Sigma=\frac{2}{\mathcal{M}-1}\sum_{i=1}^{\mathcal{M}}(\boldsymbol{\theta}_{r-1}^{i}-\mu_{r-1})(\boldsymbol{\theta}_{r-1}^{i}-\mu_{r-1})^{\text{T}}\quad\text{with}\quad\mu_{r-1}=\frac{1}{\mathcal{M}}\sum_{i=1}^{\mathcal{M}}\boldsymbol{\theta}_{r-1}^{i}.

2.2 Preconditioning SMC-ABC

Consider a fixed sequence of ABC posteriors for the stochastic model inference problem, {p(𝜽ρ(𝒟,𝒟s)ϵr)}r=1R\{p(\boldsymbol{\theta}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r})\}_{r=1}^{R}. We want to apply SMC-ABC (Supplementary Material) to efficiently sample from this sequence with adaptive proposal kernels, {qr(𝜽𝜽)}r=1R\{q_{r}(\boldsymbol{\theta}^{*}\mid\boldsymbol{\theta})\}_{r=1}^{R} (Beaumont et al., 2009; Filippi et al., 2013). Our method exploits an approximate model to further improve the average acceptance probability.

2.2.1 Algorithm development

Say we have a set of weighted particles that represent the ABC posterior at threshold ϵr1\epsilon_{r-1} using the stochastic model, that is, {(𝜽r1i,wr1i)}i=1p(𝜽r1ρ(𝒟,𝒟s)ϵr1)\{(\boldsymbol{\theta}_{r-1}^{i},w_{r-1}^{i})\}_{i=1}^{\mathcal{M}}\approx p(\boldsymbol{\theta}_{r-1}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r-1}). Now, consider applying the next importance resampling step using an approximate data generation step, 𝒟~ss~(𝒟~s𝜽)\tilde{\mathcal{D}}_{s}\sim\tilde{s}(\tilde{\mathcal{D}}_{s}\mid\boldsymbol{\theta}), where s~(𝒟~s𝜽)\tilde{s}(\tilde{\mathcal{D}}_{s}\mid\boldsymbol{\theta}) is the simulation process of an approximate model222Throughout, the overbar tilde notation, e.g. x~\tilde{x}, is used to refer to the ABC entities related to the approximate model, whereas quantities without the overbar tilde notation, e.g. xx, are used to refer fo the ABC entities related to the exact model.. Furthermore, assume the computational cost of simulating the approximate model, Cost(𝒟~s)\text{Cost}(\tilde{\mathcal{D}}_{s}), is significantly less than the computational cost of the exact model, Cost(𝒟s)\text{Cost}(\mathcal{D}_{s}), that is, Cost(𝒟~s)/Cost(𝒟s)1\text{Cost}(\tilde{\mathcal{D}}_{s})/\text{Cost}(\mathcal{D}_{s})\ll 1. The result will be a new set of particles that represent the ABC posterior at threshold ϵr\epsilon_{r} using this approximate model, denoted {(𝜽~ri,w~ri)}i=1p~(𝜽~rρ(𝒟,𝒟~s)ϵr)\{(\tilde{\boldsymbol{\theta}}_{r}^{i},\tilde{w}_{r}^{i})\}_{i=1}^{\mathcal{M}}\approx\tilde{p}(\tilde{\boldsymbol{\theta}}_{r}\mid\rho(\mathcal{D},\tilde{\mathcal{D}}_{s})\leq\epsilon_{r}). As noted in the examples in Section 1, approximate models are not always valid. This implies that p~(𝜽~rρ(𝒟,𝒟~s)ϵr)\tilde{p}(\tilde{\boldsymbol{\theta}}_{r}\mid\rho(\mathcal{D},\tilde{\mathcal{D}}_{s})\leq\epsilon_{r}) is always biased and will not in general converge to p(𝜽𝒟)p(\boldsymbol{\theta}\mid\mathcal{D}) as ϵr0\epsilon_{r}\to 0. However, since Cost(𝒟~s)/Cost(𝒟s)1\text{Cost}(\tilde{\mathcal{D}}_{s})/\text{Cost}(\mathcal{D}_{s})\ll 1, it is computationally inexpensive to compute the distribution

η~r(𝜽r)=j=1q~r(𝜽r𝜽~rj)w~rj,\tilde{\eta}_{{r}}(\boldsymbol{\theta}_{r})=\sum_{j=1}^{\mathcal{{M}}}\tilde{q}_{r}(\boldsymbol{\theta}_{r}\mid\tilde{\boldsymbol{\theta}}_{r}^{j})\tilde{w}_{r}^{j}, (6)

in comparison to computing ηr1(𝜽r)\eta_{{r-1}}(\boldsymbol{\theta}_{r}) (Equation (3)). In Equation (6), the proposal kernel q~r(𝜽r𝜽~rj)\tilde{q}_{r}(\boldsymbol{\theta}_{r}\mid\tilde{\boldsymbol{\theta}}_{r}^{j}) is possibly distinct from the qr(𝜽r𝜽r1j)q_{r}(\boldsymbol{\theta}_{r}\mid\boldsymbol{\theta}_{r-1}^{j}) used in ηr1(𝜽r)\eta_{{r-1}}(\boldsymbol{\theta}_{r}) (Equation (3)). To improve the efficiency of the sampling process we simply require

DKL(ηr1();p(ρ(𝒟,𝒟s)ϵr))>DKL(η~r();p(ρ(𝒟,𝒟~s)ϵr)),D_{\text{KL}}(\eta_{{r-1}}(\cdot);p(\cdot\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r}))>D_{\text{KL}}(\tilde{\eta}_{{r}}(\cdot);p(\cdot\mid\rho(\mathcal{D},\tilde{\mathcal{D}}_{s})\leq\epsilon_{r})), (7)

for η~r(𝜽r)\tilde{\eta}_{{r}}(\boldsymbol{\theta}_{r}) (Equation (6)) to be more efficient as a proposal mechanism compared with ηr1(𝜽r)\eta_{{r-1}}(\boldsymbol{\theta}_{r}) (Equation (3)). Provided the condition Cost(𝒟~s)/Cost(𝒟s)1\text{Cost}(\tilde{\mathcal{D}}_{s})/\text{Cost}(\mathcal{D}_{s})~\ll~1 holds, any improvements in sampling efficiency will translate directly into computational performance improvements. That is, it does not matter that p~(𝜽~rρ(𝒟,𝒟~s)ϵr)\tilde{p}(\tilde{\boldsymbol{\theta}}_{r}\mid\rho(\mathcal{D},\tilde{\mathcal{D}}_{s})\leq\epsilon_{r}) is biased, it just needs to be less biased than p(𝜽r1ρ(𝒟,𝒟s)ϵr1)p(\boldsymbol{\theta}_{r-1}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r-1}) and computationally inexpensive.

This idea yields an intuitive new algorithm for SMC-ABC; proceed through the sequential sampling of {p(𝜽ρ(𝒟,𝒟s)ϵr)}r=1R\{p(\boldsymbol{\theta}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r})\}_{r=1}^{R} by applying two resampling steps for each iteration. The first moves the particles from acceptance threshold ϵr1\epsilon_{r-1} to ϵr\epsilon_{r} using the computationally inexpensive approximate model, and the second corrects for the bias between p~(𝜽~rρ(𝒟,𝒟~s)ϵr)\tilde{p}(\tilde{\boldsymbol{\theta}}_{r}\mid\rho(\mathcal{D},\tilde{\mathcal{D}}_{s})\leq\epsilon_{r}) and p(𝜽rρ(𝒟,𝒟s)ϵr)p(\boldsymbol{\theta}_{r}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r}) using the expensive stochastic model, but at an improved acceptance rate. Since the intermediate distribution acts on the proposal mechanism to accelerate the convergence time of SMC-ABC, we denote the sequence {p~(𝜽~rρ(𝒟,𝒟~s)ϵr)}r=1R\{\tilde{p}(\tilde{\boldsymbol{\theta}}_{r}\mid\rho(\mathcal{D},\tilde{\mathcal{D}}_{s})\leq\epsilon_{r})\}_{r=1}^{R} as the preconditioner distribution sequence. The algorithm, called preconditioned SMC-ABC (PC-SMC-ABC), is given in Algorithm 2. We note that similar notions of preconditioning with approximation informed proposals have been applied in the context of Markov chain Monte Carlo samplers (Parno and Marzouk, 2018). However, to the best of our knowledge, our approach represents the first application of preconditioning ideas to SMC-ABC.

Algorithm 2 Preconditioned SMC-ABC
1:Initialize 𝜽0ip(𝜽)\boldsymbol{\theta}_{0}^{i}\sim p(\boldsymbol{\theta}) and w0i=1/w_{0}^{i}=1/{\mathcal{M}}, for i=1,,i=1,\ldots,\mathcal{M};
2:for r=1,,Rr=1,\ldots,R do
3:  for i=1,,i=1,\ldots,\mathcal{M} do
4:   repeat
5:     Draw 𝜽\boldsymbol{\theta}^{*} from {𝜽r1j}j=1\{\boldsymbol{\theta}_{r-1}^{j}\}_{j=1}^{\mathcal{M}} according to the probability mass function
(𝜽=𝜽r1j)=wr1jk=1wr1k,forj=1,,;\mathbb{P}(\boldsymbol{\theta}^{*}=\boldsymbol{\theta}_{r-1}^{j})=\frac{w_{r-1}^{j}}{\sum_{k=1}^{\mathcal{M}}w_{r-1}^{k}},\quad\text{for}\,j=1,\ldots,\mathcal{M};
6:     Sample transition kernel, 𝜽~qr(𝜽~𝜽)\tilde{\boldsymbol{\theta}}^{**}\sim q_{r}(\tilde{\boldsymbol{\theta}}\mid\boldsymbol{\theta}^{*});
7:     Generate data, 𝒟~ss~(𝒟𝜽~)\tilde{\mathcal{D}}_{s}\sim\tilde{s}(\mathcal{D}\mid\tilde{\boldsymbol{\theta}}^{**});
8:   until ρ(𝒟,𝒟~s)ϵr\rho(\mathcal{D},\tilde{\mathcal{D}}_{s})\leq\epsilon_{r}
9:   Set 𝜽~ri𝜽~\tilde{\boldsymbol{\theta}}_{r}^{i}\leftarrow\tilde{\boldsymbol{\theta}}^{**};
10:   Set w~rip(𝜽~ri)/[j=1wr1jqr(𝜽~ri𝜽r1j)]\tilde{w}_{r}^{i}\leftarrow p(\tilde{\boldsymbol{\theta}}_{r}^{i})/\left[{\sum_{j=1}^{\mathcal{M}}w_{r-1}^{j}q_{r}(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\boldsymbol{\theta}_{r-1}^{j})}\right];
11:  end for
12:  for i=1,,i=1,\ldots,\mathcal{M} do
13:   repeat
14:     Draw 𝜽~\tilde{\boldsymbol{\theta}}^{*} from {𝜽~r1j}j=1\{\tilde{\boldsymbol{\theta}}_{r-1}^{j}\}_{j=1}^{\mathcal{M}} according to the probability mass function
(𝜽~=𝜽~r1j)=w~r1jk=1w~r1k,forj=1,,;\mathbb{P}(\tilde{\boldsymbol{\theta}}^{*}=\tilde{\boldsymbol{\theta}}_{r-1}^{j})=\frac{\tilde{w}_{r-1}^{j}}{\sum_{k=1}^{\mathcal{M}}\tilde{w}_{r-1}^{k}},\quad\text{for}\,j=1,\ldots,\mathcal{M};
15:     Sample transition kernel, 𝜽q~r(𝜽𝜽~)\boldsymbol{\theta}^{**}\sim\tilde{q}_{r}(\boldsymbol{\theta}\mid\tilde{\boldsymbol{\theta}}^{*});
16:     Generate data, 𝒟ss(𝒟𝜽)\mathcal{D}_{s}\sim s(\mathcal{D}\mid\boldsymbol{\theta}^{**});
17:   until ρ(𝒟,𝒟s)ϵr\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r}
18:   Set 𝜽ri𝜽\boldsymbol{\theta}_{r}^{i}\leftarrow\boldsymbol{\theta}^{**};
19:   Set wrip(𝜽ri)/[j=1w~rjq~r(𝜽ri𝜽~rj)]w_{r}^{i}\leftarrow p(\boldsymbol{\theta}_{r}^{i})/\left[{\sum_{j=1}^{\mathcal{M}}\tilde{w}_{r}^{j}\tilde{q}_{r}(\boldsymbol{\theta}_{r}^{i}\mid\tilde{\boldsymbol{\theta}}_{r}^{j})}\right];
20:  end for
21:  Resample weighted particles, {(𝜽ri,wri)}i=1\{(\boldsymbol{\theta}_{r}^{i},w_{r}^{i})\}_{i=1}^{\mathcal{M}}, with replacement;
22:  Set wri1/w_{r}^{i}\leftarrow 1/{\mathcal{M}} for all i=1,,i=1,\ldots,\mathcal{M};
23:end for

One particular advantage of the PC-SMC-ABC method, that is demonstrated in the next section, is that it is unbiased. Effectively, one can consider PC-SMC-ABC as standard SMC-ABC method with a specialized proposal mechanism based on the preconditioner distribution. This means that PC-SMC-ABC is completely general, as discussed in Section 4, and is independent of the specific stochastic models that we consider here. This property of unbiasedness holds even for cases where the approximate model is a poor approximation of the forward dynamics of the model. However, the closer that η~r(𝜽r)\tilde{\eta}_{r}(\boldsymbol{\theta}_{r}) is to p(𝜽rρ(𝒟,𝒟s)ϵr)p(\boldsymbol{\theta}_{r}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r}) the better the performance improvement will be, as we demonstrate in Section 3.

2.3 Moment-matching SMC-ABC

The PC-SMC-ABC method is a promising modification to SMC-ABC that can accelerate inference for expensive stochastic models without introducing bias. However, other approaches can be used to obtain further computational improvements. Here, we consider an alternate approach to utilizing approximate models that aims to get the most out of a small sample of expensive stochastic simulations. Unlike PC-SMC-ABC, this method is generally biased, but it has the advantage of yielding a small and fixed computational budget. Specifically, we define a parameter α[0,1]\alpha\in[0,1], such that 1/α1/\alpha is the target computational speedup, for example, α=1/10\alpha=1/10 should result in approximate 1010 times speedup. We apply the SMC-ABC method using ~=(1α)\tilde{\mathcal{M}}=\lfloor(1-\alpha)\mathcal{M}\rfloor particles based on the approximate model, and then use ^=α\hat{\mathcal{M}}=\lceil\alpha\mathcal{M}\rceil particles based on the stochastic model to construct a hybrid population of =^+~\mathcal{M}=\hat{\mathcal{M}}+\tilde{\mathcal{M}} particles that will represent the final inference on the stochastic model. The key idea is that we use the ^\hat{\mathcal{M}} particles of the expensive stochastic model to inform a transformation on the ~\tilde{\mathcal{M}} particles of the approximation such that they the emulate particles of expensive stochastic model. Here, \lfloor\cdot\rfloor and \lceil\cdot\rceil are, respectively, the floor and ceiling functions.

2.3.1 Algorithm development

Assume that we have applied SMC-ABC to sequentially sample ~\tilde{\mathcal{M}} particles through the ABC posteriors from the approximate model, {p~(𝜽~rρ(𝒟,𝒟~s)ϵr)}r=1R\{\tilde{p}(\tilde{\boldsymbol{\theta}}_{r}\mid\rho(\mathcal{D},\tilde{\mathcal{D}}_{s})\leq\epsilon_{r})\}_{r=1}^{R}, with ϵR=ϵ\epsilon_{R}=\epsilon. For the sake of the derivation, say that for all r[1,R]r\in[1,R] we have available the mean vector, 𝝁r\boldsymbol{\mu}_{r}, and the covariance matrix, 𝚺r\boldsymbol{\Sigma}_{r}, of the ABC posterior p(𝜽rρ(𝒟,𝒟s)ϵr)p(\boldsymbol{\theta}_{r}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r}) under the stochastic model. In this case, we use particles 𝜽~r1,,𝜽~r~p~(𝜽~rρ(𝒟,𝒟~s)ϵr)\tilde{\boldsymbol{\theta}}_{r}^{1},\ldots,\tilde{\boldsymbol{\theta}}_{r}^{\tilde{\mathcal{M}}}\sim\tilde{p}(\tilde{\boldsymbol{\theta}}_{r}\mid\rho(\mathcal{D},\tilde{\mathcal{D}}_{s})\leq\epsilon_{r}) to emulate particles 𝜽r1,,𝜽r^p(𝜽rρ(𝒟,𝒟s)ϵr)\boldsymbol{\theta}_{r}^{1},\ldots,\boldsymbol{\theta}_{r}^{\hat{\mathcal{M}}}\sim p(\boldsymbol{\theta}_{r}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r}) by using the moment matching transform (Lei and Bickel, 2011; Sun et al., 2016)

𝜽ri=𝐋r[𝐋~r1(𝜽~ri𝝁~r)]+𝝁r,i=1,2,,~,\boldsymbol{\theta}_{r}^{i}=\mathbf{L}_{r}\left[\tilde{\mathbf{L}}_{r}^{-1}\left(\tilde{\boldsymbol{\theta}}_{r}^{i}-\tilde{\boldsymbol{\mu}}_{r}\right)\right]+\boldsymbol{\mu}_{r},\quad i=1,2,\ldots,\tilde{\mathcal{M}}, (8)

where 𝝁~r\tilde{\boldsymbol{\mu}}_{r} and 𝚺~r\tilde{\boldsymbol{\Sigma}}_{r} are the empirical mean vector and convariance matrix of particles 𝜽~r1,,𝜽~r~\tilde{\boldsymbol{\theta}}_{r}^{1},\ldots,\tilde{\boldsymbol{\theta}}_{r}^{\tilde{\mathcal{M}}}, 𝐋r\mathbf{L}_{r} and 𝐋~r\tilde{\mathbf{L}}_{r} are lower triangular matrices obtained through the Cholesky factorization (Press et al., 1997) of 𝚺r\boldsymbol{\Sigma}_{r} and 𝚺~r\tilde{\boldsymbol{\Sigma}}_{r}, respectively, and 𝐋~r1\tilde{\mathbf{L}}_{r}^{-1} is the matrix inverse of 𝐋~r\tilde{\mathbf{L}}_{r}. This transform will produce a collection of particles that has a sample mean vector of 𝝁r\boldsymbol{\mu}_{r} and covariance matrix 𝚺r\boldsymbol{\Sigma}_{r}. That is, the transformed sample matches the ABC posterior under the stochastic model up to the first two moments. In Section 3, we demonstrate that matching two moments is sufficient for the problems we investigate here, however, in principle we could extend this matching to higher order moments if required. For discussion on the advantages and disadvantages of matching higher moments, see Section 4.

In practice, it would be rare that 𝝁r\boldsymbol{\mu}_{r} and 𝚺r\boldsymbol{\Sigma}_{r} are known. If p(𝜽r1ρ(𝒟,𝒟s)ϵr1)p(\boldsymbol{\theta}_{r-1}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r-1}) is available, then we can use importance resampling to obtain ^\hat{\mathcal{M}} particles, 𝜽r1,,𝜽r^\boldsymbol{\theta}_{r}^{1},\ldots,\boldsymbol{\theta}_{r}^{\hat{\mathcal{M}}}, from p(𝜽rρ(𝒟,𝒟s)ϵr)p(\boldsymbol{\theta}_{r}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r}), that is, we perform a step from SMC-ABC using the expensive stochastic model. We can then use the unbiased estimators

𝝁^r=1^i=1^𝜽riand𝚺^r=1^1i=1^(𝜽ri𝝁^r)(𝜽ri𝝁^r)T,\hat{\boldsymbol{\mu}}_{r}=\frac{1}{\hat{\mathcal{M}}}\sum_{i=1}^{\hat{\mathcal{M}}}\boldsymbol{\theta}_{r}^{i}\quad\text{and}\quad\hat{\boldsymbol{\Sigma}}_{r}=\frac{1}{\hat{\mathcal{M}}-1}\sum_{i=1}^{\hat{\mathcal{M}}}(\boldsymbol{\theta}_{r}^{i}-\hat{\boldsymbol{\mu}}_{r})(\boldsymbol{\theta}_{r}^{i}-\hat{\boldsymbol{\mu}}_{r})^{\text{T}}, (9)

to obtain estimates of 𝝁r\boldsymbol{\mu}_{r} and 𝚺r\boldsymbol{\Sigma}_{r}. Substituting Equation (9) into Equation (8) gives an approximate transform

𝜽^ri=𝐋^r[𝐋~r1(𝜽~ri𝝁~r)]+𝝁^r,i=1,2,,~,\hat{\boldsymbol{\theta}}_{r}^{i}=\hat{\mathbf{L}}_{r}\left[\tilde{\mathbf{L}}_{r}^{-1}\left(\tilde{\boldsymbol{\theta}}_{r}^{i}-\tilde{\boldsymbol{\mu}}_{r}\right)\right]+\hat{\boldsymbol{\mu}}_{r},\quad i=1,2,\ldots,\tilde{\mathcal{M}}, (10)

where 𝚺^r=𝐋^r𝐋^rT\hat{\boldsymbol{\Sigma}}_{r}=\hat{\mathbf{L}}_{r}\hat{\mathbf{L}}_{r}^{\text{T}}. This enables us to construct an estimate of p(𝜽rρ(𝒟,𝒟s)ϵr)p(\boldsymbol{\theta}_{r}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r}) by applying the moment-matching transform (Equation (10)) to the particles 𝜽~r1,,𝜽~r~\tilde{\boldsymbol{\theta}}_{r}^{1},\ldots,\tilde{\boldsymbol{\theta}}_{r}^{\tilde{\mathcal{M}}} then pooling the transformed particles 𝜽^r1,,𝜽^r~\hat{\boldsymbol{\theta}}_{r}^{1},\ldots,\hat{\boldsymbol{\theta}}_{r}^{\tilde{\mathcal{M}}} with the particles 𝜽r1,,𝜽r^\boldsymbol{\theta}_{r}^{1},\ldots,\boldsymbol{\theta}_{r}^{\hat{\mathcal{M}}} that were used in the estimates 𝝁^r\hat{\boldsymbol{\mu}}_{r} and 𝚺^r\hat{\boldsymbol{\Sigma}}_{r}. The goal of the approximate transform application is for the transforms particles 𝜽^r1,,𝜽^r~\hat{\boldsymbol{\theta}}_{r}^{1},\ldots,\hat{\boldsymbol{\theta}}_{r}^{\tilde{\mathcal{M}}} to be more accurate in higher moments due, despite only matching the first two moments (see Section 3.6 for numerical justification of this property for some specific examples). This results in an approximation of p(𝜽rρ(𝒟,𝒟s)ϵr)p(\boldsymbol{\theta}_{r}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r}) using a set of \mathcal{M} particles 𝜽r1,,𝜽r\boldsymbol{\theta}_{r}^{1},\ldots,\boldsymbol{\theta}_{r}^{\mathcal{M}} with 𝜽ri+^=𝜽^ri\boldsymbol{\theta}_{r}^{i+\hat{\mathcal{M}}}=\hat{\boldsymbol{\theta}}_{r}^{i} where 1i~1\leq i\leq\tilde{\mathcal{M}}.

Algorithm 3 Moment-matching SMC-ABC
1:Given α[0,1]\alpha\in[0,1], initialize ^=α\hat{\mathcal{M}}=\lceil\alpha\mathcal{M}\rceil and ~=(1α)\tilde{\mathcal{M}}=\lfloor(1-\alpha)\mathcal{M}\rfloor;
2:Initialize 𝜽~0ip(𝜽)\tilde{\boldsymbol{\theta}}_{0}^{i}\sim p(\boldsymbol{\theta}) and w~0i=1/~\tilde{w}_{0}^{i}=1/{\tilde{\mathcal{M}}}, for i=1,,~i=1,\ldots,\tilde{\mathcal{M}};
3:Initialize 𝜽0ip(𝜽)\boldsymbol{\theta}_{0}^{i}\sim p(\boldsymbol{\theta}) and w0i=1/w_{0}^{i}=1/{\mathcal{M}}, for i=1,,i=1,\ldots,\mathcal{M};
4:Apply SMC-ABC to generate the sequence of approximate particles {(𝜽~1i,w~1i)}i=1~\{(\tilde{\boldsymbol{\theta}}_{1}^{i},\tilde{w}_{1}^{i})\}_{i=1}^{\tilde{\mathcal{M}}}, {(𝜽~2i,w~2i)}i=1~,,{(𝜽~Ri,w~Ri)}i=1~\{(\tilde{\boldsymbol{\theta}}_{2}^{i},\tilde{w}_{2}^{i})\}_{i=1}^{\tilde{\mathcal{M}}},\ldots,\{(\tilde{\boldsymbol{\theta}}_{R}^{i},\tilde{w}_{R}^{i})\}_{i=1}^{\tilde{\mathcal{M}}}
5:for r=1,,Rr=1,\ldots,R do
6:  for i=1,,^i=1,\ldots,\hat{\mathcal{M}} do
7:   repeat
8:     Draw 𝜽\boldsymbol{\theta}^{*} from {𝜽r1j}j=1\{\boldsymbol{\theta}_{r-1}^{j}\}_{j=1}^{\mathcal{M}} according to the probability mass function
(𝜽=𝜽r1j)=wr1jk=1wr1k,forj=1,,;\mathbb{P}(\boldsymbol{\theta}^{*}=\boldsymbol{\theta}_{r-1}^{j})=\frac{w_{r-1}^{j}}{\sum_{k=1}^{\mathcal{M}}w_{r-1}^{k}},\quad\text{for}j=1,\ldots,\mathcal{M};
9:     Sample transition kernel, 𝜽qr(𝜽𝜽)\boldsymbol{\theta}^{**}\sim q_{r}(\boldsymbol{\theta}\mid\boldsymbol{\theta}^{*});
10:     Generate data, 𝒟ss(𝒟𝜽)\mathcal{D}_{s}\sim s(\mathcal{D}\mid\boldsymbol{\theta}^{**});
11:   until ρ(𝒟,𝒟s)ϵr\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r}
12:   Set 𝜽ri𝜽\boldsymbol{\theta}_{r}^{i}\leftarrow\boldsymbol{\theta}^{**};
13:   Set wrip(𝜽ri)/[j=1wr1jqr(𝜽ri𝜽r1j)]w_{r}^{i}\leftarrow p(\boldsymbol{\theta}_{r}^{i})/\left[{\sum_{j=1}^{\mathcal{M}}w_{r-1}^{j}q_{r}(\boldsymbol{\theta}_{r}^{i}\mid\boldsymbol{\theta}_{r-1}^{j})}\right];
14:  end for
15:  Estimate means and covariances 𝝁~r\tilde{\boldsymbol{\mu}}_{r}, 𝚺~r\tilde{\boldsymbol{\Sigma}}_{r}, 𝝁^r\hat{\boldsymbol{\mu}}_{r}, and 𝚺^r\hat{\boldsymbol{\Sigma}}_{r};
16:  Compute Cholesky decompositions 𝚺~r=𝐋~r𝐋~rT\tilde{\boldsymbol{\Sigma}}_{r}=\tilde{\mathbf{L}}_{r}\tilde{\mathbf{L}}_{r}^{\text{T}} and 𝚺^r=𝐋^r𝐋^rT\hat{\boldsymbol{\Sigma}}_{r}=\hat{\mathbf{L}}_{r}\hat{\mathbf{L}}_{r}^{\text{T}}
17:  for i=1,,~i=1,\ldots,\tilde{\mathcal{M}} do
18:   Set 𝜽ri+^𝐋^r[𝐋~r1(𝜽~ri𝝁~r)]+𝝁^r\boldsymbol{\theta}_{r}^{i+\hat{\mathcal{M}}}\leftarrow\hat{\mathbf{L}}_{r}[\tilde{\mathbf{L}}_{r}^{-1}(\tilde{\boldsymbol{\theta}}_{r}^{i}-\tilde{\boldsymbol{\mu}}_{r})]+\hat{\boldsymbol{\mu}}_{r} and wri+^w~riw_{r}^{i+\hat{\mathcal{M}}}\leftarrow\tilde{w}_{r}^{i};
19:  end for
20:  Resample weighted particles {(𝜽ri,wri)}r=1\{(\boldsymbol{\theta}_{r}^{i},w_{r}^{i})\}_{r=1}^{\mathcal{M}} with replacement;
21:  Set wri1/w_{r}^{i}\leftarrow 1/\mathcal{M} for all i=1,,i=1,\ldots,\mathcal{M};
22:end for

This leads to our moment-matching SMC-ABC (MM-SMC-ABC) method. First, SMC-ABC inference is applied using the approximate model with ~\tilde{\mathcal{M}} particles. Then, given \mathcal{M} samples from the prior, p(𝜽)p(\boldsymbol{\theta}), we can sequentially approximate
{p(𝜽rρ(𝒟,𝒟s)ϵr)}r=1R\{p(\boldsymbol{\theta}_{r}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r})\}_{r=1}^{R}. At each iteration the following steps are performed: (i) generate a small number of particles from p(𝜽rρ(𝒟,𝒟s)ϵr)p(\boldsymbol{\theta}_{r}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r}) using importance resampling and stochastic model simulations; (ii) compute 𝝁^r\hat{\boldsymbol{\mu}}_{r} and 𝚺^r=𝐋^r𝐋^rT\hat{\boldsymbol{\Sigma}}_{r}=\hat{\mathbf{L}}_{r}\hat{\mathbf{L}}_{r}^{\text{T}}; (iii) apply the transform from Equation (10) to the particles at ϵr\epsilon_{r} from the approximate model; (iv) pool the resulting particles with the stochastic model samples; and (v) reweight particles and resample. The final MM-SMC-ABC algorithm is provided in Algorithm 3.

The performance of this method depends on the choice of α\alpha. Note that in Algorithm 3, standard SMC-ABC for the expensive stochastic model is recovered as α1\alpha\to 1
(no speedup, inference unbiased), and standard SMC-ABC using the approximate model is recovered as α0\alpha\to 0 (maximum speedup, but inference biased). Therefore we expect there is a choice of α(0,1)\alpha\in(0,1) that provides an optimal trade-off between computational improvement and accuracy. Clearly, the expected speed improvement is proportional to 1/α1/\alpha, however, if α\alpha is chosen to be too small, then the statistical error in the estimates in Equation (9) will be too high. We explore this trade-off in detail in Section 3.6 and find that 0.05α0.20.05\leq\alpha\leq 0.2 seems to give a reasonable result.

3 Results

In this section, we provide numerical examples to demonstrate the accuracy and performance of the PC-SMC-ABC and MM-SMC-ABC methods. First we apply PC-SMC-ABC to a tractable example to demonstrate the mechanisms of the method and provide insight into effective choices of approximate model. The tractable example considered here is inference for an Ornstein–Uhlenbeck process (Uhlenbeck and Ornstein, 1930). We then consider two intractable problems based on expensive discrete model. For our first example, we consider the analysis of spatially averaged population growth data. The discrete model used in this instance is relevant in the ecological sciences as it describes population growth subject to a weak Allee effect (Taylor and Hastings, 2005). We then analyze data that is typical of in vitro cell culture scratch assays in experimental cell biology using a discrete model that leads to the well-studied Fisher-KPP model (Edelstein-Keshet, 2005; Murray, 2002). In both examples, we present the discrete model and its continuum limit, then compute the full Bayesian posterior for the model parameters using the PC-SMC-ABC (Algorithm 2) and MM-SMC-ABC (Algorithm 3) methods, and compare the results with the SMC-ABC (Supplementary Material) using either the discrete model or continuum limit alone. We also provide numerical experiments to evaluate the effect of the tuning parameter α\alpha on the accuracy and performance of the MM-SMC-ABC method.

It is important to clarify that when we refer to the accuracy of our methods, we refer to their ability to sample from the target ABC posterior under the expensive stochastic model. The evaluation of this accuracy requires sampling from the target ABC posterior under the expensive stochastic model using SMC-ABC. As a result, the target acceptance thresholds are chosen to ensure this is computationally feasible.

3.1 A tractable example: Ornstein–Uhlenbeck process

The Ornstein–Uhlenbeck process is a mean reverting stochastic process with many applications in finance, biology and physics (Uhlenbeck and Ornstein, 1930). The evolution of the continuous state XtX_{t} is given by an Itō stochastic differential equation (SDE) of the form

dXt=γ(μXt)dt+σdWt,if t>0,X0=x0,if t=0,\begin{split}\text{d}X_{t}=\gamma(\mu-X_{t})\text{d}t+\sigma\text{d}W_{t},\quad\text{if }t>0,\\ X_{0}=x_{0},\quad\text{if }t=0,\end{split} (11)

where μ\mu is the long-term mean, γ\gamma is the rate of mean reversion, σ\sigma is the process volatility, WtW_{t} is a Wiener process and x0x_{0} is a constant initial condition. Example realisations are shown in Figure 2(A).

Refer to caption
Figure 2: (A) Four example realisations of the Ornstein–Uhlenbeck process with parameters μ=1\mu=1, γ=2\gamma=2, σ=25\sigma=2\sqrt{5} and initial condition x0=10x_{0}=10. (B) Empirical distribution of XTX_{T} for T=1T=1 compared with the exact Fokker–Planck solution and the stationary solution as TT\to\infty. Simulations are performed using the Euler–Maruyama discretization with time step Δt=0.001\Delta t=0.001.

We consider data consisting of NN independent realisations of the Ornstein–Uhlenbeck processs at time T<T<\infty, that is, 𝒟=[XT1,XT2,,XTN]\mathcal{D}=[X_{T}^{1},X_{T}^{2},\ldots,X_{T}^{N}]. This inference problem is analytically tractable since the Fokker–Planck equation can be solved to obtain a Gaussian distribution for the data,

XT𝒩(μ+(x0+μ)eγt,1σ22γe2γt).X_{T}\sim\mathcal{N}\left(\mu+(x_{0}+\mu)e^{-\gamma t},1-\frac{\sigma^{2}}{2\gamma}e^{-2\gamma t}\right). (12)

For demonstration purposes we will assume a solution for the full Fokker–Planck equation is unavailable and perform ABC inference to estimate the volatility parameter, σ\sigma, using stochastic simulation with the Euler–Maruyama discretisation (Maruyama, 1955),

Xt+Δt=Xt+γ(μXt)Δt+σΔtξt,X_{t+\Delta t}=X_{t}+\gamma(\mu-X_{t})\Delta t+\sigma\sqrt{\Delta t}\xi_{t}, (13)

where ξt\xi_{t} is a standard normal variate and Δt\Delta t is a small time step. For the approximate model we take the stationary distribution of the Ornstein–Uhlenbeck process obtained by taking TT\to\infty and solving the steady state of the Fokker–Planck equation,

X𝒩(μ,σ22γ).X_{\infty}\sim\mathcal{N}\left(\mu,\frac{\sigma^{2}}{2\gamma}\right). (14)

This kind of approximation will often be possible since the steady state Fokker–Planck equation is more likely to be tractable than the transient solution for most SDE models. As shown in Figure 2(B), the stationary solution is a better approximation for the true variance than for the true mean. Therefore, for small TT this approximation would be more appropriate as a preconditioner for inference of the volatility parameter, σ\sigma, in Equation (11) than for the long-time mean, μ\mu. In general, the performance expected from preconditioning will increase as TT increases since the stationary approximation will become more accurate (Supplementary Material).

Figure 3 demonstrates th results of applying PC-ABC-SMC (Algorithm 2) with =1000\mathcal{M}=1000 particles, showing intermediate distributions, for inferring D=σ2/2D=\sigma^{2}/2 given data at T=1T=1, using stochastic simulation for the exact model (Equation (13)) with Δt=0.01\Delta t=0.01, and the stationary distribution (Equation (14)) for the approximate model used in the preconditioning step.

In this example, all other parameters are treated as known with x0=10x_{0}=10, μ=1\mu=1 and γ=2\gamma=2. Data is generated using D=10D=10 as in Figure 2. In each step, the preconditioner distribution for threshold ϵr\epsilon_{r} (dashed line) is a better proposal distribution for the target (solid line) than that of the exact model at threshold ϵr1\epsilon_{r-1} (dotted line). The overall speedup factor is approximately 1.51.5 for this example, and it continues to improve as TT increases since Equation (14) becomes an increasingly better approximation to Equation (12) (See Supplementary Material). For truly intractable problems, such as the lattice-based random walk models presented in Sections 3.4 and 3.5, we obtain superior performance gains of up to a factor of four.

Refer to caption
Figure 3: PC-SMC-ABC intermediate steps for the Ornstein-Uhlenbeck SDE example. Each panel demonstrates the transition from threshold ϵr1\epsilon_{r-1} (dotted line) to ϵr\epsilon_{r} (solid line) via the preconditioner (dashed line). (A) ϵ0\epsilon_{0} to ϵ1\epsilon_{1}; (B) ϵ1\epsilon_{1} to ϵ2\epsilon_{2}; (C) ϵ2\epsilon_{2} to ϵ3\epsilon_{3}; and (D) ϵ3\epsilon_{3} to ϵ4\epsilon_{4}. The threshold sequence is ϵr=ϵr1/2\epsilon_{r}=\epsilon_{r-1}/2 for r=1,2,3,4r=1,2,3,4 with ϵ0=6.4\epsilon_{0}=6.4.

3.2 Lattice-based stochastic discrete random walk model

The stochastic discrete model we consider is a lattice-based random walk model that is often used to describe populations of motile cells (Jin et al., 2016). The model involves initially placing a population of NN agents of size δ\delta on a lattice, LL (Callaghan et al., 2006; Simpson et al., 2010), for example an I×JI\times J hexagonal lattice (Jin et al., 2016). This hexagonal lattice is defined by a set of indices
L={(i,j):i[0,1,,I1],j[0,1,,J1]}L=\{(i,j):i\in[0,1,\ldots,I-1],j\in[0,1,\ldots,J-1]\}, and a neighborhood function,

𝒩(i,j)={{(i1,j1),(i,j1),(i+1,j1),(i+1,j),(i,j+1),(i1,j)}if i is even,{(i1,j),(i,j1),(i+1,j),(i+1,j+1),(i,j+1),(i1,j+1)}if i is odd.\mathcal{N}(i,j)=\begin{cases}\{(i-1,j-1),(i,j-1),(i+1,j-1),(i+1,j),(i,j+1),(i-1,j)\}&\text{if }i\text{ is even},\\ \{(i-1,j),(i,j-1),(i+1,j),(i+1,j+1),(i,j+1),(i-1,j+1)\}&\text{if }i\text{ is odd}.\end{cases}

Lattice indices are mapped to Cartesian coordinates using

(xi,yj)={(i32δ,jδ)if i is even,(i32δ,(j+12)δ)if i is odd.(x_{i},y_{j})=\begin{cases}\left(i\dfrac{\sqrt{3}}{2}\delta,j\delta\right)&\text{if }i\text{ is even},\\ \left(i\dfrac{\sqrt{3}}{2}\delta,\left(j+\dfrac{1}{2}\right)\delta\right)&\text{if }i\text{ is odd}.\end{cases} (15)

We define an occupancy function such that C(,t)=1C(\ell,t)=1 if site \ell is occupied by an agent at time t0t\geq 0, otherwise C(,t)=0C(\ell,t)=0. This means that in our discrete model each lattice site can be occupied by, at most, one agent.

During each discrete time step of duration τ\tau, agents attempt to move with probability Pm[0,1]P_{m}\in[0,1] and attempt to proliferate with probability Pp[0,1]P_{p}\in[0,1]. If an agent at site \ell attempts a motility event, then a neighboring site will be selected uniformly at random. The motility event is aborted if the selected site is occupied, otherwise the agent will move to the selected site (Figure 4(A)–(C)).

Refer to caption
Figure 4: Example of movement and proliferation events in a lattice-based random walk model, using a hexagonal lattice with lattice spacing, δ\delta. (A) An example hexagonal lattice neighborhood 𝒩()\mathcal{N}(\ell). An agent at site \ell attempts a motility event (A)–(C) with probability PmP_{m}. (B) Motility events are aborted when the randomly selected neighbor site is occupied. (C) The agent moves to the selected site, if unoccupied. An agent at site \ell attempts a proliferation event (A),(D)–(E) with probability PpP_{p}. (D) Proliferation events are successful with probability f(C^(,t))f(\hat{C}(\ell,t)), resulting in an unoccupied site being selected. (E) The daughter agent is placed at the selected site and the number of agents in the populations is increased by one.

For proliferation events, the local neighborhood average occupancy,

C^(,t)=16N()C(,t),\hat{C}(\ell,t)=\frac{1}{6}\sum_{\ell^{\prime}\in N(\ell)}C(\ell^{\prime},t),

is calculated and a uniform random number u𝒰(0,1)u\sim\mathcal{U}(0,1) is drawn. If u>f(C^(,t))u>f(\hat{C}(\ell,t)), where f(C^(,t))[0,1]f(\hat{C}(\ell,t))\in[0,1] is called the crowding function (Browning et al., 2017; Jin et al., 2016), then the proliferation event is aborted due to local crowding effects and contact inhibition. If uf(C^(,t))u\leq f(\hat{C}(\ell,t)), then proliferation is successful and a daughter agent is placed at a randomly chosen unoccupied lattice site in 𝒩()\mathcal{N}(\ell) (Figure 4(A),(D)–(E)). The evolution of the model is generated through repeating this process though MM time steps, t1=τ,t2=2τ,,t_{1}=\tau,\,t_{2}=2\tau,\ldots, tM=Mτt_{M}=M\tau. This approach, based on the work by Jin et al. (2016), supports a generic proliferation mechanism since f(C^(,t))f(\hat{C}(\ell,t)) is an arbitrary smooth function satisfying f(0)=1f(0)=1 and f(K)=0f(K)=0, where K>0K>0 is the carrying capacity density. However, in the literature there are also examples that include other mechanisms such as cell-cell adhesion (Johnston et al., 2013), directed motility (Binny et al., 2016), and Allee effects (Böttger et al., 2015).

3.3 Approximate continuum-limit descriptions

Discrete models do not generally lend themselves to analytical methods, consequently, their application is intrinsically tied to computationally intensive stochastic simulations and Monte Carlo methods (Jin et al., 2016). As a result, it is common practice to approximate mean behavior using differential equations by invoking mean-field assumptions, that is, to treat the occupancy status of lattice sites as independent (Callaghan et al., 2006; Simpson et al., 2010). The resulting approximate continuum-limit descriptions (Supplementary Material) are partial differential equations (PDEs) of the form

𝒞(x,y,t)t=D2𝒞(x,y,t)+λ𝒞(x,y,t)f(𝒞(x,y,t)),\frac{\partial\mathcal{C}(x,y,t)}{\partial t}=D\nabla^{2}\mathcal{C}(x,y,t)+\lambda\mathcal{C}(x,y,t)f(\mathcal{C}(x,y,t)), (16)

where 𝒞(x,y,t)=𝔼[C(,t)]\mathcal{C}(x,y,t)=\mathbb{E}\left[C(\ell,t)\right], D=limδ0,τ0Pmδ2/(4τ)D=\lim_{\delta\to 0,\tau\to 0}P_{m}\delta^{2}/(4\tau) is the diffusivity, λ=limτ0Pp/τ\lambda=\lim_{\tau\to 0}P_{p}/\tau is the proliferation rate with Pp=𝒪(τ)P_{p}=\mathcal{O}(\tau), and f()f(\cdot) is the crowding function that is related to the proliferation mechanism implemented in the discrete model (Browning et al., 2017; Jin et al., 2016). For spatially uniform populations there will be no macroscopic spatial gradients on average, that is 𝒞(x,y,t)=𝟎\nabla\mathcal{C}(x,y,t)=\mathbf{0}. Thus, 𝒞(x,y,t)\mathcal{C}(x,y,t) is just a function of time, 𝒞(t)\mathcal{C}(t), and the continuum limit reduces to an ordinary differential equation (ODE) describing the net population growth,

d𝒞(t)dt=λ𝒞(t)f(𝒞(t)).\frac{\text{d}\mathcal{C}(t)}{\text{d}t}=\lambda\mathcal{C}(t)f(\mathcal{C}(t)). (17)

For many standard discrete models, the crowding function is implicitly f(𝒞)=1𝒞f(\mathcal{C})=1-\mathcal{C} (Callaghan et al., 2006). That is, the continuum limits in Equation (16) and Equation (17) yield the Fisher-KPP model (Edelstein-Keshet, 2005; Murray, 2002) and the logistic growth model (Tsoularis and Wallace, 2002; Warne et al., 2017), respectively. However, non-logistic growth models , for example, f(𝒞)=(1𝒞)nf(\mathcal{C})=(1-\mathcal{C})^{n} for n>1n>1, have also been considered (Jin et al., 2016; Simpson et al., 2010; Tsoularis and Wallace, 2002).

3.4 Temporal example: a weak Allee model

The Allee effect refers to the reduction in growth rate of a population at low densities. This is particularly well studied in ecology where there are many mechanisms that give rise to this phenomenon (Taylor and Hastings, 2005; Johnston et al., 2017). We introduce an Allee effect into our discrete model by choosing a crowding function of the form

f(C^(,t))=(1C^(,t)K)(A+C^(,t)K),f(\hat{C}(\ell,t))=\left(1-\frac{\hat{C}(\ell,t)}{K}\right)\left(\frac{A+\hat{C}(\ell,t)}{K}\right),

where C^(,t)[0,1]\hat{C}(\ell,t)\in[0,1] is the local density at the lattice site L\ell\in L, at time tt, K>0K>0 is the carrying capacity density, and AA is the Allee parameter which yields a weak Allee effect for A0A\geq 0 (Wang et al., 2019). Note that smaller values of AA entail a more pronounced Allee effect with A<0A<0 leading to a strong Allee effect that can lead to species extinction (Wang et al., 2019). For simplicity, we only consider the weak Allee effect here, but our methods are general enough to consider any sufficiently smooth f()f(\cdot).

Studies in ecology often involve population counts of a particular species over time (Taylor and Hastings, 2005). In the discrete model, the initial occupancy of each lattice site is independent, and hence there are no macroscopic spatial gradients on average. It is reasonable to summarize simulations of the discrete model at time tt by the average occupancy over the entire lattice, C¯(t)=(1/IJ)LC(,t)\bar{C}(t)=(1/IJ)\sum_{\ell\in L}C(\ell,t). Therefore, the continuum limit for this case is given by (Wang et al., 2019)

d𝒞(t)dt=λ𝒞(t)(1𝒞(t)K)(A+𝒞(t)K),\frac{\text{d}\mathcal{C}(t)}{\text{d}t}=\lambda\mathcal{C}(t)\left(1-\frac{\mathcal{C}(t)}{K}\right)\left(\frac{A+\mathcal{C}(t)}{K}\right), (18)

with 𝒞(t)=𝔼[C¯(t)]\mathcal{C}(t)=\mathbb{E}\left[\bar{C}(t)\right], λ=limτ0Pp/τ\lambda=\lim_{\tau\to 0}P_{p}/\tau, and 𝒞(0)=𝔼[C¯(0)]\mathcal{C}(0)=\mathbb{E}\left[\bar{C}(0)\right].

We generate synthetic time-series ecological data using the discrete model, with observations made at times t1=τ×103,t2=2τ×103,,t10=τ×104t_{1}=\tau\times 10^{3},t_{2}=2\tau\times 10^{3},\ldots,t_{10}=\tau\times 10^{4}, resulting in data 𝒟=[Cobs(t1),Cobs(t2),,Cobs(t10)]\mathcal{D}=[C_{\text{obs}}(t_{1}),C_{\text{obs}}(t_{2}),\ldots,C_{\text{obs}}(t_{10})] with Cobs(t)=C¯(t)C_{\text{obs}}(t)=\bar{C}(t) where C¯(t)\bar{C}(t) is the average occupancy at time tt for a single realization of the discrete model (Supplementary Material). For this example, we consider an I×JI\times J hexagonal lattice with I=80I=80, J=68J=68, and parameters Pp=1/1000P_{p}=1/1000, Pm=0P_{m}=0, δ=τ=1\delta=\tau=1, K=5/6K=5/6, and A=1/10A=1/10. Reflecting boundary conditions are applied at all boundaries and a uniform initial condition is applied, specifically, each site is occupied with probability (C(,0)=1)=1/4\mathbb{P}(C(\ell,0)=1)=1/4 for all L\ell\in L, giving 𝒞(0)=1/4\mathcal{C}(0)=1/4. This combination of parameters is selected since it is known that the continuum limit (Equation (18)) will not accurately predict the population growth dynamics of the discrete model in this regime since Pp/Pm1P_{p}/P_{m}\gg 1 (Supplementary Material).

For the inference problem we assume PmP_{m} is known, and we seek to compute p(𝜽𝒟)p(\boldsymbol{\theta}\mid\mathcal{D}) under the discrete model with 𝜽=[λ,A,K]\boldsymbol{\theta}=[\lambda,A,K] and λ=Pp/τ\lambda=P_{p}/\tau. We utilize uninformative priors, Pp𝒰(0,0.005)P_{p}\sim\mathcal{U}(0,0.005), K𝒰(0,1)K\sim\mathcal{U}(0,1) and A𝒰(0,1)A\sim\mathcal{U}(0,1) with the additional constraint that AKA\leq K, that is, AA and KK are not independent in the prior. The discrepancy metric used is the Euclidean distance. For the discrete model, this is

ρ(𝒟,𝒟s)=[k=110(Cobs(tk)C¯(tk))2]1/2,\rho(\mathcal{D},\mathcal{D}_{s})=\left[\sum_{k=1}^{10}\left(C_{\text{obs}}(t_{k})-\bar{C}(t_{k})\right)^{2}\right]^{1/2},

where C¯(tk)\bar{C}(t_{k}) is the average occupancy at time tkt_{k} of a realization of the discrete model given 𝜽\boldsymbol{\theta}. Similarly, for the continuum limit we have

ρ(𝒟,𝒟~s)=[j=110(Cobs(tk)𝒞(tk))2]1/2,\rho(\mathcal{D},\tilde{\mathcal{D}}_{s})=\left[\sum_{j=1}^{10}\left(C_{\text{obs}}(t_{k})-\mathcal{C}(t_{k})\right)^{2}\right]^{1/2},

where 𝒞(tk)\mathcal{C}(t_{k}) is the solution to the continuum limit (Equation (18)), computed numerically (Supplementary Material). We compute the posterior using our PC-SMC-ABC and MM-SMC-ABC methods to compare with SMC-ABC under the continuum limit and SMC-ABC under the discrete model. In each instance, =1000\mathcal{M}=1000 particles are used to approach the target threshold ϵ=0.125\epsilon=0.125 using the sequence ϵ1,ϵ2,,ϵ5\epsilon_{1},\epsilon_{2},\ldots,\epsilon_{5} with ϵr=ϵr1/2\epsilon_{r}=\epsilon_{r-1}/2. In the case of MM-SMC-ABC the tuning parameter is α=0.1\alpha=0.1. The Gaussian proposal kernels, qr(𝜽r𝜽r1)q_{r}(\boldsymbol{\theta}_{r}\mid\boldsymbol{\theta}_{r-1}) and q~r(𝜽r𝜽~r)\tilde{q}_{r}(\boldsymbol{\theta}_{r}\mid\tilde{\boldsymbol{\theta}}_{r}), are selected adaptively (Fehlberg, 1969; Iserles, 2008) (Supplementary Material).

Figure 5 and Table 1 present the results. SMC-ABC using the continuum-limit model is a poor approximation for SMC-ABC using the discrete model, especially for the proliferation rate parameter, λ\lambda (Figure 5(a)), which is expected because Pm=0P_{m}=0. However, the posteriors estimated using PC-SMC-ABC are an excellent match to the target posteriors estimated using SMC-ABC with the expensive discrete model, yet the PC-SMC-ABC method requires only half the number of stochastic simulations (Table 1). The MM-SMC-ABC method is not quite as accurate as the PC-SMC-ABC method, however, the number of expensive stochastic simulations is reduced by more than a factor of eight (Table 1) leading to considerable increase in computational efficiency.

Refer to caption
Figure 5: Comparison of estimated posterior marginal densities for the weak Allee model. There is a distinct bias in the SMC-ABC density estimate using the continuum limit (CL) (black dashed) compared with SMC-ABC with the discrete model (DM) (green dashed). However, the density estimates computed using the PC-SMC-ABC (orange solid) and MM-SMC-ABC (purple solid) methods match well with a significantly reduced computational overhead.
Table 1: Computational performance comparison of the SMC-ABC, PC-SMC-ABC, and MM-SMC-ABC methods for the weak Allee model inference problem. Computations are performed using an Intel®{}^{\text{\textregistered}} Xeon™ E5-2680v3 CPU (2.5 GHz).
Method Stochastic samples Continuum samples Run time (hours) Speedup
SMC-ABC 28,58828,588 0 47.147.1 1×1\times
PC-SMC-ABC 13,79913,799 58,75258,752 21.121.1 2×2\times
MM-SMC-ABC 3,3423,342 36,90836,908 5.65.6 8×8\times

3.5 Spatiotemporal example: a scratch assay

We now look to a discrete model commonly used in studies of cell motility and proliferation, and use spatially extended data that is typical of in vitro cell culture experiments, specifically scratch assays (Liang et al., 2007).

In this case we use a crowding function of the form f(C^(,t))=1C^(,t)/Kf(\hat{C}(\ell,t))=1-{\hat{C}(\ell,t)}/{K}, where K>0K>0 is the carrying capacity density, since it will lead to a logistic growth source term in Equation (16) which characterizes the growth dynamics of many cell types (Simpson et al., 2010; Warne et al., 2017). The discrete model is initialized such that initial density is independent of yy. Therefore, we summarize the discrete simulation by computing the average occupancy for each xx coordinate, that is, we average over the yy-axis in the hexagonal lattice (Jin et al., 2016), that is, C¯(x,t)=(1/J)(x,y)LC((x,y),t)\bar{C}(x,t)=(1/J)\sum_{(x,y)\in L}C((x,y),t). Thus, one arrives at the Fisher-KPP model (Edelstein-Keshet, 2005; Murray, 2002) for the continuum limit,

𝒞(x,t)t=D2𝒞(x,t)x2+λ𝒞(x,t)(1𝒞(x,t)K),\frac{\partial\mathcal{C}(x,t)}{\partial t}=D\frac{\partial^{2}\mathcal{C}(x,t)}{\partial{x}^{2}}+\lambda\mathcal{C}(x,t)\left(1-\frac{\mathcal{C}(x,t)}{K}\right), (19)

where 𝒞(x,t)=𝔼[C¯(x,t)]\mathcal{C}(x,t)=\mathbb{E}\left[\bar{C}(x,t)\right], D=limδ0,τ0Pmδ2/(4τ)D=\lim_{\delta\to 0,\tau\to 0}P_{m}\delta^{2}/(4\tau), and λ=limτ0Pp/τ\lambda=\lim_{\tau\to 0}P_{p}/\tau.

Just as with the weak Allee model, here we generate synthetic spatiotemporal cell culture data using the discrete model. Observations are made at times t1=3τ×102,t_{1}=3\tau\times 10^{2}, t2=6τ×102,,t10=3τ×103t_{2}=6\tau\times 10^{2},\ldots,t_{10}=3\tau\times 10^{3}, resulting in data

𝒟=[Cobs(x1,t1)Cobs(x1,t2)Cobs(x1,t10)Cobs(x2,t1)Cobs(x2,t2)Cobs(x2,t10)Cobs(xI,t1)Cobs(xI,t2)Cobs(xI,t10)],\mathcal{D}=\left[\begin{matrix}C_{\text{obs}}(x_{1},t_{1})&C_{\text{obs}}(x_{1},t_{2})&\ldots&C_{\text{obs}}(x_{1},t_{10})\\ C_{\text{obs}}(x_{2},t_{1})&C_{\text{obs}}(x_{2},t_{2})&\ldots&C_{\text{obs}}(x_{2},t_{10})\\ \vdots&\vdots&\ddots&\vdots\\ C_{\text{obs}}(x_{I},t_{1})&C_{\text{obs}}(x_{I},t_{2})&\ldots&C_{\text{obs}}(x_{I},t_{10})\end{matrix}\right],

with Cobs(x,t)=C¯(x,t)C_{\text{obs}}(x,t)=\bar{C}(x,t) where C¯(x,t)\bar{C}(x,t) is the average occupancy over sites (x,y1),(x,y2),(x,y_{1}),(x,y_{2}), ,(x,yJ)\ldots,(x,y_{J}) at time tt for a single realization of the discrete model. As with the weak Allee model, we consider an I×JI\times J hexagonal lattice with I=80I=80, J=68J=68, and parameters Pp=1/1000P_{p}=1/1000, Pm=1P_{m}=1, δ=τ=1\delta=\tau=1 and K=5/6K=5/6. We simulate a scratch assay by specifying the center 20 cell columns (31i5031\leq i\leq 50) to be initially unoccupied, and apply a uniform initial condition outside the scratch area such that 𝔼[C(,0)]=1/4\mathbb{E}\left[C(\ell,0)\right]=1/4 overall. Reflecting boundary conditions are applied at all boundaries. Note, we have selected a parameter regime with Pp/Pm1P_{p}/P_{m}\ll 1 for which the continuum limit is an accurate representation of the discrete model average behavior (Supplementary Material).

Since we have spatial information for this problem, we assume PmP_{m} is also an unknown parameter and perform inference on the discrete model to compute p(𝜽𝒟)p(\boldsymbol{\theta}\mid\mathcal{D}) with
𝜽=[λ,D,K]\boldsymbol{\theta}=[\lambda,D,K], λ=Pp/τ\lambda=P_{p}/\tau, and D=Pmδ2/4τD=P_{m}\delta^{2}/4\tau. We utilize uninformative priors,
Pp𝒰(0,0.008)P_{p}\sim\mathcal{U}(0,0.008), Pm𝒰(0,1)P_{m}\sim\mathcal{U}(0,1), and K𝒰(0,1)K\sim\mathcal{U}(0,1). For the discrepancy metric we use the Frobenius norm; for the discrete model, this is

ρ(𝒟,𝒟s)=[k=110i=1I(Cobs(xi,tk)C¯(xi,tk))2]1/2,\rho(\mathcal{D},\mathcal{D}_{s})=\left[\sum_{k=1}^{10}\sum_{i=1}^{I}\left(C_{\text{obs}}(x_{i},t_{k})-\bar{C}(x_{i},t_{k})\right)^{2}\right]^{1/2},

where C¯(xi,tk)\bar{C}(x_{i},t_{k}) is the average occupancy at site xix_{i} at time tkt_{k} of a realization of the discrete model given parameters 𝜽\boldsymbol{\theta}. Similarly, for the continuum limit we have

ρ(𝒟,𝒟~s)=[k=110i=1I(Cobs(xi,tk)𝒞(xi,tk))2]1/2,\rho(\mathcal{D},\tilde{\mathcal{D}}_{s})=\left[\sum_{k=1}^{10}\sum_{i=1}^{I}\left(C_{\text{obs}}(x_{i},t_{k})-\mathcal{C}(x_{i},t_{k})\right)^{2}\right]^{1/2},

where 𝒞(xi,tk)\mathcal{C}(x_{i},t_{k}) is the solution to the continuum-limit PDE (Equation (19)), computed using a backward-time, centered-space finite difference scheme with fixed-point iteration and adaptive time steps (Simpson et al., 2007; Sloan and Abbo, 1999) (Supplementary Material). We estimate the posterior using our PC-SMC-ABC and MM-SMC-ABC methods to compare with SMC-ABC using the continuum limit and SMC-ABC using the discrete model. In each case, =1000\mathcal{M}=1000 particles are used to approach the target threshold, ϵ=2\epsilon=2, using the sequence ϵ1,ϵ2,,ϵ5\epsilon_{1},\epsilon_{2},\ldots,\epsilon_{5} with ϵr=ϵr1/2\epsilon_{r}=\epsilon_{r-1}/2. In the case of MM-SMC-ABC the tuning parameter is α=0.1\alpha=0.1. Again, Gaussian proposal kernels, qr(𝜽r𝜽r1)q_{r}(\boldsymbol{\theta}_{r}\mid\boldsymbol{\theta}_{r-1}) and q~r(𝜽r𝜽~r)\tilde{q}_{r}(\boldsymbol{\theta}_{r}\mid\tilde{\boldsymbol{\theta}}_{r}), are selected adaptively (Supplementary Material).

Results are shown in Figure 6 and Table 2. Despite the continuum limit being a good approximation of the discrete model average behavior, using solely this continuum limit in the inference problem still leads to bias. Just as with the weak Allee model, both PC-SMC-ABC and MM-SMC-ABC methods produce a more accurate estimate of the SMC-ABC posterior density with the discrete model. Overall, PC-SMC-ABC is unbiased, however, MM-SMC-ABC is still very accurate. The main point for our work is that the PC-SMC-ABC and MM-SMC-ABC methods both produce posteriors that are accurate compared with the expensive stochastic inference problem, whereas the approximate model alone does not. From Table 2, both PC-SMC-ABC and MM-SMC-ABC require a reduced number of stochastic simulations of the discrete model compared with direct SMC-ABC. For PC-SMC-ABC, the reduction is almost a factor of four and, for MM-SMC-ABC, the reduction is almost a factor of eleven.

Refer to caption
Figure 6: Comparison of estimated posterior marginal densities for the scratch assay model. There is a distinct bias in the SMC-ABC density estimate using the continuum limit (CL) (black dashed) compared with SMC-ABC with the discrete model (DM) (green dashed). However, the density estimates computed using the PC-SMC-ABC (orange solid) and MM-SMC-ABC (purple solid) methods match well with a reduced computational overhead.
Table 2: Computational performance comparison of the SMC-ABC, PC-SMC-ABC, and MM-SMC-ABC methods, using the scratch assay model inference problem. Computations are performed using an Intel®{}^{\text{\textregistered}} Xeon™ E5-2680v3 CPU (2.5 GHz).
Method Stochastic samples Continuum samples Run time (hours) Speedup
SMC-ABC 46,43546,435 0 20.620.6 1×1\times
PC-SMC-ABC 13,94913,949 13,17913,179 5.65.6 4×4\times
MM-SMC-ABC 4,4574,457 10,59410,594 1.91.9 11×11\times

3.6 A guide to selection of α\alpha for MM-SMC-ABC

The performance of MM-SMC-ABC is dependent on the tuning parameter α[0,1]\alpha\in[0,1]. Since MM-SMC-ABC will only propagate α\lceil\alpha\mathcal{M}\rceil particles based on the expensive stochastic model, α\alpha can be considered as a target computational cost reduction factor with 1/α1/\alpha being the target speed up factor. However, intuitively there will be a limit as to how small one can choose α\alpha before the statistical error incurred from the estimates of 𝝁r\boldsymbol{\mu}_{r} and 𝚺r\boldsymbol{\Sigma}_{r} is large enough to render the approximate moment matching transform inaccurate. It is non-trivial to analyze MM-SMC-ABC to obtain a theoretical guideline for choosing α\alpha, therefore we perform a computational benchmark to obtain a heuristic. It should be noted that we use an SMC-ABC sampler with =1000\mathcal{M}=1000 as a benchmark for accuracy and performance. As a result, it may be that repeating the analysis with larger \mathcal{M} could lead to a smaller optimal α\alpha.

Here, using different values for α\alpha we repeatedly solve the weak Allee model (Section 3.4) and the scratch assay model (Section 3.5). For both inverse problems we applied MM-SMC-ABC under identical conditions as in Sections 3.4 and 3.5 with the exception of the tuning parameter α\alpha that takes values from the sequence {αk}k=05\left\{\alpha_{k}\right\}_{k=0}^{5} with α0=0.8\alpha_{0}=0.8 and αk=αk1/2\alpha_{k}=\alpha_{k-1}/2 for k>0k>0. For each αk\alpha_{k} in the sequence, we consider NN independent applications of MM-SMC-ABC. The computational cost for each αk\alpha_{k} is denoted by Cost(αk)\text{Cost}(\alpha_{k}) and represents the run time in seconds for an application of MM-SMC-ABC with tuning parameter αk\alpha_{k}. We also calculate an error metric,

Error(αk)=(𝚯R,𝚯R(αk),P),\text{Error}(\alpha_{k})=\mathcal{E}\left(\boldsymbol{\Theta}_{R},\boldsymbol{\Theta}_{R}(\alpha_{k}),P\right),

where 𝚯R={𝜽Ri}i=1\boldsymbol{\Theta}_{R}=\left\{\boldsymbol{\theta}_{R}^{i}\right\}_{i=1}^{\mathcal{M}} is a set of particles from an application of SMC-ABC using the expensive stochastic model, and 𝚯R(αk)=({𝜽Ri}i=1αk{𝜽¯Ri}i=1(1αk))\boldsymbol{\Theta}_{R}(\alpha_{k})=\left(\left\{\boldsymbol{\theta}_{R}^{i}\right\}_{i=1}^{\lceil\alpha_{k}\mathcal{M}\rceil}\cup\left\{\bar{\boldsymbol{\theta}}_{R}^{i}\right\}_{i=1}^{\lfloor(1-\alpha_{k})\mathcal{M}\rfloor}\right) is the pooled exact and approximate transformed particles from the jjth application of MM-SMC-ABC. For PP\in\mathbb{N}, the function (,,P)\mathcal{E}(\cdot,\cdot,P) is the PP-th order empirical moment-matching distance function  (Liao et al., 2015; Lillacci and Khammash, 2010; Zechner et al., 2012), given by

(𝐗,𝐘,P)=m=0P𝐛Sm1|Sm|m(μ^(𝐗)𝐛μ^(𝐘)𝐛μ^(𝐗)𝐛)2,\mathcal{E}(\mathbf{X},\mathbf{Y},P)=\sum_{m=0}^{P}\sum_{\mathbf{b}\in S_{m}}\frac{1}{|S_{m}|^{m}}\left(\frac{\hat{\mu}(\mathbf{X})^{\mathbf{b}}-\hat{\mu}(\mathbf{Y})^{\mathbf{b}}}{\hat{\mu}(\mathbf{X})^{\mathbf{b}}}\right)^{2},

for two sample sets 𝐗={𝐱1,𝐱2,,𝐱}\mathbf{X}=\left\{\mathbf{x}_{1},\mathbf{x}_{2},\ldots,\mathbf{x}_{\mathcal{M}}\right\} and 𝐘={𝐲1,𝐲2,,𝐲}\mathbf{Y}=\left\{\mathbf{y}_{1},\mathbf{y}_{2},\ldots,\mathbf{y}_{\mathcal{M}}\right\} with 𝐱i,𝐲in\mathbf{x}_{i},\mathbf{y}_{i}\in\mathbb{R}^{n} for n1n\geq 1, and Sm={𝐛:𝐛n,𝐛=m}S_{m}=\left\{\mathbf{b}:\mathbf{b}\in\mathbb{N}^{n},\|\mathbf{b}\|=m\right\}. For any nn-dimensional discrete vector 𝐛=[b1,b2,,bn]Tn\mathbf{b}=\left[b_{1},b_{2},\ldots,b_{n}\right]^{\text{T}}\in\mathbb{N}^{n}, then μ^(𝐗)𝐛\hat{\mu}(\mathbf{X})^{\mathbf{b}} is the 𝐛\mathbf{b}th empirical raw moment of the sample
set 𝐗\mathbf{X},

μ^(𝐗)𝐛=1i=1𝐱i𝐛,\hat{\mu}(\mathbf{X})^{\mathbf{b}}=\frac{1}{\mathcal{M}}\sum_{i=1}^{\mathcal{M}}\mathbf{x}_{i}^{\mathbf{b}},

where 𝐱i𝐛=xi,1b1×xi,2b2××xi,nbn\mathbf{x}_{i}^{\mathbf{b}}=x_{i,1}^{b_{1}}\times x_{i,2}^{b_{2}}\times\cdots\times x_{i,n}^{b_{n}}. Note that PP must be greater than the number of moments that are matched in the approximate transform (Equation (10)) to ensure that MM-SMC-ABC is improving the accuracy in higher moments also.

We estimate the average Cost(αk)\text{Cost}(\alpha_{k}) and Error(αk)\text{Error}(\alpha_{k}) for each value of αk\alpha_{k} for both the weak Allee effect and the scratch assay inverse problems. Figure 7 displays the estimates and standard errors given P=6P=6 and N=10N=10, with the value of αk\alpha_{k} shown. We emphasize that P>2P>2, that is our error measure compares the first six moments while only two moments are matched.

Refer to caption
Figure 7: Error versus cost plots for different values of the tuning parameter α\alpha. Averages and standard errors are shown for N=10N=10 independent applications of the MM-SMC-ABC method to the (A) weak Allee effect model and (B) the scratch assay model. Results in (A)–(B) are shown in (C)–(D) in a log scale for clarity.

There is clearly a threshold for α\alpha, below which the error becomes highly variable. For both the weak Allee effect model (Figure 7(A),(C)) and the scratch assay model (Figure 7(B),(D)), the optimal choice of α\alpha is located between α2=0.2\alpha_{2}=0.2 and α4=0.05\alpha_{4}=0.05. Therefore, we suggest a heuristic of α[0.1,0.2]\alpha\in[0.1,0.2] to be a reliable choice. If extra performance is needed α[0.05,0.1)\alpha\in[0.05,0.1) may also be acceptable, but if accuracy is of the utmost importance then α0.2\alpha\approx 0.2 seems to be the most robust choice. This experiment also provides insight in to the consistency and stability of the MM-SMC-ABC method, where α0.1\alpha\geq 0.1 leads to results that are consistently fast and have low variability in the error metric. While further work is required to assess theoretically the stability and consistency properties of this method, these numerical results are promising. In general, the choice of optimal α\alpha is still an open problem and is likely to be impacted by the specific nature of the relationship between the exact model and the approximate model.

3.7 Summary

This section presented numerical examples to demonstrate our new methods, PC-SMC-ABC and MM-SMC-ABC, for ABC inference with expensive stochastic discrete models. The tractable Ornstein–Uhlenbeck process was used to highlight the mechanisms leading to the performance improvements of PC-ABC-SMC. Then two examples based on lattice-based random walks were used to demonstrate the efficacy of both PC-SMC-ABC and MM-SMC-AB. In the weak Allee model example, data were generated using parameters that violate standard continuum-limit assumptions; in the scratch assay model example, the Fisher-KPP continuum limit is known to be a good approximation in the parameter regime of the generated data. In both examples, final inferences are biased when the continuum limit is exclusively relied on in the SMC-ABC sampler. However, the results from our new algorithms, PC-SMC-ABC and MM-SMC-ABC, show significantly more accurate posteriors can be computed at a fraction of the cost of the full SMC-ABC using the discrete model, with speed improvements over an order of magnitude.

As mentioned in Section 2.3, the tuning parameter, α\alpha, in the MM-SMC-ABC method effectively determines the trade-off between the computational speed of the approximate model and the accuracy of the expensive stochastic model. The values α=0\alpha=0 and α=1\alpha=1 correspond to performing inference exclusively with, respectively, the continuum limit and the stochastic discrete model. Based on numerical experimentation, we find that α0.1\alpha\approx 0.1 is quite reasonable, however, this conclusion will be dependent on the specific model, the parameter space dimensionality, and the number of particles used for the SMC scheme.

4 Discussion

In the life sciences, computationally challenging stochastic discrete models are routinely used to characterize the dynamics of biological populations (Codling et al., 2008; Callaghan et al., 2006; Simpson et al., 2010). In practice, approximations such as the mean-field continuum limit are often derived and used in place of the discrete model for analysis and, more recently, for inference. However, parameter inferences will be biased when the approximate model is solely utilized for inference, even in cases when the approximate model provides an accurate description of the average behaviour of the stochastic model.

We provide a new approach to inference for stochastic models that maintains all the methodological benefits of working with discrete mathematical models, while avoiding the computational bottlenecks of relying solely upon repeated expensive stochastic simulations. Our two new algorithms, PC-SMC-ABC and MM-SMC-ABC, utilize samples from the approximate model inference problem in different ways to accelerate SMC-ABC sampling. The PC-SMC-ABC method is asymptotically unbiased, and we demonstrate computational improvements of up to a factor of almost four are possible. While potentially biased, MM-SMC-ABC can provide further improvements. In general, the expected speedup is around 1/α1/\alpha, and α0.1\alpha\approx 0.1 is reasonable based on our numerical investigations. For larger values of \mathcal{M} it may be that even smaller values of α\alpha could be effective.

There are some assumptions in our approach that could be generalized in future work. First, in PC-SMC-ABC, we assume that the condition in Equation (7) holds for all ϵr\epsilon_{r}; this is reasonable for the models we consider since we never observe a decrease in performance. However, it may be possible for the bias in the approximate model to be so extreme for some ϵr\epsilon_{r} that the condition in Equation (7) is violated, leading to a decrease in performance at specific generations. Acceptance probabilities could be estimated by performing a small set of trial samples from both ηr1(𝜽r1)\eta_{{r-1}}(\boldsymbol{\theta}_{r-1}) and η~r(𝜽r)\tilde{\eta}_{r}(\boldsymbol{\theta}_{r}) proposal mechanisms, enabling automatic selection of the optimal proposal mechanism. Second, in the moment matching transform proposed in Equation (8), we use two moments only as this is sufficient for the problems we consider here with numerical examples demonstrating accuracy in the first six moments. However, our methodology is sufficiently flexible that additional moments can be incorporated if necessary. While including higher moments will improve the accuracy of the moment-matching transform, more samples from the exact model will be required to achieve initial estimates of these moments resulting in eroded performance.

While the performance improvements we demonstrate here are significant, it is also possible to obtain improvements of similar order through detailed code optimization techniques applied to standard SMC-ABC. We emphasize that our schemes would also benefit from such optimizations as advanced vectorization and parallelization to further improve their performance (Lee et al., 2010; Warne et al., 2021). Our algorithm extensions are also more direct to implement over advanced high performance computing techniques for acceleration of computational schemes.

There are many extensions to our methods that could be considered. We have based our presentation on a form of an SMC-ABC sampler that uses a fixed sequence of thresholds. However, the ideas of using the preconditioning distribution, as in PC-SMC-ABC, and the moment matching transform, as in MM-SMC-ABC, are applicable to SMC schemes that adaptively select thresholds (Drovandi and Pettitt, 2011). Recently, there have been a number of state-of-the-art inference schemes introduced based on multilevel Monte Carlo (MLMC) (Giles, 2015; Warne et al., 2019a). Our new SMC-ABC schemes could exploit MLMC to combine samples from all acceptance thresholds using a coupling scheme and bias correction telescoping summation, such as in the work of Jasra et al. (2019) or Warne et al. (2018). Early accept/rejection schemes, such as those considered by Prangle (2016), Prescott and Baker (2020), and Lester (2020), could also be introduced for the sampling steps involving the expensive discrete model. Lastly, the PC-SMC-ABC and the MM-SMC-ABC methods could also be applied together and possibly lead to a compounding effect in the performance. Delayed acceptance schemes (Banterle et al., 2019; Everitt and Rowińska, 2020; Golightly et al., 2015) are also an alternative approach with similar motivations to the methods we propose in this work. However, these approaches can be highly sensitive to false negatives, that is, cases where a particular value of 𝜽\boldsymbol{\theta} would be rejected under the approximate model but accepted under the exact model . Our PC-SMC-ABC approach is not affected by false negatives due to the use of the second set of proposal kernels.

We have demonstrated our methods using a two-dimensional lattice-based discrete random-walk model that leads to mean-field continuum-limit approximations with linear diffusion and a source term of the form λ𝒞f(𝒞)\lambda\mathcal{C}f(\mathcal{C}). However, our methods are more widely applicable. We could further generalize the model to deal with a more general class of reaction-diffusion continuum limits involving nonlinear diffusion (Warne et al., 2019b; Witelski, 1995) and generalized proliferation mechanisms (Simpson et al., 2013; Tsoularis and Wallace, 2002). Our framework is also relevant to lattice-free discrete models (Codling et al., 2008; Browning et al., 2018) and higher dimensional lattice-based models (Browning et al., 2019); we expect the computational improvements will be even more significant in this case. Many other forms of model combinations are also be possible. For example, a sequence of continuum models of increasing complexity could be considered, as in Browning et al. (2019). Alternatively, a sequence of numerical approximations of increasing accuracy could be used for inference using a complex target PDE model (Cotter et al., 2010). Linear mapping approximations of higher order chemical reaction network models, such as in Cao and Grima (2018), could also exploit our approach. Another relevant and very general application in systems biology is utilize reaction rate equations, that are deterministic ODEs, as approximations to stochastic chemical kinetics models (Higham, 2008; Wilkinson, 2009).

Of course, not all approximate models will necessarily provide performance improvements. As demonstrated for the Ornstein–Uhlenbeck example (Section 3.1), the stationary distribution will be more appropriate for inference of σ\sigma rather than μ\mu with the approximation improving as the model sample time TT increases. However, as shown for lattice-based random walk models (Sections 3.4 and 3.5), even when the assumptions associated with the approximation do not hold, it is still possible to improve sampling with PC-SMC-ABC and MM-ABC-SMC. Therefore, we suggest that approximations that are derived from some limiting, averaged behavior of the exact model will be good initial candidates for our methods. Semi-automated model reduction techniques also are potential approaches to obtain approximations (Transtrum and Qiu, 2014) that could be investigated in the future.

In this work, novel methods have been presented for exploiting approximate models to accelerate Bayesian inference for expensive stochastic models. We have shown that, even when the approximation leads to biased parameter inferences, it can still inform the proposal mechanisms for ABC samplers using the stochastic model. Our numerical examples show performance improvements of more than tenfold. These substantial computational improvements are promising and expands the feasibility of Bayesian analysis for problems involving expensive stochastic models.

Software availability

Numerical examples presented in this work are available from GitHub
https://github.com/ProfMJSimpson/Warne_RapidBayesianInference_2019.

Acknowledgements

This work was supported by the Australian Research Council (DP170100474). D.J.W. and M.J.S. acknowledge continued support from the Centre for Data Science at the Queensland University of Technology. D.J.W. and M.J.S. are members of the Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers. R.E.B. would like to thank the Leverhulme Trust for a Leverhulme Research Fellowship, the Royal Society for a Wolfson Research Merit Award, and the BBSRC for funding via BB/R00816/1. M.J.S. appreciates support from the University of Canterbury Erskine Fellowship. Computational resources where provided by the eResearch Office, Queensland University of Technology. The authors thank Wang Jin for helpful discussions.

References

  • Agnew et al. (2014) Agnew, D., J. Green, T. Brown, M. Simpson, and B. Binder (2014). Distinguishing between mechanisms of cell aggregation using pair-correlation functions. Journal of Theoretical Biology 352, 16–23.
  • Baker et al. (2018) Baker, R. E., J.-M. Peña, J. Jayamohan, and A. Jérusalem (2018). Mechanistic models versus machine learning, a fight worth fighting for the biological community? Biology Letters 14(5), 20170660.
  • Banterle et al. (2019) Banterle, M., C. Grazian, A. Lee, and C. P. Robert (2019). Accelerating Metropolis-Hastings algorithms by delayed acceptance. Foundations of Data Science 1(2), 103–128.
  • Barnes et al. (2012) Barnes, C. P., S. Filippi, M. P. H. Stumpf, and T. Thorne (2012). Considerate approaches to constructing summary statistics for ABC model selection. Statistics and Computing 22(6), 1181–1197.
  • Beaumont et al. (2009) Beaumont, M. A., J.-M. Cornuet, J.-M. Marin, and C. P. Robert (2009). Adaptive approximate Bayesian computation. Biometrika 96(4), 983–990.
  • Beaumont et al. (2002) Beaumont, M. A., W. Zhang, and D. J. Balding (2002). Approximate Bayesian computation in population genetics. Genetics 162(4), 2025–2035.
  • Binny et al. (2016) Binny, R. N., P. Haridas, A. James, R. Law, M. J. Simpson, and M. J. Plank (2016). Spatial structure arising from neighbour-dependent bias in collective cell movement. PeerJ 4, e1689.
  • Black and McKane (2012) Black, A. J. and A. J. McKane (2012). Stochastic formulation of ecological models and their applications. Trends in Ecology & Evolution 27(6), 337–345.
  • Blum et al. (2013) Blum, M. G. B., M. A. Nunes, D. Prangle, and S. A. Sisson (2013). A comparative review of dimension reduction methods in approximate Bayesian computation. Statistical Science 28(2), 189–208.
  • Böttger et al. (2015) Böttger, K., H. Hatzikirou, A. Voss-Böhme, E. A. Cavalcanti-Adam, M. A. Herrero, and A. Deutsch (2015). An emerging Allee effect is critical for tumor initiation and persistence. PLOS Computational Biology 11(9), e1004366.
  • Browning et al. (2019) Browning, A. P., P. Haridas, and M. J. Simpson (2019). A Bayesian sequential learning framework to parameterise continuum models of melanoma invasion into human skin. Bulletin of Mathematical Biology 81(3), 676–698.
  • Browning et al. (2018) Browning, A. P., S. W. McCue, R. N. Binny, M. J. Plank, E. T. Shah, and M. J. Simpson (2018). Inferring parameters for a lattice-free model of cell migration and proliferation using experimental data. Journal of Theoretical Biology 437, 251–260.
  • Browning et al. (2017) Browning, A. P., S. W. McCue, and M. J. Simpson (2017). A Bayesian computational approach to explore the optimal duration of a cell proliferation assay. Bulletin of Mathematical Biology 79(8), 1888–1906.
  • Buzbas and Rosenberg (2015) Buzbas, E. O. and N. A. Rosenberg (2015). AABC: Approximate approximate Bayesian computation for inference in population-genetic models. Theoretical Population Biology 99, 31–42.
  • Callaghan et al. (2006) Callaghan, T., E. Khain, L. M. Sander, and R. M. Ziff (2006). A stochastic model for wound healing. Journal of Statistical Physics 122(5), 909–924.
  • Cao and Grima (2018) Cao, Z. and R. Grima (2018). Linear mapping approximation of gene regulatory networks with stochastic dynamics. Nature Communications 9(1), 3305.
  • Chen and Zhang (2014) Chen, C. P. and C.-Y. Zhang (2014). Data-intensive applications, challenges, techniques and technologies: a survey on Big Data. Information Sciences 275, 314–347.
  • Chkrebtii et al. (2015) Chkrebtii, O. A., E. K. Cameron, D. A. Campbell, and E. M. Bayne (2015). Transdimensional approximate Bayesian computation for inference on invasive species models with latent variables of unknown dimension. Computational Statistics & Data Analysis 86, 97–110.
  • Codling et al. (2008) Codling, E. A., M. J. Plank, and S. Benhamou (2008). Random walk models in biology. Journal of The Royal Society Interface 5(25), 813–834.
  • Cotter et al. (2010) Cotter, S. L., M. Dashti, and A. M. Stuart (2010). Approximation of Bayesian inverse problems for PDEs. SIAM Journal on Numerical Analysis 48(1), 322–345.
  • Coveney et al. (2016) Coveney, P. V., E. R. Dougherty, and R. R. Highfield (2016). Big data need big theory too. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374(2080), 20160153.
  • DeAngelis and Grimm (2014) DeAngelis, D. L. and V. Grimm (2014). Individual-based models in ecology after four decades. F1000Prime Reports 6, 39.
  • Del Moral et al. (2006) Del Moral, P., A. Doucet, and A. Jasra (2006). Sequential Monte Carlo samplers. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68(3), 411–436.
  • Drawert et al. (2017) Drawert, B., M. Griesemer, L. R. Petzold, and C. J. Briggs (2017). Using stochastic epidemiological models to evaluate conservation strategies for endangered amphibians. Journal of The Royal Society Interface 14(133), 20170480.
  • Drovandi and Pettitt (2011) Drovandi, C. C. and A. N. Pettitt (2011). Estimation of parameters for macroparasite population evolution using approximate Bayesian computation. Biometrics 67(1), 225–233.
  • Edelstein-Keshet (2005) Edelstein-Keshet, L. (2005). Mathematical Models in Biology. Society for Industrial and Applied Mathematics.
  • Ellison (2004) Ellison, A. M. (2004). Bayesian inference in ecology. Ecology Letters 7(6), 509–520.
  • EHT Collaboration et al. (2019) EHT Collaboration et al. (2019). First M87 Event Horizon Telescope results. VI. the shadow and mass of the central black hole. The Astrophysical Journal Letters 875(1), L6.
  • Everitt and Rowińska (2020) Everitt, R. G. and P. A. Rowińska (2020). Delayed acceptance ABC-SMC. Journal of Computational and Graphical Statistics.
  • Fearnhead and Prangle (2012) Fearnhead, P. and D. Prangle (2012). Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74(3), 419–474.
  • Fehlberg (1969) Fehlberg, E. (1969). Low-order classical Runge-Kutta formulas with step size control and their application to some heat transfer problems. Technical Report R-315, NASA.
  • Filippi et al. (2013) Filippi, S., C. P. Barnes, J. Cornebise, and M. P. H. Stumpf (2013). On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo. Statistical Applications in Genetics and Molecular Biology 12(1), 87–107.
  • Gelman et al. (2014) Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin (2014). Bayesian Data Analysis (3rd ed.). Chapman & Hall/CRC.
  • Giles (2015) Giles, M. B. (2015). Multilevel Monte Carlo methods. Acta Numerica 24, 259–328.
  • Golightly et al. (2015) Golightly, A., D. A. Henderson, and C. Sherlock (2015). Delayed acceptance particle MCMC for exact inference in stochastic kinetic models. Statistics and Computing 25(5), 1039–1055.
  • Guindani et al. (2014) Guindani, M., N. Sepúlveda, C. D. Paulino, and P. Müller (2014). A Bayesian semiparametric approach for the differential analysis of sequence counts data. Journal of the Royal Statistical Society: Series C (Applied Statistics) 63(3), 385–404.
  • Gunawan et al. (2005) Gunawan, R., Y. Cao, L. Petzold, and F. J. Doyle III (2005). Sensitivity analysis of discrete stochastic systems. Biophysical Journal 88(4), 2530–2540.
  • Higham (2008) Higham, D. J. (2008). Modeling and simulating chemical reactions. SIAM Review 50(2), 347–368.
  • Holden et al. (2018) Holden, P. B., N. R. Edwards, A. Ridgwell, R. D. Wilkinson, K. Fraedrich, F. Lunkeit, H. Pollitt, J.-F. Mercure, P. Salas, A. Lam, F. Knobloch, U. Chewpreecha, and J. E. Viñuales (2018). Climate–carbon cycle uncertainties and the Paris Agreement. Nature Climate Change 8(7), 609–613.
  • Iserles (2008) Iserles, A. (2008). A First Course in the Numerical Analysis of Differential Equations (2nd ed.). Cambridge University Press.
  • Jasra et al. (2019) Jasra, A., S. Jo, D. Nott, C. Shoemaker, and R. Tempone (2019). Multilevel Monte Carlo in approximate Bayesian computation. Stochastic Analysis and Applications 37(3), 346–360.
  • Jin et al. (2016) Jin, W., C. J. Penington, S. W. McCue, and M. J. Simpson (2016). Stochastic simulation tools and continuum models for describing two-dimensional collective cell spreading with universal growth functions. Physical Biology 13(5), 056003.
  • Jin et al. (2017) Jin, W., E. T. Shah, C. J. Penington, S. W. McCue, P. K. Maini, and M. J. Simpson (2017). Logistic proliferation of cells in scratch assays is delayed. Bulletin of Mathematical Biology 79(5), 1028–1050.
  • Johnston et al. (2017) Johnston, S. T., R. E. Baker, D. L. S. McElwain, and M. J. Simpson (2017). Co-operation, competition and crowding: a discrete framework linking Allee kinetics, nonlinear diffusion, shocks and sharp-fronted travelling waves. Scientific Reports 7, 42134.
  • Johnston et al. (2014) Johnston, S. T., M. J. Simpson, D. L. S. McElwain, B. J. Binder, and J. V. Ross (2014). Interpreting scratch assays using pair density dynamics and approximate Bayesian computation. Open Biology 4(9), 140097.
  • Johnston et al. (2013) Johnston, S. T., M. J. Simpson, and M. J. Plank (2013). Lattice-free descriptions of collective motion with crowding and adhesion. Physical Review E 88, 062720.
  • King et al. (2014) King, T. E., G. G. Fortes, P. Balaresque, M. G. Thomas, D. Balding, P. M. Delser, et al. (2014). Identification of the remains of King Richard III. Nature Communications 5(1), 5631.
  • Kullback and Leibler (1951) Kullback, S. and R. A. Leibler (1951). On information and sufficiency. The Annals of Mathematical Statistics 22(1), 79–86.
  • Law et al. (2003) Law, R., D. J. Murrell, and U. Dieckmann (2003). Population growth in space and time: Spatial logistic equations. Ecology 84(1), 252–262.
  • Lawson et al. (2018) Lawson, B. A. J., C. C. Drovandi, N. Cusimano, P. Burrage, B. Rodriguez, and K. Burrage (2018). Unlocking data sets by calibrating populations of models to data density: A study in atrial electrophysiology. Science Advances 4(1), e1701676.
  • Lee et al. (2010) Lee, A., C. Yau, M. B. Giles, A. Doucet, and C. C. Holmes (2010). On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Journal of Computational and Graphical Statistics 19(4), 769–789.
  • Lei and Bickel (2011) Lei, J. and P. Bickel (2011). A moment matching ensemble filter for nonlinear non-Gaussian data assimilation. Monthly Weather Review 139(12), 3964–3973.
  • Lester (2020) Lester, C. (2020). Multi-level approximate Bayesian computation. ArXiv e-prints arXiv:1811.08866 [q-bio.QM]
  • Lester et al. (2017) Lester, C., C. A. Yates, and R. E. Baker (2017). Efficient parameter sensitivity computation for spatially extended reaction networks. The Journal of Chemical Physics 146(4), 044106.
  • Liang et al. (2007) Liang, C.-C., A. Y. Park, and J.-L. Guan (2007). In vitro scratch assay: a convenient and inexpensive method for analysis of cell migration in vitro. Nature Protocols 2(2), 329.
  • Liao et al. (2015) Liao, S., T. Vejchodský, and R. Erban (2015). Tensor methods for parameter estimation and bifurcation analysis of stochastic reaction networks. Journal of The Royal Society Interface 12(108), 20150233.
  • Lillacci and Khammash (2010) Lillacci, G. and M. Khammash (2010). Parameter estimation and model selection in computational biology. PLOS Computational Biology 6(3), e1000696.
  • Liu et al. (2018) Liu, Q.-H., M. Ajelli, A. Aleta, S. Merler, Y. Moreno, and A. Vespignani (2018). Measurability of the epidemic reproduction number in data-driven contact networks. Proceedings of the National Academy of Sciences 115(50), 12680–12685.
  • Malaspinas et al. (2016) Malaspinas, A.-S., M. C. Westaway, C. Muller, V. C. Sousa, O. Lao, I. Alves, et al. (2016). A genomic history of Aboriginal Australia. Nature 538(7624), 207–214.
  • Marino et al. (2008) Marino, S., I. B. Hogue, C. J. Ray, and D. E. Kirschner (2008). A methodology for performing global uncertainty and sensitivity analysis in systems biology. Journal of Theoretical Biology 254(1), 178–96.
  • Marjoram et al. (2003) Marjoram, P., J. Molitor, V. Plagnol, and S. Tavaré (2003). Markov chain Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences 100(26), 15324–15328.
  • Maruyama (1955) Maruyama, G. (1955). Continuous Markov processes and stochastic equations. Rendiconti del Circolo Matematico di Palermo 4(1), 48–90.
  • McLane et al. (2011) McLane, A. J., C. Semeniuk, G. J. McDermid, and D. J. Marceau (2011). The role of agent-based models in wildlife ecology and management. Ecological Modelling 222(8), 1544–1556.
  • Murray (2002) Murray, J. D. (2002). Mathematical Biology: I An Introduction. Springer, New York.
  • Nott et al. (2014) Nott, D. J., Y. Fan, L. Marshall, and S. A. Sisson (2014). Approximate Bayesian computation and Bayes’ linear analysis: toward high-dimensional ABC. Journal of Computational and Graphical Statistics 23(1), 65–86.
  • O’Dea et al. (2016) O’Dea, A., H. A. Lessios, A. G. Coates, R. I. Eytan, S. A. Restrepo-Moreno, A. L. Cione, et al. (2016). Formation of the Isthmus of Panama. Science Advances 2(8), e1600883.
  • Parno and Marzouk (2018) Parno, M. D. and Y. M. Marzouk (2018). Transport map accelerated Markov chain Monte Carlo. SIAM/ASA Journal on Uncertainty Quantification 6(2), 645–682.
  • Prangle (2016) Prangle, D. (2016). Lazy ABC. Statistics and Computing 26(1), 171–185.
  • Prescott and Baker (2020) Prescott, T. P. and R. E. Baker (2020). Multifidelity approximate Bayesian computation. SIAM/ASA Journal of Uncertainty Quantification 8(1), 114–138.
  • Press et al. (1997) Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery (1997). Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press.
  • Pritchard et al. (1999) Pritchard, J. K., M. T. Seielstad, A. Perez-Lezaun, and M. W. Feldman (1999). Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Molecular Biology and Evolution 16(12), 1791–1798.
  • Rynn et al. (2019) Rynn, J. A. J., S. L. Cotter, C. E. Powell, and L. Wright (2019). Surrogate accelerated Bayesian inversion for the determination of the thermal diffusivity of a material. Metrologia 56(1), 015018.
  • Simpson et al. (2013) Simpson, M. J., B. J. Binder, P. Haridas, B. K. Wood, K. K. Treloar, D. L. S. McElwain, and R. E. Baker (2013). Experimental and modelling investigation of monolayer development with clustering. Bulletin of Mathematical Biology 75(5), 871–889.
  • Simpson et al. (2007) Simpson, M. J., K. A. Landman, and K. Bhaganagarapu (2007). Coalescence of interacting cell populations. Journal of Theoretical Biology 247(3), 525–543.
  • Simpson et al. (2010) Simpson, M. J., K. A. Landman, and B. D. Hughes (2010). Cell invasion with proliferation mechanisms motivated by time-lapse data. Physica A: Statistical Mechanics and its Applications 389(18), 3779–3790.
  • Sisson et al. (2018) Sisson, S. A., Y. Fan, and M. Beaumont (2018). Handbook of Approximate Bayesian Computation (1st ed.). Chapman & Hall/CRC.
  • Sisson et al. (2007) Sisson, S. A., Y. Fan, and M. M. Tanaka (2007). Sequential Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences 104(6), 1760–1765.
  • Sloan and Abbo (1999) Sloan, S. W. and A. J. Abbo (1999). Biot consolidation analysis with automatic time stepping and error control part 1: theory and implementation. International Journal for Numerical and Analytical Methods in Geomechanics 23(6), 467–492.
  • Stumpf (2014) Stumpf, M. P. H. (2014). Approximate Bayesian inference for complex ecosystems. F1000Prime Reports 6, 60.
  • Sun et al. (2016) Sun, B., J. Feng, and K. Saenko (2016). Return of frustratingly easy domain adaptation. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, pp.  2058–2065. AAAI Press.
  • Tavaré et al. (1997) Tavaré, S., D. J. Balding, R. C. Griffiths, and P. Donnelly (1997). Inferring coalescence times from DNA sequence data. Genetics 145(2), 505–518.
  • Taylor and Hastings (2005) Taylor, C. M. and A. Hastings (2005). Allee effects in biological invasions. Ecology Letters 8(8), 895–908.
  • Toni et al. (2009) Toni, T., D. Welch, N. Strelkowa, A. Ipsen, and M. P. Stumpf (2009). Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. Journal of The Royal Society Interface 6(31), 187–202.
  • Transtrum and Qiu (2014) Transtrum, M. K., and P. Qiu (2014). Model reduction by manifold boundaries. Physical Review Letters 113, 098701.
  • Tsoularis and Wallace (2002) Tsoularis, A. and J. Wallace (2002). Analysis of logistic growth models. Mathematical Biosciences 179(1), 21–55.
  • Uhlenbeck and Ornstein (1930) Uhlenbeck, G. E. and L. S. Ornstein (1930). On the theory of Brownian Motion. Physical Review 36(5), 823–841.
  • Vankov et al. (2019) Vankov, E. R., M. Guindani, and K. B. Ensor (2019). Filtering and estimation for a class of stochastic volatility models with intractable likelihoods. Bayesian Analysis 14, 29–52.
  • Vincenot et al. (2016) Vincenot, C. E., F. Carteni, S. Mazzoleni, M. Rietkerk, and F. Giannino (2016). Spatial self-organization of vegetation subject to climatic stress—insights from a system dynamics—individual-based hybrid model. Frontiers in Plant Science 7, 636.
  • Vo et al. (2015) Vo, B. N., C. C. Drovandi, A. N. Pettitt, and M. J. Simpson (2015). Quantifying uncertainty in parameter estimates for stochastic models of collective cell spreading using approximate Bayesian computation. Mathematical Biosciences 263, 133–142.
  • von Hardenberg et al. (2001) von Hardenberg, J., E. Meron, M. Shachak, and Y. Zarmi (2001. Diversity of vegetation patterns and desertification. Physical Review Letters 87, 198101.
  • Wang et al. (2019) Wang, Y., J. Shi, and J. Wang (2019). Persistence and extinction of population in reaction–diffusion–advection model with strong Allee effect growth. Journal of Mathematical Biology 78(7), 2093–2140.
  • Warne et al. (2017) Warne, D. J., R. E. Baker, and M. J. Simpson (2017). Optimal quantification of contact inhibition in cell populations. Biophysical Journal 113(9), 1920–1924.
  • Warne et al. (2018) Warne, D. J., R. E. Baker, and M. J. Simpson (2018). Multilevel rejection sampling for approximate Bayesian computation. Computational Statistics & Data Analysis 124, 71–86.
  • Warne et al. (2019a) Warne, D. J., R. E. Baker, and M. J. Simpson (2019a). Simulation and inference algorithms for stochastic biochemical reaction networks: from basic concepts to state-of-the-art. Journal of The Royal Society Interface 16(151), 20180943.
  • Warne et al. (2019b) Warne, D. J., R. E. Baker, and M. J. Simpson (2019b). Using experimental data and information criteria to guide model selection for reaction–diffusion problems in mathematical biology. Bulletin of Mathematical Biology 81(6), 1760–1804.
  • Warne et al. (2019c) Warne, D. J., R. E. Baker, and M. J. Simpson (2019c). A practical guide to pseudo-marginal methods for computational inference in systems biology. Journal of Theoretical Biology 496(7), 110255.
  • Warne et al. (2021) Warne, D. J., S. A. Sisson, and C. Drovandi (2021). Vector operations for accelerating expensive Bayesian computations – a tutorial guide. Bayesian Analysis (to appear).
  • Wilkinson (2009) Wilkinson, D. J. (2009). Stochastic modelling for quantitative description of heterogeneous biological systems. Nature Reviews Genetics 10(2), 122–133.
  • Witelski (1995) Witelski, T. P. (1995). Merging traveling waves for the porous-Fisher’s equation. Applied Mathematics Letters 8(4), 57–62.
  • Woods and Barnes (2016) Woods, M. L. and C. P. Barnes (2016). Mechanistic modelling and Bayesian inference elucidates the variable dynamics of double-strand break repair. PLOS Computational Biology 12(10), e1005131.
  • Zechner et al. (2012) Zechner, C., J. Ruess, P. Krenn, S. Pelet, M. Peter, J. Lygeros, and H. Koeppl (2012). Moment-based inference predicts bimodality in transient gene expression. Proceedings of the National Academy of Sciences 109(21), 8340–8345.

Appendix A Analysis of PC-SMC-ABC

Here we demonstrate that the PC-SMC-ABC method is asymptotically unbiased. Using similar arguments to Del Moral et al. Del Moral et al. (2006) and Sisson et al. Sisson et al. (2007), we show that the weighting update scheme can be interpreted as importance sampling on the joint space distribution of particle trajectories and that this joint target density admits the target posterior density as a marginal.

Given the target sequence, {p(𝜽rρ(𝒟,𝒟s)ϵr)}r=1R\{p(\boldsymbol{\theta}_{r}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r})\}_{r=1}^{R}, and approximate sequence, {p~(𝜽~rρ(𝒟,𝒟~s)ϵr)}r=1R\{\tilde{p}(\tilde{\boldsymbol{\theta}}_{r}\mid\rho(\mathcal{D},\tilde{\mathcal{D}}_{s})\leq\epsilon_{r})\}_{r=1}^{R}, with prior p(𝜽0)p(\boldsymbol{\theta}_{0}), ϵr1>ϵr\epsilon_{r-1}>\epsilon_{r} and proposal kernels q~r(𝜽r𝜽~r)\tilde{q}_{r}(\boldsymbol{\theta}_{r}\mid\tilde{\boldsymbol{\theta}}_{r}) and qr(𝜽~r𝜽r1)q_{r}(\tilde{\boldsymbol{\theta}}_{r}\mid\boldsymbol{\theta}_{r-1}), we write the unnormalized weighting update scheme for PC-SMC-ABC. That is,

w~ri=wr1ip(𝜽~riρ(𝒟,𝒟~s)ϵr)Qr1(𝜽r1i𝜽~ri)p(𝜽r1iρ(𝒟,𝒟s)ϵr1)qr(𝜽~ri𝜽r1i),\tilde{w}_{r}^{i}=w_{r-1}^{i}\frac{p(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\rho(\mathcal{D},\tilde{\mathcal{D}}_{s})\leq\epsilon_{r})Q_{r-1}(\boldsymbol{\theta}_{r-1}^{i}\mid\tilde{\boldsymbol{\theta}}_{r}^{i})}{p(\boldsymbol{\theta}_{r-1}^{i}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r-1})q_{r}(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\boldsymbol{\theta}_{r-1}^{i})}, (A.1)

and

wri=w~rip(𝜽riρ(𝒟,𝒟s)ϵr)Q~r(𝜽~ri𝜽ri)p(𝜽~riρ(𝒟,𝒟~s)ϵr)q~r(𝜽ri𝜽~ri),w_{r}^{i}=\tilde{w}_{r}^{i}\frac{p(\boldsymbol{\theta}_{r}^{i}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r})\tilde{Q}_{r}(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\boldsymbol{\theta}_{r}^{i})}{p(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\rho(\mathcal{D},\tilde{\mathcal{D}}_{s})\leq\epsilon_{r})\tilde{q}_{r}(\boldsymbol{\theta}_{r}^{i}\mid\tilde{\boldsymbol{\theta}}_{r}^{i})}, (A.2)

where Qr1(𝜽r1i𝜽~ri)Q_{r-1}(\boldsymbol{\theta}_{r-1}^{i}\mid\tilde{\boldsymbol{\theta}}_{r}^{i}) and Q~r(𝜽~ri𝜽ri)\tilde{Q}_{r}(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\boldsymbol{\theta}_{r}^{i}) are arbitrary backwards kernels. Note that Algorithm 2 in the main manuscript is not expressed in terms of these arbitrary kernels, but rather we utilize optimal backwards kernels. To proceed we substitute Equation (A.1) into Equation (A.2) and simplify as follows,

wri\displaystyle w_{r}^{i} =wr1ip(𝜽riρ(𝒟,𝒟s)ϵr)Q~r(𝜽~ri𝜽ri)p(𝜽~riρ(𝒟,𝒟~s)ϵr)q~r(𝜽ri𝜽~ri)×p(𝜽~riρ(𝒟,𝒟~s)ϵr)Qr1(𝜽r1i𝜽~ri)p(𝜽r1iρ(𝒟,𝒟s)ϵr1)qr(𝜽~ri𝜽r1i)\displaystyle=w_{r-1}^{i}\frac{p(\boldsymbol{\theta}_{r}^{i}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r})\tilde{Q}_{r}(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\boldsymbol{\theta}_{r}^{i})}{p(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\rho(\mathcal{D},\tilde{\mathcal{D}}_{s})\leq\epsilon_{r})\tilde{q}_{r}(\boldsymbol{\theta}_{r}^{i}\mid\tilde{\boldsymbol{\theta}}_{r}^{i})}\times\frac{p(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\rho(\mathcal{D},\tilde{\mathcal{D}}_{s})\leq\epsilon_{r})Q_{r-1}(\boldsymbol{\theta}_{r-1}^{i}\mid\tilde{\boldsymbol{\theta}}_{r}^{i})}{p(\boldsymbol{\theta}_{r-1}^{i}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r-1})q_{r}(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\boldsymbol{\theta}_{r-1}^{i})}
=wr1ip(𝜽riρ(𝒟,𝒟s)ϵr)Q~r(𝜽~ri𝜽ri)Qr1(𝜽r1i𝜽~ri)p(𝜽r1iρ(𝒟,𝒟s)ϵr1)q~r(𝜽ri𝜽~ri)q(𝜽~ri𝜽r1i)r×p(𝜽~riρ(𝒟,𝒟~s)ϵr)p(𝜽~riρ(𝒟,𝒟~s)ϵr)\displaystyle=w_{r-1}^{i}\frac{p(\boldsymbol{\theta}_{r}^{i}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r})\tilde{Q}_{r}(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\boldsymbol{\theta}_{r}^{i})Q_{r-1}(\boldsymbol{\theta}_{r-1}^{i}\mid\tilde{\boldsymbol{\theta}}_{r}^{i})}{p(\boldsymbol{\theta}_{r-1}^{i}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r-1})\tilde{q}_{r}(\boldsymbol{\theta}_{r}^{i}\mid\tilde{\boldsymbol{\theta}}_{r}^{i})q(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\boldsymbol{\theta}_{r-1}^{i}){r}}\times\frac{p(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\rho(\mathcal{D},\tilde{\mathcal{D}}_{s})\leq\epsilon_{r})}{p(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\rho(\mathcal{D},\tilde{\mathcal{D}}_{s})\leq\epsilon_{r})}
=wr1ip(𝜽riρ(𝒟,𝒟s)ϵr)Q~r(𝜽~ri𝜽ri)Qr1(𝜽r1i𝜽~ri)p(𝜽r1iρ(𝒟,𝒟s)ϵr1)q~r(𝜽ri𝜽~ri)qr(𝜽~ri𝜽r1i).\displaystyle=w_{r-1}^{i}\frac{p(\boldsymbol{\theta}_{r}^{i}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r})\tilde{Q}_{r}(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\boldsymbol{\theta}_{r}^{i})Q_{r-1}(\boldsymbol{\theta}_{r-1}^{i}\mid\tilde{\boldsymbol{\theta}}_{r}^{i})}{p(\boldsymbol{\theta}_{r-1}^{i}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r-1})\tilde{q}_{r}(\boldsymbol{\theta}_{r}^{i}\mid\tilde{\boldsymbol{\theta}}_{r}^{i})q_{r}(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\boldsymbol{\theta}_{r-1}^{i})}. (A.3)

Now, recursively expand the weight update sequence (Equation (A.3)) to obtain the final weight for the iith particle,

wRi\displaystyle w_{R}^{i} =p(𝜽1iρ(𝒟,𝒟s)ϵ1)Q~1(𝜽~1i𝜽1i)Q0(𝜽0i𝜽~1i)p(𝜽0i)q~1(𝜽1i𝜽~1i)q1(𝜽~1i𝜽0i)\displaystyle=\frac{p(\boldsymbol{\theta}_{1}^{i}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{1})\tilde{Q}_{1}(\tilde{\boldsymbol{\theta}}_{1}^{i}\mid\boldsymbol{\theta}_{1}^{i})Q_{0}(\boldsymbol{\theta}_{0}^{i}\mid\tilde{\boldsymbol{\theta}}_{1}^{i})}{p(\boldsymbol{\theta}_{0}^{i})\tilde{q}_{1}(\boldsymbol{\theta}_{1}^{i}\mid\tilde{\boldsymbol{\theta}}_{1}^{i})q_{1}(\tilde{\boldsymbol{\theta}}_{1}^{i}\mid\boldsymbol{\theta}_{0}^{i})}
×r=2Rp(𝜽riρ(𝒟,𝒟s)ϵr)Q~r(𝜽~ri𝜽ri)Qr1(𝜽r1i𝜽~ri)p(𝜽r1iρ(𝒟,𝒟s)ϵr1)q~r(𝜽ri𝜽~ri)qr(𝜽~ri𝜽r1i)\displaystyle\quad\times\prod_{r=2}^{R}\frac{p(\boldsymbol{\theta}_{r}^{i}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r})\tilde{Q}_{r}(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\boldsymbol{\theta}_{r}^{i})Q_{r-1}(\boldsymbol{\theta}_{r-1}^{i}\mid\tilde{\boldsymbol{\theta}}_{r}^{i})}{p(\boldsymbol{\theta}_{r-1}^{i}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{r-1})\tilde{q}_{r}(\boldsymbol{\theta}_{r}^{i}\mid\tilde{\boldsymbol{\theta}}_{r}^{i})q_{r}(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\boldsymbol{\theta}_{r-1}^{i})}
=p(𝜽Riρ(𝒟,𝒟s)ϵR)p(𝜽0i)r=1RQ~r(𝜽~ri𝜽ri)Qr1(𝜽r1i𝜽~ri)q~r(𝜽ri𝜽~ri)qr(𝜽~ri𝜽r1i)\displaystyle=\frac{p(\boldsymbol{\theta}_{R}^{i}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{R})}{p(\boldsymbol{\theta}_{0}^{i})}\prod_{r=1}^{R}\frac{\tilde{Q}_{r}(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\boldsymbol{\theta}_{r}^{i})Q_{r-1}(\boldsymbol{\theta}_{r-1}^{i}\mid\tilde{\boldsymbol{\theta}}_{r}^{i})}{\tilde{q}_{r}(\boldsymbol{\theta}_{r}^{i}\mid\tilde{\boldsymbol{\theta}}_{r}^{i})q_{r}(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\boldsymbol{\theta}_{r-1}^{i})}
=p(𝜽Riρ(𝒟,𝒟s)ϵR)p(𝜽0i)r=1RBr1(𝜽r1i𝜽ri)r=1RFr(𝜽ri𝜽r1i),\displaystyle=\frac{p(\boldsymbol{\theta}_{R}^{i}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{R})}{p(\boldsymbol{\theta}_{0}^{i})}\frac{\prod_{r=1}^{R}B_{r-1}(\boldsymbol{\theta}_{r-1}^{i}\mid\boldsymbol{\theta}_{r}^{i})}{\prod_{r=1}^{R}F_{r}(\boldsymbol{\theta}_{r}^{i}\mid\boldsymbol{\theta}_{r-1}^{i})}, (A.4)

where Fr(𝜽ri𝜽r1i)=q~r(𝜽ri𝜽~ri)qr(𝜽~ri𝜽r1i)F_{r}(\boldsymbol{\theta}_{r}^{i}\mid\boldsymbol{\theta}_{r-1}^{i})=\tilde{q}_{r}(\boldsymbol{\theta}_{r}^{i}\mid\tilde{\boldsymbol{\theta}}_{r}^{i})q_{r}(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\boldsymbol{\theta}_{r-1}^{i}) is the composite proposal kernel and Br1(𝜽r1i𝜽ri)=Qr1(𝜽r1i𝜽~ri)Q~r(𝜽~ri𝜽ri)B_{r-1}(\boldsymbol{\theta}_{r-1}^{i}\mid\boldsymbol{\theta}_{r}^{i})=Q_{r-1}(\boldsymbol{\theta}_{r-1}^{i}\mid\tilde{\boldsymbol{\theta}}_{r}^{i})\tilde{Q}_{r}(\tilde{\boldsymbol{\theta}}_{r}^{i}\mid\boldsymbol{\theta}_{r}^{i}) is the composite backward kernel. We observe that Equation (A.4) is equivalent to the weight obtained from direct importance sampling on the joint space of the entire particle trajectory Del Moral et al. (2006); Sisson et al. (2007), that is,

wRi=πR(𝜽0i,𝜽1i,,𝜽Ri)π0(𝜽0i,𝜽1i,,𝜽Ri).w_{R}^{i}=\frac{\pi_{R}(\boldsymbol{\theta}_{0}^{i},\boldsymbol{\theta}_{1}^{i},\ldots,\boldsymbol{\theta}_{R}^{i})}{\pi_{0}(\boldsymbol{\theta}_{0}^{i},\boldsymbol{\theta}_{1}^{i},\ldots,\boldsymbol{\theta}_{R}^{i})}.

Here, the importance distribution, given by

π0(𝜽0i,𝜽1i,,𝜽Ri)=p(𝜽0i)r=1RFr(𝜽ri𝜽r1i),\displaystyle\pi_{0}(\boldsymbol{\theta}_{0}^{i},\boldsymbol{\theta}_{1}^{i},\ldots,\boldsymbol{\theta}_{R}^{i})=p(\boldsymbol{\theta}_{0}^{i})\prod_{r=1}^{R}F_{r}(\boldsymbol{\theta}_{r}^{i}\mid\boldsymbol{\theta}_{r-1}^{i}),

is the process of sampling from the prior and performing a sequence of kernel transitions. Finally, we note that the target distribution admits the target ABC posterior as a marginal density, that is,

RπR(𝜽0i,𝜽1i,,𝜽Ri)d𝜽0id𝜽R1i=p(𝜽Riρ(𝒟,𝒟s)ϵR).\displaystyle\int_{\mathbb{R}^{R}}\pi_{R}(\boldsymbol{\theta}_{0}^{i},\boldsymbol{\theta}_{1}^{i},\ldots,\boldsymbol{\theta}_{R}^{i})\,\text{d}\boldsymbol{\theta}_{0}^{i}\ldots\text{d}\boldsymbol{\theta}_{R-1}^{i}=p(\boldsymbol{\theta}_{R}^{i}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{R}).

Therefore, for any function f()f(\cdot) that is integrable with respect to the ABC posterior measure we have,

i=1f(𝜽Ri)wRif(𝜽R)p(𝜽Rρ(𝒟,𝒟s)ϵR)d𝜽R=𝔼[f(𝜽R)],\sum_{i=1}^{\mathcal{M}}f(\boldsymbol{\theta}_{R}^{i})w_{R}^{i}\to\int_{\mathbb{R}}f(\boldsymbol{\theta}_{R})p(\boldsymbol{\theta}_{R}\mid\rho(\mathcal{D},\mathcal{D}_{s})\leq\epsilon_{R})\,\text{d}\boldsymbol{\theta}_{R}=\mathbb{E}\left[f(\boldsymbol{\theta}_{R})\right],

as \mathcal{M}\to\infty, that is, the PC-SMC-ABC method is unbiased.

Appendix B Example of MM-SMC-ABC with the Ornstein-Uhlenbeck process

The Ornstein-Uhlenbeck process Uhlenbeck and Ornstein (1930) (Equation (11)) is used in the main manuscript (Section 3.1) to demonstrate the PC-SMC-ABC. Here, we replicate the demonstration for MM-SMC-ABC.

Following the main manuscript, we use Euler-Maruyama simulations Maruyama (1955) with time discretisation Δt=0.01\Delta t=0.01 (Equation (13) ) for the exact model, and the stationary distribution of the Ornstein-Uhlenbeck process (Equation (14)) for the approximate model. Similarly, the main manuscript is followed for the data 𝒟=[XT1,XT2,,XTN]\mathcal{D}=[X_{T}^{1},X_{T}^{2},\ldots,X_{T}^{N}] which is generated with N=1000N=1000, T=1T=1, x0=10x_{0}=10, μ=1\mu=1, γ=2\gamma=2 , and σ=25\sigma=2\sqrt{5}. To better visually demonstrate MM-SMC-ABC, we infer the joint distribution of 𝜽=(μ,D)\boldsymbol{\theta}=(\mu,D) where D=σ2/2D=\sigma^{2}/2 using MM-ABC-SMC (Algorithm 3 in main manuscript) with \mathcal{M} particles, tuning parameter α=0.1\alpha=0.1, and ABC threshold sequence ϵr=ϵr1/2\epsilon_{r}=\epsilon_{r-1}/2 for r=1,2,7r=1,2,\ldots 7 with ϵ0=6.4\epsilon_{0}=6.4.

Figure B.1 demonstrates the movement of ~=(1α)\tilde{\mathcal{M}}=\lfloor(1-\alpha)\mathcal{M}\rfloor approximate particles (blue dots), that are transformed (red dots) using ^=α\hat{\mathcal{M}}=\lceil\alpha\mathcal{M}\rceil exact particles (black crosses) according to Equation (10) from the main manuscript. For this example, the speedup factor is approximately 10×10\times.

Refer to caption
Figure B.1: MM-SMC-ABC intermediate steps for the Ornstein-Uhlenbeck SDE example. Each panel demonstrates the SMC-ABC particles using the approximate model (blue dots), the small subset of exact particles (black crosses), and the transformed approximate particles (red dots) to form a new proposal distribution for the MM-SMC-ABC sampler.

Appendix C Derivation of approximate continuum-limit description

Here we derive the approximate continuum-limit description for our hexagonal lattice based discrete random walk model with generalized crowding function f:[0,1][1,1]f:[0,1]\to[-1,1]. We follow the method of Simpson et al., Simpson et al. (2010) and Jin et al., Jin et al. (2016).

Our main modification is dealing with a potentially negative crowding function. To deal with this case, we define two auxiliary functions,

f+(C)={f(C)if f(C)0,0otherwise,,f(C)={|f(C)|if f(C)<0,0otherwise.f^{+}(C)=\begin{cases}f(C)&\text{if }f(C)\geq 0,\\ 0&\text{otherwise},\end{cases},\quad f^{-}(C)=\begin{cases}|f(C)|&\text{if }f(C)<0,\\ 0&\text{otherwise}.\end{cases} (C.1)

Note that f(C)=f+(C)f(C)f(C)=f^{+}(C)-f^{-}(C), as this is important later.

We assume a mean-field, that is, for any two lattice sites, (x1,y1),(x2,y2)2(x_{1},y_{1}),(x_{2},y_{2})\in\mathbb{R}^{2}, then their occupancy probabilities are independent, which results in the property,
𝔼[C((x1,y1),t)C((x2,y2),t)]=𝔼[C((x1,y1),t)]𝔼[C((x2,y2),t)]\mathbb{E}\left[C((x_{1},y_{1}),t)C((x_{2},y_{2}),t)\right]=\mathbb{E}\left[C((x_{1},y_{1}),t)\right]\mathbb{E}\left[C((x_{2},y_{2}),t)\right]. Using this property, we denote 𝒞(x,y,t)=𝔼[C((x,y),t)]\mathcal{C}(x,y,t)=\mathbb{E}\left[C((x,y),t)\right] and write the conservation of probability equation that describes the change in occupancy probability of a site over a single time step,

Δ𝒞(x,y,t)=Pm(1𝒞(x,y,t))𝒞^(x,y,t)Pm𝒞(x,y,t)(1𝒞^(x,y,t))+Pp6(1𝒞(x,y,t))((x,y)N(x,y)𝒞(x,y,t)f+(𝒞^(x,y,t)1𝒞^(x,y,t))Pp𝒞(x,y,t)f(𝒞^(x,y,t),\displaystyle\begin{split}\Delta\mathcal{C}(x,y,t)=&\,P_{m}\left(1-\mathcal{C}(x,y,t)\right)\hat{\mathcal{C}}(x,y,t)-P_{m}\mathcal{C}(x,y,t)\left(1-\hat{\mathcal{C}}(x,y,t)\right)\\ &+\frac{P_{p}}{6}\left(1-\mathcal{C}(x,y,t)\right)\left(\sum_{(x^{\prime},y^{\prime})\in N(x,y)}\mathcal{C}(x^{\prime},y^{\prime},t)\frac{f^{+}(\hat{\mathcal{C}}(x^{\prime},y^{\prime},t)}{1-\hat{\mathcal{C}}(x^{\prime},y^{\prime},t)}\right)\\ &-P_{p}\mathcal{C}(x,y,t)f^{-}(\hat{\mathcal{C}}(x,y,t),\end{split} (C.2)

where Δ𝒞(x,y,t)=𝒞(x,y,t+τ)𝒞(x,y,t)\Delta\mathcal{C}(x,y,t)=\mathcal{C}(x,y,t+\tau)-\mathcal{C}(x,y,t),

𝒞^(x,y,t)=16(x,y)N(x,y)𝒞(x,y,t),\hat{\mathcal{C}}(x,y,t)=\frac{1}{6}\sum_{(x^{\prime},y^{\prime})\in N(x,y)}\mathcal{C}(x^{\prime},y^{\prime},t), (C.3)

and

𝒞^(x,y,t)=16(x′′,y′′)N(x,y)𝒞(x′′,y′′,t).\hat{\mathcal{C}}(x^{\prime},y^{\prime},t)=\frac{1}{6}\sum_{(x^{\prime\prime},y^{\prime\prime})\in N(x^{\prime},y^{\prime})}\mathcal{C}(x^{\prime\prime},y^{\prime\prime},t). (C.4)

The conservation of probability equation (Equation C.2) deserves some interpretation. The first term is the probability that site (x,y)(x,y) is unoccupied at time tt and becomes occupied at time t+τt+\tau due to a successful motility event that moves an agent from an occupied neighboring site into (x,y)(x,y). The second term is the probability that site (x,y)(x,y) is occupied at time tt and becomes unoccupied at time t+τt+\tau due to a successful motility event that moves the agent away to a neighboring site. The third term is the probability that site (x,y)(x,y) is unoccupied at time tt and becomes occupied at time t+τt+\tau due to a successful proliferation event from an occupied neighboring site. The final term is the probability that the site (x,y)(x,y) is occupied at time tt and becomes unoccupied at time t+τt+\tau due to f(C^((x,y),t))<0f(\hat{C}((x,y),t))<0.

For a hexagonal lattice site, (x,y)(x,y), we have six immediate neighboring lattice sites, (x1,y1)=(x,y+δ)(x_{1},y_{1})=(x,y+\delta), (x2,y2)=(x,yδ)(x_{2},y_{2})=(x,y-\delta), (x3,y3)=(x+δ3/2,y+δ/2)(x_{3},y_{3})=(x+\delta\sqrt{3}/2,y+\delta/2),
(x4,y4)=(xδ3/2,y+δ/2)(x_{4},y_{4})=(x-\delta\sqrt{3}/2,y+\delta/2), (x5,y5)=(x+δ3/2,yδ/2)(x_{5},y_{5})=(x+\delta\sqrt{3}/2,y-\delta/2), and
(x6,y6)=(xδ3/2,yδ/2)(x_{6},y_{6})=(x-\delta\sqrt{3}/2,y-\delta/2). The positions of these neighbors are shown in the schematic in Figure  C.1.

Refer to caption
Figure C.1: Schematic of the local neighborhood for the hexagonal lattice. Neighbor coordinates are (x1,y1)=(x,y+δ)(x_{1},y_{1})=(x,y+\delta), (x2,y2)=(x,yδ)(x_{2},y_{2})=(x,y-\delta), (x3,y3)=(x+δ3/2,y+δ/2)(x_{3},y_{3})=(x+\delta\sqrt{3}/2,y+\delta/2),
(x4,y4)=(xδ3/2,y+δ/2)(x_{4},y_{4})=(x-\delta\sqrt{3}/2,y+\delta/2), (x5,y5)=(x+δ3/2,yδ/2)(x_{5},y_{5})=(x+\delta\sqrt{3}/2,y-\delta/2), and
(x6,y6)=(xδ3/2,yδ/2)(x_{6},y_{6})=(x-\delta\sqrt{3}/2,y-\delta/2).

We obtain an expression for 𝒞(x,y,t)\mathcal{C}(x,y,t) at each of the six neighboring lattice points by associating 𝒞(x,y,t)\mathcal{C}(x,y,t) with a continuous function and performing a Taylor expansion about the point (x,y)(x,y). The result is,

𝒞(x1,y1,t)=𝒞(x,y,t)+δ𝒞(x,y,t)y+δ222𝒞(x,y,t)y2+𝒪(δ3),𝒞(x2,y2,t)=𝒞(x,y,t)δ𝒞(x,y,t)y+δ222𝒞(x,y,t)y2+𝒪(δ3),𝒞(x3,y3,t)=𝒞(x,y,t)+δ32𝒞(x,y,t)x+δ2𝒞(x,y,t)y+δ22[342𝒞(x,y,t)x2+322𝒞(x,y,t)xy+142𝒞(x,y,t)y2]+𝒪(δ3),𝒞(x4,y4,t)=𝒞(x,y,t)δ32𝒞(x,y,t)x+δ2𝒞(x,y,t)y+δ22[342𝒞(x,y,t)x2322𝒞(x,y,t)xy+142𝒞(x,y,t)y2]+𝒪(δ3),𝒞(x5,y5,t)=𝒞(x,y,t)+δ32𝒞(x,y,t)xδ2𝒞(x,y,t)y+δ22[342𝒞(x,y,t)x2+322𝒞(x,y,t)xy+142𝒞(x,y,t)y2]+𝒪(δ3),𝒞(x6,y6,t)=𝒞(x,y,t)δ32𝒞(x,y,t)xδ2𝒞(x,y,t)y+δ22[342𝒞(x,y,t)x2+322𝒞(x,y,t)xy+142𝒞(x,y,t)y2]+𝒪(δ3).\displaystyle\begin{split}\mathcal{C}(x_{1},y_{1},t)=&\,\mathcal{C}(x,y,t)+\delta\frac{\partial\mathcal{C}(x,y,t)}{\partial y}+\frac{\delta^{2}}{2}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{y}^{2}}+\mathcal{O}(\delta^{3}),\\ \mathcal{C}(x_{2},y_{2},t)=&\,\mathcal{C}(x,y,t)-\delta\frac{\partial\mathcal{C}(x,y,t)}{\partial y}+\frac{\delta^{2}}{2}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{y}^{2}}+\mathcal{O}(\delta^{3}),\\ \mathcal{C}(x_{3},y_{3},t)=&\,\mathcal{C}(x,y,t)+\frac{\delta\sqrt{3}}{2}\frac{\partial\mathcal{C}(x,y,t)}{\partial x}+\frac{\delta}{2}\frac{\partial\mathcal{C}(x,y,t)}{\partial y}\\ &+\frac{\delta^{2}}{2}\left[\frac{3}{4}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}^{2}}+\frac{\sqrt{3}}{2}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}\partial{y}}+\frac{1}{4}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{y}^{2}}\right]+\mathcal{O}(\delta^{3}),\\ \mathcal{C}(x_{4},y_{4},t)=&\,\mathcal{C}(x,y,t)-\frac{\delta\sqrt{3}}{2}\frac{\partial\mathcal{C}(x,y,t)}{\partial x}+\frac{\delta}{2}\frac{\partial\mathcal{C}(x,y,t)}{\partial y}\\ &+\frac{\delta^{2}}{2}\left[\frac{3}{4}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}^{2}}-\frac{\sqrt{3}}{2}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}\partial{y}}+\frac{1}{4}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{y}^{2}}\right]+\mathcal{O}(\delta^{3}),\\ \mathcal{C}(x_{5},y_{5},t)=&\,\mathcal{C}(x,y,t)+\frac{\delta\sqrt{3}}{2}\frac{\partial\mathcal{C}(x,y,t)}{\partial x}-\frac{\delta}{2}\frac{\partial\mathcal{C}(x,y,t)}{\partial y}\\ &+\frac{\delta^{2}}{2}\left[\frac{3}{4}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}^{2}}+\frac{\sqrt{3}}{2}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}\partial{y}}+\frac{1}{4}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{y}^{2}}\right]+\mathcal{O}(\delta^{3}),\\ \mathcal{C}(x_{6},y_{6},t)=&\,\mathcal{C}(x,y,t)-\frac{\delta\sqrt{3}}{2}\frac{\partial\mathcal{C}(x,y,t)}{\partial x}-\frac{\delta}{2}\frac{\partial\mathcal{C}(x,y,t)}{\partial y}\\ &+\frac{\delta^{2}}{2}\left[\frac{3}{4}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}^{2}}+\frac{\sqrt{3}}{2}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}\partial{y}}+\frac{1}{4}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{y}^{2}}\right]+\mathcal{O}(\delta^{3}).\end{split} (C.5)

Therefore, we obtain an expression for 𝒞^(x,y,t)\hat{\mathcal{C}}(x,y,t),

𝒞^(x,y,t)\displaystyle\hat{\mathcal{C}}(x,y,t) =16(x,y)N(x,y)𝒞(x,y,t)\displaystyle=\frac{1}{6}\sum_{(x^{\prime},y^{\prime})\in N(x,y)}\mathcal{C}(x^{\prime},y^{\prime},t)
=16k=16𝒞(xk,yk,t)\displaystyle=\frac{1}{6}\sum_{k=1}^{6}\mathcal{C}(x_{k},y_{k},t)
=𝒞(x,y,t)+δ24[2𝒞(x,y,t)x2+2𝒞(x,y,t)y2]+𝒪(δ3).\displaystyle=\mathcal{C}(x,y,t)+\frac{\delta^{2}}{4}\left[\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}^{2}}+\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{y}^{2}}\right]+\mathcal{O}(\delta^{3}). (C.6)

This expression is required for the first, second, and fourth terms in the conservation equation (Equation (C.2)).

To deal with the third term in Equation (C.2) we require an expression for
g+(𝒞^(xk,yk,t))=f+(𝒞^(xk,yk,t))/(1𝒞^(xk,yk,t))g^{+}(\hat{\mathcal{C}}(x_{k},y_{k},t))=f^{+}(\hat{\mathcal{C}}(x_{k},y_{k},t))/(1-\hat{\mathcal{C}}(x_{k},y_{k},t)), k=1,2,,6k=1,2,\ldots,6. By combining Equation (C.5) and Equation (C.6) we obtain

𝒞^(x1,y1,t)=𝒞(x,y,t)+δ𝒞(x,y,t)y+δ24[2𝒞(x,y,t)x2+32𝒞(x,y,t)y2]+𝒪(δ3)𝒞^(x2,y2,t)=𝒞(x,y,t)+δ𝒞(x,y,t)y+δ24[2𝒞(x,y,t)x2+32𝒞(x,y,t)y2]+𝒪(δ3)𝒞^(x3,y3,t)=𝒞(x,y,t)+δ32𝒞(x,y,t)x+δ2𝒞(x,y,t)y+δ24[522𝒞(x,y,t)x2+32𝒞(x,y,t)x2y+322𝒞(x,y,t)y2]+𝒪(δ3)𝒞^(x4,y4,t)=𝒞(x,y,t)δ32𝒞(x,y,t)x+δ2𝒞(x,y,t)y+δ24[522𝒞(x,y,t)x232𝒞(x,y,t)x2y+322𝒞(x,y,t)y2]+𝒪(δ3),𝒞^(x5,y5,t)=𝒞(x,y,t)+δ32𝒞(x,y,t)xδ2𝒞(x,y,t)y+δ24[522𝒞(x,y,t)x232𝒞(x,y,t)x2y+322𝒞(x,y,t)y2]+𝒪(δ3),𝒞^(x3,y3,t)=𝒞(x,y,t)δ32𝒞(x,y,t)xδ2𝒞(x,y,t)y+δ24[522𝒞(x,y,t)x2+32𝒞(x,y,t)x2y+322𝒞(x,y,t)y2]+𝒪(δ3).\displaystyle\begin{split}\hat{\mathcal{C}}(x_{1},y_{1},t)=&\,\mathcal{C}(x,y,t)+\delta\frac{\partial\mathcal{C}(x,y,t)}{\partial y}+\frac{\delta^{2}}{4}\left[\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}^{2}}+3\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{y}^{2}}\right]+\mathcal{O}(\delta^{3})\\ \hat{\mathcal{C}}(x_{2},y_{2},t)=&\,\mathcal{C}(x,y,t)+\delta\frac{\partial\mathcal{C}(x,y,t)}{\partial y}+\frac{\delta^{2}}{4}\left[\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}^{2}}+3\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{y}^{2}}\right]+\mathcal{O}(\delta^{3})\\ \hat{\mathcal{C}}(x_{3},y_{3},t)=&\,\mathcal{C}(x,y,t)+\frac{\delta\sqrt{3}}{2}\frac{\partial\mathcal{C}(x,y,t)}{\partial x}+\frac{\delta}{2}\frac{\partial\mathcal{C}(x,y,t)}{\partial y}\\ &+\frac{\delta^{2}}{4}\left[\frac{5}{2}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}^{2}}+\sqrt{3}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}^{2}}{y}+\frac{3}{2}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{y}^{2}}\right]+\mathcal{O}(\delta^{3})\\ \hat{\mathcal{C}}(x_{4},y_{4},t)=&\,\mathcal{C}(x,y,t)-\frac{\delta\sqrt{3}}{2}\frac{\partial\mathcal{C}(x,y,t)}{\partial x}+\frac{\delta}{2}\frac{\partial\mathcal{C}(x,y,t)}{\partial y}\\ &+\frac{\delta^{2}}{4}\left[\frac{5}{2}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}^{2}}-\sqrt{3}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}^{2}}{y}+\frac{3}{2}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{y}^{2}}\right]+\mathcal{O}(\delta^{3}),\\ \hat{\mathcal{C}}(x_{5},y_{5},t)=&\,\mathcal{C}(x,y,t)+\frac{\delta\sqrt{3}}{2}\frac{\partial\mathcal{C}(x,y,t)}{\partial x}-\frac{\delta}{2}\frac{\partial\mathcal{C}(x,y,t)}{\partial y}\\ &+\frac{\delta^{2}}{4}\left[\frac{5}{2}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}^{2}}-\sqrt{3}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}^{2}}{y}+\frac{3}{2}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{y}^{2}}\right]+\mathcal{O}(\delta^{3}),\\ \hat{\mathcal{C}}(x_{3},y_{3},t)=&\,\mathcal{C}(x,y,t)-\frac{\delta\sqrt{3}}{2}\frac{\partial\mathcal{C}(x,y,t)}{\partial x}-\frac{\delta}{2}\frac{\partial\mathcal{C}(x,y,t)}{\partial y}\\ &+\frac{\delta^{2}}{4}\left[\frac{5}{2}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}^{2}}+\sqrt{3}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}^{2}}{y}+\frac{3}{2}\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{y}^{2}}\right]+\mathcal{O}(\delta^{3}).\end{split} (C.7)

Each expression in Equation (C.7) is of the form 𝒞^(xk,yk,t)=𝒞(x,y,t)+𝒞¯k\hat{\mathcal{C}}(x_{k},y_{k},t)=\mathcal{C}(x,y,t)+\bar{\mathcal{C}}_{k}, where
𝒞¯k=𝒪(δ)\bar{\mathcal{C}}_{k}=\mathcal{O}(\delta). This allows us to consider the Taylor expansion of g+(𝒞^(xk,yk,t))g^{+}(\hat{\mathcal{C}}(x_{k},y_{k},t)) about the density 𝒞(x,y,t)\mathcal{C}(x,y,t), that is,

g+(𝒞+𝒞¯k)=g+(𝒞)+𝒞¯kdg+(𝒞)d𝒞+𝒞¯k22d2g+(𝒞)d𝒞2+𝒪(δ3).g^{+}(\mathcal{C}+\bar{\mathcal{C}}_{k})=g^{+}(\mathcal{C})+\bar{\mathcal{C}}_{k}\frac{\text{d}g^{+}(\mathcal{C})}{\text{d}\mathcal{C}}+\frac{\bar{\mathcal{C}}_{k}^{2}}{2}\frac{\text{d}^{2}g^{+}(\mathcal{C})}{\text{d}{\mathcal{C}}^{2}}+\mathcal{O}(\delta^{3}). (C.8)

Using Equation (C.8), we can obtain an expression for the summation within the third term of the conservation equation (Equation (C.2)), that is,

(x,y)N(x,y)𝒞(x,y,t)f+(𝒞^(x,y,t)1𝒞^(x,y,t)=\displaystyle\sum_{(x^{\prime},y^{\prime})\in N(x,y)}\mathcal{C}(x^{\prime},y^{\prime},t)\frac{f^{+}(\hat{\mathcal{C}}(x^{\prime},y^{\prime},t)}{1-\hat{\mathcal{C}}(x^{\prime},y^{\prime},t)}= k=16𝒞(xk,yk,t)g+(𝒞(x,y,t)+𝒞¯k)\displaystyle\,\sum_{k=1}^{6}\mathcal{C}(x_{k},y_{k},t)g^{+}(\mathcal{C}(x,y,t)+\bar{\mathcal{C}}_{k})
=\displaystyle= g+(𝒞(x,y,t))k=16𝒞(xk,yk,t)+dg+(𝒞)d𝒞k=16𝒞¯k𝒞(xk,yk,t)\displaystyle\,g^{+}(\mathcal{C}(x,y,t))\sum_{k=1}^{6}\mathcal{C}(x_{k},y_{k},t)+\frac{\text{d}g^{+}(\mathcal{C})}{\text{d}\mathcal{C}}\sum_{k=1}^{6}\bar{\mathcal{C}}_{k}\mathcal{C}(x_{k},y_{k},t)
+d2g+(𝒞)d𝒞2k=16𝒞¯k22𝒞(xk,yk,t)+𝒪(δ3).\displaystyle+\frac{\text{d}^{2}g^{+}(\mathcal{C})}{\text{d}{\mathcal{C}}^{2}}\sum_{k=1}^{6}\frac{\bar{\mathcal{C}}_{k}^{2}}{2}\mathcal{C}(x_{k},y_{k},t)+\mathcal{O}(\delta^{3}). (C.9)

After substitution of Equation (C.7) into Equation (C.9) and some tedious algebra we arrive at

(x,y)N(x,y)𝒞(x,y,t)f+(𝒞^(x,y,t)1𝒞^(x,y,t)=\displaystyle\sum_{(x^{\prime},y^{\prime})\in N(x,y)}\mathcal{C}(x^{\prime},y^{\prime},t)\frac{f^{+}(\hat{\mathcal{C}}(x^{\prime},y^{\prime},t)}{1-\hat{\mathcal{C}}(x^{\prime},y^{\prime},t)}=  6𝒞(x,y,t)g+(𝒞(x,y,t))\displaystyle\,6\mathcal{C}(x,y,t)g^{+}(\mathcal{C}(x,y,t))
+3δ22g+(𝒞(x,y,t))[2𝒞(x,y,t)x2+2𝒞(x,y,t)y2]\displaystyle+\frac{3\delta^{2}}{2}g^{+}(\mathcal{C}(x,y,t))\left[\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}^{2}}+\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{y}^{2}}\right]
+3δ2𝒞(x,y,t)dg+(𝒞)d𝒞[2𝒞(x,y,t)x2+2𝒞(x,y,t)y2]\displaystyle+3\delta^{2}\mathcal{C}(x,y,t)\frac{\text{d}g^{+}(\mathcal{C})}{\text{d}\mathcal{C}}\left[\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{x}^{2}}+\frac{\partial^{2}\mathcal{C}(x,y,t)}{\partial{y}^{2}}\right]
+3δ2dg+(𝒞)d𝒞[(𝒞(x,y,t)x)2+(𝒞(x,y,t)y)2]\displaystyle+3\delta^{2}\frac{\text{d}g^{+}(\mathcal{C})}{\text{d}\mathcal{C}}\left[\left(\frac{\partial\mathcal{C}(x,y,t)}{\partial x}\right)^{2}+\left(\frac{\partial\mathcal{C}(x,y,t)}{\partial y}\right)^{2}\right]
+3δ22𝒞(x,y,t)d2g+(𝒞)d𝒞2[(𝒞(x,y,t)x)2+(𝒞(x,y,t)y)2]\displaystyle+\frac{3\delta^{2}}{2}\mathcal{C}(x,y,t)\frac{\text{d}^{2}g^{+}(\mathcal{C})}{\text{d}{\mathcal{C}}^{2}}\left[\left(\frac{\partial\mathcal{C}(x,y,t)}{\partial x}\right)^{2}+\left(\frac{\partial\mathcal{C}(x,y,t)}{\partial y}\right)^{2}\right]
+𝒪(δ3).\displaystyle+\mathcal{O}(\delta^{3}). (C.10)

Finally, we substitute Equation (C.6) and Equation (C.10) into Equation (C.2) to obtain

Δ𝒞(x,y,t)=\displaystyle\Delta\mathcal{C}(x,y,t)= Pm(1𝒞(x,y,t))[𝒞(x,y,t)+δ242𝒞(x,y,t)]\displaystyle P_{m}\left(1-\mathcal{C}(x,y,t)\right)\left[\mathcal{C}(x,y,t)+\frac{\delta^{2}}{4}\nabla^{2}\mathcal{C}(x,y,t)\right]
Pm𝒞(x,y,t)[1𝒞(x,y,t)δ242𝒞(x,y,t)]\displaystyle-P_{m}\mathcal{C}(x,y,t)\left[1-\mathcal{C}(x,y,t)-\frac{\delta^{2}}{4}\nabla^{2}\mathcal{C}(x,y,t)\right]
+Pp6(1𝒞(x,y,t))[6𝒞(x,y,t)g+(𝒞(x,y,t))+3δ22g+(𝒞(x,y,t))2𝒞(x,y,t)\displaystyle+\frac{P_{p}}{6}\left(1-\mathcal{C}(x,y,t)\right)\left[6\mathcal{C}(x,y,t)g^{+}(\mathcal{C}(x,y,t))+\frac{3\delta^{2}}{2}g^{+}(\mathcal{C}(x,y,t))\nabla^{2}\mathcal{C}(x,y,t)\right.
+3δ2𝒞(x,y,t)dg+(𝒞)d𝒞2𝒞(x,y,t)+3δ2dg+(𝒞)d𝒞(𝒞(x,y,t)𝒞(x,y,t))\displaystyle+3\delta^{2}\mathcal{C}(x,y,t)\frac{\text{d}g^{+}(\mathcal{C})}{\text{d}\mathcal{C}}\nabla^{2}\mathcal{C}(x,y,t)+3\delta^{2}\frac{\text{d}g^{+}(\mathcal{C})}{\text{d}\mathcal{C}}\left(\nabla\mathcal{C}(x,y,t)\cdot\nabla\mathcal{C}(x,y,t)\right)
+3δ22𝒞(x,y,t)d2g+(𝒞)d𝒞2(𝒞(x,y,t)𝒞(x,y,t))]\displaystyle+\left.\frac{3\delta^{2}}{2}\mathcal{C}(x,y,t)\frac{\text{d}^{2}g^{+}(\mathcal{C})}{\text{d}{\mathcal{C}}^{2}}\left(\nabla\mathcal{C}(x,y,t)\cdot\nabla\mathcal{C}(x,y,t)\right)\right]
Pp𝒞(x,y,t)f(𝒞(x,y,t)+𝒪(δ3).\displaystyle-P_{p}\mathcal{C}(x,y,t)f^{-}(\mathcal{C}(x,y,t)+\mathcal{O}(\delta^{3}). (C.11)

Here we have used the notation 2𝒞=2𝒞x2+2𝒞y2\nabla^{2}\mathcal{C}=\frac{\partial^{2}\mathcal{C}}{\partial{x}^{2}}+\frac{\partial^{2}\mathcal{C}}{\partial{y}^{2}}, and 𝒞𝒞=(𝒞x)2+(𝒞y)2\nabla\mathcal{C\cdot\nabla\mathcal{C}}=(\frac{\partial\mathcal{C}}{\partial x})^{2}+(\frac{\partial\mathcal{C}}{\partial y})^{2}. After rearranging Equation (C.11), we obtain

Δ𝒞(x,y,t)=\displaystyle\Delta\mathcal{C}(x,y,t)= Pmδ242𝒞(x,y,t)+Pp𝒞(x,y,t)[(1𝒞(x,y,t))g+(𝒞(x,y,t))f(𝒞(x,y,t))]\displaystyle\frac{P_{m}\delta^{2}}{4}\nabla^{2}\mathcal{C}(x,y,t)+P_{p}\mathcal{C}(x,y,t)\left[\left(1-\mathcal{C}(x,y,t)\right)g^{+}(\mathcal{C}(x,y,t))-f^{-}(\mathcal{C}(x,y,t))\right]
+Ppδ2H(𝒞(x,y,t))+𝒪(δ3)\displaystyle+P_{p}\delta^{2}H(\mathcal{C}(x,y,t))+\mathcal{O}(\delta^{3}) (C.12)

where

H(𝒞(x,y,t))=\displaystyle H(\mathcal{C}(x,y,t))= 14g+(𝒞(x,y,t))2𝒞(x,y,t)+12𝒞(x,y,t)dg+(𝒞)d𝒞2𝒞(x,y,t)\displaystyle\frac{1}{4}g^{+}(\mathcal{C}(x,y,t))\nabla^{2}\mathcal{C}(x,y,t)+\frac{1}{2}\mathcal{C}(x,y,t)\frac{\text{d}g^{+}(\mathcal{C})}{\text{d}\mathcal{C}}\nabla^{2}\mathcal{C}(x,y,t)
+12dg+(𝒞)d𝒞(𝒞(x,y,t)𝒞(x,y,t))+14𝒞(x,y,t)d2g+(𝒞)d𝒞2(𝒞(x,y,t)𝒞(x,y,t)).\displaystyle+\frac{1}{2}\frac{\text{d}g^{+}(\mathcal{C})}{\text{d}\mathcal{C}}\left(\nabla\mathcal{C}(x,y,t)\cdot\nabla\mathcal{C}(x,y,t)\right)+\frac{1}{4}\mathcal{C}(x,y,t)\frac{\text{d}^{2}g^{+}(\mathcal{C})}{\text{d}{\mathcal{C}}^{2}}\left(\nabla\mathcal{C}(x,y,t)\cdot\nabla\mathcal{C}(x,y,t)\right).

Note that f+(𝒞(x,y,t))=g+(𝒞(x,y,t))(1𝒞(x,y,t))f^{+}(\mathcal{C}(x,y,t))=g^{+}(\mathcal{C}(x,y,t))(1-\mathcal{C}(x,y,t)) and, by Equation (C.1), that
f(𝒞(x,y,t))=f+(𝒞(x,y,t))f(𝒞(x,y,t))f(\mathcal{C}(x,y,t))=f^{+}(\mathcal{C}(x,y,t))-f^{-}(\mathcal{C}(x,y,t)). Then Equation (C.12) becomes

Δ𝒞(x,y,t)=Pmδ242𝒞(x,y,t)+Pp𝒞(x,y,t)f(𝒞(x,y,t))+Ppδ2H(𝒞(x,y,t))+𝒪(δ3).\displaystyle\Delta\mathcal{C}(x,y,t)=\frac{P_{m}\delta^{2}}{4}\nabla^{2}\mathcal{C}(x,y,t)+P_{p}\mathcal{C}(x,y,t)f(\mathcal{C}(x,y,t))+P_{p}\delta^{2}H(\mathcal{C}(x,y,t))+\mathcal{O}(\delta^{3}).

After dividing by the time interval, τ\tau, and choosing δ2=𝒪(τ)\delta^{2}=\mathcal{O}(\tau) and Pp=𝒪(τ)P_{p}=\mathcal{O}(\tau), then we have the following limits,

limτ0Pmδ24τ=D,limτ0Ppτ=λ,limτ0Δ𝒞(x,y,t)τ=𝒞(x,y,t)t,limτ0Ppδ2τ=0.\displaystyle\lim_{\tau\to 0}\frac{P_{m}\delta^{2}}{4\tau}=D,\quad\lim_{\tau\to 0}\frac{P_{p}}{\tau}=\lambda,\quad\lim_{\tau\to 0}\frac{\Delta\mathcal{C}(x,y,t)}{\tau}=\frac{\partial\mathcal{C}(x,y,t)}{\partial t},\quad\lim_{\tau\to 0}\frac{P_{p}\delta^{2}}{\tau}=0.

Therefore, we arrive at the continuum-limit approximation

𝒞(x,y,t)t=D2𝒞(x,y,t)+λ𝒞(x,y,t)f(𝒞(x,y,t)).\displaystyle\frac{\partial\mathcal{C}(x,y,t)}{\partial t}=D\nabla^{2}\mathcal{C}(x,y,t)+\lambda\mathcal{C}(x,y,t)f(\mathcal{C}(x,y,t)).

Appendix D Forward problem simulation

This section presents standard computational methods that are applied to forwards problems of the models we consider in this work.

D.1 Stochastic simulation of the discrete model

In the work of Jin et al. Jin et al. (2016), they consider crowding functions with f:[0,1][0,1]f:[0,1]\rightarrow[0,1]. This is not quite sufficient to enable a crowding function with a carrying capacity K<1K<1 since motility events will cause the carrying capacity to be violated. Therefore, we take f:[0,1][1,1]f:[0,1]\rightarrow[-1,1]. If f(C^(s))<0f(\hat{C}(\ell_{s}))<0 then the site s\ell_{s} is removed from the set of occupied sites at time tt, denoted by (t)\mathcal{L}(t), with probability Pp|f(C^(s))|P_{p}|f(\hat{C}(\ell_{s}))|. Given these definitions, the lattice-based random walk proceeds according to Algorithm D.1.

Algorithm D.1 Lattice-based random walk model
1:Initialize (0)L\mathcal{L}(0)\subset L with |(0)|=N|\mathcal{L}(0)|=N and t0t\leftarrow 0;
2:while t<Tt<T do
3:  N|(t)|N\leftarrow|\mathcal{L}(t)|;
4:  for i=[1,2,,N]i=[1,2,\ldots,N] do
5:   Choose s(t)\ell_{s}\in\mathcal{L}(t) uniformly at random with probability 1/N1/{N};
6:   Choose m𝒩(s)\ell_{m}\in\mathcal{N}(\ell_{s}) uniformly at random with probability 1/|𝒩(s)|1/{|\mathcal{N}(\ell_{s})|};
7:   if C(m,t)=0C(\ell_{m},t)=0 then
8:     Generate u𝒰(0,1)u\sim\mathcal{U}(0,1);
9:     if uPmu\leq P_{m} then
10:      Set (t){m}(t)\{s}\mathcal{L}(t)\leftarrow\left\{\ell_{m}\right\}\cup\mathcal{L}(t)\backslash\left\{\ell_{s}\right\};
11:     end if
12:   end if
13:  end for
14:  for i=[1,2,,N]i=[1,2,\ldots,N] do
15:   Choose s(t)\ell_{s}\in\mathcal{L}(t) uniformly at random with probability 1/N1/{N};
16:   Set C^(s)[1/|𝒩(s)|]s𝒩(s)C(s,t)\hat{C}(\ell_{s})\leftarrow\left[1/{|\mathcal{N}(\ell_{s})|}\right]\sum_{\ell_{s}^{\prime}\in\mathcal{N}(\ell_{s})}C(\ell_{s}^{\prime},t);
17:   Generate u𝒰(0,1)u\sim\mathcal{U}(0,1);
18:   if uPp|f(C^(s))|u\leq P_{p}|f(\hat{C}(\ell_{s}))| then
19:     if f(C^(s))0f(\hat{C}(\ell_{s}))\geq 0 then
20:      Choose p{s𝒩(s):C(s,t)=0}\ell_{p}\in\left\{\ell_{s}^{\prime}\in\mathcal{N}(\ell_{s}):C(\ell_{s}^{\prime},t)=0\right\} uniformly at random with probability 1C^(s)1-\hat{C}(\ell_{s});
21:      Set (t){p}(t)\mathcal{L}(t)\leftarrow\left\{\ell_{p}\right\}\cup\mathcal{L}(t);
22:     else
23:      Set (t)(t)\{s}\mathcal{L}(t)\leftarrow\mathcal{L}(t)\backslash\left\{\ell_{s}\right\};
24:     end if
25:   end if
26:  end for
27:  tt+τt\leftarrow t+\tau
28:end while

D.2 Numerical solutions of continuum-limit differential equations

In our case, the approximate model is a deterministic continuum equation that is either a nonlinear ODE or PDE. In both cases we apply numerical schemes that automatically adapt the time step size to control the truncation error.

D.2.1 ODE numerical solutions

The Runge-Kutta-Fehlberg fourth-order-fifth-order method (RKF45) Fehlberg (1969) is a numerical scheme in the Runge-Kutta family of methods to approximate the solution of a nonlinear ODE of the form,

d𝒞(t)dt=h(t,𝒞(t)),0<t,\frac{\text{d}\mathcal{C}(t)}{\text{d}t}=h(t,\mathcal{C}(t)),\quad 0<t,

where h(t,𝒞(t))h(t,\mathcal{C}(t)) is a function that satisfies certain regularity conditions and the initial condition, 𝒞(0)\mathcal{C}(0).

Given the approximate solution, ci𝒞(ti)c_{i}\approx\mathcal{C}(t_{i}), an embedded pair of Runge-Kutta methods, specifically a fourth and fifth order pair, are used to advanced the solution to ci+1𝒞(ti+Δt)+𝒪(Δt4)c_{i+1}\approx\mathcal{C}(t_{i}+\Delta t)+\mathcal{O}(\Delta t^{4}) and estimate the truncation error that can be used to adaptively adjust Δt\Delta t to ensure the error is always within some specified tolerance τ\tau.

The fourth and fifth order estimates are, respectively,

ci+1=ci+Δt(25216k1+1,4082,565k3+2,1974,104k415k5),c_{i+1}=c_{i}+\Delta t\left(\frac{25}{216}k_{1}+\frac{1,408}{2,565}k_{3}+\frac{2,197}{4,104}k_{4}-\frac{1}{5}k_{5}\right), (D.1)

and

ci+1=ci+Δt(16135k1+6,65612,825k3+28,56156,430k4950k5+255k6),c^{*}_{i+1}=c_{i}+\Delta t\left(\frac{16}{135}k_{1}+\frac{6,656}{12,825}k_{3}+\frac{28,561}{56,430}k_{4}-\frac{9}{50}k_{5}+\frac{2}{55}k_{6}\right), (D.2)

where

k1\displaystyle k_{1} =h(ti,ci),\displaystyle=h(t_{i},c_{i}),
k2\displaystyle k_{2} =h(ti+Δt4,ci+Δt4k1),\displaystyle=h\left(t_{i}+\frac{\Delta t}{4},c_{i}+\frac{\Delta t}{4}k_{1}\right),
k3\displaystyle k_{3} =h(ti+3Δt8,ci+Δt[332k1+932k2]),\displaystyle=h\left(t_{i}+\frac{3\Delta t}{8},c_{i}+\Delta t\left[\frac{3}{32}k_{1}+\frac{9}{32}k_{2}\right]\right),
k4\displaystyle k_{4} =h(ti+12Δt13,ci+Δt[1,9322,197k17,2002,197k2+7,2962,197k3]),\displaystyle=h\left(t_{i}+\frac{12\Delta t}{13},c_{i}+\Delta t\left[\frac{1,932}{2,197}k_{1}-\frac{7,200}{2,197}k_{2}+\frac{7,296}{2,197}k_{3}\right]\right),
k5\displaystyle k_{5} =h(ti+Δt,ci+Δt[439216k18k2+3,680513k38454,104k4]),\displaystyle=h\left(t_{i}+\Delta t,c_{i}+\Delta t\left[\frac{439}{216}k_{1}-8k_{2}+\frac{3,680}{513}k_{3}-\frac{845}{4,104}k_{4}\right]\right),
k6\displaystyle k_{6} =h(ti+Δt2,ci+Δt[827k1+2k23,5442,565k3+1,8594,104k41140k5]).\displaystyle=h\left(t_{i}+\frac{\Delta t}{2},c_{i}+\Delta t\left[-\frac{8}{27}k_{1}+2k_{2}-\frac{3,544}{2,565}k_{3}+\frac{1,859}{4,104}k_{4}-\frac{11}{40}k_{5}\right]\right).

Note that the truncation error can be estimated by ϵ=|ci+1ci+1|\epsilon=|c_{i+1}-c^{*}_{i+1}|. After each evaluation of Equation (D.1) and Equation (D.2), a new step size is determined by ΔtsΔt\Delta t\leftarrow s\Delta t, where s=[τ/(2ϵ)]1/4s=\left[\tau/(2\epsilon)\right]^{1/4}. If ϵτ\epsilon\leq\tau, then the solution is accepted and the new Δt\Delta t is used for the next iteration. Alternatively, if ϵ>τ\epsilon>\tau then the solution is rejected and a new attempt is made using the new time step. This process is repeated until the solution advances to some desired time point TT. For more details and analysis of this method, see Iserles Iserles (2008).

D.2.2 PDE numerical solutions

In this work, we consider numerical solutions to PDEs of the form

𝒞(x,t)t=D2𝒞(x,t)x2+λ𝒞(x,t)f(𝒞(x,t)),0<t,0<x<L,\frac{\partial\mathcal{C}(x,t)}{\partial t}=D\frac{\partial^{2}\mathcal{C}(x,t)}{\partial{x}^{2}}+\lambda\mathcal{C}(x,t)f(\mathcal{C}(x,t)),\quad 0<t,\quad 0<x<L, (D.3)

with initial conditions

𝒞(x,0)=c(x),t=0,\mathcal{C}(x,0)=c(x),\quad t=0,

and Neumann boundary conditions

𝒞(x,t)x=0,x=0 and x=L.\frac{\partial\mathcal{C}(x,t)}{\partial x}=0,\quad x=0\text{ and }x=L.

Given the approximate solution, cij𝒞(xi,tj)c_{i}^{j}\approx\mathcal{C}(x_{i},t_{j}) for i=1,2,Ni=1,2,\ldots N, then we apply a first order backward Euler discretization in time and first order central differences in space to yield

c2j+1c1j+1Δx=0,cij+1cijΔt=Dci+1j+12cij+1+ci1j+1Δx2+λcij+1f(cij+1),i=2,3,,N1,cNj+1cN1j+1Δx=0,\displaystyle\begin{split}\frac{c_{2}^{j+1}-c_{1}^{j+1}}{\Delta x}&=0,\\ \frac{c_{i}^{j+1}-c_{i}^{j}}{\Delta t}&=D\frac{c_{i+1}^{j+1}-2c_{i}^{j+1}+c_{i-1}^{j+1}}{\Delta x^{2}}+\lambda c_{i}^{j+1}f(c_{i}^{j+1}),\quad i=2,3,\ldots,N-1,\\ \frac{c_{N}^{j+1}-c_{N-1}^{j+1}}{\Delta x}&=0,\end{split} (D.4)

where ci±1j±1𝒞(xi±Δx,tj±Δt)c_{i\pm 1}^{j\pm 1}\approx\mathcal{C}(x_{i}\pm\Delta x,t_{j}\pm\Delta t). The solution is stepped forward in time using fixed point iterations that are initialized by a first order forward Euler estimate.

The truncation error is estimated by

ϵ=Δt2max1iN|(dcdt)ij+1(dcdt)ij|, with (dcdt)ij+1cij+1cijΔt.\epsilon=\frac{\Delta t}{2}\max_{1\leq i\leq N}\left|\left(\frac{\text{d}c}{\text{d}t}\right)_{i}^{j+1}-\left(\frac{\text{d}c}{\text{d}t}\right)_{i}^{j}\right|,\text{ with }\left(\frac{\text{d}c}{\text{d}t}\right)_{i}^{j+1}\approx\frac{c_{i}^{j+1}-c_{i}^{j}}{\Delta t}.

After solving the nonlinear system (Equation (D.4)) using fixed point iteration, a new step size is determined by ΔtsΔt\Delta t\leftarrow s\Delta t, where s=0.9τ/ϵs=0.9\sqrt{\tau/\epsilon} and τ\tau is the truncation error tolerance Simpson et al. (2007); Sloan and Abbo (1999). If ϵτ\epsilon\leq\tau, then the solution is accepted and the new Δt\Delta t is used for the next time step. Alternatively, if ϵ>τ\epsilon>\tau then the solution is rejected and a new attempt is made using the new time step. This process is repeated until the solution advances to some desired time point TT.

Appendix E Additional results

We provide the multivariate posteriors for the results provided in the main text. In both Figure E.1 and Figure E.2 the contours of the posterior bivariate marginals generated, at a significant reduction in computational cost, both the PC-SMC-ABC and MM-SMC-ABC methods align very well with contours of the posterior bivariate marginals SMC-ABC using the expensive discrete model alone, however, the is a clear bias in the contours posterior bivariate marginals generated with SMC-ABC using the continuum-limit approximation alone.

Refer to caption
Figure E.1: Comparison of estimated posterior marginal densities for the weak Allee model. There is a distinct bias in the SMC-ABC density estimate using the continuum limit (CL) (black dashed) compared with the SMC-ABC method with the discrete model (DM) (green dashed). However, the density estimates computed using the PC-SMC-ABC (orange solid) and MM-SMC-ABC (purple solid) methods match well with a reduced computational overhead.
Refer to caption
Figure E.2: Comparison of estimated posterior marginal densities for the scratch assay model. There is a distinct bias in the SMC-ABC density estimate using the continuum limit (CL) (black dashed) compared with the SMC-ABC method with the discrete model (DM) (green dashed). However, the density estimates computed using the PC-SMC-ABC (orange solid) and MM-SMC-ABC (purple solid) methods match well with a reduced computational overhead.

Appendix F Effect of motility rate on continuum-limit approximation

Here we demonstrate the ability (or lack thereof) of the continuum-limit approximation to capture the average behavior of the discrete model for the two examples considered in the main manuscript: the weak Allee model; and the scratch assay model. Figure F.1 plots the solutions to the continuum-limit differential equation against realizations of the discrete model for both motile, Pm=1P_{m}=1, and non-motile, Pm=0P_{m}=0, agents.

In the case of the weak Allee model (Figure F.1(A)), the continuum limit does not match the average behavior in either case, though it does perform better when Pm=1P_{m}=1 than when Pm=0P_{m}=0. The remaining discrepancy when Pm=1P_{m}=1 is can be related to the ratio Pp/PmP_{p}/P_{m} and to the effect of the neighborhood radius, rr (see Jin et al. Jin et al. (2016) for further explanation). Decreasing Pp/PmP_{p}/P_{m} and increasing rr will correct this discrepancy. The scratch assay continuum limit captures the average behavior of the discrete model very well when Pm=1P_{m}=1 (Figure F.1(B)), but does not capture the scratch closing behavior of the discrete model when Pm=0P_{m}=0 (Figure F.1(C)).

Refer to caption
Figure F.1: Continuum-limit approximations for the (A) weak Allee model and (B)-(C) the scratch assay model plotted against stochastic simulations of the discrete model. (A) For the weak Allee model, the continuum limit (black solid) does not capture the average behavior of the discrete model for motile agents, Pm=1P_{m}=1, (green dotted) or non-motile agents, Pm=0P_{m}=0, (orange dotted). (B)-(C) For the scratch assay model, the continuum limit (solid) is a very good match of the average behavior of the (B) discrete model (dot-dashed) for motile agents, Pm=1P_{m}=1, but a very poor approximation of the average behavior of the (C) discrete model (dot-dashed) for non-motile agents, Pm=0P_{m}=0, especially in the scratch region. Parameters used in the simulations are Pp=1/1000P_{p}=1/1000, λ=Pp/τ\lambda=P_{p}/\tau D=Pmδ2/4τD=P_{m}\delta^{2}/4\tau, K=5/6K=5/6, and A=1/10A=1/10. The stochastic simulations are performed on an I×JI\times J hexagonal lattice with I=80I=80, J=68J=68, τ=1\tau=1 and δ=1\delta=1.