This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Calculation of Epidemic Arrival Time Distributions using Branching processes

Alastair Jamieson-Lane, Bernd Blasius
(March 2020)
Abstract

The rise of the World Airline Network over the past century has lead to sharp changes in our notions of ‘distance’ and ‘closeness’ - both in terms of trade and travel, but also (less desirably) with respect to the spread of disease. When novel pathogens are discovered, countries, cities and hospitals are caught trying to predict how much time they have to prepare. In this paper, by considering the early stages of epidemic spread as a simple branching process, we derive the full probability distribution of arrival times. We are able to re-derive a number of past arrival time results (in suitable limits), and demonstrate the robustness of our approach, both to parameter values far outside the traditionally considered regime, and to errors in the parameter values used. The branching process approach provides some theoretical justification to the ‘effective distance’ introduced by Brockmann & Helbing (2013), however we also observe that when compared to real world data, the predictive power of all methods in this class is significantly lower than has been previously reported.

1 Introduction

It was said by Jules Verne in 1873 that “The world has grown smaller, since a man can now go round it ten times more quickly than a hundred years ago,” that one could travel around the world in eighty days [Verne, 1873]. One can’t help but wonder what Verne would think of the advances in technology that have occurred in the century and a half since, which now allow the circumnavigation of the globe in less than eighty hours. The World Aviation Network (WAN) has, over the past century, shrunk the globe to a fraction of its former size, allowing for vast increases in tourism, immigration, and trade. At the same time, diseases that might once have travelled at the speed of cart, ship or train now cross the world in a number of days or hours. In 2003 Severe Acute Respiratory Syndrome (SARS) was able to spread from China to Vietnam, Hong Kong and Canada within two weeks of being reported to the World Health Organization [WHO, 2003]. Similarly, the 2009 H1N1 flu pandemic first reported in Mexico was able to reach both Europe and Asia within a fortnight [WHO, 2009]. Now, in 2020 we see that coronavirus SARS-CoV-2 has spread quickly across the world [WHO, 2020], reaching all but a handful of countries withing a few short months.

Modelling of disease spread and disease arrival time is critical, both to mitigate the effects of a disease and to determine where best to apply quarantine and screening protocols so as to minimize the spread of infection. In order to model the spread of disease, given the current WAN, a number of highly detailed epidemic simulation models have been created, such as GLEAMviz[Broeck et al., 2011]. Depending on how they are set up, these models can include such factors as vaccination, incubation times, multiple susceptibility classes, non-linear responses, seasonal forcing, quarantine, and the stochastic movement of individual agents, providing a detailed and comprehensive modelling framework. Unfortunately, in many cases, such models quickly become black boxes in and of themselves - objects to be studied and analyzed only via costly simulation - allowing predictions of the future that can be observed, but seldom understood. This difficulty is further exacerbated by the difficulties of determining parameter values, often a delicate task during the initial stages of a disease outbreak when information is scarce.

Depending on the questions being asked, explicit modelling of such intricate details is necessary, however, for other questions, such as the determination of epidemic arrival time (‘AT’), it seems that simpler methods may suffice. For this reason, a number of authors [Brockmann and Helbing, 2013, Iannelli et al., 2017, Gautreau et al., 2007, Chen et al., 2018] have proposed a variety of heuristics and metrics – artificial measures of distance based on flight data from the WAN. The goal of such metrics is to predict relative arrival times for an epidemic starting in a specified location, predicting for example that an epidemic starting in Vancouver will take twice as long to reach Istanbul as it will to reach Rio de Janeiro. Absolute AT will also depend on the parameters of the particular disease in question.

A particularly elegant example of such a metric is Brockmann & Helbing’s [Brockmann and Helbing, 2013] ‘effective distance’, in which they define the direct distance between a pair of locations as

dij=1log(Pij),d_{ij}=1-\log(P_{ij}), (1)

where PijP_{ij} is the probability that a particular individual who is leaving location ii is travelling to location jj. The total effective distance between nodes aa and bb is then given as the shortest path between a given pair of nodes:

Dab=minijdij=minij[1log(Pij)],D_{ab}=\min\sum_{ij}d_{ij}=\min\sum_{ij}\left[1-\log(P_{ij})\right], (2)

where here we minimize over all possible paths from aa to bb, and dijd_{ij} denotes the effective distance along each step of this shortest path (and PijP_{ij} the corresponding probability).

Papers by Gautreau, Ianelli, et al. provide more detailed and mathematically justified metrics, by first calculating the expected time for a growing epidemic to make its first ‘jump’ [Gautreau et al., 2007], and then summing over all possible paths joining a pair of airports (in an appropriate manner) [Iannelli et al., 2017]. More recent work by Chen et al. [Chen et al., 2018] made use of linear spreading theory and the matrix exponential, in order to attain similar accuracy at reduced computational cost.

Underlying each of the above efforts, either implicitly or explicitly, is an assumption of unbound growth; the idea that in the early stages of an epidemic, disease spread is not limited by population size. Such unbound growth can, and often has been, modeled as smooth exponential growth, a continuous approximation of discrete real world populations. An alternative model for unbound growth, one often used when small populations, rare events, and probabilities are of interest, is the branching process [Kimmel and Axelrod, 2015]. The branching process is a classical tool in the study of extinction and evolution[Schaffer, 1970], and has been used to study epidemics[González et al., 2010], surnames and genealogies [Watson and Galton, 1875], and cancers [Goldie and Coldman, 1979, Coldman and Goldie, 1986]. In this paper we apply the branching process framework in order to calculate not only the mean, but also the full distribution of ATs for arbitrary networks.

In section 2 we introduce a basic ‘multi-compartment branching process’ model, and from this model derive a system of ODE’s that precisely determine the probability of epidemic arrival by a given time. In section 3 we determine the mean and variance in the AT, and explore how these results both support and contrast with the results of past papers. Section 4 explores the sensitivity of our predictions to perturbations to system parameters and network structure. In section 5 we use data from real world flight networks, and compare our predictions, along with the predictions of past authors to epidemic arrival times as observed for in the 2003 SARS epidemic, and 2009 H1N1 influenza pandemic. In both cases we observe correlation between predicted and observed results, but also significant noise. We are, unfortunately, unable to reproduce previously published results, and instead observe the observed stochasticity is both larger than previously reported, and larger than might be expected given the underlying model.

Our goal in this paper is to provide a ‘canonical’ approach to calculating arrival times; that is to say, an approach which relies on minimal assumptions and approximations, and can be trusted to provide accurate results across all of parameter space. To the extent that the method works, it can be seen as providing a foundation to past methods and heuristics. To the extent that our predictions disagree with real world data this is suggestive of either flaws in the data, or gaps in the underlying model, gaps that will require not simply better mathematics, but instead better understanding of the system under study.

2 Diffusion on a Network and the Branching Process

Let us begin by concretely defining the model; we consider a network of NN connected nodes, each node representing a location. In our case these nodes refer to particular airports in the WAN. Individuals travel from node ii to node jj at transport rate γTij\gamma T_{ij}. Here TijT_{ij} gives the relative transport rate along a given route, while γ\gamma is the global flight rate. While γ\gamma and TT can be folded together as one parameter, maintaining this distinction will prove convenient later. We select TiiT_{ii} to represent the rate at which individuals leave location ii; Tii=jiTijT_{ii}=-\sum_{j\neq i}T_{ij}. Each node contains a population of Ik(t)I_{k}(t) infected individuals, initially set to zero. This population evolves according to a continuous time branching process: in any given small time interval dtdt Ik(t)I_{k}(t) will increase by one with probability αkIk(t)dt\alpha_{k}I_{k}(t)dt and decrease by one with probability βkIk(t)dt\beta_{k}I_{k}(t)dt. This represents infection and recovery (α\alpha and β\beta respectively). Parallel with this branching process, infected individuals may travel from one location, ii, to a neighbouring location, jj, with probability γTijIi(t)dt\gamma T_{ij}I_{i}(t)dt, resulting in an increment at location jj and a decrement at location ii. We thus have what might be described as a “continuous time multitype branching process” [Kimmel and Axelrod, 2015].

Refer to caption

Figure 1: (Left) Schematic diagram of a traveller in a network; each node representing a city/airport, and edges representing a flight link in the WAN. Edges will have different weights depending on the number of passengers flying. (Right) Each infected individual in location kk can take one of three actions at any given moment of time: they may infected another individual (at infection rate αk\alpha_{k}), they may recover, removing themselves from the infectious population (at recovery rate βk\beta_{k}), or they may travel to some other location jj at transport rate γTkj\gamma T_{kj}.

We would like to determine the travel time from aa to bb. Stated formally, suppose a novel diseases originates at time t=0t=0, at node aa. We would like to know the time at which the first infectious individual arrives in location bb, that is to say the earliest time such that Ib(t)>0I_{b}(t)>0. We denote this epidemic arrival time (AT) as τab\tau_{a}^{b}. If no infected individuals ever arrives at bb we say that τab=\tau_{a}^{b}=\infty. The distribution of τab\tau_{a}^{b} will depend on the starting location aa, the target location bb, and the network parameters γ,T\gamma,T, α\alpha and β\beta. We wish to understand this dependence and do so by considering the survival function Sab(t)=(τab>t){S}_{a}^{b}(t)=\mathbb{P}(\tau_{a}^{b}>t).

Consider ‘Patient Zero,’ initially placed in location aa at time t=0t=0. Now consider the state of the system a short time later, at time t=dtt=dt. With overwhelming probability, nothing will have happened during this small time window, and the survival times will still be governed by Sab(t){S}^{b}_{a}(t). With probability dtγTa,kdt\gamma T_{a,k} patient zero will have moved to location kk, and the survival time distribution will instead be governed by Skb(t){S}^{b}_{k}(t). In the case where k=bk=b, survival is now impossible, hence Sbb(t)=0{S}^{b}_{b}(t)=0. Patient zero may also infect someone with probability dtαadt\alpha_{a}. In this case, the probability of survival until time tt is given by the probability that neither traveler arrives at bb. As both travelers are independent and identical, this probability is given by [Sab(t)]2[{S}^{b}_{a}(t)]^{2}. Finally, it is possible that the initially infected patient zero may simply recover. This happens with probability βa\beta_{a}. In this case, no infected individual will ever reach bb, and survival probability is 11 for all time.

The probability of surviving for tt units of time after dtdt is thus a superposition of the possible survival functions for the states the system could transition to. By definition the probability of surviving for tt units of time after time dtdt is also precisely equal to Sab(dt+t){S}^{b}_{a}(dt+t). Stated algebraically we thus have:

Sab(dt+t)\displaystyle{S}^{b}_{a}(dt+t)\approx (1dt(αa+βa+γkaTa,k))Sab(t)\displaystyle(1-dt(\alpha_{a}+\beta_{a}+\gamma\sum_{k\neq a}T_{a,k})){S}^{b}_{a}(t)
+γkadtTa,kSkb(t)+αadt[Skb(t)]2+βadt,\displaystyle+\gamma\sum_{k\neq a}dtT_{a,k}{S}^{b}_{k}(t)+\alpha_{a}dt[{S}^{b}_{k}(t)]^{2}+\beta_{a}dt, (3)
rearranging and taking limits we find,
limdt0Sab(dt+t)Sab(t)dt=\displaystyle\lim_{dt\rightarrow 0}\frac{{S}^{b}_{a}(dt+t)-{S}^{b}_{a}(t)}{dt}={} γkaTa,k[Skb(t)Sab(t)]\displaystyle\gamma\sum_{k\neq a}T_{a,k}\left[{S}^{b}_{k}(t)-{S}^{b}_{a}(t)\right]
αa[[Sab(t)]2Sab(t)]+βa[1Sab(t)].\displaystyle\alpha_{a}\left[[{S}^{b}_{a}(t)]^{2}-{S}^{b}_{a}(t)\right]+\beta_{a}\left[1-{S}^{b}_{a}(t)\right]. (4)
remembering that Ta,aSab(t)=baTa,kSab(t)T_{a,a}S_{a}^{b}(t)=-\sum_{b\neq a}T_{a,k}S_{a}^{b}(t), we can simplify to
S˙ab(t)=\displaystyle\dot{S}^{b}_{a}(t)= γkTa,kSkb(t)+(βaαaSab(t))(1Sab(t)),\displaystyle\gamma\sum_{k}T_{a,k}{S}^{b}_{k}(t)+\left(\beta_{a}-\alpha_{a}{S}^{b}_{a}(t)\right)\left(1-{S}^{b}_{a}(t)\right),
Sbb(t)=\displaystyle{S}^{b}_{b}(t)= 0,Sabb(0)=1.\displaystyle 0,~{}~{}~{}~{}~{}{S}^{b}_{a\neq b}(0)=1. (5)

We refer to arrival times calculated using this method as ‘branching process arrival times’ (BP AT). These equations are equivalent to those given by Goldie & Coldman [Goldie and Coldman, 1979], who model the arrival of treatment resistant cancer cells via rare mutation (although in Goldie’s case, a much smaller collection of ‘types’ of individual are considered).

It is important to note that eq. (5) does not model the internal state of the system. While normal differential equations might be constructed by modelling the internal ‘state’ of the system at time tt, and using this to model the system at t+dtt+dt, instead we merely observe that the survival function Sab(dt+t)S_{a}^{b}(dt+t) (whatever it might be), can be written as a superposition of the (as yet unknown) survival functions Skb(t)S_{k}^{b}(t), and hence derive eq. (3). In so doing, we have lent heavily upon the assumption that system parameters are constant in time; that the behavior and survival curve associated with a single traveller observed at time tt is identical to the behavior and survival curve of a traveller observed at time 0. For a discussion of how a similar approach might be taken in the context of time varying parameters, see Appendix A.

2.1 Example Networks

In order to test eq. (5), and get something of an intuitive handle on the behavior of our system, let us consider three example networks of successively increasing complexity. For each network we solve eq. (5) numerically using Matlab’s ode45 [Shampine and Reichelt, 1997] for each target node, and compare to survival times observed in exact agent based simulations, where we track individual infection, migration and recovery events precisely. Where possible simulations are conducted using Gillespie’s exact algorithm [Gillespie, 1977]. In cases where such an approach is computationally infeasible, we make use of τ\tau-leaping [Gillespie, 2001].

As our first example, consider a simple chain graph, in which infection begins at one end of the chain, and is allowed to propagate from one node to the next (fig. 3). Here we observe that arrival time scales with number of steps taken, as might be expected. Survival curves are sigmoidal, which, as we will later see, is a typical behavior for reasonable parameter values. Mean arrival times as predicted by survival curves and as observed over 5000 Gillespie simulations are found to agree.

We next consider a ‘spinning top’ graph (fig. 3); this graph is constructed in order to illustrate the effects of multiplicity of paths. Here one node of the graph is accessible through only a single possible path, while another node (at equal distance) can be accessed via a great multiplicity of paths, each with a correspondingly reduced probability.

Here we observe that the BP AT gives accurate predictions of arrival time, correctly predicting that the arrival time in the two most distant nodes is equal, regardless of the multiplicity of paths. Counter-intuitively, it is also observed that when kk (the number of paths) is greater than 1/γ1/\gamma , the mean arrival time in the outer blue nodes precedes the mean arrival time in the intermediate yellow nodes. This occurs because, while the epidemic must reach some intermediate node before passing to the outer blue node, it is also perfectly possible for a large number of intermediate nodes to be uninfected at this stage, resulting in a higher mean arrival time. In the case γ=102\gamma=10^{-2} this is precisely what occurs, and the ordering of mean arrival times inverts. The qualitative behavior of the system is thus sensitive to transport rate γ\gamma. All of this is correctly described and predicted by the BP AT; heuristic models that account only for a single shortest path (such as eq. (1) ) do not capture this effect (see fig. 3).

As a third example, in order to illustrate the possible complexity of arrival time distributions, we construct an artificial network with a wide spread of transportation, infection and recovery rates, and compare numerical solutions of equation 5 with survival curves observed over the course of 25000 Gillespie simulations[Gillespie, 1977]. In this case we observe that survival curves exhibit rich, complex behavior. Nonetheless, equation 5 correctly predicts the distribution of arrival times as observed in direct agent based simulation (figure 4) over multiple time scales.

The branching process approach gives similar accuracy over a wide variety of parameter values and network structures. A gallery of such comparisons is provided in Appendix B.

Refer to caption
Refer to caption
Figure 2: Here we consider the arrival time for an epidemic propagating down a simple chain (Top). (Left) Survival curves, calculated using eq. (5) are approximately logistic, with the exception of the first step. (Right) Mean arrival times as predicted via eq. (5) perfectly match mean arrival times as observed over the course of 5000 Gillespie simulations. Parameter values α=0.5\alpha=0.5, γ=0.1\gamma=0.1. Nodes and survival curve are colour coded so as to match the network diagram, with the site of initial infection marked in black.
Refer to caption
Refer to caption
Figure 3: (Left) The spinning top graph consists of k+4k+4 nodes; a central starting node (grey) with no incoming edges, the initial site of infection, a ‘top’ and ‘bottom’ node with no outgoing edges (blue), a single “up” node directly between the center node and the top (red), and kk “down” nodes between the center node and the bottom (yellow). Infected individuals in the central node will travel either up or down at a rate of γ\gamma - those travelling down select amongst the kk ‘down’ nodes uniformly at random. (Right) Graphs of mean AT (averaged over 150 simulations) vs either effective distance (ED), or branching process arrival time (BP AT) for the Spinning top graph. In all cases, we assume α=0.5\alpha=0.5, β=0\beta=0, k=80k=80, and γ\gamma as specified on the scatter plot. Comparison of ED and BP AT demonstrates that ED distinguishes between the ‘top’ and ‘bottom’ node of the graph, while BP correctly handles multiplicity of paths, and assigns the pair identical ATs. We also see that BP AT is responsive to changes in γ\gamma, and correctly predicts the change in expected order of arrival when γ\gamma is changed from 10210^{-2} to 10510^{-5}. Simulations for this figure are carried out via τ\tau-leaping.
Refer to caption
Refer to caption
Figure 4: (A) Here we construct a complex network, designed to highlight the potential complexity of the survival curve S(t)S(t). Patient zero is placed at the top of the graph and can transition to one of three ‘intermediate’ nodes. These intermediate nodes have either high infection rate α=4\alpha=4 or low infection rate α0.01\alpha\approx 0.01, and transition to the three ‘final’ nodes at a variety of different rates (1,1021,10^{-2} or 10410^{-4}). The third intermediate node permits recovery from infection (transition to the empty circle) at rate 0.010.01. While this network is not reflective of real world transportation networks, it does provide a useful illustration of the possible complexity of survival curves, and serves to demonstrate just how closely theory matches simulation even in these complex situations. (B) Survival curves based on 25000 Gillespie simulations of the epidemic spreading process. Lines are colour coded to match the nodes in the network. (C) We solve equation 5 and display the analytic survival curve Sab(t){S}^{b}_{a}(t) for each possible bb, keeping aa fixed. (D) We plot the difference between the observed and analytic survival curves over time. (E) We sample nn of our 25000 simulations, then calculate |error|2𝑑t\sum\int|\text{error}|^{2}dt when comparing observed and predicted survival probability. Here we integrate over time, and sum over possible arrival locations.

3 Derivation of Mean and Variance in Arrival Time

Equation 5, while accurate, is somewhat cumbersome and does not provide any closed form for values of interest such as expected arrival time and variance in arrival time. If we place no constraints on α\alpha, β\beta and γT\gamma T then it is easy to construct survival curves with complex topology such as fig. 4, and it can be shown that any valid survival curve (that is any curve of the form 0f(t)1,f(t)0,f(0)=10\leq f(t)\leq 1,f^{\prime}(t)\leq 0,f(0)=1) can be well approximated, for suitably chosen network and parameter values (see appendix C). While mathematically interesting, such results are of little use to epidemiologists. More useful results can be obtained by placing some mild constraints on parameter values based on real world understanding.

The first assumption that we make is that we are not interested in epidemics that go extinct before spreading; we care only about those epidemics that reach international prevalence. For simplicity in the following discussion, this is achieved by assuming βk=0\beta_{k}=0 for all kk. A discussion of non-zero βk\beta_{k} can be found in Appendix D.

The second assumption that we make in what follows is that epidemic dynamics (infection and recovery) take place on on the time scale of days and weeks, while international travel is a rare occurrence, the vast majority of the human population taking international flights at most once or twice per year111With some individuals travelling significantly more often, and many others never flying at all.. Stated algebraically we have γTα\gamma T\ll\alpha; travel is rarer than infection. This mimics the ‘rare mutation’ assumption made by Goldie & Coldman[Goldie and Coldman, 1979]when studying the the development of treatment resistant cancer cells, and rules out slow epidemics such as HIV.

We further assume that infection rate is equal across all locations: αk=α\alpha_{k}=\alpha. This greatly simplifies notation, and has limited impact upon our final conclusions.

Taken together, these three assumptions (βk=0,αk=α,γTijα\beta_{k}=0,\alpha_{k}=\alpha,\gamma T_{ij}\ll\alpha are sufficient to avoid the more complex survival curve dynamics, as observed in fig. 4. Instead, we find S˙ab(t)αSab(t)(1Sab(t))\dot{S}^{b}_{a}(t)\approx-\alpha{S}^{b}_{a}(t)\left(1-{S}^{b}_{a}(t)\right) and the arrival time distribution is well approximated by a logistic function:

Sab(t)\displaystyle{S}^{b}_{a}(t)\approx 11+eα(tμab)+O(γ).\displaystyle\frac{1}{1+e^{\alpha(t-\mu_{a}^{b})}}+O(\gamma). (6)

Here μab\mu_{a}^{b} is the mean arrival time (AT) for an epidemic starting in node aa and spreading to bb. Determination of μab\mu_{a}^{b} is of some considerable interest. This value is dependent upon the dynamics of the system when (1Sab(t))=O(γ)(1-{S}^{b}_{a}(t))=O(\gamma); that is to say the region where the transport γTSab(t)\gamma TS^{b}_{a}(t) is no longer overwhelmed by logistic decay in probability.

3.1 Mean arrival time

Determination of μab\mu_{a}^{b} can be seen as equivalent to the question of determining when the exponential term of eq. (6) is equal to 11. For the sake of convenience, we label this term Qab(t){Q}^{b}_{a}(t), and make use of the change of variables Sab(t)=1/(1+Qab(t)){S}^{b}_{a}(t)=1/(1+{Q}^{b}_{a}(t)). This leads to the differential equation:

Q˙ab(t)\displaystyle\dot{Q}^{b}_{a}(t)\approx αQab(t)[1+Qab(t)]2γkTa,k11+Qkb(t)\displaystyle\alpha{Q}_{a}^{b}(t)-[1+{Q}_{a}^{b}(t)]^{2}\gamma\sum_{k}T_{a,k}\frac{1}{1+{Q}_{k}^{b}(t)} (7)

Given Eq. 6, the mean arrival time μab\mu_{a}^{b} satisfies Qab(μab)=eα(μabμab)=1{Q}^{b}_{a}(\mu_{a}^{b})=e^{\alpha(\mu_{a}^{b}-\mu_{a}^{b})}=1. Because Qab(t)<1{Q}^{b}_{a}(t)<1 for t<μat<\mu_{a}, and γTα\gamma T\ll\alpha, we can safely assume αQab(t)γ(2Qab(t)+[Qab(t)]2)\alpha{Q}_{a}^{b}(t)\gg\gamma(2{Q}_{a}^{b}(t)+[{Q}_{a}^{b}(t)]^{2}) in our region of interest, hence allowing the approximation [1+Qab(t)]21[1+{Q}_{a}^{b}(t)]^{2}\approx 1. Because jTij=0\sum_{j}T_{ij}=0, we can restate eq. (7) as:

Q˙ab(t)\displaystyle\dot{Q}^{b}_{a}(t)\approx γkTa,kQkb(t)1+Qkb(t)+αQab(t).\displaystyle\gamma\sum_{k}T_{a,k}\frac{{Q}_{k}^{b}(t)}{1+{Q}_{k}^{b}(t)}+\alpha{Q}_{a}^{b}(t). (8)

At this stage it may be tempting to make the further approximation

Q˙ab(t)\displaystyle\dot{Q}^{b}_{a}(t)\approx (γT+αI)Qkb(t),\displaystyle(\gamma T+\alpha I){Q}_{k}^{b}(t)\,, (9)
using the matrix exponential eM=I+M+M2/2!+e^{M}=I+M+M^{2}/2!+... we solve to find
Qab(t)\displaystyle{Q}^{b}_{a}(t)\approx e(γT+αI)tQab(0).\displaystyle e^{(\gamma T+\alpha I)t}{Q}_{a}^{b}(0)\,. (10)

Here we observe that eq. (9) and (10) bear a striking resemblance to the calculation of the expected number of infections at bb, given an initial epidemic starting at aa, as studied by Chen et al. [Chen et al., 2018]. Under this interpretation of Qab(t){Q}_{a}^{b}(t) we are effectively approximating our arrival time as the time at which the expected number of infected individuals in our target node, bb, reaches one.

In cases where a single path of length dd dominates the spread of our epidemic from aa to bb we can make the further approximation:

Qab(t)=1=\displaystyle{Q}_{a}^{b}(t)=1= eαt[I+γTt1!+γ2T2t22!+]a,beαt[Td]a,bγdtdd!\displaystyle e^{\alpha t}\left[I+\frac{\gamma Tt}{1!}+\frac{\gamma^{2}T^{2}t^{2}}{2!}+...\right]_{a,b}\approx e^{\alpha t}[T^{d}]_{a,b}\frac{\gamma^{d}t^{d}}{d!} (11)
αt\displaystyle\alpha t\approx log(γdTa,bdtdd!).\displaystyle-\log\left(\frac{\gamma^{d}T^{d}_{a,b}t^{d}}{d!}\right). (12)
Choosing to ignore a number of small terms, we find
\displaystyle\approx dlogγlog(Ti,j)=logγ+logTi,j,\displaystyle-d\log{\gamma}-\log\left(\prod T_{i,j}\right)=-\sum\log{\gamma}+\log T_{i,j}, (13)

where here Ti,jT_{i,j} are the transport rates along each link of the most probable path epidemic path. We thus reconstruct the results of shortest path methodology similar to Gautreau’s original study [Gautreau et al., 2007] and the work of Brockmann & Helbing [Brockmann and Helbing, 2013]. More detailed discussion of this matrix exponential approach and its relation to previous work is provided in Appendix E. Survival curves as predicted by equation 10 are given in figure 5.

While the approximation Q/(1+Q)QQ/(1+Q)\approx Q is convenient in that it allows us to linearize the equation, more accurate results can be obtained by retaining 1/(1+Q)1/(1+Q) and following on from equation 8 directly. Applying integrating factors we find,

eαtQab(t)\displaystyle e^{-\alpha t}{Q}^{b}_{a}(t)\approx 0teατγkTa,kQkb(τ)1+Qkb(τ)dτ.\displaystyle\int_{0}^{t}e^{-\alpha\tau}\gamma\sum_{k}T_{a,k}\frac{{Q}_{k}^{b}(\tau)}{1+{Q}_{k}^{b}(\tau)}d\tau. (14)
Substituting the ansatz Qkb(τ)=eα(τμkb)Q_{k}^{b}(\tau)=e^{\alpha(\tau-\mu_{k}^{b})} for ka,bk\neq a,b, along with Qbb(t)1+Qbb(t)=1Sbb(t)=1\frac{Q_{b}^{b}(t)}{1+Q_{b}^{b}(t)}=1-S_{b}^{b}(t)=1, we find
eαtQab(t)\displaystyle e^{-\alpha t}{Q}^{b}_{a}(t)\approx γTa,b0teατ𝑑τ+kbγTa,k0teατeα(τμkb)1+eα(τμkb)𝑑τ.\displaystyle\gamma T_{a,b}\int_{0}^{t}e^{-\alpha\tau}d\tau+\sum_{k\neq b}\gamma T_{a,k}\int_{0}^{t}e^{-\alpha\tau}\frac{e^{\alpha(\tau-\mu_{k}^{b})}}{1+e^{\alpha(\tau-\mu_{k}^{b})}}d\tau. (15)
Multiplying through by eαte^{\alpha t} and integrating provides
Qab(t)\displaystyle{Q}^{b}_{a}(t)\approx γTa,beαt1α+kbγTa,keα(tμkb)αlog(eαt+eαμkb1+eαμkb).\displaystyle\gamma T_{a,b}\frac{e^{\alpha t}-1}{\alpha}+\sum_{k\neq b}\gamma T_{a,k}\frac{e^{\alpha(t-\mu_{k}^{b})}}{-\alpha}\text{log}\left(\frac{e^{-\alpha t}+e^{-\alpha\mu_{k}^{b}}}{1+e^{-\alpha\mu_{k}^{b}}}\right). (16)

Our approach in this equation is to explicitly solve for Qab(t)Q_{a}^{b}(t) while treating all other survival curves as logistic. Because eα(μabμkb)e^{\alpha(\mu_{a}^{b}-\mu_{k}^{b})} will be non-negligible only when μkb<μab\mu_{k}^{b}<\mu_{a}^{b}, it is possible to calculate arrival time from smallest μkb\mu_{k}^{b} up to largest, using knowledge of the previously derived μ\mu to inform calculation of later μ\mu. In many cases eq. (16) is dominated by a single term, allowing us to make the approximation Qab(μab)=1γTa,keα(μabμkB)μkb{Q}^{b}_{a}(\mu_{a}^{b})=1\approx\gamma T_{a,k}e^{\alpha(\mu_{a}^{b}-\mu_{k}^{B})}\mu_{k}^{b} for some kk. Taking logs of both sides we find (μabμkb)=log(γTa,kμkb)/α[log(γ)+log(Ta,k)]/α(\mu_{a}^{b}-\mu_{k}^{b})=-\log(\gamma T_{a,k}\mu_{k}^{b})/\alpha\approx-[\log(\gamma)+\log(T_{a,k})]/\alpha. Summing over multiple such steps reconstructs a shortest path type metric, similar to eq. (13). Even in cases where shortest path approximations do not apply, Newton’s method can be easily applied to solve Qab(μab)=1{Q}^{b}_{a}(\mu_{a}^{b})=1, allowing us to determine all μab\mu_{a}^{b} for any given bb.

Eq. 16 can thus be used to accurately approximate Qab(t)Q_{a}^{b}(t) and Sab(t)S_{a}^{b}(t) without resorting to computationally expensive numerical ODE solvers; this is particularly helpful in regions of parameter space where high sensitivity near t=0t=0 render more direct numerical methods unstable.

A comparison of the various approaches described in this section is given in Fig. 5. Numeric solutions of equation 5 give the best results (when using high accuracy ODE solvers), while predictions based on simple matrix exponentiation (Eq. 10) are the least accurate. All three approaches are highly correlated with mean arrival times as observed in full agent based simulations however. When we are interested in relative rather than absolute arrival time, any of the three approaches will suffice, and matrix exponential based approaches are the fastest (see appendix E for details on efficient computation). When more accuracy, or knowledge of full probability distributions, is required, we recommend using either equation 5 or 16.

Refer to caption

Figure 5: Comparison of arrival times arrival times as observed in 5000 direct Gillespie simulations of epidemic spread (A) to survival times as predicted using either: numerical solutions Eq. 5 (panel B), logistic curves centered according to μ\mu Eq. 16 (panel E), or logistic curves with Q(t)Q(t) as defined by simple matrix exponentiation (Eq.10, panel G). In the right hand column we give the difference between predicted and observed survival probabilities for each method. (D) Comparison of mean AT as observed in 5000 simulations with the mean AT as predicted by the three numeric methods shown in the adjoining panes.

3.2 Variance

Another value of practical interest is the variability in arrival time; recognising the difference between 52±352\pm 3 days and 52±3052\pm 30 days allows us to understand how precise we expect arrival predictions to be. When γTα\gamma T\ll\alpha, we can use standard logistic distribution results [Balakrishnan, 2013] to determine var(τab)=π2s23=π23α2\text{var}(\tau_{a}^{b})=\frac{\pi^{2}s^{2}}{3}=\frac{\pi^{2}}{3\alpha^{2}}, where ss is the ‘scale parameter’ of the logistic distribution, in our case equal to 1/α1/\alpha. In plain language, we find that variance is high for slowly growing epidemics, and low for fast growing epidemics, where the infected population spends shorter periods of time at intermediate levels.

This determination of variance leads naturally to the question of co-variance; if our epidemic arrives three days earlier than expected in Paris, will it also inevitably arrives three days early in Istanbul or São Paulo? Or are the two arrival times relatively independent? This question is of particular importance because, generally speaking, we do not know when an epidemic has started, and hence can only ever compare arrival times in different cities relative to one another.

Unfortunately, by its nature Sab(t){S}^{b}_{a}(t) contains minimal information about the state of the system; calculation of such state information using probability generating function techniques [Miller, 2018] would require us to take infinitely many derivatives of Sab(t){S}^{b}_{a}(t) with respect to our initial conditions Sab(0){S}^{b}_{a}(0). It would appear that full co-variance and correlation information is thus inaccessible.

One avenue that we can take is to examine the variance in arrival time given a larger starting population - for example p=50p=50, with all individuals starting at location aa. Such an assumption effectively removes the large levels of variance associated with the epidemics “initial take off” time (the time taken to grow from p=1p=1 to p=50p=50), and leaves only the variation associated with the spreading process throughout our network (and subsequent take off time in the various locations the epidemic emigrates to). Given that we typically do not get to observe the true epidemic start data in practice, such an assumption is likely to better reflect real world data.

The survival time distribution for a starting population of pp is simply [Sab(t)]p[S_{a}^{b}(t)]^{p}. It can be shown (see appendix F) that for an initial population pp, the arrival time has variance

var(τ)\displaystyle\text{var}(\tau) =π23α2k=1p11α2k2\displaystyle=\frac{\pi^{2}}{3\alpha^{2}}-\sum_{k=1}^{p-1}\frac{1}{\alpha^{2}k^{2}} (17)

In the limit k=1(αk)2=π2/6α2\sum_{k=1}^{\infty}(\alpha k)^{-2}=\pi^{2}/6\alpha^{2}.

Of further note, we also observe that as pp increases the pthp^{th} power of the logistic curve ((1+e(tμ))p(1+e^{(t-\mu)})^{-p}) approaches the Gumbel distribution (figure 6) as observed by Gautreau et al. [Gautreau et al., 2007] in their original study of the epidemic arrival time process. The Gumbel distribution can be loosely thought of as an exponential waiting time with exponentially increasing rate parameter (corresponding to the growing population), and has cumulative distribution function of the form exp[e(tμ)β]\exp[e^{-(t-\mu)\beta}]. The continuous population assumption implicit in this result is valid when p100p\geq 100, but causes noticeable discrepancies for p5p\leq 5.

Refer to caption
Refer to caption
Figure 6: Here we compare the Gumbel distribution (the survival curve identified by Gautreau et al. when modelling population as continuous [Gautreau et al., 2008]), and the exact survival curve assuming a initial population p=50p=50, as discussed in section 3.2. (Left) Survival curve for Gumbel distribution (exp[e(tμ)]\exp[-e^{(t-\mu)}]) or logistic like arrival times with p=50p=50 ((1+e(tμ))p(1+e^{(t-\mu)})^{-p}) (Right) Probability density function for the same. Here we compare only shape, hence, in this example the curves have different μ\mu parameter; this acts only to slide the mean value so as to make the curves more comparable.

4 Robustness and Limitation

So far we have explored the branching process model in what may be considered close to optimal conditions; we have assumed that TT, α\alpha and γ\gamma are exactly known, and that the susceptible population of each node is large enough so as not to limit epidemic growth and spread. Each of the above assumptions may be violated in one context or another, and for this reason it is critical to understand how far each can be stretched. Such understanding sheds light not only on the BP AT itself, but also on past distance metrics, which we have shown to rely on a similar theoretical foundations.

The first, and most likely assumption to be violated in practice is the notion that we know α\alpha and γ\gamma. While γ\gamma can be determined to some reasonable level of accuracy via commercial flight data, α\alpha is a number that will vary from illness to illness and may be known only to a limited degree of accuracy, particularly in the early stages of a pandemic. Fortunately the model turns out to be robust against variations in both α\alpha and γ\gamma – even when varying these parameters by a factor of five compared to the ‘true’ values used in simulations, predicted ATs remain highly correlated with those observed in simulation (see figure 7). When using incorrect α\alpha the constant of proportionality between τi\tau_{i} and observed ATs is no longer equal to one, suggesting that the BP model robustly estimates ατi\alpha\tau_{i}, such that the relative time of arrival at any two locations is largely independent of α\alpha and γ\gamma, but that incorrect estimates of α\alpha will lead to corresponding inaccuracy in the absolute values of τi\tau_{i}; if α\alpha is estimated a factor of two too high, then τi\tau_{i} will be a factor of two too low.

Refer to caption

Figure 7: Calculating the BP AT using incorrect growth and transport parameters leads to suboptimal results. Here we compare the Predicted mean AT (according to eq. (5)) with mean observed AT (averaged over 5000 agent based Gillespie simulations) in the case of using α\alpha and γ\gamma which are either a factor of five too high or to low compared to their true values (α=0.5,γ=103\alpha=0.5,\gamma=10^{-3}). Data is coloured according to the minimum number of flights from origin to target. While errors change the magnitude of predicted AT (note varying axis scales), the effect on relative AT predictions is minor in eight out of the nine cases considered.

Another systematic source of error is inaccuracies in flight network data. These can be produced due to uncounted or unregistered flights (or alternative forms of transportation), out of date data, or as a direct result of changes to individual flight plans as a result of the epidemic itself. In order to test robustness to noise in available flight data, we compare the results of full Gillespie simulation using flight network TT, to ATs predicted using the perturbed matrix F^\hat{F}, where T^ij=TijeN(0,ξ)\hat{T}_{ij}=T_{ij}e^{N(0,\xi)}. Here ξ\xi is a constant representing noise level. As can be seen in Fig. 9, the resulting errors are modest.

Refer to caption

Figure 8: Comparing the results of Gillespie simulations with the predictions made using a perturbed version of the original random network. We are still able to accurately predict AT, even in the case of significant perturbations to TT, indicating that BP AT is robust even when link weights are increased or decreased by an order of magnitude. As previously, nodes are colour coded based on the number of steps to the initial infection site. Note that here, after noise is applied to calculate T^\hat{T}, we demand symmetry, and recalculate diagonal entries so as to preserve mass. Similar results are observed for scale-free networks.

Refer to caption

Figure 9: When populations are very small, the exponential growth assumption that the BP AT relies upon no longer gives accurate predictions. Here we plot expected arrival time according to the BP model, against the average results of 5000 Gillespie simulations on a random graph of 135 nodes, where the susceptible population in each node is 18.2±7.518.2\pm 7.5. We assume infection rate α=0.5\alpha=0.5, and recovery rate β=0\beta=0 (for small populations, β>0\beta>0 generally leads to extinction). The dotted black line gives the ‘1:1’ line that would be expected of accurate predictions. The dashed red line gives the line of best fit, as used to calculate R2R^{2}.

While we have thus far considered arrival times in the context of the global aviation network, effective distance measures have also been considered on a much smaller scale in order to study the spread of antibiotic resistance between wards within a hospital [Friso Coerts, 2018]. In this case, assuming unbound growth becomes problematic, as the susceptible population in each node is rather small; a single ward might contain (for example) 103010-30 beds, a far cry from the tens or hundreds of thousands found when each node represents an entire city. When the local susceptible population is small, the epidemic is frequently no longer in an exponential growth phase by the time it spreads to a neighbouring node; ‘branching random walks’ are no longer independent because it is impossible for person AA to infect someone who has already been infected by person BB. Population saturation inevitably leads to less accurate predictions, see figure 9.

In practice, for the SIR type models the parameter window where this concern is relevant is relatively narrow; infections which grow quickly enough to violate our assumptions are liable to burn through the entire local population and drive themselves to extinction before spreading between nodes. For SIS and SI type models, or any infection which is expected to reach a local endemic equilibrium, it is important to determine whether exponential growth is a good approximation before making use of the BP AT, or any distance metric that relies upon the same underlying unbound growth assumption.

5 Real World Data

Finally, while comparison to simulations provides an effective test bed, useful in terms of repeatability and certainty of data, we are, generally speaking more interested in the performance of models as they apply to the real world. Using flight data provided by Dirk Brockmann (private correspondence) we compare observed and predicted AT for both SARS and H1N1 outbreaks (see figure 10). We consider predictions made by both the branching process method described here (Eq. 5) as well as Brockmann & Helbing’s effective distance metric (Eq. 2, [Brockmann and Helbing, 2013]).

Ideally, in order to calculate the BP AT distribution, we would prefer to have access to the full transport matrix TijT_{ij}, the probability that an individual in location ii will travel to jj on any given day. Unfortunately such data is not available, and can not realistically be obtained. Instead, we have access to Fi,jF_{i,j}; a matrix containing the total number of passengers travelling from ii to jj during a given time period. In order to approximate Ti,jT_{i,j} we normalize FF, hence calculating the probability, Pi,jP_{i},j, to fly from ii to jj, conditioned on the assumption that we board some plane in airport ii, and multiply by a constant γ\gamma, representing the probability of boarding some flight on a given day. In the real world, we might assume γ\gamma to vary significantly from place to place, based on economic factors, prevalence of tourism, and the size of the population that any given airport is servicing. For the time being we approximate γ\gamma, and rely on the models low sensitivity to γ\gamma to prevent difficulties.

We find that both BP AT and effective distance arrival time (‘ED AT’) methods give qualitatively similar results (see fig. 10), and that these results would appear in many cases plausible: for example, in the case of SARS, we observe that predicted arrival times in both South Korea and Hong Kong are low, while the arrival time in the USA is predicted to be marginally higher. This result is entirely plausible given airline traffic between the respective countries. In practice, predictions for both BP AT and ED AT do not match observed arrival times: SARS is reported to have arrived in the USA 5454 days after the first reports in China (11/16/2002), and in South Korea a full 160160 after these first reports. For SARS, Arrival times as reported from real world data correlate at best weakly with predictions(note the small R2<0.3R^{2}<0.3 in fig. 10 C,D), indicating either inaccuracy in the data or alternatively some complexity in the real world process not accounted for in current models. R2R^{2} values are somewhat better when predicting H1N1 (R20.55R^{2}\approx 0.55).

We find no method (including ED) is able to reproduce the very high R2R^{2} values reported in Brockmann & Helbing’s original paper, and that we are unable to recreate their figures (Figures 2D and 2E in the original paper), instead observing a far broader scatter. It is unclear if the observed discrepancies is due to differences in implementation of the algorithm, differences in the underlying data, or some other cause. While it is possible that inaccuracies in WAN data are to blame for these discrepancies, this explanation seems unlikely; as demonstrated in figure 9, arrival time predictions are generally stable, even to substantial changes in FF. If we consider the effective distance metric (Eq 2), increasing Di,jD_{i,j} by 55, as is needed to correctly predict SARS late arrival time in South Korea, would require a Pi,jP_{i,j} two orders of magnitude smaller than our current value. Such drastic changes appear implausible.

Refer to caption

Figure 10: Comparison of distance metrics to arrival times in real world epidemics for the 2009 H1N1 epidemic (top) and the 2003 SARS epidemic (bottom). (A) We attempt to reconstruct Figure 2D of Brockmann & Helbing (2013) [Brockmann and Helbing, 2013]; we find that numerous details of the general shape are reconstructed (for example the ‘branch’ on the right hand side mid way up, and the cloud of seperated points to the left at the top), but that the figure as presented in the original paper is slanted significantly closer to the diagonal. (B) For suitably chosen α\alpha and γ\gamma values, the BP method improves upon R2R^{2}, but is still limited by the accuracy of the underlying network, and the inherently noisy details of real world epidemiology. (C & D) In case of the SARS epidemic we were not able to reproduce the extremely tight correlation previously reported [Brockmann and Helbing, 2013]. Once again, use of the BP metric yields improvements in R2R^{2} compared to our reconstruction the ED metric. The three marked nodes in the bottom images corresponding to (from top to bottom) South Korea, Hong Kong, and the United States. As might be expected for distance measure based on flight data, both Korea and Hong Kong are ‘close’ to China according to both metrics, while the USA is further away. This would appear to be in contrast to previously published results [Brockmann and Helbing, 2013], which appear to indicate that the United States is closest to China, while Korea is rather distant. Data provided by Dirk Brockmann (private communication).

6 Conclusions

Better understanding the spread of epidemics through the WAN allows for both real time forecasting (as has been used during the current COVID-19 pandemic), and also the possibility of network design, making changes to the WAN network so as to slow epidemic spread.

In studying the question of epidemic arrival time, a variety of models have been used, from the the most intricate agent based simulations[Broeck et al., 2011] to the intuitively appealing ‘distance’ based models [Brockmann and Helbing, 2013], and a number of analytic approaches in between [Chen et al., 2018, Iannelli et al., 2017, Gautreau et al., 2008]. While each of these models approaches the question of epidemic arrival time from a different view point, one common thread is the assumption of exponential growth; in the early stages of an epidemic, spread is governed by unbound epidemic growth, and rare transportation events. Population saturation, detailed viral dynamics, and a travellers tendency to return to their port of origin are ignored in all but the most detailed of models.

Given this basic premise, we were able to formulate the problem of epidemic arrival times in term of ‘Branching Processes’ [Kimmel and Axelrod, 2015, Goldie and Coldman, 1979], and calculate the full probability distribution of possible arrival times explicitly. These predictions match perfectly to corresponding simulations. If we further assume that air travel is rare and infection and recovery are common, it is possible to re-derive many of the results of past papers, and in some cases improve upon them. We are also able to give predictions in regions of parameter space where past methods are known to fail, and show that theoretical predictions provide a reasonable match to simulations, even when parameter values are altered significantly.

Unfortunately, when comparing to real world data we observe that the predictive powers of ED metrics may be significantly lower than previously reported [Brockmann and Helbing, 2013]. When compared to real world date, ED and BP like methods predict roughly 50% of variance in the cases of H1N1, and only 20% of variance when compared to SARS-2003, significantly lower than would be predicted given the modest intrinsic variance of our models. While the Branching Process Arrival Time makes very few assumptions and gives exact results given the underlying model assumed, these findings are suggestive of a gap in our knowledge- either in the data available for epidemic arrival times or (more likely) in the model itself. It seems likely that future efforts would be best directed towards determining what real world factors are currently unaccounted for, and which of these are most critical.

7 Acknowledgements and Data Availability

We gratefully acknowledge Dirk Brockmann for stimulating conversation, and for providing historical network and epidemic data. Full code for simulation based figures is available on github [Jamieson-Lane, 2020].

Appendix A Time Varying Parameters and flight networks

Construction of equation 5 relies heavily upon the assumption that the process is memoryless and does not keep track of time.

In order to see the importance of this Markov assumptions, suppose we take equation 5 and replace α\alpha with some piecewise constant function α(t)\alpha(t); both in our simulation, and in the ODE. As can be seen in figure 11 (top panels), simply replacing α\alpha with α(t)\alpha(t) leads a rapid divergence of Sab(t)S_{a}^{b}(t) and our observed survival curve. Under such naive assumptions, our calculated Sab(t)S_{a}^{b}(t) is no longer monotone decreasing, and hence can no longer be considered a survival curve in any sense of the word. This breakdown comes about because even though Sab(t)S_{a}^{b}(t) does not track possible states of the population directly, it must (inevitably) encode these states implicitly. When α\alpha changes midway through the process, this encoding is rendered invalid, and Sab(t)S_{a}^{b}(t) no longer behaves in a sensible manner. Without the Markov assumption, Sab(dt+t)S_{a}^{b}(dt+t) is no longer a superposition of Skb(t)S_{k}^{b}(t).

Calculation of survival curves when dealing with time varying parameters is possible however. To do this, we expand the one dimensional Sab(t)S_{a}^{b}(t) to the two dimensional Sab(t,ρ)S_{a}^{b}(t,\rho). We define Sab(t,ρ)S_{a}^{b}(t,\rho) to be the probability that an infected traveller, initially observed at position aa at time ρ\rho has no decedents arriving at bb before time tt. By definition Sab(t,ρ)=1S_{a}^{b}(t,\rho)=1 whenever tρt\leq\rho, and Sbb(t,t)=0S_{b}^{b}(t,t)=0. Using similar arguments to equation 3 we find that Sab(t,ρ)S_{a}^{b}(t,\rho) is a superposition of Sab(t,ρ+dρ)S_{a}^{b}(t,\rho+d\rho). Taking limits and converting into a differential equation we find:

Sab(t,ρ)ρ=T(ρ)Sab(t,ρ)+[β(ρ)α(ρ)Sab(t,ρ)]×[1Sab(t,ρ)].-\frac{\partial S_{a}^{b}(t,\rho)}{\partial\rho}=T(\rho)S_{a}^{b}(t,\rho)+[\beta(\rho)-\alpha(\rho)S_{a}^{b}(t,\rho)]\times[1-S_{a}^{b}(t,\rho)]. (18)

Combining equation 18 with the initial conditions given at t=ρt=\rho, it is possible to determine Sab(t,0)S_{a}^{b}(t,0) (the survival curve of interest) for any given time tt by solving a simple ODE in the ρ\rho direction (see figure 11, left panels). This approach accounts for time varying parameters whenever α\alpha,β\beta,γ\gamma and TT vary independently from the epidemic course. In the case where parameters are dependent on epidemic spread (for example, border closures in response to observed spread of disease), more complicated methods are needed. Because eq. (18) must be solved independently for each tt this approach is more computationally costly than the approach taken for Markov systems.

Refer to caption
Figure 11: (A) Survival curves observed over the course of 12,00012,000 simulations, for piecewise constant αk\alpha_{k}. The values of αk\alpha_{k} change when t=5,10,15..t=5,10,15.., indicated by the dashed grey lines. βk=0\beta_{k}=0, γ=0.063\gamma=0.063. (B) Sab(t)S_{a}^{b}(t) as calculated using equation 5 by simply substituting in the time varying αk(t)\alpha_{k}(t) in for αk\alpha_{k}. This approach violates our Markov assumption and gives non-physical results. (C) Survival curves Sab(t,0)S_{a}^{b}(t,0), calculated using equation 18. By correctly accounting for time variation in αk\alpha_{k} we are able to match observed survival curves exactly. (D) Schematic diagram showing the flow of information in equation 18; solution curves trace back from our initial conditions at t=ρt=\rho to our survival curve of interest at ρ=0\rho=0. The invalid region t<ρt<\rho is greyed over.

Appendix B Gallery

Here we give a variety of figures, exploring the possible behavior of arrival time distributions in a variety of networks and parameter regimes. In fig. 13 and 13 we examine a variety of scale free networks, varying network size, and parameter values. In fig. 15 we demonstrate the robustness of eq. (5) to unusual parameter values by consider the (unrealistic) case of a fast moving traveller who is non-infectious. We observing tight agreement between analytic predictions and observed survival time in simulations. Finally, in 15 we consider arrival times for travellers on a 30 by 30 mesh.

In all cases where simulation is feasible, we observe strong agreement in arrival probabilities between analytic and simulation based results, for all tt values sampled.

Refer to caption
Figure 12: Here we simulate a scale free network as produced by Mathew George’s SFNG function [Mathew George, 2020]. Here we set N=30N=30 nodes, infection rate α=0.5\alpha=0.5 and recovery rate β=0.1\beta=0.1. (Left) Observed survival curves given 15001500 simulations. Simulation done using the exact Gillespie algorithm [Gillespie, 1977]. (Right) Survival curves as calculated using eq. (5).
Refer to caption
Refer to caption
Figure 13: (Left) Survival curves as calculated using equation (5), for a larger scale-free network. N=3000,γ=0.001,α=0.5±0.11,β=0.1±0.03N=3000,\gamma=0.001,\alpha=0.5\pm 0.11,\beta=0.1\pm 0.03. (Right) Analytic Survival curves for N=3000,γ=105,α=0.5,β=0.1N=3000,\gamma=10^{-5},\alpha=0.5,\beta=0.1. In both cases we observe ‘logistic’ decay from 11, along with infinite survival time in cases where the epidemic goes extinct, which occurs with probability β1/(β1+α1)\beta_{1}/(\beta_{1}+\alpha_{1}). For small γ\gamma, time of arrival is determined predominantly by the number of steps. Here we present only analytic results, as simulations are prohibitively expensive to run.
Refer to caption
Figure 14: The analytic survival curve gives valid predictions even far from the usual parameter regime; here we consider a traveller on a random network with, γ=10,β=0.05,α=0\gamma=10,\beta=0.05,\alpha=0, that is to say, a fast moving traveller that eventually decays, but does not duplicate. Arrival time distributions as observed in 1500015000 simulations perfectly match survival probabilities as calculated numerically using eq. (5). Unlike the γ1\gamma\ll 1 domain, where survival probabilities decay approximately logistically, here we observe roughly exponential decay. Also in contrast to the γ1\gamma\ll 1 case, different locations plateau at different levels; representing different probabilities that the traveller will ever arrive.
Refer to caption
Refer to caption
Figure 15: (Left) Analytic survival time curves, for a 30×3030\times 30 lattice, with periodic boundary conditions. N=900,γ=0.001N=900,\gamma=0.001, and epidemic parameters vary by node: α=0.5±0.11,β=0.1±0.03\alpha=0.5\pm 0.11,\beta=0.1\pm 0.03. (Right) Analytic Survival curves for 20×2020\times 20 lattice, N=400,γ=0.001N=400,\gamma=0.001, epidemic parameters constant across all nodes, α=0.5,β=0.1\alpha=0.5,\beta=0.1. In both cases we observe ‘logistic’ decay from 11. Mean arrival time scales with distance from the initial site of infection. When epidemic parameters are homogeneous, symmetry results in many survival curves overlapping perfectly (the survival curve for ‘one step north and one step east’ is the same as ‘one step south, one step west’). For variable parameter values, this symmetry is broken and we see a smooth distribution of arrival times.

Appendix C Construction of Arbitrary Survival Curves

One of the more concerning results of eq. (5) is the implication that it is possible to construct networks with arbitrary survival curves. In this section we sketch a method for constructing such a network. The purpose of this exercise is to demonstrate that for arbitrary networks, the probability distribution of arrival times may have arbitrary complexity, and hence that any analytically tractable results require further assumptions to be made on α,β\alpha,\beta and TT. More detail would be needed in order to make the argument rigorous, the work here is only intended as a proof of concept.

Suppose we are given some function f(t)f(t) such that f(0)=1,f(t)0,f(t)0f(0)=1,f^{\prime}(t)\leq 0,f(t)\geq 0. We assume that f(t)f^{\prime}(t) is well defined for all time. Our goal is to create a network such that S0N(t)S_{0}^{N}(t) approximates f(t)f(t) as closely as possible. Consider a simple “diamond” network as depicted in figure 16. In this network, node zero has a directed edge connecting it to nodes 11 to N1N-1, and nodes 11 to N1N-1 have a single outgoing edge, directed towards the ‘final’ node NN. We assume βk=0\beta_{k}=0 for all intermediary nodes. We take a travel rate γ=1\gamma=1.

For these “intermediate” nodes, the survival time is governed by the equation:

S˙kN(t)=Tk,NSkN(t)αkSkN(t)+αk[SkN(t)]2,\dot{S}_{k}^{N}(t)=-T_{k,N}S_{k}^{N}(t)-\alpha_{k}S_{k}^{N}(t)+\alpha_{k}[S_{k}^{N}(t)]^{2}, (19)

which permits solutions of the form:

SkN(t)=αk+Tk,Nαk+(αk+2Tk,N)e(αk+Tk,N)(tμk).S_{k}^{N}(t)=\frac{\alpha_{k}+T_{k,N}}{\alpha_{k}+(\alpha_{k}+2T_{k,N})e^{(\alpha_{k}+T_{k,N})(t-\mu_{k})}}. (20)

Here μk\mu_{k} is selected such that (αk+2Tk,N)eμk(αk+Tk,N)=Tk,N(\alpha_{k}+2T_{k,N})e^{-\mu_{k}(\alpha_{k}+T_{k,N})}=T_{k,N}. Because both αk\alpha_{k} and Tk,NT_{k,N} are free parameters, it is possible to select them so as to construct a logistic function with arbitrary mean and scale parameters. This allows us to approximate arbitrary step functions.

We now wish to select the transition rates T0,kT_{0,k} such that S0N(t)f(t)S_{0}^{N}(t)\approx f(t). We select α0=0\alpha_{0}=0 such that the initial infection at node zero does not replicate, and instead simply ‘jumps’ to one of our N1N-1 intermediary nodes, or recovers. S0N(t)S_{0}^{N}(t) is governed by the equation:

S˙0N(t)=T0,k[SkN(t)S0N(t)]βk[1S0N(t)]\dot{S}_{0}^{N}(t)=\sum T_{0,k}\left[S_{k}^{N}(t)-S_{0}^{N}(t)\right]\beta_{k}\left[1-S_{0}^{N}(t)\right] (21)

If we select T0,kT_{0,k} and βk\beta_{k} very large, then our initial infection will linger in the starting node for only a short time, and

S0N(t)T0,kSkN(t)+βkT0,k+βk,S_{0}^{N}(t)\approx\frac{\sum T_{0,k}S_{k}^{N}(t)+\beta_{k}}{\sum T_{0,k}+\beta_{k}}, (22)

That is to say, S0N(t)S_{0}^{N}(t) is a sum of step functions, and a constant. Transition parameters are selected such that βk/(T0,k+βk)\beta_{k}/(\sum T_{0,k}+\beta_{k}) is equal to f()f(\infty) and T0,k/(T0,k+βk)=f((μk+μk+1)/2)f((μk+μk1)/2)T_{0,k}/(\sum T_{0,k}+\beta_{k})=f((\mu_{k}+\mu_{k+1})/2)-f((\mu_{k}+\mu_{k-1})/2). In this way f(t)f(t) can be approximated using a series of (approximate) step functions. A demonstration of this is given in figure 16. More sophisticated algorithms would presumably be able to approximate ff using fewer nodes, but for our purposes this simple approach will suffice.

Refer to caption
Refer to caption
Figure 16: (Top) We generate a random piecewise linear decreasing function, and then approximate it using the survival curve of a network. Here we use analytic expressions for our stepwise function as given in eq. (20). Direct integration of eq. (5) runs into numerical difficulties due to the exceptionally small Tk,NT_{k,N} involved. (Bottom) A schematic diagram of the diamond network used to match our randomly generated survival curve. Infection rate in our central layer, along with transport rate to the ‘target’ node is selected such that QkN(t)Q_{k}^{N}(t) for each of our intermediary nodes approximates a step function, with a wide array of ‘step times’. Transport rates from our starting node to intermediate nodes are selected such that Q0N(t)Q_{0}^{N}(t) is (approximately) a superposition of these many step functions, with appropriate weight given to each such that |Q0N(t)f(t)||Q_{0}^{N}(t)-f(t)| is small.

Appendix D Survival curves for non-zero β\beta

Throughout the main text, we frequently made use of the simplifying assumption β=0\beta=0; an unrealistic assumption in real world context. Suppose we wish to consider non-zero β\beta, in the context of global epidemic spread. We assume, as previously, that travel is rare, and also that we interest ourselves only in diseases such that R0>1R_{0}>1, hence γβ<α\gamma\ll\beta<\alpha. This leads to the equation

S˙ab(t)\displaystyle\dot{S}^{b}_{a}(t)\approx (βaαaSab(t))(1Sab(t)),\displaystyle\left(\beta_{a}-\alpha_{a}{S}^{b}_{a}(t)\right)\left(1-{S}^{b}_{a}(t)\right), (23)
and hence,
Sab(t)\displaystyle{S}^{b}_{a}(t)\approx 1+βae(αaβa)(tμab)1+αae(αaβa)(tμab),\displaystyle\frac{1+\beta_{a}e^{(\alpha_{a}-\beta_{a})(t-\mu_{a}^{b})}}{1+\alpha_{a}e^{(\alpha_{a}-\beta_{a})(t-\mu_{a}^{b})}}, (24)

that is to say, in the limit of small γ\gamma, the arrival time distribution can be approximated using a logistic function. As in the main text μab\mu_{a}^{b} is some mean arrival time constant determined by aa,bb, and γT\gamma T. In the limit tt\rightarrow\infty Sab(t)βa/αa{S}^{b}_{a}(t)\rightarrow\beta_{a}/\alpha_{a}, that is to say, the probability that τab=\tau_{a}^{b}=\infty is precisely the probability of early epidemic extinction in our branching process. Unfortunately, this non-zero probability of ‘infinite’ arrival time leads to E(τab)=E(\tau_{a}^{b})=\infty; not an especially informative result. It is thus useful to instead consider the arrival time conditional on τab<\tau_{a}^{b}<\infty:

Sab(t)=\displaystyle{S}^{b}_{a}(t)= Sab(t)Sab()Sab(0)Sab(),\displaystyle\frac{{S}^{b}_{a}(t)-{S}^{b}_{a}(\infty)}{{S}^{b}_{a}(0)-{S}^{b}_{a}(\infty)}, (25)
\displaystyle\approx 11+e(αaβa)(tμb).\displaystyle\frac{1}{1+e^{(\alpha_{a}-\beta_{a})(t-\mu_{b})}}. (26)

At this stage we have an ‘ideal’ logistic distribution [Balakrishnan, 2013], with mean value μab\mu_{a}^{b} (currently unknown), and scale parameter 1/(αaβa)1/(\alpha_{a}-\beta_{a}). Hence, when making predictions concerning real world epidemics that have not gone extinct in their early stages, it is useful to remap our variables such that we imagine βk=0\beta_{k}=0 and αk\alpha_{k} is the net grow rate of our disease (formerly labeled αkβk\alpha_{k}-\beta_{k}).

Appendix E Comparison to, and improvement upon past exponential growth methods.

In this appendix we will discuss the Matrix Exponential method, as used by Chen et al. [Chen et al., 2018] (referred to by them as ‘linear spreading theory’). We begin by introducing the method itself, and its connection to the Branching Process approach studied in the main text. We then explain some of the computational details necessary for fast calculation, and the circumstances under which the approximation is and is not accurate.

Let us begin by describing the method itself. Suppose, at time t=0t=0, we have a population of infected individuals 𝐏(0){\bf P}(0) – that is to say, there are Pk(0)P_{k}(0) infected individuals located at node kk. These individuals travel between nodes at transport rate γT\gamma T, and infect others at some rate α\alpha. This rate is assumed to be constant; we assume that the total population is large enough such that population constraints are not relevant on the timescale we are interested in. Under this unbound growth assumption, the expected number of infectious individuals if governed by the linear equation:

𝐏˙(t)=\displaystyle\dot{\bf P}(t)= γT𝐏(t)+α𝐏(t).\displaystyle\gamma T^{\top}{\bf P}(t)+\alpha{\bf P}(t). (27)
This equation permits solutions of the form
𝐏(t)=\displaystyle{\bf P}(t)= exp[(γT+αI)t]𝐏(0).\displaystyle\exp[(\gamma T^{\top}+\alpha I)t]{\bf P}(0). (28)

Here, we make use of the matrix exponential; that is to say expM=I+M+M2/2!+M3/3!+\exp M=I+M+M^{2}/2!+M^{3}/3!+... . Here we use the transposed transition matrix, TT^{\top}, because here we consider the flow of infected individuals forward through the network, as opposed to the flow of arrival probability backwards through the network (as we do in the main text).

Suppose we start with an initial population of 11 in node aa. If we wish calculate the deterministic time when the expected population in node bb reaches 11, we must solve 𝐏b(t)=1{\bf P}_{b}(t)=1. Written slightly differently,

1=\displaystyle 1= δbexp[(γT+αI)t]δa,\displaystyle\delta_{b}^{\top}\exp[(\gamma T^{\top}+\alpha I)t]\delta_{a}, (29)

where δa,δb\delta_{a},\delta_{b} are indicator vectors, equal to 11 at index aa and bb respectively, and equal to zero elsewhere. This equation is equivalent to equation 7 of Chen et al [Chen et al., 2018]. Here we have reached the equation via a direct appeal to unbound population growth, while Chen et al instead describe their formalism as a linearisation of a full SIR model.
Eq. (29) can also be reached via an appeal to branching processes. By making the change of variables SQ{S}\rightarrow{Q} as we do in the main text, suitable approximations lead to:

Qab(t)\displaystyle{Q}^{b}_{a}(t)\approx e(γT+αI)tQab(0),\displaystyle e^{(\gamma T+\alpha I)t}{Q}_{a}^{b}(0), (30)

eq. (10) of the main text. The expected arrival time at node aa is precisely the time when Qab(t)=1{Q}^{b}_{a}(t)=1, that is to say

1\displaystyle 1 =δae(γT+αI)tδb.\displaystyle=\delta_{a}^{\top}e^{(\gamma T+\alpha I)t}\delta_{b}. (31)

The two methods give the same results (under the influence of suitably powerful approximations).

Before moving on to discuss the accuracy of these results, let us first discuss some computational difficulties. Eq. (29) is transcendental and can thus only be solved approximately, using some suitable numerical scheme. In their paper, Chen et al. approach this difficulty by identifying the minimal number of steps dd required to get from aa to bb, and truncating the polynomial expansion of their matrix exponential after this many steps:

δbe(γT+αI)tδa=\displaystyle\delta_{b}^{\top}e^{(\gamma T^{\top}+\alpha I)t}\delta_{a}= eαt[δbδa+δbγTtδa++δbγdTdtdd!δa+]\displaystyle e^{\alpha t}[\delta_{b}^{\top}\delta_{a}+\delta_{b}^{\top}\gamma T^{\top}t\delta_{a}+...+\delta_{b}^{\top}\frac{\gamma^{d}T^{\top d}t^{d}}{d!}\delta_{a}+...] (32)
1\displaystyle 1\approx eαtδbγdTdtdd!δa.\displaystyle e^{\alpha t}\delta_{b}^{\top}\frac{\gamma^{d}T^{\top d}t^{d}}{d!}\delta_{a}. (33)

All terms before the dthd^{th} term are zero, all terms after are higher powers of γ\gamma and thus assumed to be small. Chen et al solve eq. (33) using the Lambert-W function [Leonhard Euler, 1783, Weisstein, 2020], a nonlinear function originally constructed to solve equations of the form yey=xye^{y}=x. In cases where we have a single “most probable” path from aa to bb, we can make the approximation:

αt\displaystyle\alpha t\approx log(δbγdTdtdd!δa),\displaystyle-\log\left(\delta_{b}^{\top}\frac{\gamma^{d}T^{\top d}t^{d}}{d!}\delta_{a}\right), (34)
\displaystyle\approx dlogγlog(Ti,j),\displaystyle-d\log{\gamma}-\log\left(\prod T_{i,j}\right), (35)
\displaystyle\approx logγ+logTi,j.\displaystyle-\sum\log{\gamma}+\log T_{i,j}. (36)

Here Ti,jT_{i,j} are the transition rates along the steps in our shortest path. Such an estimate neglects terms of magnitude dlog(t/d)d\log(t/d), along with contributions from all paths except the shortest. This result mirrors those of both Gautreau et al. [Gautreau et al., 2007], as well as Brockmann et al. [Brockmann and Helbing, 2013] (with the notable difference being that logγ-\log\gamma is replaced by 11 when using the effective distance metric).

An alternative method to solve Eq (29) (not used by Chen et al) is the iteration scheme:

1=\displaystyle 1= δaeγTtneαtn+1δb,\displaystyle\delta_{a}^{\top}e^{\gamma Tt_{n}}e^{\alpha t_{n+1}}\delta_{b}, (37)
αtn+1=\displaystyle\alpha t_{n+1}= log(δaeγTtnδb).\displaystyle-\log\left(\delta_{a}^{\top}e^{\gamma Tt_{n}}\delta_{b}\right). (38)

Here we make use of the fact that αtn+1\alpha t_{n+1} varies faster than γTtn\gamma Tt_{n}; hence, if we have even a moderately accurate approximation tnt_{n}, eγTtne^{\gamma Tt_{n}} will be close to the correct value eγTte^{\gamma Tt}. In contrast eαtn+1e^{\alpha t_{n+1}} varies quickly: demanding tn+1t_{n+1} to solve eq (38), we quickly approach true solutions of eq. (31).

This iteration scheme requires us to calculate large matrix exponentials repeatedly, an operation which is generically computationally expensive, but can be made significantly less costly by first computing the eigenvector decomposition T=VDV1T=VDV^{-1}. Here VV is a matrix containing the eigenvectors of MM, while DD is a diagonal matrix containing the corresponding eigenvalues. This allows us to instead compute the matrix exponential via elementwise exponentials of the diagonal elements:

αtn+1=log((δaV)eγDtn(V1δb))\displaystyle\alpha t_{n+1}=-\log\left((\delta_{a}^{\top}V)e^{\gamma Dt_{n}}(V^{-1}\delta_{b})\right) (39)

Matlab code to implement this iteration scheme, approximating transport times to node bb from all starting points is given by:

%%Set up
[V,D_original]  =eig(T);
N=size(T,1);
b=7; %Example start/end points.
delta_b= zeros(N,1);
delta_b(b)=1; %%Set up our initial vectors.

D= gamma * diag(D_original); %D is a vector of length N containing eigenvalues.
RightVect= V\delta_b;

Tk=-ones(N,1);

for(aaa=1:N)
    guessT= (-log(gamma))/alpha
    deltaT=5;
    LeftVect= V(aaa,:);
    while(abs(deltaT)>10^-3)
         expGammaT= LeftVect*(exp(D*guessT).*RightVect);
         deltaT= -log(expGammaT)/alpha
    end
    Tk(aaa)=guessT;
end

The computational cost of the above code is overwhelmingly dominated by eig(M), and hence the runtime scales like O(n3)O(n^{3}), similar to the Floyd–Warshall algorithm as might be used to identify shortest paths when calculating the effective distance. It is possible to calculate the time taken to arrive in each location from a fixed starting location aa by using the transposed transition matrix, TT^{\top}, rather than TT.

Now, in all uses of such methods, it is important to note that the deterministic time when the expected population reaches 11 (as calculated using Chen et al’s method), and the expected time when the stochastically varying population reaches one are not equivalent concepts. This is best illustrated by considering the extreme case α=β=0\alpha=\beta=0, where we observe that δaeγTteαtδb<1\delta_{a}^{\top}e^{\gamma Tt}e^{\alpha t}\delta_{b}<1 for all tt. Taken at face value this would erroneously imply that the arrival time has an expected value of infinity; in practice this merely illustrates the difficulties of using such an approximation for diseases with low α\alpha (such as HIV). In general, eq. (10) is observed to be a good approximation when αγT\alpha\gg\gamma T, and performs poorly when α1\alpha\ll 1.

Appendix F Variance in Arrival Time

Here we provide the algebraic details for calculating variances in arrival time, assuming a starting population greater than 1.

Recall that τab\tau_{a}^{b} denotes the arrival time at bb, assuming an epidemic starts at aa, and has mean μab\mu_{a}^{b}. The survival function of Sab(t)=P(τab>t)S_{a}^{b}(t)=P(\tau_{a}^{b}>t). In what follows, we suppress subscripts and superscripts, considering some generic τ\tau.

E(τ)=0S𝑑t\displaystyle E(\tau)=\int_{0}^{\infty}Sdt =μ\displaystyle=\mu (40)
Assuming the mean value μ0\mu\gg 0
S˙\displaystyle\dot{S} =αS(1S)\displaystyle=\alpha S(1-S) (41)
0SkSk+1dt\displaystyle\int_{0}^{\infty}S^{k}-S^{k+1}dt =0Sk1S˙/α𝑑t\displaystyle=\int_{0}^{\infty}S^{k-1}\dot{S}/\alpha dt (42)
10S^k1/α𝑑S\displaystyle\int_{1}^{0}\hat{S}^{k-1}/\alpha dS =[Skαk]10=1/αk\displaystyle=\left[\frac{S^{k}}{\alpha k}\right]_{1}^{0}=-1/\alpha k (43)
0Sk+1𝑑t\displaystyle\implies\int_{0}^{\infty}S^{k+1}dt =0S^k𝑑t1αk=μk=1N1αk\displaystyle=\int_{0}^{\infty}\hat{S}^{k}dt-\frac{1}{\alpha k}=\mu-\sum_{k=1}^{N}\frac{1}{\alpha k} (44)

That is to say, if the expected arrival given an initial population of 11 is μ\mu, then the arrival time assuming K+1K+1 individuals is μ1N1/αk\mu-\sum_{1}^{N}1/\alpha k. This makes sense, given that the expected time of transition form kk to k+1k+1 individuals, for a branching rate of α\alpha is precisely equal to 1/αk1/\alpha k.

Next, we need to determine the expected value of E(τ2)E(\tau^{2}) for a starting population of NN. For a starting population of 1, we have E(τ2)=μ2+π2/3α2E(\tau^{2})=\mu^{2}+\pi^{2}/3\alpha^{2} [Balakrishnan, 2013]. For larger populations:

02t(SkSk+1)𝑑t\displaystyle\int_{0}^{\infty}2t(S^{k}-S^{k+1})dt =02tSk1S˙/α𝑑t\displaystyle=\int_{0}^{\infty}2tS^{k-1}\dot{S}/\alpha dt (45)
S=1/(1+exp[α(tμ)])\displaystyle S=1/(1+\exp[\alpha(t-\mu)]) t=log(1SS)/α+μ\displaystyle\implies t=\log\left(\frac{1-S}{S}\right)/\alpha+\mu (46)
2α210log(1SS)Sk1𝑑S\displaystyle\frac{2}{\alpha^{2}}\int_{1}^{0}\log\left(\frac{1-S}{S}\right)S^{k-1}dS =2α2Hk1k\displaystyle=\frac{2}{\alpha^{2}}\frac{H_{k-1}}{k} (47)
where HkH_{k} is the kth harmonic numberi=1k1i and H0=0.\displaystyle k^{th}\text{ harmonic number}\sum_{i=1}^{k}\frac{1}{i}\text{ and }H_{0}=0.
2α0tSk1𝑑S\displaystyle\frac{2}{\alpha}\int_{0}^{\infty}tS^{k-1}d{S} =2α2Hk1k2μαk\displaystyle=\frac{2}{\alpha^{2}}\frac{H_{k-1}}{k}-\frac{2\mu}{\alpha k} (48)
02tSN+1𝑑t\displaystyle\int_{0}^{\infty}2tS^{N+1}dt =μ2+π23α2+k=1N[2α2Hk1k2μαk]\displaystyle=\mu^{2}+\frac{\pi^{2}}{3\alpha^{2}}+\sum_{k=1}^{N}\left[\frac{2}{\alpha^{2}}\frac{H_{k-1}}{k}-\frac{2\mu}{\alpha k}\right] (49)

The variance is thus given as:

E(τ2)E(τ)2\displaystyle E(\tau^{2})-E(\tau)^{2} =μ2+π23s2+k=1N[2s2Hk1k2μsk](μab1N1/sk)2\displaystyle=\mu^{2}+\frac{\pi^{2}}{3s^{2}}+\sum_{k=1}^{N}\left[\frac{2}{s^{2}}\frac{H_{k-1}}{k}-\frac{2\mu}{sk}\right]-\left(\mu_{a}^{b}-\sum_{1}^{N}1/sk\right)^{2} (50)
=π23s2+k=1N[2s2Hk1k](1N1/sk)2\displaystyle=\frac{\pi^{2}}{3s^{2}}+\sum_{k=1}^{N}\left[\frac{2}{s^{2}}\frac{H_{k-1}}{k}\right]-\left(\sum_{1}^{N}1/sk\right)^{2} (51)
=π23s2+1s2k=1N[2Hk1HNk]=π23s21s2k2\displaystyle=\frac{\pi^{2}}{3s^{2}}+\frac{1}{s^{2}}\sum_{k=1}^{N}\left[\frac{2H_{k-1}-H_{N}}{k}\right]=\frac{\pi^{2}}{3s^{2}}-\sum\frac{1}{s^{2}k^{2}} (52)

References

  • [Balakrishnan, 2013] Balakrishnan, N. (2013). Handbook of the Logistic Distribution. CRC Press. Google-Books-ID: iaieM_3lcHQC.
  • [Brockmann and Helbing, 2013] Brockmann, D. and Helbing, D. (2013). The Hidden Geometry of Complex, Network-Driven Contagion Phenomena. Science, 342(6164):1337–1342.
  • [Broeck et al., 2011] Broeck, W. V. d., Gioannini, C., Gonçalves, B., Quaggiotto, M., Colizza, V., and Vespignani, A. (2011). The GLEaMviz computational tool, a publicly available software to explore realistic epidemic spreading scenarios at the global scale. BMC Infectious Diseases, 11(1):37.
  • [Chen et al., 2018] Chen, L. M., Holzer, M., and Shapiro, A. (2018). Estimating epidemic arrival times using linear spreading theory. Chaos (Woodbury, N.Y.), 28(1):013105.
  • [Coldman and Goldie, 1986] Coldman, A. J. and Goldie, J. H. (1986). A stochastic model for the origin and treatment of tumors containing drug-resistant cells. Bulletin of Mathematical Biology, 48(3):279–292.
  • [Friso Coerts, 2018] Friso Coerts (2018). Network analysis as a tool to improve infection prevention & outbreak management. Master’s degree Medical Pharmaceutical Drug Innovation, University of Groningen, Groningen.
  • [Gautreau et al., 2007] Gautreau, A., Barrat, A., and Barthelemy, M. (2007). Arrival Time Statistics in Global Disease Spread. Journal of Statistical Mechanics: Theory and Experiment, 2007(09):L09001–L09001. arXiv: 0707.3047.
  • [Gautreau et al., 2008] Gautreau, A., Barrat, A., and Barthelemy, M. (2008). Global disease spread: statistics and estimation of arrival times. Journal of Theoretical Biology, 251(3):509–522. arXiv: 0801.1846.
  • [Gillespie, 1977] Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions. The Journal of Physical Chemistry, 81(25):2340–2361.
  • [Gillespie, 2001] Gillespie, D. T. (2001). Approximate accelerated stochastic simulation of chemically reacting systems. The Journal of Chemical Physics, 115(4):1716–1733.
  • [Goldie and Coldman, 1979] Goldie, J. H. and Coldman, A. J. (1979). A mathematic model for relating the drug sensitivity of tumors to their spontaneous mutation rate. Cancer Treatment Reports, 63(11-12):1727–1733.
  • [González et al., 2010] González, M. V., Puerto, I. M., Martínez, R., Molina, M., Mota, M., and Ramos, A., editors (2010). Workshop on Branching Processes and Their Applications. Lecture Notes in Statistics - Proceedings. Springer-Verlag, Berlin Heidelberg.
  • [Iannelli et al., 2017] Iannelli, F., Koher, A., Brockmann, D., Hoevel, P., and Sokolov, I. M. (2017). Effective Distances for Epidemics Spreading on Complex Networks. Physical Review E, 95(1):012313. arXiv: 1608.06201.
  • [Jamieson-Lane, 2020] Jamieson-Lane, A. (2020). alastair-JL/EpidemicSpread. original-date: 2020-06-22T23:17:54Z.
  • [Kimmel and Axelrod, 2015] Kimmel, M. and Axelrod, D. (2015). Branching Processes in Biology. Interdisciplinary Applied Mathematics. Springer-Verlag, New York, 2 edition.
  • [Leonhard Euler, 1783] Leonhard Euler (1783). ”De serie Lambertina Plurimisque eius insignibus proprietatibus. Acta Acad. Scient. Petropol., 2:29–51.
  • [Mathew George, 2020] Mathew George (2020). B-A Scale-Free Network Generation and Visualization.
  • [Miller, 2018] Miller, J. C. (2018). A primer on the use of probability generating functions in infectious disease modeling. Infectious Disease Modelling, 3:192–248.
  • [Schaffer, 1970] Schaffer, H. E. (1970). Survival of Mutant Genes as a Branching Process. In Kojima, K.-i., editor, Mathematical Topics in Population Genetics, Biomathematics, pages 317–336. Springer, Berlin, Heidelberg.
  • [Shampine and Reichelt, 1997] Shampine, L. F. and Reichelt, M. W. (1997). The MATLAB ODE Suite. SIAM Journal on Scientific Computing, 18(1):1–22.
  • [Verne, 1873] Verne, J. (1873). Around the World in Eighty Days. Porter & Coates. Google-Books-ID: sd06RZq4mN4C.
  • [Watson and Galton, 1875] Watson, H. W. and Galton, F. (1875). On the Probability of the Extinction of Families. The Journal of the Anthropological Institute of Great Britain and Ireland, 4:138–144. Publisher: [Royal Anthropological Institute of Great Britain and Ireland, Wiley].
  • [Weisstein, 2020] Weisstein, E. W. (2020). Lambert W-Function. Library Catalog: mathworld.wolfram.com Publisher: Wolfram Research, Inc.
  • [WHO, 2003] WHO (2003). WHO | Situation Updates - SARS.
  • [WHO, 2009] WHO (2009). WHO | Situation updates - Avian influenza.
  • [WHO, 2020] WHO (2020). Novel Coronavirus (2019-nCoV) situation reports.