Linkage disequlibrium under recurrent bottlenecks

E. Schaper^a,¹¹1Present address: Institute of Computational Science, ETH, CH-8092 Zürich, Switzerland and Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland, A. Eriksson^b, M.Rafajlovic^a, S. Sagitov^c, and B. Mehlig^a
^aDepartment of Physics, University of Gothenburg, SE-41296 Gothenburg, Sweden
^bDepartment of Zoology, University of Cambridge, Cambridge, UK
^cMathematical Sciences, Chalmers University of Technology and University of Gothenburg, SE-41296 Gothenburg, Sweden

Abstract

Understanding patterns of selectively neutral genetic variation is essential in order to model deviations from neutrality, caused for example by different forms of selection. Best understood is neutral genetic variation at a single locus, but additional insights can be gained by investigating genetic variation at multiple loci. The corresponding patterns of variation reflect linkage disequilibrium and provide information about the underlying multi-locus gene genealogies. The statistical properties of two-locus genealogies have been intensively studied for populations of constant census size, as well as for simple demographic histories such as exponential population growth, and single bottlenecks. By contrast, the combined effect of recombination and sustained demographic fluctuations is poorly understood. Addressing this issue, we study a two-locus Wright-Fisher model of a population subject to recurrent bottlenecks. We derive coalescent approximations for the covariance of the times to the most recent common ancestor at two loci. We find, first, that an effective population-size approximation describes the numerically observed linkage disequilibrium provided that recombination occurs either much faster or much more slowly than the population size changes. Second, when recombination occurs frequently between bottlenecks but rarely within bottlenecks, we observe long-range linkage disequilibrium. Third, we show that in the latter case, a commonly used measure of linkage disequilibrium, $\sigma^{2}_{d}$ (closely related to $\hat{r}^{2}$ ), fails to capture long-range linkage disequilibrium because constituent terms, each reflecting long-range linkage disequilibrium, cancel. Fourth, we analyse a limiting case in which long-range linkage disequilibrium can be described in terms of a Xi-coalescent process allowing for simultaneous multiple mergers of ancestral lines.

Keywords: Recurrent bottlenecks, linkage disequilibrium, recombination, gene histories, single nucleotide polymorphism, Kingman’s coalescent, Xi-coalescent

I Introduction

Genetic variation at single neutral loci has been investigated in great detail for population models under different demographic processes, such as population expansions, single bottlenecks, or genetic hitchhiking caused by nearby selective sweeps (see for example (Eriksson et al., 2008) for a review of such models). All of of these models, either with non-overlapping generations, such as the Wright-Fisher model (Fisher, 1930/1999; Wright, 1931), or with continuous reproduction and mortality, such as the Moran model Moran (1958), have in common the underlying assumption of random mating.

Real biological populations exhibit abundance fluctuations on both short and long time scales, caused by e. g. environmental and ecological changes. Such size fluctuations in the form of repeated bottlenecks are characteristic of populations expanding into new territories: examples include the human out-of-Africa scenario (Liu et al., 2006; Ramachandran et al., 2005), the accompanying expansion of the parasite Plasmodicum falciparum causing severe malaria (Tanabe et al., 2010), and the recolonization by the marine snail Littorina saxatilis of Sweden’s west coast archipelago (Johannesson, 2003). It is common practice to accommodate such fluctuations in the theory by using an effective population size instead of the census population size (see (Ewens, 1982) for a review of different measures of the effective population size).

Recent research has highlighted the importance of two competing time scales in the context of effective population-size approximations: the coalescent time scale (which is inversely proportional to the population size and reflects the time to the most recent common ancestor (MRCA) in populations with constant population size), and the time scale of the population-size fluctuations. Clearly, relatively slow demographic fluctuations can be ignored and the effective population size can be approximated by the initial population size Sjödin et al. (2005). In the opposite case of very frequent demographic fluctuations, it has been argued (Wright, 1938; Crow and Kimura, 1970) that genetic variation of a population with varying population size is well described in terms of a population with constant population size $N_{\rm eff}$ , given by the harmonic average of the population size $N_{\tau}$ in generation $\tau$ :

N_{\rm eff}=\lim_{T\rightarrow\infty}\Bigl{(}\frac{1}{T}\sum_{\tau=0}^{T}\frac{1}{N_{\tau}}\Bigr{)}^{-1}\,\,.

(1)

See (Sjödin et al., 2005; Jagers and Sagitov, 2004; Wakeley and Sargsyan, 2009) for recent developments of this concept.

Thus, for both fast and slow demographic fluctuations, the statistical properties of gene genealogies agree with those of the constant population-size model. By contrast, when both time scales are of the same order, it has been shown (Nordborg and Krone, 2003; Kaj and Krone, 2003; Sjödin et al., 2005; Eriksson et al., 2010) that the distribution of total branch lengths in samples of one-locus gene genealogies does not in general agree with that predicted by the standard coalescent approximation. Especially, when subject to periodic fluctuations, the total branch length of gene genealogies may exhibit a maximum for periods matching the coalescent time scale (Eriksson et al., 2010). In summary, the effect of population-size fluctuations upon genetic variation of a single locus is well understood.

But how do population-size fluctuations affect multi-locus patterns of genetic variation on the same chromosome? Such patterns are influenced by recombination. Genetic recombination introduces a new time scale which is inversely proportional to the probability of recombination $r$ between a pair of loci in a single generation.

Genetic recombination plays an important role in shaping empirically observed multi-locus patterns of genetic variation in biological populations. Measures of linkage disequilibrium (LD) quantify the degree of association of genetic variation at pairs of loci on the same chromosome. Common measures of LD, such as $\hat{r}^{2}$ Hill and Robertson (1968), and its approximation $\sigma^{2}_{d}$ Ohta and Kimura (1971); McVean (2002), depend upon the frequencies of alleles at two loci. These measures are thus closely related to the covariance of the times (i. e. the number of generations) to the MRCA of the underlying gene genealogies (McVean, 2002).

In order to illustrate this concept, we show in Fig. 1 our results of the Wright-Fisher dynamics (grey lines) of a population experiencing recurrent bottlenecks, with random durations and separations between bottlenecks. Panels a and b show results for two different sets of parameters. Details are given in Section II. The single-locus properties of a similar model were studied by Sjödin et al. (2005) and also by Eriksson et al. (2010). Each grey line in Fig. 1 shows the covariance of the times to the MRCA for two chromosomes at two loci, as a function of genetic distance along the chromosomes, corresponding to a single realisation of the sequence of bottlenecks, $\mathcal{D}$ (averaged over all pairs of loci the same distance apart). The red lines are the averages of the covariances within each panel. In panel a, the bottlenecks happen frequently and have a short duration; in this case, the single-locus properties are expected to be in good agreement with those of a population with the constant effective population size, given by the harmonic time average, Eq. (1), of the population size $N_{\tau}$ . For such populations, the coalescent approximation predicts (Griffiths, 1981; Hudson, 1983, 1990)

\text{cov}[t_{a(ij)},t_{b(ij)}]=x_{\rm eff}^{2}\frac{Rx_{\rm eff}+18}{(Rx_{\rm eff})^{2}+13Rx_{\rm eff}+18},

(2)

where $t_{a(ij)}$ and $t_{b(ij)}$ are the times to the MRCA of two loci (called $a$ and $b$ ), in a sample of two chromosomes (denoted by $i$ and $j$ ). Moreover, $R=2N_{0}r$ is the scaled recombination rate, and $x_{\rm eff}=N_{\rm eff}/N_{0}$ is the effective population size relative to $N_{0}$ , the population size at the present time. Units of time are chosen so that $\tau=\lfloor tN_{0}\rfloor$ , and $\lfloor tN_{0}\rfloor$ is the largest integer not larger than $tN_{0}$ . Effective population-size approximations, according to Eq. (2), are shown as dashed lines in Fig. 1. In Fig. 1a we observe good agreement between Eq. (2) and the average covariance of the times to the MRCA obtained numerically. However, in Fig. 1b, Eq. (2) agrees with our numerically obtained covariance only for short genetic distances. For large genetic distances, by contrast, the numerical results show that the covariance decreases much more slowly than expected according to Eq. (2). Thus, the results shown in Fig. 1b imply long-range LD in two-locus gene genealogies.

The examples shown in Fig. 1 open many questions in relation to multi-locus gene-genealogies. What are the conditions for the effective population-size approximation to be valid in the multi-locus case? Why does it fail when these conditions are not met? How significant are deviations of the exact result from the effective population-size approximation? Why does long-range linkage appear in some cases? How large are fluctuations around the covariance of the coalescent times, averaged over an ensemble of gene-genealogies and over different demographic histories? What is the significance of the fluctuations around such averages for data analysis?

The aim of this paper is to provide answers to the above questions, by computing the covariance of the times to the MRCA under a model of recurrent bottlenecks introduced in Section II. Our analysis enables us to qualitatively and quantitatively determine the effects of fluctuating population size on the two-locus statistics in terms of the time scales of population-size fluctuations, of coalescence, and of recombination. Using both a numerical and an analytical approach, we estimate the range of validity of the coalescent effective population-size approximation for the two-locus case. We find that the coalescent effective population-size approximation inevitably fails for large recombination rates: the failure is sometimes minor (as in the case shown in Fig. 1a) and sometimes significant (as in the case shown in Fig. 1b). By taking different limits of the parameters of the model, we provide both qualitative and quantitative understanding of how the constant effective population-size approximation may fail. Finally, when bottlenecks are severe, we show that gene genealogies can be approximated by those of a constant-sized population with simultaneous multiple pairwise coalescent events (the so-called Xi-coalescent described by Schweinsberg (2000) and Möhle and Sagitov (2001)). Last but not least, we demonstrate that $\sigma^{2}_{d}$ is surprisingly little affected by the long-range covariances.

II Model

We use a Wright-Fisher model of a population of chromosomes, to trace the ancestry of $L$ loci on a pair of chromosomes backwards in time, until the most recent common ancestor of each locus is found. In each generation $\tau$ , a set of parents is chosen randomly from the previous generation. Genetic recombination of a pair of chromosomes occurs independently with probability $r$ between each adjacent pair of loci.

To investigate the role of population-size fluctuations, we consider a model of recurrent bottlenecks in which the population size can take one of two values, $N_{0}$ or $xN_{0}$ . Here $xN_{0}$ is the population size in the bottlenecks, and $0<x<1$ . The probability of changing the population size, going one generation back in time, depends on the current population size. The switching probability is $p$ and $q$ when the current population size is $N_{0}$ and $xN_{0}$ , respectively. Hence, the expected durations of the high and the low population-size phases are $1/p$ and $1/q$ generations, respectively. The population size in the first generation is taken to be $N_{0}$ . We note that for the one-locus case, such a model has been investigated by Sjödin et al. (2005) and also by Eriksson et al. (2010).

Fig. 2 illustrates population-size fluctuations in this recurrent bottleneck model (panels a and b) and examples of gene genealogies of two loci (called $a$ and $b$ ) in a sample of two chromosomes (panels c and d). In panels c and d, generations with low population size ( $xN_{0}$ ) are marked with yellow, otherwise the population size is $N_{0}$ . Each chromosome is represented by a pair of lines (red and blue lines correspond to loci $a$ and $b$ , respectively). In the generations where a common ancestor is found for a pair of ancestral lines, or a recombination between two loci occurs, the chromosomes are represented by circles instead of lines. The MRCA of a locus is shown as a filled circle. In some cases recombination causes the ancestry of one locus to become associated with a chromosome that lacks direct descendants in the sample (grey circles). The ancestries of such segments of DNA are irrelevant to the gene genealogy of the sample, and these ancestral lines are not traced further.

III Covariance of the times to the MRCA for two individuals

As Fig. 1 shows, the covariance of the times to the MRCA at two loci (called $a$ and $b$ ) depends on the particular sequence of bottlenecks in the history of the population. Let $\text{cov}_{\mathcal{D}}[t_{a(ij)},t_{b(ij)}]$ denote the covariance conditional on the demographic history $\mathcal{D}$ . In the following, we average the conditional covariance over random demographic histories to obtain (see red lines in Fig. 1):

	$\displaystyle\langle\text{cov}_{\mathcal{D}}[t_{a(ij)},t_{b(ij)}]\rangle$	$\displaystyle=\langle\langle t_{a(ij)}t_{b(ij)}\|\mathcal{D}\rangle-\langle t_{a(ij)}\|\mathcal{D}\rangle\langle t_{b(ij)}\|\mathcal{D}\rangle\rangle$
		$\displaystyle=\langle t_{a(ij)}t_{b(ij)}\rangle-\langle\langle t_{a(ij)}\|\mathcal{D}\rangle^{2}\rangle\,\,,$		(3)

where $\langle\dots|\mathcal{D}\rangle$ denotes the expectation conditional on the particular demographic history $\mathcal{D}$ . In the second equality we have used that the expected times to the MRCA are the same for both loci. Note that the averaged conditional covariance is not the same as the unconditional covariance of the times to the MRCA for the full process (given by $\langle t_{a(ij)}t_{b(ij)}\rangle-\langle t_{a(ij)}\rangle^{2}$ ).

We now derive an approximate expression for $\langle{\text{cov}}_{\mathcal{D}}[t_{a(ij)},t_{b(ij)}]\rangle$ using the coalescent approximation (Kingman, 1982), valid in the limit of large population sizes. Thus, in the following calculations, we assume $N_{0}\gg 1$ , and $xN_{0}\gg 1$ . As is usual in coalescent calculations, it is convenient to scale the rates $p$ and $q$ in the Wright-Fisher model by a suitable representative of the population size. Defining the rates $\lambda=pN_{0}$ , and $\lambda_{\text{B}}=qxN_{0}$ allows us to express the probability for two lines to coalesce between two consecutive bottlenecks as $(1+\lambda)^{-1}$ . Correspondingly, the probability for two lines to coalesce during a single bottleneck is given by $(1+\lambda_{\rm B})^{-1}$ . The units of time are chosen so that $\tau=\lfloor tN_{0}\rfloor$ , we take $R=2N_{0}r$ as the scaled recombination rate as mentioned in the introduction. In these units we denote the population size at time $t$ by $N(t)$ , indicating that in the limit of $N_{0}\rightarrow\infty$ , $t$ becomes a continuous variable.

We find the following expression for the term $\langle\langle t_{a(ij)}|\mathcal{D}\rangle\langle t_{b(ij)}|\mathcal{D}\rangle\rangle\equiv\langle\langle t_{a(ij)}|\mathcal{D}\rangle^{2}\rangle$ , occurring in Eq. (III):

\langle\langle t_{a(ij)}|\mathcal{D}\rangle^{2}\rangle=\frac{\lambda_{\rm B}(2x\lambda+\lambda_{\rm B}+3)+x\lambda(x\lambda+x+2)+2}{(\lambda+\lambda_{\rm B}+1)(\lambda+\lambda_{\rm B}+2)}\,\,.

(4)

Details of the derivation of this result are summarised in Appendix A.

In order to evaluate the remaining term in Eq. (III), $\langle t_{a(ij)}t_{b(ij)}\rangle$ , we adapt the method described by Eriksson and Mehlig (2004) for calculating the covariance of the times to the MRCA at two loci, to the present model of recurrent bottlenecks. As can be seen in Fig. 2c, d, there are only a small number of possible combinations of ancestral lines in gene genealogies of two loci for two chromosomes. Thus, we can write down a Markov process for how states of the ancestral lines change along the gene genealogy. The corresponding graph is shown in Fig. 3, where the vertices represent states (combinations of ancestral lines), and the edges represent transitions between the states (the transition rates $w_{ji}$ from $i$ to $j$ are shown along the edges from $i$ to $j$ ). The vertices labeled by a prime correspond to states in a bottleneck.

Following Eriksson and Mehlig (2004) we note that the expectation $\langle t_{a(ij)}t_{b(ij)}\rangle$ is determined by a six-dimensional sub-graph of the graph shown in Fig. 3, consisting of the vertices $1,2,3,1^{\prime},2^{\prime}$ , and $3^{\prime}$ . Let $\bf{M}$ be the corresponding $6\times 6$ transition matrix. Its off-diagonal elements are given by the transition rates, $w_{ji}$ , from state $i$ to state $j$ . The diagonal elements $M_{ii}$ are equal to the negative sum of the rates of all edges leaving node $i$ in the graph, i. e. $M_{ii}=-\sum_{j\neq i}M_{ji}$ . At the present time ( $t=0$ ), the system is in state $1$ in the graph. This is represented by the vector

\displaystyle\mbox{{\boldsymbol{$v$}}}=\begin{bmatrix}1&0&0&0&0&0\end{bmatrix}^{\sf T},

(5)

where $\sf T$ denotes the transpose. It is convenient to combine the transition rates $w_{ji}$ into vectors and matrices as follows:

$u$	$\displaystyle=\begin{bmatrix}1&0&0&x^{-1}&0&0\end{bmatrix},$
$\displaystyle\bf{Q}$	$\displaystyle=\begin{bmatrix}0&2&2&0&0&0\\ 0&0&0&0&2x^{-1}&2x^{-1}\end{bmatrix},$
$\displaystyle\bf{K}$	$\displaystyle=\begin{bmatrix}-(\lambda+1)&\lambda_{\text{B}}x^{-1}\\ \lambda&-(\lambda_{\text{B}}+1)x^{-1}\end{bmatrix},$	(6)
$c$	$\displaystyle=\begin{bmatrix}1&x^{-1}\end{bmatrix}\,.$

In terms of these vectors and matrices we can write Eriksson and Mehlig (2004)

\displaystyle\langle t_{a(ij)}t_{b(ij)}\rangle=\int_{0}^{\infty}dt_{1}\,t_{1}^{2}\,{\mbox{{\boldsymbol{$u$}}}}\,{\rm e}^{{\bf M}\,t_{1}}{\mbox{{\boldsymbol{$v$}}}}+\int_{0}^{\infty}dt_{1}\int_{t_{1}}^{\infty}dt_{2}\,t_{1}t_{2}\,{\mbox{{\boldsymbol{$c$}}}}\,{\rm e}^{{\bf K}\left(t_{2}-t_{1}\right)}\,{\bf Q}\,{\rm e}^{{\bf M}\,t_{1}}{\mbox{{\boldsymbol{$v$}}}},

(7)

where $t_{1}$ and $t_{2}$ are the first and second coalescent events, respectively (i. e. $\min(t_{a(ij)},t_{b(ij)})$ and $\max(t_{a(ij)},t_{b(ij)})$ ). The first term corresponds to a common MRCA of loci $a$ and $b$ (i. e. a transition from states $1$ or $1^{\prime}$ to state $5$ ), and the second to different MRCA (transitions from states $2$ or $3$ to state $4$ , or from states $2^{\prime}$ or $3^{\prime}$ to state $4^{\prime}$ , followed by a transition to state $5$ ). The matrix $\bf{K}$ describes the change in the population size due to recurrent bottlenecks, whereas the vector $c$ contains the coalescent rates in each population-size regime. Both $\bf{M}$ and $\bf{K}$ have negative real eigenvalues. Hence, the integrals in Eq. (7) can be evaluated in terms of matrix inverses Eriksson and Mehlig (2004):

\displaystyle\langle t_{a(ij)}t_{b(ij)}\rangle=2\,{\mbox{{\boldsymbol{$u$}}}}\,(-{\bf M})^{-3}\,{\mbox{{\boldsymbol{$v$}}}}+2\,{\mbox{{\boldsymbol{$c$}}}}\,(-{\bf K})^{-2}\left\{(-{\bf K})^{-1}\,{\bf Q}+{\bf Q}\,(-{\bf M})^{-1}\right\}(-{\bf M})^{-2}\,{\mbox{{\boldsymbol{$v$}}}}.

(8)

Combining Eqs. (4) and (8) yields

\displaystyle\langle\mbox{cov}_{\mathcal{D}}[t_{a(ij)},t_{b(ij)}]\rangle=\frac{R^{3}C_{3}+R^{2}C_{2}+RC_{1}+C_{0}}{R^{4}D_{4}+R^{3}D_{3}+R^{2}D_{2}+RD_{1}+D_{0}}\,\,,

(9)

where the coefficients $C_{i}$ and $D_{i}$ are functions of parameters $x$ , $\lambda$ and $\lambda_{\rm B}$ given in Appendix C.

We now discuss three special limits of this result. First, when the time to the first bottleneck is much longer than the expected times to the MRCA for two chromosomes at two loci in the large population-size regime (i. e. $\lambda\ll 1$ ), the subsequent bottlenecks are irrelevant to the mean covariance. In this case, Eq. (9) reduces to Eq. (2), the expression valid in the case of constant population size with $x_{\text{eff}}=1$ .

Second, assume that the population-size fluctuations are rapid compared to all other processes (this case is described by taking the limit $\lambda\rightarrow\infty$ and $\lambda_{\rm B}\rightarrow\infty$ in such a way that the ratio $\lambda/\lambda_{\rm B}$ is kept constant). Keeping only the leading order of $\lambda$ and $\lambda_{\text{B}}$ in the numerator and the denominator of (9) yields Eq. (2), with $x_{\text{eff}}=(x\lambda+\lambda_{\text{B}})/(\lambda+\lambda_{\text{B}})$ . This is the harmonic time average (1) of $N_{\tau}/N_{0}$ in the recurrent bottleneck model.

Third, we consider the case of severe bottlenecks ( $x\ll 1$ ). Because the expected duration of a bottleneck is given by $x\lambda_{\text{B}}^{-1}$ , this regime implies that bottlenecks are typically of short duration. Such demographic histories can occur during range expansions, where small groups of animals repeatedly colonize new areas (examples of this kind are given in the introduction). This case can be treated analytically by taking the leading order of $x^{-1}$ in the numerator and denominator of Eq. (9). We find:

\displaystyle\langle\text{cov}_{\mathcal{D}}[t_{a(ij)},t_{b(ij)}]\rangle\approx\frac{R^{2}A_{2}+RA_{1}+A_{0}}{R^{2}B_{2}+RB_{1}+B_{0}},

(10)

where $A_{i}$ and $B_{i}$ are functions of parameters $\lambda$ and $\lambda_{\rm B}$ given in Appendix C. Note that this function reaches a plateau for large values of $R$ :

	$\displaystyle\langle\mbox{cov}_{\mathcal{D}}[t_{a(ij)},t_{b(ij)}]\rangle$	$\displaystyle\approx\frac{A_{2}}{B_{2}}$		(11)
	$\displaystyle=$	$\displaystyle\frac{2\lambda_{\rm B}(1+\lambda_{\rm B})\lambda}{(1+\lambda_{\rm B}+\lambda)(2+\lambda_{\rm B}+\lambda)(9(2+\lambda)+\lambda_{\rm B}(27+8\lambda+\lambda_{\rm B}(10+\lambda_{\rm B}+\lambda)))}\,.$

This expression implies long-range linkage disequilibrium since the right hand side does not depend upon $R$ .

IV Comparison of the coalescent calculations to the Wright-Fisher simulations

In order to further illustrate the role of the relevant time scales of genetic drift and recombination in shaping LD, we compare the full coalescent result, Eq. (9), and the different limiting cases considered in Section III, to the average covariance calculated from the Wright-Fisher simulations. The comparisons are shown in Fig. 4.

The parameters used in Fig. 4a correspond to rapid population-size fluctuations (the second example described in the previous section). As can be seen in Fig. 4a, the agreement between the numerical result (red line) and the approximation (2) (dashed line) is good for a wide range of recombination rates. A small disagreement appears at large values of $R$ , more precisely at $R\approx 100$ . This discrepancy is expected, as at such large recombination rates, the population-size fluctuations are no longer rapid compared to the process of recombination. In summary, in the case of rapid population-size fluctuations, the results of the Wright-Fisher simulations are well approximated by Eq. (2) as long as genetic recombination does not occur too frequently.

Fig. 4b shows results for parameters corresponding to severe bottlenecks (the third example described in the previous section). As we noted already in the introduction, this case exhibits long-range linkage disequilibrium which cannot be accurately described by the effective population-size approximation (2). We observe that the average covariance curve can be divided into three regions where the curve behaves qualitatively differently. First, for very small recombination rates, the covariance can be approximated using the harmonic average effective population size. This is expected since, in this region, the population-size fluctuations are fast compared to the process of recombination. Second, the constant effective population-size approximation breaks down when $R\approx\lambda=10$ . For recombination rates larger than this value, by contrast, recombination occurs frequently between bottlenecks but rarely within, and multiple coalescent events during bottlenecks become frequent. Since the bottlenecks are short in this regime, this may lead to long-range LD. In this regime, the covariance is approximated by Eq. (10). We observe that, for a large range of recombination rates, Eq. (10) is in excellent agreement with both the full coalescent result, Eq. (9), and the simulations. Third, we see that the agreement between Eq. (10) and the full coalescent result breaks down when recombination events can no longer be ignored in the bottlenecks (i. e. when $R$ is of the same order as the rate of leaving a bottleneck, $\lambda_{\text{B}}x^{-1}$ , or larger). For still larger recombination rates, only the full coalescent result agrees with the Wright-Fisher simulations. The slight deviations between the Wright-Fisher simulations and the coalescent result (9), visible in Fig. 4, are discussed in the concluding section.

V The effect of recurrent bottlenecks upon $\sigma^{2}_{d}$

In the previous sections we have shown how sustained population-size fluctuations in the form of recurrent bottlenecks give rise to long-range linkage disequilibrium, measured by the covariance (III) of the times to the MRCA. An important question is how such population-size fluctuations affect more common measures of linkage disequilibrium such as, for example, $\hat{r}^{2}$ , and its approximation, $\sigma^{2}_{d}$ (Ohta and Kimura, 1971; McVean, 2002). In this section we discuss the effect of recurrent bottlenecks upon $\sigma^{2}_{d}$ . While the covariance of the times to the MRCA is obtained by comparing two chromosomes, the measure $\sigma_{d}^{2}$ is computed in a large sample of chromosomes. In the following we consider the limit of infinite sample size. In this limit, and in terms of the expectations and covariances conditional on the demographic history $\mathcal{D}$ , Eq. (9) in McVean (2002) becomes:

\displaystyle\sigma^{2}_{d}=\left\langle\frac{\text{cov}_{\mathcal{D}}[t_{a(ij)},t_{b(ij)}]-2\text{cov}_{\mathcal{D}}[t_{a(ij)},t_{b(ik)}]+\text{cov}_{\mathcal{D}}[t_{a(ij)},t_{b(kl)}]}{\langle t_{a(ij)}|\mathcal{D}\rangle^{2}+\text{cov}_{\mathcal{D}}[t_{a(ij)},t_{b(kl)}]}\right\rangle\,.

(12)

As before, $a$ and $b$ denote two loci, and $i$ , $j$ , $k$ , and $l$ refer to four different chromosomes in a large sample. The main properties of this measure are determined by how the numerator depends on the recombination rate McVean (2002). In order to simplify the analysis, we therefore focus on the expected value of the numerator, i. e.

\langle\text{cov}_{\mathcal{D}}[t_{a(ij)},t_{b(ij)}]\rangle-2\langle\text{cov}_{\mathcal{D}}[t_{a(ij)},t_{b(ik)}]\rangle+\langle\text{cov}_{\mathcal{D}}[t_{a(ij)},t_{b(kl)}]\rangle.

The covariance $\langle\text{cov}_{\mathcal{D}}[t_{a(ij)},t_{b(ij)}]\rangle$ is given by Eq. (9). The covariances $\langle\text{cov}_{\mathcal{D}}[t_{a(ij)},t_{b(ik)}]\rangle$ and $\langle\text{cov}_{\mathcal{D}}[t_{a(ij)},t_{b(kl)}]\rangle$ can be calculated in the same way as Eq. (9) was obtained, but starting from different initial conditions (McVean, 2002; Eriksson and Mehlig, 2004). In our Markov representation, this corresponds to taking ${\mbox{{\boldsymbol{$v$}}}}=[0\ 1\ 0\ 0\ 0\ 0]^{T}$ and ${\mbox{{\boldsymbol{$v$}}}}=[0\ 0\ 1\ 0\ 0\ 0]^{T}$ , respectively, in Eq. (7).

Fig. 5 shows the relation between the covariances $\langle{\rm cov}_{\mathcal{D}}[t_{a(ij)},t_{b(ij)}]\rangle$ (blue lines), $\langle{\rm cov}_{\mathcal{D}}[t_{a(ij)},t_{b(ik)}]\rangle$ (red lines), and $\langle{\rm cov}_{\mathcal{D}}[t_{a(ij)},t_{b(kl)}]\rangle$ (green lines). As can be seen, in a region of low values of $R$ each covariance is well approximated by the corresponding constant effective population-size approximation, as explained in Section III. Note that in the case shown in panel b, all three covariances exhibit a plateau at the same level, in approximately the same range of $R$ values. Thus, the linear combination of covariances in the numerator of $\sigma^{2}_{d}$ cancel an enhancement relative to the effective population-size approximation. This is the reason why the plateau, present in each covariance, does not show in the numerator of $\sigma^{2}_{d}$ . In other words, the information about the sustained population-size fluctuations is not preserved in $\sigma_{d}^{2}$ .

VI Discussion

The aim of this paper was to provide an understanding on how sustained random population-size fluctuations influence gene-history correlations. Our conclusions are based on both analytical and numerical calculations, which we find to agree well. Using the particular population-size model, depicted in Fig. 2a, we have derived an exact result for the covariance of the times to the MRCA of two loci, Eq. (9). We have discussed three particular limits of our result. First, if the expected times to the MRCA of two loci are less than the expected time to the most recent bottleneck, our model reduces to the constant population-size model with the effective population size equal to the population size at the present time. Second, if the population size fluctuates much faster than the remaining two processes (coalescence and recombination), the coalescent effective population-size approximation again works well, but with the effective population size given by the harmonic average (1). These two cases are consistent with earlier findings of investigations of single-locus properties of populations which exhibit population-size variations (Sjödin et al., 2005; Eriksson et al., 2010). In the third case, bottlenecks are severe (large difference in population sizes in and between bottlenecks) and the time between bottlenecks is assumed to be brief. In this case, the result of the simulations depends on the relation between the time scale of recombination and the time scale of population-size changes. When recombination is the slowest process (i. e. when the recombination rate is sufficiently small), the coalescent effective population-size approximation (with the effective population size given by the harmonic average (1)) is a good approximation (this is essentially the same as in the second case). Conversely, when recombination is the fastest process, the covariance is approximately $1/R$ (same as in the first case). Finally, when recombination is intermediate, slow enough that recombination events during bottlenecks are rare, but fast enough to decorrelate gene genealogies between bottlenecks, the covariance exhibits a plateau. The plateau corresponds to an enhanced covariance of the times to the MRCA, with respect to that expected in the effective constant population-size case. Thus, in this case, pairs of distant loci are expected to exhibit a high degree of linkage.

These conclusions rely on analysing covariances averaged over different demographic histories. This raises the question how typical such averages are. In other words, how large are the fluctuations around the average? Fig. 1 shows that in the case of severe bottlenecks, the fluctuations around the mean covariance are much higher than, for example, in the case of fast population-size fluctuations. This can be explained as follows: in the case of severe bottlenecks, the times to the MRCA of both loci are determined by the time to the bottleneck that hosts two pairwise mergers (for very strong bottlenecks, this is simply the time to the first bottleneck).

The coalescent approximations employed in this paper assume large population sizes. While we generally find very good agreement between the coalescent approximations and the Wright-Fisher dynamics, we observe some deviations, in particular for severe bottlenecks and large recombination rates. The coalescent approximations assume that the time between two recombination events is exponentially distributed. However, this is only accurate in the limit of small recombination rates, $r$ . Because the relevant parameter is $R=2N_{0}r$ , by increasing $N_{0}$ we reach a better agreement between the corresponding analytical and numerical results in the range of values of $R$ shown in Figs. 1 and 4. We have made additional simulations to confirm that this is the case (not shown).

It is worth mentioning that the result for the case of severe bottlenecks ( $x\ll 1$ ), Eq. (10), can be understood in terms of the so-called Xi-coalescent approximation. Xi-coalescents form a broad family of gene-genealogical models allowing for simultaneous multiple pairwise coalescent events (mergers). The Kingman coalescent is a special case, allowing only for pairwise mergers. See (Schweinsberg, 2000; Möhle and Sagitov, 2001) for detailed descriptions of the family of Xi-coalescents. In the case of severe bottlenecks ( $x\ll 1$ ), coalescent events during a bottleneck may appear as simultaneous multiple mergers (as pointed out also by Birkner et al. (2009)). We show in Appendix C how the Markov process with simultaneous multiple mergers is obtained in this case, and compute the corresponding transition rates $w_{ji}$ . This provides an alternative way of deriving Eq. (10). More importantly, it provides insight into why the plateau forms. It turns out that the plateau arises as a direct consequence of simultaneous multiple mergers. We note that this means that long-range gene-history correlations are also expected in other situations where simultaneous multiple mergers are important. Examples are populations with strongly skewed reproduction laws, or populations subject to selective sweeps.

We conclude with the observation that $\sigma^{2}_{d}$ , a measure of LD, fails to show the plateaus present in its constituent covariances (this was observed already in Eriksson and Mehlig (2004) for the case of a single, recent bottleneck). Because of the close link between $\sigma^{2}_{d}$ and $\hat{r}^{2}$ , a common measure of LD McVean (2002), this casts doubt on the suitability of such measures for characterising LD (another example is the measure $HR^{2}$ (Sabatti and Risch, 2002)), in populations that may have been subject to recent population bottlenecks and range expansions. A more accurate approach, especially for detecting long-range LD, may be to estimate the covariance of the times to the MRCA directly. For example, simulations show that the covariance of the number of mutations in small windows (e. g. a few hundred nucleotides long) can be used to estimate the covariance of the times to the MRCA Eriksson and Mehlig (2004). However, it remains to investigate which observables are most suitable for detecting long-range dependencies in the underlying gene genealogies for more general demographic histories.

Acknowledgements. Support by Swedish Research Council grants, the Göran Gustafsson stiftelse, and by the Centre for Theoretical Biology at the University of Gothenburg are gratefully acknowledged. AE was supported by a Philip Leverhume Award and a Biotechnology and Biological Sciences Research Council grant (BB/H005854/1).

References

Birkner et al. (2009) Birkner, M., Blath, J., Möhle, M., Steinrücken, M., Tams, J., 2009. A modified lookdown construction for the xi-fleming-viot process with mutation and populations with recurrent bottlenecks. ALEA 6, 25–61.
Crow and Kimura (1970) Crow, J. F., Kimura, M., 1970. An introduction to population genetics theory. Harper & Row, London, discussion of linkage equilibrium on p. 50.
Eriksson et al. (2008) Eriksson, A., Fernstrom, P., Mehlig, B., Sagitov, S., 2008. An accurate model for genetic hitchhiking. Genetics 178 (1), 439–451.
Eriksson and Mehlig (2004) Eriksson, A., Mehlig, B., 2004. Gene-history correlation and population structure. Physical Biology 1, 220–228.
Eriksson et al. (2010) Eriksson, A., Mehlig, B., Rafajlovic, M., Sagitov, S., 2010. The total branch length of sample genealogies in populations of variable size. Genetics 186, 601–611
Ewens (1982) Ewens, W., 1982. The concept of the effective population size. Theor. Popul. Biol. 21, 373–378.
Fisher (1930/1999) Fisher, R. A., 1930/1999. The Genetical Theory of Natural Selection, variorum Edition. Oxford University Press.
Griffiths (1981) Griffiths, R. C., 1981. Neutral 2-locus multiple allele models with recombination. Theor. Pop. Biol. 19, 169–186.
Hill and Robertson (1968) Hill, W. G., Robertson, A., 1968. Linkage disequilibrium in finite populations. TAG Theoretical and Applied Genetics 38, 226–231, 10.1007/BF01245622.
URL http://dx.doi.org/10.1007/BF01245622
Hudson (1983) Hudson, R. R., 1983. Properties of a neutral allele model with intragenic recombination. Theoretical Population Biology 23, 183–201.
Hudson (1990) Hudson, R. R., 1990. Gene genealogies and the coalescent process. Oxford Surveys in Evolutionary Biology 7, 1–44.
Jagers and Sagitov (2004) Jagers, P., Sagitov, S., 2004. Convergence to the coalescent in populations of substantially varying size. Journal of Applied Probability 41 (1), 368–378.
Johannesson (2003) Johannesson, K., 2003. Evolution in littorina: ecology matters. Journal of Sea Research 49 (2), 107–117.
Kaj and Krone (2003) Kaj, I., Krone, S., 2003. The coalescent process in a population with stochastically varying size. J. Appl. Prob. 40, 33–48.
Kingman (1982) Kingman, J. F. C., 1982. The coalescent. Stochastic Processes and their Applications 13, 235–248.
Liu et al. (2006) Liu, H., Prugnolle, F., Manica, A., Balloux, F., Aug 2006. A geographically explicit genetic model of worldwide human-settlement history. Am. J. Hum. Genet. 79 (2), 230–7.
McVean (2002) McVean, G., 2002. A genealogical interpretation of linkage disequilibrium. Genetics 162, 987 – 991.
Möhle and Sagitov (2001) Möhle, M., Sagitov, S., 2001. A classification of coalescent processes for haploid exchangeable population models. Ann. Probab. 29, 156–2.
Moran (1958) Moran, P. A. P., 1958. Random processes in genetics. Proc. Cambridge Philos. Soc. 54, 60–71.
Nordborg and Krone (2003) Nordborg, M., Krone, S., 2003. Modern Developments in Population Genetics: The Legacy of Gustave Malécot. Oxford University Press, Oxford, pp. 194–232.
Ohta and Kimura (1971) Ohta, T., Kimura, M., 1971. Linkage disequilibrium between two segregating nucleotide sites under the steady flux of mutations in a finite population. Genetics 68 (4), 571–580.
Ramachandran et al. (2005) Ramachandran, S., Deshpande, O., Roseman, C., Rosenberg, N., Feldman, M., Cavalli-Sforza, L., Nov. 2005. Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proceedings of the National Academy of Sciences of the United States of America 102 (44), 15942–15947.
Sabatti and Risch (2002) Sabatti, C., Risch, N., Apr 2002. Homozygosity and linkage disequilibrium. Genetics 160 (4), 1707–19.
Sagitov (2003) Sagitov, S., 2003. Convergence to the coalescent with simultanous multiple mergers. J. Appl. Probab 36, 1116–1125.
Sagitov et al. (2011) Sagitov, S., Mehlig, B., Rafajlovic, M., Eriksson, A., 2011. Coalescent algorithms for populations with skewed reproduction, recurrent bottlenecks and selective sweeps. In preparation.
Schweinsberg (2000) Schweinsberg, J., 2000. Coalescents with simultaneous multiple collisions. Electron. J. Probab. 5, 1–55.
Sjödin et al. (2005) Sjödin, P., Kaj, I., Krone, S., Lascoux, M., Nordborg, M., 2005. On the meaning and existence of an effective population size. Genetics 169 (2), 1061–1070.
Tanabe et al. (2010) Tanabe, K., Mita, T., Jombart, T., Eriksson, A., Horibe, S., Palacpac, N., Ranford-Cartwright, L., Sawai, H., Sakihama, N., Ohmae, H., Nakamura, M., Ferreira, M. U., Escalante, A. A., Prugnolle, F., Björkman, A., Färnert, A., Kaneko, A., Horii, T., Manica, A., Kishino, H., Balloux, F., Jul 2010. Plasmodium falciparum accompanied the human expansion out of Africa. Curr Biol 20 (14), 1283–9.
Wakeley and Sargsyan (2009) Wakeley, J., Sargsyan, O., 2009. Extensions of the coalescent effective population size. Genetics 181, 341–345.
Wright (1931) Wright, S., 1931. Evolution in Mendelian populations. Genetics 16, 97–159.
Wright (1938) Wright, S., 1938. Size of a population and breeding structure in relation to evolution. Science 87, 430–431.

Appendix A Calculation of $\langle\langle t_{a(ij)}|\mathcal{D}\rangle^{2}\rangle$

In this Appendix we calculate $\langle\langle t_{a(ij)}|\mathcal{D}\rangle^{2}\rangle$ using a recursion. The resulting expression for $\langle\langle t_{a(ij)}|\mathcal{D}\rangle^{2}\rangle$ was used in evaluating Eq. (III). In the Wright-Fisher model, let $\xi$ and $\xi_{\text{B}}$ be the expected times to the MRCA (i. e. $N_{0}\langle t_{a(ij)}|\mathcal{D}\rangle$ , as in Section III) for bottleneck sequences starting outside and in the bottleneck, respectively. Furthermore, let $y=N_{0}^{-1}$ and $y_{\text{B}}=(xN_{0})^{-1}$ (note that $yy_{\text{B}}^{-1}=x$ ). Furthermore, let $\eta$ and $\eta_{\text{B}}$ be independent stochastic variables which are unity with probability $p$ and $q$ , respectively, and are zero otherwise. We have

	$\displaystyle\xi$	$\displaystyle=1+(1-y)[(1-\eta)\xi^{\prime}+\eta\xi_{\text{B}}^{\prime}]\,\,,$
	$\displaystyle\xi_{\text{B}}$	$\displaystyle=1+(1-y_{\text{B}})[(1-\eta_{\text{B}})\xi_{\text{B}}^{\prime}+\eta_{\text{B}}\xi^{\prime}]\,\,,$		(13)

where $\xi^{\prime}$ and $\xi_{\text{B}}^{\prime}$ have the same distribution as $\xi$ and $\xi_{\text{B}}$ , respectively, but are statistically independent. Taking the expected value of both sides, one obtains a linear system of equations for $\langle\xi\rangle$ and $\langle\xi_{\text{B}}\rangle$ . Solving this system yields

	$\displaystyle\langle\xi\rangle$	$\displaystyle=\frac{p(1-y)+q(1-y_{\text{B}})+y_{\text{B}}}{yy_{\text{B}}+py_{\text{B}}(1-y)+qy(1-y_{\text{B}})}\,\,,$		(14)
	$\displaystyle\langle\xi_{\text{B}}\rangle$	$\displaystyle=\frac{p(1-y)+q(1-y_{\text{B}})+y}{yy_{\text{B}}+py_{\text{B}}(1-y)+qy(1-y_{\text{B}})}\,\,.$		(15)

In order to calculate $\langle t_{a(ij)}|\mathcal{D}\rangle^{2}$ , we use Eq. (A) to write:

	$\displaystyle\xi^{2}$	$\displaystyle=1+2(1-y)[(1-\eta)\xi^{\prime}+\eta\xi_{\text{B}}^{\prime}]+(1-y)^{2}(1-\eta)^{2}\xi^{\prime 2}+(1-y)^{2}\eta^{2}\xi_{\text{B}}^{\prime 2}\,\,,$
	$\displaystyle\xi_{\text{B}}^{2}$	$\displaystyle=1+2(1-y_{\text{B}})[(1-\eta_{\text{B}})\xi_{\text{B}}^{\prime}+\eta_{\text{B}}\xi^{\prime}]+(1-y_{\text{B}})^{2}(1-\eta_{\text{B}})^{2}\xi_{\text{B}}^{\prime 2}+(1-y_{\text{B}})^{2}\eta_{\text{B}}^{2}\xi^{\prime 2}\,\,.$		(16)

Note that the terms containing $\xi^{\prime}\xi_{\text{B}}^{\prime}$ are absent because they contain factors $\eta(1-\eta)$ or $\eta_{\text{B}}(1-\eta_{\text{B}})$ , and $\eta$ and $\eta_{\text{B}}$ are either zero or unity. Taking the expected value of both sides of these equations, and using the property of independence, one again obtains a linear system that can be solved for $\langle\xi^{2}\rangle$ and $\langle\xi_{\text{B}}^{2}\rangle$ :

	$\displaystyle\langle\xi^{2}\rangle$	$\displaystyle=1+2(1-y)[(1-p)\langle\xi\rangle+p\langle\xi_{\text{B}}\rangle]+(1-y)^{2}(1-p)\langle\xi^{2}\rangle+(1-y)^{2}p\langle\xi_{\text{B}}^{2}\rangle\,\,,$
	$\displaystyle\langle\xi_{\text{B}}^{2}\rangle$	$\displaystyle=1+2(1-y_{\text{B}})[(1-q)\langle\xi_{\text{B}}\rangle+q\langle\xi\rangle]+(1-y_{\text{B}})^{2}(1-q)\langle\xi_{\text{B}}^{2}\rangle+(1-y_{\text{B}})^{2}q\langle\xi^{2}\rangle\,\,.$		(17)

To leading order in $N_{0}^{-1}$ , the solution to Eq. (17) is:

\displaystyle\langle\xi^{2}\rangle

\displaystyle=N_{0}^{2}\langle\langle t_{a(ij)}|\mathcal{D}\rangle^{2}\rangle=N_{0}^{2}\frac{\lambda_{\text{B}}(\lambda_{\text{B}}+3+2\lambda x)+\lambda x(\lambda x+x+2)+2}{\left(\lambda_{\text{B}}+\lambda+1\right)\left(\lambda_{\text{B}}+\lambda+2\right)}\,\,.

(18)

Eq. (18) corresponds to the result (4) given in Section III. Alternatively, Eq. (18) can be derived using Eq. (20) in Eriksson et al. (2010).

Appendix B Coefficients $C_{i},D_{i},A_{i},B_{i}$ in formulae (9) and (10)

In this Appendix we list the coefficients appearing in Eqs. (9) and (10). The coefficients appearing in Eq. (9) are given by:

$\displaystyle C_{0}=$	$\displaystyle 36x^{5}\left(\lambda_{\rm B}+\lambda+3\right)\left(\lambda_{\rm B}+\lambda+6\right)(\lambda_{\rm B}(\lambda_{\rm B}\left(\lambda_{\rm B}+2x\lambda+\lambda+4\right)$
	$\displaystyle+\lambda(x((x+2)\lambda+x+6)+1)+5)+x\lambda(x(\lambda+1)(\lambda+3)+2)+2)\,\,,$
$\displaystyle C_{1}=$	$\displaystyle 2x^{5}(\lambda_{\rm B}(\lambda_{\rm B}(\lambda_{\rm B}(\lambda_{\rm B}(\lambda_{\rm B}+3x(\lambda+9)+2\lambda+13)+(3x(x+2)+1)\lambda^{2}$
	$\displaystyle+(x(55x+76)+29)\lambda+252x+59)+x^{3}\lambda(\lambda+1)(\lambda+27)$
	$\displaystyle+2x^{2}\lambda(\lambda(3\lambda+58)+213)+x(\lambda(\lambda(3\lambda+80)+370)+801)+2\lambda(8\lambda+55)+119)$
	$\displaystyle+x^{3}\lambda(\lambda+1)(\lambda+21)(2\lambda+9)+x^{2}\lambda(\lambda(\lambda(3\lambda+76)+452)+903)$
	$\displaystyle+x(\lambda(\lambda(31\lambda+226)+647)+1044)+(3\lambda+4)(5\lambda+27))$
	$\displaystyle+x(\lambda(x^{2}(\lambda+1)(\lambda+3)(\lambda+6)(\lambda+15)+x(5\lambda+18)(\lambda(3\lambda+16)+33)$
	$\displaystyle+44\lambda+270)+468)+18(\lambda+2))\,\,,$
$\displaystyle C_{2}=$	$\displaystyle x^{5}(\lambda_{\rm B}(\lambda_{\rm B}(x\lambda_{\rm B}(3\lambda_{\rm B}+2x(4\lambda+9)+4(\lambda+7))+x(x^{2}\lambda(7\lambda+39)$
	$\displaystyle+10x(\lambda(\lambda+7)+9)+\lambda(\lambda+25)+89)+4\lambda)$
	$\displaystyle+x(2x^{3}\lambda(\lambda+1)(\lambda+9)+x^{2}\lambda(\lambda(8\lambda+65)+129)$
	$\displaystyle+2x(\lambda+2)^{2}(\lambda+18)+\lambda(9\lambda+55)+116)+4\lambda)$
	$\displaystyle+x(x(\lambda(2x^{2}(\lambda+1)(\lambda+3)(\lambda+6)+x(\lambda(\lambda(\lambda+22)+83)+102)$
	$\displaystyle+4\lambda^{2}+42\lambda+96)+72)+26(\lambda+2)))\,\,,$
$\displaystyle C_{3}=$	$\displaystyle x^{7}\left(\lambda_{\rm B}+\lambda+2\right)\left(\lambda_{\rm B}\left(\lambda_{\rm B}+2x\lambda+3\right)+x\lambda(x\lambda+x+2)+2\right)\,\,,$	(19)

$\displaystyle D_{0}=$	$\displaystyle 36x^{5}\left(\lambda_{\rm B}+\lambda+1\right){}^{2}\left(\lambda_{\rm B}+\lambda+2\right)\left(\lambda_{\rm B}+\lambda+3\right)\left(\lambda_{\rm B}+\lambda+6\right)\,\,,$
$\displaystyle D_{1}=$	$\displaystyle 2x^{5}(\lambda_{\rm B}+\lambda+1)(\lambda_{\rm B}+\lambda+2)(\lambda_{\rm B}(\lambda_{\rm B}(13\lambda_{\rm B}+13(x+2)\lambda+27x+130)$
	$\displaystyle+13(2x+1)\lambda^{2}+157(x+1)\lambda+9(19x+39))$
	$\displaystyle+13x(\lambda+1)(\lambda+3)(\lambda+6)+9(\lambda+2)(3\lambda+13))\,\,,$
$\displaystyle D_{2}=$	$\displaystyle x^{5}(\lambda_{\rm B}+\lambda+1)(\lambda_{\rm B}+\lambda+2)(\lambda_{\rm B}(\lambda_{\rm B}(2\lambda_{\rm B}+4x\lambda+39x+2\lambda+20)$
	$\displaystyle+x(2x(\lambda(\lambda+8)+9)+4\lambda^{2}+86\lambda+247)+16\lambda+54)$
	$\displaystyle+2x^{2}(\lambda+1)(\lambda+3)(\lambda+6)+13x(\lambda+2)(3\lambda+13)+18(\lambda+2))\,\,,$
$\displaystyle D_{3}=$	$\displaystyle x^{6}\left(\lambda_{\rm B}+\lambda+1\right)\left(\lambda_{\rm B}+\lambda+2\right){}^{2}\left(3\lambda_{\rm B}+x(3\lambda+13)+13\right)\,\,,$
$\displaystyle D_{4}=$	$\displaystyle x^{7}\left(\lambda_{\rm B}+\lambda+1\right)\left(\lambda_{\rm B}+\lambda+2\right){}^{2}\,\,.$	(20)

The coefficients appearing in Eq. (10) are given by:

$\displaystyle A_{0}=$	$\displaystyle 18(\lambda_{\rm B}+1)\left(\lambda_{\rm B}+\lambda+3\right)\left(\lambda_{\rm B}+\lambda+6\right)\left(\lambda_{\rm B}\left(\lambda_{\rm B}+\lambda+3\right)+2\right)\,\,,$
$\displaystyle A_{1}=$	$\displaystyle\left(\lambda_{\rm B}+1\right)\Bigl{(}\lambda_{\rm B}\bigl{(}\lambda_{\rm B}(\lambda_{\rm B}^{2}+2(\lambda+6)\lambda_{\rm B}+\lambda(\lambda+27)+47)+\lambda(15\lambda+83)+72\bigr{)}+18(\lambda+2)\Bigr{)}\,\,,$
$\displaystyle A_{2}=$	$\displaystyle 2\lambda_{\rm B}(\lambda_{\rm B}+1)\lambda\,\,,$
$\displaystyle B_{0}=$	$\displaystyle 18\left(\lambda_{\rm B}+\lambda+1\right){}^{2}\left(\lambda_{\rm B}+\lambda+2\right)\left(\lambda_{\rm B}+\lambda+3\right)\left(\lambda_{\rm B}+\lambda+6\right)\,\,,$
$\displaystyle B_{1}=$	$\displaystyle\left(\lambda_{\rm B}+\lambda+1\right)\left(\lambda_{\rm B}+\lambda+2\right)\Bigl{(}\lambda_{\rm B}\bigl{(}13\lambda_{\rm B}\left(\lambda_{\rm B}+2(\lambda+5)\right)+\lambda(13\lambda+157)+351\bigr{)}+9(\lambda+2)(3\lambda+13)\Bigr{)}\,\,,$
$\displaystyle B_{2}=$	$\displaystyle\left(\lambda_{\rm B}+\lambda+1\right)\left(\lambda_{\rm B}+\lambda+2\right)\left(\lambda_{\rm B}\left(\lambda_{\rm B}\left(\lambda_{\rm B}+\lambda+10\right)+8\lambda+27\right)+9(\lambda+2)\right)\,\,.$	(21)

Appendix C Severe bottlenecks: connection to the Xi-coalescent

In this Appendix, we turn our attention to a special case of population-size fluctuations: recurrent severe bottlenecks, that is $x\rightarrow 0$ . As we now show, single-locus gene genealogies in this limit are well approximated by the Xi-coalescent approximation (see also Sagitov et al. (2011)).

Consider fixed values of $\lambda~{\rm and}~\lambda_{\rm B}$ and take the limit of $x\rightarrow 0$ . Recall that the rate of leaving a bottleneck is given by $x^{-1}\lambda_{\rm B}$ . Thus, as $x$ is being decreased, and $\lambda_{\rm B}$ is kept constant, the time between coalescent events hosted by a single bottleneck becomes shorter, ultimately leading to a failure of the Kingman coalescent approximation. In the limit of $x\rightarrow 0$ , multiple pairwise mergers in a bottleneck appear as a single simultaneous multiple merger. It turns out that one-locus gene genealogies in this case are well approximated by the Xi-coalescent approximation.

We show here that, in the case of severe bottlenecks, applying the Xi-coalescent approximation yields Eq. (10) for the mean covariance $\langle\text{cov}_{\mathcal{D}}[t_{a(ij)},t_{b(ij)}]\rangle$ . Our method for calculating the term $\langle t_{a(ij)}t_{b(ij)}\rangle$ is described in the main text (see Eq. (7) in Section III). But in the Xi-coalescent approximation, described in this appendix, the Markov process is different than the one described in Section III. The corresponding graph is shown in Fig. 6. It consists of the same five states $1,\ldots,5$ shown in Fig. 3, but in Fig. 6 the states in bottlenecks are omitted, since the time system spends in a single bottleneck is short. The corresponding transition rates, $w_{ji}$ , from state $i$ to $j$ , are listed in Fig. 6. In the following, we show how the rates $w_{ji}$ can be derived using the Xi-coalescent approximation. These rates determine the matrix $\bf M$ , and vectors $u$ , and $Q$ , as well as $K$ and $c$ , appearing in Eq. (7).

C.1 Formulae for $w_{ji}$ under the Xi-coalescent approximation

In a severe bottleneck, $l$ incoming lines are allowed to coalesce almost instantaneously into $b\leq l-1$ outgoing lines. Note that in Kingman’s coalescent case one has $b=l-1$ . In the Xi-coalescent, by contrast, $l$ lines are partitioned into $b$ families, such that $k_{i}$ families are of size $i=1,\ldots,l$ (see an illustration in Fig. 7). By construction, the following condition must be satisfied:

l=\sum_{i=1}^{l}ik_{i},~b=\sum_{i=1}^{l}k_{i}\,\,.

(22)

In our model, the collision rate $\phi_{\{l;k_{1},\ldots,k_{l}\}}$ of $l$ lines colliding into a particular partition $\{l;k_{1},\ldots,k_{l}\}$ , such that Eq. (22) is satisfied, is given by:

\phi_{\{l;k_{1},\ldots,k_{l}\}}={1}_{\{b=l-1\}}+\lambda\Xi_{\{l;k_{1},\ldots,k_{l}\}}

(23)

where the first term stands for the Kingman coalescent, and the second term corresponds to the contribution from multiple mergers which occur at a rate $\lambda$ . Given the probability, $C_{lb}$ , that during a bottleneck $l$ lines collide into $b$ lines, $\Xi_{\{l;k_{1},\ldots,k_{l}\}}$ can be calculated according to:

\Xi_{\{l;k_{1},\ldots,k_{l}\}}=C_{lb}p_{\{l;k_{1},\ldots,k_{l}\}}\,\,.

where $p_{\{l;k_{1},\ldots,k_{l}\}}$ is the probability of observing a particular partition $\{l;k_{1},\ldots,k_{l}\}$ of $l$ lines. As shown by Kingman (1982), it is given by:

p_{\{l;k_{1},\ldots,k_{l}\}}=\frac{(l-b)!b!(b-1)!}{l!(l-1)!}\prod_{i=1}^{l}(i!)^{k_{i}}\,\,.

The probability $C_{lb}$ is calculated as follows. Assume that $l$ lines enter a bottleneck. The probability that the bottleneck hosts $l-b$ coalescent events can be obtained in two steps. First, the total coalescent time needed to arrive from $l$ to $b$ lines should be less than or equal to the duration of the bottleneck. Second, the arrival at a state with $b-1$ lines must occur after the end of the bottleneck. For simplicity, we choose here to measure time in units of the population size during the bottleneck so that the coalescent rate is unity as in the standard coalescent case. In these units, the time in the bottleneck is exponentially distributed, with the parameter $\lambda_{\rm B}$ . Accordingly, the following expression for $C_{lb}$ is obtained:

C_{lb}=\frac{\lambda_{\rm B}}{\binom{b}{2}+\lambda_{\rm B}}\prod_{i=b+1}^{l}\frac{\binom{i}{2}}{\binom{i}{2}+\lambda_{\rm B}}\,\,.

The rate $\phi_{\{l;k_{1},\ldots,k_{l}\}}$ , given in Eq. (23), is conditional on a particular partition. Thus the total collision rate of $l$ lines into any of partitions of type $\{l;k_{1},\ldots,k_{l}\}$ is given by:

\phi_{\{l;k_{1},\ldots,k_{l}\}}^{\rm tot}=\binom{l}{2}{1}_{\{b=l-1\}}+\lambda C_{lb}p_{\{l;k_{1},\ldots,k_{l}\}}S_{\{l;k_{1},\ldots,k_{l}\}}\,\,,

(24)

where $S_{\{l;k_{1},\ldots,k_{l}\}}$ denotes the number of possible ways of collisions of $l$ lines into a partition $\{l;k_{1},\ldots,k_{l}\}$ , such that restrictions in Eq. (22) hold. It is given by Sagitov (2003):

S_{\{l;k_{1},\ldots,k_{l}\}}=\frac{l!}{\prod_{i=1}^{l}(i!)^{k_{i}}k_{i}!}\,\,.

It follows from the latter expression that in the case of the standard coalescent, one has $S_{\{l;l-2,1,0\ldots,0\}}=\binom{l}{2}$ , as expected.

The graph corresponding to the Markov process in the limit described in the beginning of this appendix, consists of five states $1,\ldots,5$ (see Fig. 6). We now show how the corresponding transition rates between states ${1,\ldots,5}$ can be derived from Eq. (24).

We observe that a collision of type ${\{2;0,1\}}$ describes a transition from either state $1$ or $4$ , to $5$ . It follows:

w_{51}=w_{54}=\phi_{\{2;0,1\}}^{\rm tot}\,\,.

(25)

State $2$ consists of three chromosomal lines. A collision of a particular pair of lines, among the three lines, results in a transition from $2$ to $1$ , while a collision of either of the two remaining pairs of lines results in a transition from $2$ to $4$ . Because a collision of a pair of lines among three lines is of type $\{3;1,1,0\}$ , we obtain the corresponding transition rates:

	$\displaystyle w_{12}=\frac{1}{3}\phi_{\{3;1,1,0\}}^{\rm tot}\,\,,$		(26)
	$\displaystyle w_{42}=\frac{2}{3}\phi_{\{3;1,1,0\}}^{\rm tot}\,\,.$		(27)

A collision of all three lines of state $2$ leads to a transition from state $2$ to $5$ at the rate:

w_{52}=\phi_{\{3;0,0,1\}}^{\rm tot}\,\,.

(28)

Now consider transitions from state $3$ . It consists of four ancestral lines. We analyse first a collision of a single pair of lines, that is a collision of type $\{4;2,1,0,0\}$ . There are in total six different ways to pair lines: four choices describe a transition from state $3$ to $2$ , and the remaining two lead to a transition from $3$ to $4$ . Thus, we have:

\displaystyle w_{23}=\frac{2}{3}\phi_{\{4;2,1,0,0\}}^{\rm tot}\,\,.

(29)

Further, there are three possibilities for simultaneous collisions of two pairs of lines. Two possibilities result in a transition from $3$ to $1$ , and one leads to a transition from $3$ to $5$ (see Fig. 2d). It is also possible to obtain a collision of three lines, in which case the transition from $3$ to $4$ is observed. Further, a collision of all four lines results in a transition from $3$ to $5$ . Thus, we obtain the following transition rates:

	$\displaystyle w_{13}=\frac{2}{3}\phi_{\{4;0,2,0,0\}}^{\rm tot}\,\,,$		(30)
	$\displaystyle w_{43}=\frac{1}{3}\phi_{\{4;2,1,0,0\}}^{\rm tot}+\phi_{\{4;1,0,1,0\}}^{\rm tot}\,\,,$		(31)
	$\displaystyle w_{53}=\frac{1}{3}\phi_{\{4;0,2,0,0\}}^{\rm tot}+\phi_{\{4;0,0,0,1\}}^{\rm tot}\,\,.$		(32)

The remaining non-vanishing rates, $w_{21}=R$ , and $w_{32}=R/2$ , describe recombination transitions from state $1$ to $2$ , and from $2$ to $3$ .

In Tab. 1, we summarise the formulae for $p_{\{l;k_{1},\ldots,k_{l}\}}$ , $S_{\{l;k_{1},\ldots,k_{l}\}}$ , $C_{lb}$ , and $\phi_{\{l;k_{1},\ldots,k_{l}\}}^{\rm tot}$ , for $l=2,3,4$ lines. Using the formulae in this table, the transition rates under the Xi-coalescent approximation can be calculated explicitly, in terms of the parameters $\lambda,\lambda_{\rm B}$ , and $x$ .

C.2 Obtaining Eq. (10) under the Xi-coalescent approximation

The transition rates $w_{ji}$ , obtained in the previous subsection, can be used for calculating $\langle t_{a(ij)}t_{b(ij)}\rangle$ according to the method explained in Section III. This calculation requires the elements of the $3\times 3$ matrix ${\bf M}$ . We find that the non-zero off-diagonal elements of $\bf M$ are given by:

$\displaystyle M_{12}$	$\displaystyle=1+\lambda\lambda_{\rm B}\bigl{(}(1+\lambda_{\rm B})(3+\lambda_{\rm B})\bigr{)}^{-1}\,\,,$
$\displaystyle M_{13}$	$\displaystyle=4\lambda\lambda_{\rm B}\bigl{(}(1+\lambda_{\rm B})(3+\lambda_{\rm B})(6+\lambda_{\rm B})\bigr{)}^{-1}\,\,,$
$\displaystyle M_{21}$	$\displaystyle=2M_{32}=R\,\,,$
$\displaystyle M_{23}$	$\displaystyle=4+4\lambda\lambda_{\rm B}\bigl{(}(3+\lambda_{\rm B})(6+\lambda_{\rm B})\bigr{)}^{-1}\,\,.$	(33)

and the diagonal elements are given by:

M_{ii}=-\sum_{j\neq i}M_{ji},~{\rm for}~i=1,2,3\,\,.

(34)

Further:

$u$	$\displaystyle=\begin{bmatrix}&1+\lambda(1+\lambda_{\rm B})^{-1}\\ &3\lambda\bigl{(}(1+\lambda_{\rm B})(3+\lambda_{\rm B})\bigr{)}^{-1}\\ &2\lambda(9+\lambda_{\rm B})\bigl{(}(1+\lambda_{\rm B})(3+\lambda_{\rm B})(6+\lambda_{\rm B})\bigr{)}^{-1}\end{bmatrix}^{T}\,\,,$
$Q$	$\displaystyle=\begin{bmatrix}&0\\ &2(1+\lambda\lambda_{\rm B}\bigl{(}(1+\lambda_{\rm B})(3+\lambda_{\rm B})\bigr{)}^{-1})\\ &2(1+\lambda\lambda_{\rm B}(7+\lambda_{\rm B})\bigl{(}(1+\lambda_{\rm B})(3+\lambda_{\rm B})(6+\lambda_{\rm B})\bigr{)}^{-1})\end{bmatrix}^{T}\,\,,$
$\displaystyle K$	$\displaystyle=-1\,\,,$
$\displaystyle c$	$\displaystyle=1\,\,.$	(35)

Note that $\bf M$ , $u$ , $Q$ , $K$ and $c$ have dimensions different from those in Section III. The reason is that in the limit $x\rightarrow 0$ , the case described in this appendix, the states in bottlenecks are omitted. Combining Eq. (7) with Eqs. (33)-(35) yields Eq. (10). This shows that the covariance of the times to the MRCA in the case of severe bottlenecks can be derived using the Xi-coalescent approximation.

\psfrag{a}{\bf a}\psfrag{b}{\bf b}\psfrag{c}{\bf c}\psfrag{d}{\bf d}\psfrag{cov1}{\raisebox{14.22636pt}{\hskip-28.45274pt{$\mbox{cov}_{\mathcal{D}}[t_{a(ij)},t_{b(ij)}]$}}}\psfrag{R}{$R$}\includegraphics[angle={0},width=426.79134pt,clip]{figs/fig1.eps}

Figure 1: The covariance of the times to the MRCA at two loci, in a sample of two chromosomes in a population subject to repeated bottlenecks (details in Section II). (a) Rapid population-size fluctuations. Fisher-Wright simulations for ten random population-bottleneck sequences with

\lambda=100

x=0.1

\lambda_{\rm B}=10

, and

N_{0}=10^{5}

(grey lines). Each grey line is obtained by first generating a random sequence of bottlenecks, and then averaging over an ensemble of

1000

gene genealogies. The red line shows the covariance averaged over demographic histories. The dashed line shows the result of the effective population-size approximation, Eq. (2). (b) Same, but for severe bottlenecks. Fisher-Wright simulations for fifteen randomly generated sequences of bottlenecks, with parameters

\lambda=10

x=5\cdot 10^{-4}

\lambda_{\rm B}=10

, and

N_{0}=10^{6}

. Averaging is done over an ensemble of

100

gene genealogies.

\psfrag{a}{\raisebox{0.0pt}{\hskip 0.0pt{\large\bf a}}}\psfrag{b}{\raisebox{0.0pt}{\hskip 0.0pt{\large\bf b}}}\psfrag{c}{\raisebox{0.0pt}{\hskip 2.84544pt{\large\bf c}}}\psfrag{d}{\raisebox{0.0pt}{\hskip-5.69046pt{\large\bf d}}}\psfrag{x1}{\raisebox{5.69054pt}{\hskip-18.49411pt{\hbox{\pagecolor{white}\large${\rm Exp}(\lambda)$}}}}\psfrag{x2}{\raisebox{5.69054pt}{\hskip-18.49411pt{\hbox{\pagecolor{white}\large${\rm Exp}(\lambda_{\rm x})$}}}}\psfrag{t}{\raisebox{0.0pt}{\hskip 76.82234pt{\large$t$}}}\psfrag{1}{\raisebox{0.0pt}{\hskip 0.0pt{\large\bf 1}}}\psfrag{2}{\raisebox{0.0pt}{\hskip 0.0pt{\large\bf 2}}}\psfrag{3}{\raisebox{0.0pt}{\hskip 0.0pt{\large\bf 3}}}\psfrag{4}{\raisebox{0.0pt}{\hskip 0.0pt{\large\bf 4}}}\psfrag{5}{\raisebox{0.0pt}{\hskip 0.0pt{\large\bf 5}}}\psfrag{N(t)}{\raisebox{-2.84526pt}{\hskip-8.5359pt{\begin{sideways}\large$N(t)$\end{sideways}}}}\psfrag{x N}{\raisebox{0.0pt}{\hskip 0.0pt{\large$xN_{0}$}}}\psfrag{N}{\raisebox{0.0pt}{\hskip-2.84544pt{\large$N_{0}$}}}\psfrag{Locus a}{\raisebox{0.0pt}{\hskip 0.0pt{\large Locus $a$}}}\psfrag{Locus b}{\raisebox{0.0pt}{\hskip 0.0pt{\large Locus $b$}}}\psfrag{MRCA of locus a}{\raisebox{0.0pt}{\hskip 0.0pt{\large MRCA of locus $a$}}}\psfrag{MRCA of locus b}{\raisebox{0.0pt}{\hskip 0.0pt{\large MRCA of locus $b$}}}\psfrag{Chromosome}{\raisebox{0.0pt}{\hskip 0.0pt{\large Chromosome}}}\psfrag{Ancestral line}{\raisebox{0.0pt}{\hskip 0.0pt{\large Ancestral line}}}\psfrag{material not ancestral}{\raisebox{0.0pt}{\hskip 0.0pt{\large Material not ancestral}}}\psfrag{to loci a and b}{\raisebox{0.0pt}{\hskip 0.0pt{\large to loci $a$ and $b$}}}\psfrag{Period of low }{\raisebox{0.0pt}{\hskip 0.0pt{\large Period of low}}}\psfrag{population-size}{\raisebox{0.0pt}{\hskip 0.0pt{\large population size}}}\includegraphics[angle={0},width=512.1496pt,clip]{figs/fig2.eps}

Figure 2: Panels a and b show two realisations of the population-size curve,

N(t)

, backwards in time (

t=0

denotes the present time). Initially, the population size is

N_{0}

. Going backwards in time, the population size randomly jumps between two values,

N_{0}

and

xN_{0}

(

x<1

), with the transition rates

\lambda

(from

N_{0}

xN_{0}

) and

\lambda_{x}\equiv x^{-1}\lambda_{\rm B}

(from

xN_{0}

N_{0}

). Panels c and d show schematically corresponding ancestral histories of two loci (blue and red empty circles correspond to two loci, called

a

and

b

) subject to genetic recombination in a sample of two chromosomes. The yellow background depicts a time during which a population was subject to a bottleneck. Two joint circles depict two loci in the same chromosome. States

1,\ldots,5

denote the possible states of the system (they are explained in detail in Fig. 3). Grey circles denote genetic material not ancestral to the sampled loci. Blue and red filled circles indicate that the corresponding loci have found their most recent common ancestor. Note that bottlenecks can host multiple coalescent events (mergers). In the case of severe bottlenecks such multiple mergers appear as if instantaneous on the time scale of the gene genealogy. An example is shown in panel d: an almost instantaneous transition from state

3

to state

5

\psfrag{24}{\raisebox{0.0pt}{\hskip 5.69046pt{\Large$1$}}}\psfrag{25}{\raisebox{0.0pt}{\hskip 0.0pt{\Large$1$}}}\psfrag{26}{\raisebox{0.0pt}{\hskip 0.0pt{\Large$1$}}}\psfrag{27}{\raisebox{0.0pt}{\hskip 0.0pt{\Large$R$}}}\psfrag{28}{\raisebox{0.0pt}{\hskip 0.0pt{\Large$4$}}}\psfrag{6}{\raisebox{2.84526pt}{\hskip-17.07182pt{\Large$R/2$}}}\psfrag{7}{\raisebox{2.84526pt}{\hskip 0.0pt{\Large$2$}}}\psfrag{8}{\raisebox{0.0pt}{\hskip-5.69046pt{\Large$2$}}}\psfrag{18}{\raisebox{0.0pt}{\hskip 5.69046pt{\Large$\lambda$}}}\psfrag{21}{\raisebox{0.0pt}{\hskip 0.0pt{\Large$\lambda$}}}\psfrag{23}{\raisebox{2.84526pt}{\hskip 0.0pt{\Large$\lambda$}}}\psfrag{16}{\raisebox{2.84526pt}{\hskip 2.84544pt{\Large$\lambda$}}}\psfrag{19}{\raisebox{0.0pt}{\hskip-2.84544pt{\Large$\lambda_{\rm x}$}}}\psfrag{20}{\raisebox{2.84526pt}{\hskip-8.5359pt{\Large$\lambda_{\rm x}$}}}\psfrag{22}{\raisebox{2.84526pt}{\hskip 0.0pt{\Large$\lambda_{\rm x}$}}}\psfrag{17}{\raisebox{2.84526pt}{\hskip-2.84544pt{\Large$\lambda_{\rm x}$}}}\psfrag{9}{\raisebox{0.0pt}{\hskip-5.69046pt{\Large$x^{-1}$}}}\psfrag{10}{\raisebox{0.0pt}{\hskip 0.0pt{\Large$x^{-1}$}}}\psfrag{11}{\raisebox{2.84526pt}{\hskip 0.0pt{\Large$x^{-1}$}}}\psfrag{12}{\raisebox{2.84526pt}{\hskip 0.0pt{\Large$R$}}}\psfrag{13}{\raisebox{0.0pt}{\hskip 0.0pt{\Large$4x^{-1}$}}}\psfrag{14}{\raisebox{2.84526pt}{\hskip 0.0pt{\Large$R/2$}}}\psfrag{15}{\raisebox{2.84526pt}{\hskip 0.0pt{\Large$2x^{-1}$}}}\psfrag{29}{\raisebox{2.84526pt}{\hskip-17.07182pt{\Large$2x^{-1}$}}}\includegraphics[angle={0},width=199.16928pt,clip]{figs/full_diag_rates.eps}

group state

{\bf 1},{\bf 1^{\prime}}

a_{i}b_{i}

a_{j}b_{j}

a_{i}b_{j}

a_{j}b_{i}

{\bf 2},{\bf 2^{\prime}}

a_{i}\circ

\circ b_{i}

a_{j}b_{j}

a_{i}b_{i}

a_{j}\circ

\circ b_{j}

a_{i}\circ

\circ b_{j}

a_{j}b_{i}

a_{i}b_{j}

a_{j}\circ

\circ b_{i}

{\bf 3},{\bf 3^{\prime}}

a_{i}\circ

\circ b_{i}

a_{j}\circ

\circ b_{j}

{\bf 4},{\bf 4^{\prime}}

a_{i}\bullet

a_{j}\bullet

\bullet b_{i}

\bullet b_{j}

{\bf 5}

\bullet\bullet

Figure 3: Left: a graph showing the states and transition rates determining the ancestral history of two loci in a sample of two chromosomes, under the population model introduced in Section II. States where the population is in a bottleneck are marked with a prime. The final state is denoted by

5

(in this state it does not matter whether the population is in a bottleneck or not). Arrows indicate transitions between states. The corresponding transition rates from state

i

j

w_{ji}

, are displayed next to the lines. Note that

\lambda_{\rm x}\equiv\lambda_{\rm B}x^{-1}

. Right: a table of possible states of the system. Two loci considered are denoted by

a

and

b

, and the corresponding chromosomes are indicated by

i

and

j

. Empty circles denote genetic material not ancestral to sampled loci, and full circles denote the MRCA of a locus.

\psfrag{a}{\bf a}\psfrag{b}{\bf b}\psfrag{c}{\bf c}\psfrag{d}{\bf d}\psfrag{cov1}{\raisebox{14.22636pt}{\hskip-28.45274pt{$\langle\mbox{cov}_{\mathcal{D}}[t_{a(ij)},t_{b(ij)}]\rangle$}}}\psfrag{R}{$R$}\includegraphics[angle={0},width=426.79134pt,clip]{figs/fig4.eps}

Figure 4: The covariance of the times to the MRCA of two loci averaged over random population-size histories. (a) The red line shows the average covariance corresponding to

\lambda=100

x=0.1

\lambda_{\rm B}=10

, and

N_{0}=10^{5}

, determined numerically from Fisher-Wright simulations (same as in Fig. 1a). The solid lines show our exact result, Eq. (9), and the dashed lines show the coalescent effective population-size approximation, Eq. (2). The numerical result deviates from the effective population-size approximation when the recombination time-scale is the smallest (

R>100

). (b) The same as in a, but for the short bottleneck case:

\lambda=10

x=5\cdot 10^{-4}

\lambda_{\rm B}=10

, and

N_{0}=10^{6}

. The dashed-dotted line denotes the result of Eq. (10), corresponding to the Xi-coalescent approximation.

\psfrag{a}{\bf a}\psfrag{b}{\bf b}\psfrag{var(tij)}{\raisebox{14.22636pt}{\hskip-17.07182pt{$\langle{\rm cov}_{\mathcal{D}}[t_{a},t_{b}]\rangle$}}}\psfrag{R}{$R$}\psfrag{0.01}{\raisebox{0.0pt}{\hskip 0.0pt{\small$10^{-2}$}}}\psfrag{1}{\small$10^{0}$}\psfrag{100}{\raisebox{0.0pt}{\hskip 0.0pt{\small$10^{2}$}}}\psfrag{10000}{\raisebox{0.0pt}{\hskip 0.0pt{\small$10^{4}$}}}\psfrag{1e+06}{\raisebox{0.0pt}{\hskip 0.0pt\small{$10^{6}$}}}\psfrag{-6}{\raisebox{0.0pt}{\hskip-11.38092pt{\small$10^{-6}$}}}\psfrag{-4}{\raisebox{0.0pt}{\hskip-11.38092pt{\small$10^{-4}$}}}\psfrag{-2}{\raisebox{0.0pt}{\hskip-11.38092pt{\small$10^{-2}$}}}\psfrag{0}{\raisebox{0.0pt}{\hskip-11.38092pt{\small$10^{0}$}}}\includegraphics[angle={0},width=426.79134pt,clip]{figs/fig_var_n_inf.eps}

Figure 5: The relation between the covariances

\langle{\rm cov}_{\mathcal{D}}[t_{a(ij)},t_{b(ij)}]\rangle

(blue lines),

\langle{\rm cov}_{\mathcal{D}}[t_{a(ij)},t_{b(ik)}]\rangle

(red lines), and

\langle{\rm cov}_{\mathcal{D}}[t_{a(ij)},t_{b(kl)}]\rangle

(green lines). In panels a and b, the values of the parameters

\lambda,\lambda_{\rm B}

, and

x

, are the same as in Fig. 1a and Fig. 1b, respectively. Exact results are shown as solid lines, whereas results obtained within the effective population-size approximation are shown as dashed lines.

Refer to caption — Figure 6: Left: a graph showing the states of the system in the limit $x\rightarrow 0$ , and possible transitions between them. States $1,\ldots,5$ are explained in Fig. 3. Note that in this graph three colored arrows appear, denoting the simultaneous multiple mergers. They appear in this case because of the short time the system spends in a single bottleneck (see Fig. 2b). By contrast, they are forbidden in the constant population-size case. Right: exact formulae for the transition rates, $w_{ji}$ , from $i$ to $j$ (the corresponding entries of the matrix $\bf M$ , and vectors $u$ , and $Q$ are calculated using these rates) in terms of the parameters $\lambda$ and $\lambda_{\rm B}$ .

\psfrag{l1}{\raisebox{-8.53581pt}{\hskip 56.9055pt{\Large$l=4$}}}\psfrag{l2}{\raisebox{-8.53581pt}{\hskip-28.45274pt{\Large$b=1$}}}\psfrag{a}{\raisebox{11.38109pt}{\hskip 5.69046pt{\Large{\bf a}}}}\psfrag{b}{\raisebox{11.38109pt}{\hskip 5.69046pt{\Large{\bf b}}}}\psfrag{present}{\raisebox{-5.69046pt}{\Large{present}}}\psfrag{t}{\raisebox{0.0pt}{\hskip 0.0pt{\Large$t$}}}\psfrag{N(t)}{\raisebox{17.07164pt}{\hskip-5.69046pt{\Large$N(t)$}}}\psfrag{N}{\raisebox{0.0pt}{\hskip-14.22636pt{\Large{$N_{0}$}}}}\psfrag{xN}{\raisebox{0.0pt}{\hskip-14.22636pt{\Large{$xN_{0}$}}}}\includegraphics[angle={0},width=284.52756pt,clip]{figs/part_lines.eps}

Figure 7: (a) Realisation of a population-size history curve,

N(t)

. (b) Partition of ancestral lines. Shown is a collision of

l=4

lines (at the entrance of the bottleneck, indicated by the yellow background), into

b=1

line. When the bottleneck is short, the three pairwise mergers during the bottleneck appear as a single instantaneous event.

$C_{21}=\frac{1}{1+\lambda_{\rm B}}$	$p_{\{2;0,1\}}=1$	$S_{\{2;0,1\}}=1$	$\phi_{\{2;0,1\}}^{\rm tot}=1+\frac{\lambda}{1+\lambda_{\rm B}}$
$C_{31}=\frac{3}{(1+\lambda_{\rm B})(3+\lambda_{\rm B})}$	$p_{\{3;0,0,1\}}=1$	$S_{\{3;0,0,1\}}=1$	$\phi_{\{3;0,0,1\}}^{\rm tot}=\frac{3\lambda}{(1+\lambda_{\rm B})(3+\lambda_{\rm B})}$
$C_{32}=\frac{3\lambda_{\rm B}}{(1+\lambda_{\rm B})(3+\lambda_{\rm B})}$	$p_{\{3;1,1,0\}}=\frac{1}{3}$	$S_{\{3;1,1,0\}}=3$	$\phi_{\{3;1,1,0\}}^{\rm tot}=3+\frac{3\lambda\lambda_{\rm B}}{(1+\lambda_{\rm B})(3+\lambda_{\rm B})}$
$C_{41}=\frac{18}{(1+\lambda_{\rm B})(3+\lambda_{\rm B})(6+\lambda_{\rm B})}$	$p_{\{4;0,0,0,1\}}=1$	$S_{\{4;0,0,0,1\}}=1$	$\phi_{\{4;0,0,0,1\}}^{\rm tot}=\frac{18\lambda}{(1+\lambda_{\rm B})(3+\lambda_{\rm B})(6+\lambda_{\rm B})}$
$C_{42}=\frac{18\lambda_{\rm B}}{(1+\lambda_{\rm B})(3+\lambda_{\rm B})(6+\lambda_{\rm B})}$	$p_{\{4;1,0,1,0\}}=\frac{1}{6}$	$S_{\{4;1,0,1,0\}}=4$	$\phi_{\{4;1,0,1,0\}}^{\rm tot}=\frac{12\lambda\lambda_{\rm B}}{(1+\lambda_{\rm B})(3+\lambda_{\rm B})(6+\lambda_{\rm B})}$
	$p_{\{4;0,2,0,0\}}=\frac{1}{9}$	$S_{\{4;0,2,0,0\}}=3$	$\phi_{\{4;0,2,0,0\}}^{\rm tot}=\frac{6\lambda\lambda_{\rm B}}{(1+\lambda_{\rm B})(3+\lambda_{\rm B})(6+\lambda_{\rm B})}$
$C_{43}=\frac{6\lambda_{\rm B}}{(3+\lambda_{\rm B})(6+\lambda_{\rm B})}$	$p_{\{4;2,1,0,0\}}=\frac{1}{6}$	$S_{\{4;2,1,0,0\}}=6$	$\phi_{\{4;2,1,0,0\}}^{\rm tot}=6+\frac{6\lambda\lambda_{\rm B}}{(3+\lambda_{\rm B})(6+\lambda_{\rm B})}$

Table 1: Formulae necessary to explicitly calculate the transition rates

w_{ji}

, listed in the graph in Fig. 6, according to Eqs. (25)-(32).

	$\displaystyle w_{12}$	$\displaystyle=1+\lambda\lambda_{\rm B}\bigl{(}(1+\lambda_{\rm B})(3+\lambda_{\rm B})\bigr{)}^{-1}$
	$\displaystyle w_{13}$	$\displaystyle=4\lambda\lambda_{\rm B}\bigl{(}(1+\lambda_{\rm B})(3+\lambda_{\rm B})(6+\lambda_{\rm B})\bigr{)}^{-1}$
	$\displaystyle w_{21}$	$\displaystyle=2w_{32}=R$
	$\displaystyle w_{23}$	$\displaystyle=4(1+\lambda\lambda_{\rm B}\bigl{(}(3+\lambda_{\rm B})(6+\lambda_{\rm B})\bigr{)}^{-1})$
	$\displaystyle w_{42}$	$\displaystyle=2(1+\lambda\lambda_{\rm B}\bigl{(}(1+\lambda_{\rm B})(3+\lambda_{\rm B})\bigr{)}^{-1})$
	$\displaystyle w_{43}$	$\displaystyle=2(1+\lambda\lambda_{\rm B}(7+\lambda_{\rm B})\bigl{(}(1+\lambda_{\rm B})(3+\lambda_{\rm B})(6+\lambda_{\rm B})\bigr{)}^{-1})$
	$\displaystyle w_{51}$	$\displaystyle=w_{54}=1+\lambda(1+\lambda_{\rm B})^{-1}$
	$\displaystyle w_{52}$	$\displaystyle=3\lambda\bigl{(}(1+\lambda_{\rm B})(3+\lambda_{\rm B})\bigr{)}^{-1}$
	$\displaystyle w_{53}$	$\displaystyle=2\lambda(9+\lambda_{\rm B})\bigl{(}(1+\lambda_{\rm B})(3+\lambda_{\rm B})(6+\lambda_{\rm B})\bigr{)}^{-1}$

Linkage disequlibrium under recurrent bottlenecks

Abstract

I Introduction

II Model

III Covariance of the times to the MRCA for two individuals

IV Comparison of the coalescent calculations to the Wright-Fisher simulations

V The effect of recurrent bottlenecks upon σd2\sigma^{2}_{d}

VI Discussion

References

Appendix A Calculation of ⟨⟨ta​(i​j)|𝒟⟩2⟩\langle\langle t_{a(ij)}|\mathcal{D}\rangle^{2}\rangle

Appendix B Coefficients Ci,Di,Ai,BiC_{i},D_{i},A_{i},B_{i} in formulae (9) and (10)

Appendix C Severe bottlenecks: connection to the Xi-coalescent

C.1 Formulae for wj​iw_{ji} under the Xi-coalescent approximation

C.2 Obtaining Eq. (10) under the Xi-coalescent approximation

V The effect of recurrent bottlenecks upon $\sigma^{2}_{d}$

Appendix A Calculation of $\langle\langle t_{a(ij)}|\mathcal{D}\rangle^{2}\rangle$

Appendix B Coefficients $C_{i},D_{i},A_{i},B_{i}$ in formulae (9) and (10)

C.1 Formulae for $w_{ji}$ under the Xi-coalescent approximation