This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Simulating the spread of COVID-19 with cellular automata: A new approach

Sourav Chowdhury  Suparna Roychowdhury  Indranath Chaudhuri email: chowdhury95sourav@gmail.comemail: suparna@sxccal.eduemail: indranath@sxccal.edu Department of Physics, St. Xavier’s College (Autonomous)
30 Mother Teresa Sarani, Kolkata-700016, West Bengal, India
Abstract

Between the years 2020 to 2022, the world was hit by the pandemic of COVID-19 giving rise to an extremely grave situation. Many people suffered and died from this disease. Also the global economy was badly hurt due to the consequences of various intervention strategies (like social distancing, lockdown) which were applied by different countries to control this pandemic. There are multiple speculations that humanity will again face such pandemics in the future. Thus it is very important to learn and gain knowledge about the spread of such infectious diseases and the various factors which are responsible for it. In this study, we have extended our previous work (Chowdhury et.al., 2022) on the probabilistic cellular automata (CA) model to reproduce the spread of COVID-19 in several countries by modifying its earlier used neighbourhood criteria. This modification gives us the liberty to adopt the effect of different restrictions like lockdown and social distancing in our model. We have done some theoretical analysis for initial infection and simulations to gain insights into our model. We have also studied the data from eight countries for COVID-19 in a window of 876 days and compared it with our model. We have developed a proper framework to fit our model on the data for confirmed cases of COVID-19 and have also re-checked the goodness of the fit with the data of the deceased cases for this pandemic. Our model is compared with other well known CA models and the ODE based SEIR model. This model fits well with different peaks of COVID-19 data for all the eight countries and can be possibly generalized for a global prediction. Our study shows that the rate of disease spread depends both on infectivity of a disease and social restrictions. Also, it shows an overall decrement in mortality rate with time due to COVID-19 as more and more people get infected as well as vaccinated. Our minimal model with modified neighbourhood condition can easily quantify the degree of social restrictions. It is statistically concluded that the overall degree of social restrictions is above the mean when we considered all eight countries. Finally to conclude, this study has given us various important insights about this pandemic which can help in preparing for combating epidemics in future situations.

1 Introduction

Epidemics and pandemics have been hampering human civilizations from centuries. In recent times, the COVID-19 disease had infected people of almost all countries globally. There are some claims that world will face a pandemic again in the near future [1, 2]. Thus it is very important to study and understand their nature. Mathematical models help us to simulate their behaviour and can give us many hidden details.

COVID-19 is caused by SARS-COV-2 virus. Globally, the number of infected cases and deaths from this disease is extremely high [3]. In India, this pandemic left a very serious toll on human life and economy. Due to rapid mutations, many variants of SARS-COV-2 have been found worldwide. Delta and Omicron are some commonly known mutations [4]. Recently, many reports have been published about the worrying situation of COVID-19 in China [5, 6].

Epidemics and pandemics have been modeled through different mathematical and computational techniques. In the year 1927, Kermack and McKendrick established a dynamical model for epidemics known as the SIR model, which is a backbone for many current models [7]. This model consists of nonlinear differential equations. Many variants of this model like SEIR, SIRS and SEIAR have been developed throughout the year [8, 9, 10, 11, 12, 13, 14]. These nonlinear differential equations based models have been extensively used to study COVID-19 pandemic [15, 16, 17, 18, 19, 20, 21, 22]. Cellular automata is a spatio-temporal model which is represented with a collection of colored cells. The shape of these cells can be of any type like square and triangular [23]. Each of the color of the cells represent a state of the system. All the cells of the CA model evolve simultaneously and each cell evolves according to a predefined algorithm. This algorithm not only governs the interaction of a cell with neighbourhood cells but also controls evolution of the whole system. The states of the model and the neighbourhood condition depends on the system. In modeling epidemics, the four main states which have been considered are susceptible, exposed, infectious and recovered (or removed). Also, the Moore’s and the Neumann’s neighbourhood conditions are examples of trivial interaction methods which are used for CA to model epidemics. [24, 25, 26, 27]

CA has not only been used for diseases which spread by contact like influenza but also has been used in many vector-borne diseases like chikungunya, dengue [23, 28, 29, 30, 31, 32, 24, 33]. Mostly in literature a disease spread is modeled in a square lattice where each lattice cell represents a person or a fraction of the population. However, there are some studies where a disease spread is modeled in a triangular or in a hexagonal lattice. In these models, the total population is divided into many sub-populations like Susceptible, Exposed, Infectious and Removed (SEIR) or various successors of it. In CA models, a neighbourhood condition is used to represent spatial interactions between cells. To study epidemics and pandemics, Moore’s neighbourhood condition and Neumann’s neighbourhood condition are the most widely used spatial interaction criteria. However these neighbourhood conditions lead to a serious issue which is called neighbourhood saturation [34]. This issue can be eliminated by taking a modified neighbourhood condition for example, Mikler et. al. proposed a global neighbourhood criteria to eliminate this issue [34, 35].

Recently, cellular automata (CA) models have been used by many authors to study COVID-19 pandemic [36, 37, 38, 39, 40, 41, 42, 43]. In these studies CA has been mostly used to predict the future behaviour of the spread COVID-19 by studying its past behaviour. There are studies which also suggest various intervention and vaccination strategies to control this pandemic. Some advanced studies which have used the Genetic Algorithm (GA) and proposed various methods to optimize the CA model to fit the COVID-19 data [38, 44, 40]. In these studies, different types of neighbourhood condition have been used. A particular neighbourhood condition called the rr-neighbourhood condition is used widely in literature [37, 38]. In this model a person can interact upto the rrth neighbours with equal probabilities and thus increasing the range of disease spread beyond nearest neighbours.

In this study, we have defined a generalized neighbourhood condition. We have analyzed and studied this neighbourhood condition in detail to get a clear picture of its behaviour. We have applied this condition in our CA model and have done some analytical estimates. Various simulations have been done on this system after defining a suitable algorithm. We have also compared our model with other current models used in literature. Finally, we have studied COVID-19 data from eight different countries in relevance to our model and analyzed the concurrence between the two.

This paper is organized as follows: In Sec. 2, we have defined our neighbourhood condition. An initial analysis on this neighbourhood condition has been done in Sec. 2.2. In Sec. 3 and Sec. 4, probability of infection has been calculated and theoretical analysis of our CA model is described respectively. In Sec. 5, we have defined an algorithm for this CA model and shown different results of our simulations. Our model is compared to other models in Sec. 6 and 7. The detailed study on COVID-19 data using our CA model is described in Sec. 8. Finally we have listed the concluding remarks in Sec. 9.

2 Model description

In this article, we have studied the SEIR model by cellular automata with a modified neighbourhood condition. In this section, we mainly discuss our neighbourhood condition in detail and also its features which will affect a real situation. All the other important assumptions for this model are similar to Chowdhury et. al (2022) [40] where the choice of periodic boundary condition is of special importance in this study.

2.1 Neighbourhood condition

Suppose the lattice size for this CA model is N×NN\times N. Then with respect to any (i,j)(i,j) cell, the lattice can be divided into several layers (by considering periodic boundary condition) as shown in Fig. 1.

Refer to caption
Figure 1: Different layers of neighbourhoods corresponding to the cell, (i,j)(i,j) of a lattice. [40].

Here \ell represents the layer number. Any layer of number \ell has 88\ell number of cells. With respect to any cell, indexed by (i,j)(i,j), of a N×NN\times N lattice, the total number of layers can be calculated by

L=N12L=\frac{N-1}{2} (1)

when NN is odd. If, NN is even and N1N\gg 1 then Eq.1 is approximately true. The main assumptions for this modified neighbourhood condition, similar to Chowdhury et. al (2022) and references there in [40] are

  • The distance between any two cells is proportional to the layer number of any one cell with respect to the other cell. Hence, dd\propto\ell. For, simplicity we can assume d=d=\ell.

  • Probability of interaction (pint(d)p_{\scriptscriptstyle{\text{int}}}(d)) between two cells is inversely proportional to the nnth power of the distance. Thus, pint(d)1dnp_{\scriptscriptstyle{\text{int}}}(d)\propto\dfrac{1}{d^{n}}. Here nn= degree exponent and it can have any positive or negative values.

Thus probability of interaction (pint(d)p_{\scriptscriptstyle{\text{int}}}(d)) can be written as,

pint(d)=1dnd1dn=1n=1L1n=1Annp_{\scriptscriptstyle{\text{int}}}(d)=\frac{\frac{1}{d^{n}}}{\sum_{d}\frac{1}{d^{n}}}=\frac{\frac{1}{\ell^{n}}}{\sum_{\ell=1}^{L}\frac{1}{\ell^{n}}}=\frac{1}{A_{n}\ell^{n}} (2)

where, An==1L1nA_{n}=\sum_{\ell=1}^{L}\frac{1}{\ell^{n}}.

2.2 Average distance of interaction (d\langle d\rangle)

In modeling disease spread, the interaction between people is very important. In real world, interaction between two persons will depend on the distance between them. As distance increases, chance of interaction between two persons decrease [45, 46]. Also at the time of COVID-19 pandemic, interactions between people were controlled by various restrictions like lockdown and social distancing. These facts motivate us to assume the power law form of the interaction probability as shown in Eq. 2. Here degree exponent nn can tune the interaction probability as per the requirements of different strictness of restrictions on social interactions and models.

In this section, we have defined the average distance of interaction and discuss its dependence on the degree exponent nn. The average distance of interaction (d\langle d\rangle) and the variance of the interaction distance (σd2\sigma^{2}_{d}) can be deduced from Eq.2 as,

d==1Lpint()=1An=1L1n1\langle d\rangle=\sum_{\ell=1}^{L}\ell p_{\scriptscriptstyle{\text{int}}}(\ell)=\frac{1}{A_{n}}\sum_{\ell=1}^{L}\frac{1}{\ell^{n-1}} (3)
σd2=d2d2=1An=1L1n2(1An=1L1n1)2\sigma_{d}^{2}=\langle d^{2}\rangle-\langle d\rangle^{2}=\frac{1}{A_{n}}\sum_{\ell=1}^{L}\frac{1}{\ell^{n-2}}-\left(\frac{1}{A_{n}}\sum_{\ell=1}^{L}\frac{1}{\ell^{n-1}}\right)^{2} (4)
Refer to caption
(a)
Refer to caption
(b)
Figure 2: Average distance of interaction (d\langle d\rangle) (a) and variance on the distance (σd2\sigma_{d}^{2}) (b) are plotted as a function of nn. Here we have considered L=50L=50.

Fig. 2(a) shows that the average distance of interaction (d\langle d\rangle) falls very rapidly when degree exponent (nn) changes from negative to positive. Also, Fig. 2(b) shows that variance (σd2\sigma_{d}^{2}) has a very sharp peak for nn which have slightly higher value than zero. Numerically it is found that σd2\sigma_{d}^{2} has a peak at n0.3519n\approx 0.3519 for L=50L=50. So, the nn values around 0.0 are very significant because of the high values of σd2\sigma_{d}^{2}. When nn is positive then d\langle d\rangle quickly saturates to one and for negative values of nn, d\langle d\rangle quickly saturates to maximum layer number (LL). Variance of the interaction distance, σd2\sigma_{d}^{2} rapidly saturates to zero for both positive and negative values of nn. As, d\langle d\rangle decreases with increase in nn, nn can be defined as ‘social confinement’ [40]. Also, we want to mention that a high negative value incorporates that a person will mainly interact with the distant persons which is physically not possible. Thus very high positive or negative values of the degree exponent is physically irrelevant.

3 Probability of infection

Probability of infection of any susceptible cell at (i,j)(i,j) (QI(i,j)Q_{\text{I}}(i,j)) states the total probability of infection of this susceptible cell due to the infectious cells around it. To calculate QI(i,j)Q_{\text{I}}(i,j) it is needed to define some probabilities which are given below:

  • Probability of infection per contact (qq): This defines the probability of infection when a susceptible cell interacts with an infectious cell. This probability depends on the infectivity of the virus of a disease and the immunity of a person. If a virus is more infectious than its previous mutations, the probability of infection per contact will increase. Also, if a population gains immunity from a disease, then the probability of infection per contact will decrease. For COVID-19, the immunity is mainly achieved by vaccination.

  • Probability that an infectious person is at a distance dd is given by (pI(d)p_{\scriptscriptstyle{\text{I}}}(d)): pI(d)p_{\scriptscriptstyle{\text{I}}}(d) can be represented as,

    pI(d)=Number of infectious cells in layerTotal number of cells in layer=NI()8p_{\scriptscriptstyle{\text{I}}}(d)=\frac{\text{Number of infectious cells in layer}\ \ell}{\text{Total number of cells in layer}\ \ell}=\frac{N_{\text{I}}(\ell)}{8\ell} (5)

Hence from these probabilities the probability of infection (QI(i,j)Q_{I}(i,j)) of any susceptible cell at (i,j)(i,j) can be derived as,

QI=dqpint(d)pI(d)=q=1Lpint()pI().Q_{\text{I}}=\sum_{d}qp_{\scriptscriptstyle{\text{int}}}(d)p_{\scriptscriptstyle{\text{I}}}(d)=q\sum_{\ell=1}^{L}p_{\scriptscriptstyle{\text{int}}}(\ell)p_{\scriptscriptstyle{\text{I}}}(\ell). (6)

Substituting Eq. 2 and Eq. 5 in Eq. 6 we get,

QI=q8An=1LNI()n+1Q_{\text{I}}=\frac{q}{8A_{n}}\sum_{\ell=1}^{L}\frac{N_{\text{I}}(\ell)}{\ell^{n+1}} (7)

So, it is noted that the probability of infection (QI(i,j)Q_{\text{I}}(i,j)) is highly depended on the degree exponent nn. If n1n\gg 1 then probability of infection of any susceptible cell depends on the number nearest infectious cells only.

4 Dependence of initial infection growth on various free parameters

In this section, we have tried to find the initial infection growth of our model and its dependence on various free parameters. Here we have assumed that initially there are no removed persons in the region, which means R(0)=0R\!\left(0\right)=0.

4.1 Initial infection growth with one initial infection

Let, the total size of the lattice is N×NN\times N. Also, assuming that at time, t=0t=0 there is only one infectious person in the region. This infectious person has 88\ell number of susceptible neighbours at a distance \ell, which can be understood from Fig. 1. In other words, there are 88\ell susceptible persons who are placed at a distance \ell from the infectious person. Thus the probability of infection for each of these 88\ell susceptible persons is

QI()=q8An1n+1Q_{\text{I}}(\ell)=\frac{q}{8A_{n}}\frac{1}{\ell^{n+1}} (8)

Hence, on average, the number of newly infected cases between these 88\ell susceptible persons at t=1t=1 is,

E(1)=8(q8An1n+1)=qAn1nE_{\ell}(1)=8\ell\left(\frac{q}{8A_{n}}\frac{1}{\ell^{n+1}}\right)=\frac{q}{A_{n}}\frac{1}{\ell^{n}} (9)

Thus considering all distances, the average number of newly infected cases at t=1t=1 are,

E(1)==1LE(1)=qAn=1L1n=qAnAn=qE(1)=\sum_{\ell=1}^{L}E_{\ell}(1)=\frac{q}{A_{n}}\sum_{\ell=1}^{L}\frac{1}{\ell^{n}}=\frac{q}{A_{n}}A_{n}=q (10)

So, with one initial infectious case, the average number of newly infected cases at t=1t=1 is a constant and have a value equal to qq, which is the probability of infection per contact and is independent of nn.

4.2 New infected cases with multiple initial infections

Let at t=0t=0, there are nIn_{\scriptscriptstyle{\text{I}}} number of infectious cells and which are represented by i1,i2,,inIi_{1},i_{2},...,i_{n_{\scriptscriptstyle{\text{I}}}} and the distance between any two infectious cells ipi_{p} and iqi_{q} is represented by pq\ell_{pq}. To estimate the new infections, we will consider each infectious person one by one. An infectious person can infect everyone except the already infected person. Thus, the average number of susceptible persons who can only be infected by i1i_{1} is,

Ei1(1)=qq8An(112n+1+113n+1++11nIn+1)E_{i_{1}}(1)=q-\frac{q}{8A_{n}}\left(\frac{1}{\ell_{12}^{n+1}}+\frac{1}{\ell_{13}^{n+1}}+...+\frac{1}{\ell_{1n_{\scriptscriptstyle{\text{I}}}}^{n+1}}\right) (11)

As an infectious person cannot infect another infectious person, we have subtracted those probabilities from qq. Considering all infectious cases, we can write,

Ei1(1)=qq8An(112n+1+113n+1++11nIn+1)E_{i_{1}}(1)=q-\frac{q}{8A_{n}}\left(\frac{1}{\ell_{12}^{n+1}}+\frac{1}{\ell_{13}^{n+1}}+...+\frac{1}{\ell_{1n_{\scriptscriptstyle{\text{I}}}}^{n+1}}\right)
Ei2(1)=qq8An(121n+1+123n+1++12nIn+1)E_{i_{2}}(1)=q-\frac{q}{8A_{n}}\left(\frac{1}{\ell_{21}^{n+1}}+\frac{1}{\ell_{23}^{n+1}}+...+\frac{1}{\ell_{2n_{\scriptscriptstyle{\text{I}}}}^{n+1}}\right)
.
.
.
EinI(1)=qq8An(1nI1n+1+1nI2n+1++1nInI1n+1)E_{i_{n_{\scriptscriptstyle{\text{I}}}}}(1)=q-\frac{q}{8A_{n}}\left(\frac{1}{\ell_{n_{\scriptscriptstyle{\text{I}}}1}^{n+1}}+\frac{1}{\ell_{n_{\scriptscriptstyle{\text{I}}}2}^{n+1}}+...+\frac{1}{\ell_{n_{\scriptscriptstyle{\text{I}}}n_{\scriptscriptstyle{\text{I}}}-1}^{n+1}}\right)
E(1)=nIq2q8An(i,j=1i<jnI1ijn+1)E(1)=n_{\scriptscriptstyle{\text{I}}}q-\frac{2q}{8A_{n}}\left(\sum\limits_{\begin{subarray}{c}i,j=1\\ i<j\end{subarray}}^{n_{\scriptscriptstyle{\text{I}}}}\frac{1}{\ell_{ij}^{n+1}}\right)

As ij=ji\ell_{ij}=\ell_{ji}, every term in the summation repeats twice. Hence the new infected cases at time t=1t=1 are,

E(1)=nIq2q8An(i,j=1i<jnI1ijn+1)E(1)=n_{\scriptscriptstyle{\text{I}}}q-\frac{2q}{8A_{n}}\left(\sum_{\begin{subarray}{c}i,j=1\\ i<j\end{subarray}}^{n_{\scriptscriptstyle{\text{I}}}}\frac{1}{\ell_{ij}^{n+1}}\right) (12)

The summation of Eq. 12 consists of C2nI{}^{n_{\scriptscriptstyle{\text{I}}}}\!C_{2} number of terms. If nIN2n_{\scriptscriptstyle{\text{I}}}\ll N^{2}, C2nI{}^{n_{\scriptscriptstyle{\text{I}}}}\!C_{2} number of relative terms in Eq. 12 are negligible. Thus we can write,

E(1)nIq.E(1)\approx n_{\scriptscriptstyle{\text{I}}}q. (13)

Next we will look at another method of estimating average number of newly infected cases with randomly distributed initial infections. We note that the relative terms in Eq. 12 are difficult to estimate as they can have different values for different simulation as a function of relative distances. However, we can adopt an average approach to this. Thus from Eq. 11, the average new infected cases for i1i_{1} can be written as,

Ei1(1)¯=qq8Anj=2nI11jn+1¯=qq8Anj=2nI11jn+1¯\overline{E_{i_{1}}(1)}=q-\frac{q}{8A_{n}}\overline{\sum_{j=2}^{n_{I}}\frac{1}{\ell_{1j}^{n+1}}}=q-\frac{q}{8A_{n}}\sum_{j=2}^{n_{I}}\overline{\frac{1}{\ell_{1j}^{n+1}}} (14)

Probability that an infectious person will be randomly placed in layer \ell with respect to i1i_{1} is 8N21\frac{8\ell}{N^{2}-1}. Here, we take N21N^{2}-1 instead of N2N^{2} as one place is occupied by i1i_{1}. Hence,

11jn+1¯=1j=1L11jn+181jN21=8N211j=1L11jn=8AnN21\overline{\frac{1}{\ell_{1j}^{n+1}}}=\sum_{\ell_{1j}=1}^{L}\frac{1}{\ell_{1j}^{n+1}}\frac{8\ell_{1j}}{N^{2}-1}=\frac{8}{N^{2}-1}\sum_{\ell_{1j}=1}^{L}\frac{1}{\ell_{1j}^{n}}=\frac{8A_{n}}{N^{2}-1} (15)

All nI1n_{\scriptscriptstyle{\text{I}}}-1 terms in the sum given in Eq. 14 have the same average as given in Eq. 15. Hence, from Eq. 14 we can write,

Ei1(1)¯=qqN21(nI1)\overline{E_{i_{1}}(1)}=q-\frac{q}{N^{2}-1}\left(n_{\scriptscriptstyle{\text{I}}}-1\right) (16)

Hence, for all infectious cases,

E(1)¯=nIEi1(1)¯=nIqqnI(nI1)N21\overline{E(1)}=n_{\scriptscriptstyle{\text{I}}}\overline{E_{i_{1}}(1)}=n_{\scriptscriptstyle{\text{I}}}q-\frac{qn_{\scriptscriptstyle{\text{I}}}\left(n_{\scriptscriptstyle{\text{I}}}-1\right)}{N^{2}-1} (17)

4.3 Interpreting the basic reproduction number

Let, τR\tau_{\scriptscriptstyle{\text{R}}} represent the mean infectious period. If initial infectious cases, nIN2n_{\scriptscriptstyle{\text{I}}}\ll N^{2}, the new infected cases, E(1)nIqE(1)\approx n_{\scriptscriptstyle{\text{I}}}q (Eq. 13) after unit time steps. So, one infectious person will spread approximately qq number of infections. As the infectious period is τR\tau_{\scriptscriptstyle{\text{R}}} one infectious person will spread qτRq\tau_{\scriptscriptstyle{\text{R}}} number of infections in this period. This quantity is equivalent to the basic reproduction number (R0R_{0}).

5 Algorithm and simulation of the CA model

In this section, we have discussed about the algorithm of our CA model and shown multiple simulation results.

5.1 Algorithm

This CA model consists of four sub-populations: susceptible (SS), exposed (EE), infectious (II) and removed (RR) (recovered+dead). At any time tt, each cell of the lattice belongs to any one of these sub-populations. For simulations, these sub-populations are represented with values 0, 1, 2 and 3 respectively. The algorithm for the CA model is same as described in Chowdhury et. al (2022) [40].

Suppose, the lattice size is N×NN\times N and at time tt the number of cells belonging to different sub-populations like exposed, infectious and removed are NE=E(t)N_{\text{E}}=E\left(t\right), NI=I(t)N_{\text{I}}=I\left(t\right), and NR=R(t)N_{\text{R}}=R\left(t\right) respectively. Thus the number of cells belonging to the susceptible sub-population at time tt is S(t)=N2NENINRS(t)=N^{2}-N_{\text{E}}-N_{\text{I}}-N_{\text{R}}. Any susceptible cell at position (i,j)(i,j) can be infected with a probability QI(i,j,t)Q_{\text{I}}\left(i,j,t\right) as described in Eq. 7. These newly infected cells at time tt will be moved to the exposed sub-population at t+1t+1. Also, the exposed and infectious cells will move to the infectious and removed sub-populations respectively after the latency period (τI\tau_{\scriptscriptstyle{\text{I}}}) and infectious period (τR\tau_{\scriptscriptstyle{\text{R}}}) for the disease. Therefore at time tt, if a cell is exposed (or infectious), it will move to the infectious (or removed) sub-population at time t+τIt+\tau_{\scriptscriptstyle{\text{I}}} (or t+τRt+\tau_{\scriptscriptstyle{\text{R}}}).

5.2 Simulated results

In this section, we have discussed various simulations and results of our model. To start with, all the free parameters of our simulation are lattice size (N×NN\times N), sample size (SS), probability of infection per contact (qq), degree exponent (nn), mean latency period of the disease (τI\tau_{\scriptscriptstyle{\text{I}}}), and mean infectious period of the disease (τR\tau_{\scriptscriptstyle{\text{R}}}).

For our simulations, we have assumed the values of τI\tau_{\scriptscriptstyle{\text{I}}} and τR\tau_{\scriptscriptstyle{\text{R}}} as τI=8days\tau_{\scriptscriptstyle{\text{I}}}=8~{}\text{days} and τR=18days\tau_{\scriptscriptstyle{\text{R}}}=18~{}\text{days}. Also we have assumed that the initially there is only one infectious cell in the whole lattice and all other cells are susceptible. Hence, the initial condition for the CA model with a N×NN\times N lattice is I(0)=1I\left(0\right)=1, E(0)=0E\left(0\right)=0, R(0)=0R\left(0\right)=0, S(0)=N21S\left(0\right)=N^{2}-1.

At first, we have checked the dependence of the model on the two free parameters, sample size (SS) and lattice size (N×NN\times N). We have checked this through the time evolution of the fraction of the infectious cases (i(t)=I(t)N2i(t)=\frac{I(t)}{N^{2}}). The results are shown in the Fig. 3.

Refer to caption
(a)
Refer to caption
(b)
Figure 3: Plots of the variations of i(t)i(t) for different sample and lattice sizes. (a) Plot of the variations of i(t)i(t) for different sample sizes when lattice size fixed at 101×101101\times 101. (b) Plot of the variations of i(t)i(t) for different lattice sizes when sample size fixed at 50.

Fig. 3(a) shows the variation of i(t)i(t) which is averaged out for different sample sizes when q=0.3q=0.3, n=2n=2 and lattice size is fixed at 101×101101\times 101. It is found that there is no major deviation happens in i(t)i(t) by changing sample size.

Fig. 3(b) shows the variation of i(t)i(t) for different lattice sizes. Here, we have again considered q=0.3q=0.3, n=2n=2 and also the sample size as S=50S=50. It is found that if the initial fraction of the infection cases (i(0)i(0)) is same then the evolution of i(t)i(t) is approximately independent of the lattice size. Here we have checked the results with different combinations of the parameters for both of these cases. Thus we can conclude, the model is approximately sample and size independent. For future simulations we have fixed the lattice size at N×N=101×101N\times N=101\times 101 and sample size at S=50S=50.

Refer to caption
(a)
Refer to caption
(b)
Figure 4: Plots of i(t)i(t) for different values of qq and nn (a) Plot of i(t)i(t) for different values of qq for n=2.0n=2.0. (b) Plot of i(t)i(t) for different values of nn for q=0.3q=0.3.

Fig. 4 shows how the the fraction of the infectious cases (i(t)i(t)) varies with time for different qq values and nn values. In Fig.4(a) it is shown how i(t)i(t) varies for different qq when n=2n=2. Here, we can see that growth of i(t)i(t) is very fast for high qq values. This is reasonable because high qq means when a susceptible people cell with an infectious cell then there is a very high chance that the susceptible people will be infected. Thus the disease spread is faster for high qq values than the low qq values.

To simulate Fig. 4(b) we have fixed qq at q=0.3q=0.3. It can be seen from this figure that when nn increases the peak of the fraction of the infectious cases (i(t)i(t)) falls significantly and disease spread become slower.

Refer to caption Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption Refer to caption Refer to caption
Figure 5: Top row shows the i(t)i(t) plots with 1σ1\sigma interval for different values of nn and bottom row shows the spatial distribution of the disease spread at a intermediate time for the corresponding values of nn.

To understand the reason behind the slower disease spread for higher nn we have plotted the time variation of i(t)i(t) with the spatial distribution of the disease spread at a intermediate time for different values of nn (Fig. 5). Here, the spatial distribution for a particular nn is chosen randomly from 50 samples. It is shown that the disease spread is more clustered for higher values of nn. This is the reason for slower disease spread. Also, Fig. 4(b) shows that for n=0n=0 the peak of i(t)i(t) is lower than the n=0.5n=0.5. The reason behind this is the variance σd2\sigma_{d}^{2}. For this figure, n=0.5n=0.5 have the highest σd2\sigma_{d}^{2} value amongst all nn values which have been used for this plot (Fig. 4(b)).

6 Comparison with the continuous SEIR model

In this section, we have compared our model with the continuous SEIR model. The model is described by the following equations:

dsdt=βsi\frac{ds}{dt}=-\beta si (18)
dedt=βsiσe\frac{de}{dt}=\beta si-\sigma e (19)
didt=σeγi\frac{di}{dt}=\sigma e-\gamma i (20)
drdt=γi\frac{dr}{dt}=\gamma i (21)

β\beta= Infection rate.

σ\sigma= Latency period.

γ\gamma= Infectious period.

In this model, β\beta, σ\sigma and γ\gamma are the free parameters. By definition we have assumed σ=1τI=0.125day1\sigma=\frac{1}{\tau_{\scriptscriptstyle{\text{I}}}}=0.125~{}day^{-1} and γ=1τR=0.0556day1\gamma=\frac{1}{\tau_{\scriptscriptstyle{\text{R}}}}=0.0556~{}day^{-1}. We have generated various simulated data from the stochastic model for different qq values with nn= 0.0, 0.5, 1.0, 1.5, 2.0, 2.5 and 3.0. Now for a particular qq value, we have fitted the continuous model on the simulated data corresponding to these nn values to find the best fit result for this particular qq. To achieve this, the sum of squared errors (SSESSE) of i(t)i(t) has been minimized by varying the free parameter β\beta using an optimization algorithm, pattern search (PS).

SSE=k(ikik^)2SSE=\sum_{k}\left(i_{k}-\hat{i_{k}}\right)^{2} (22)

ii= Fraction of the infectious cases of the simulated data from our model.

i^\hat{i}= Fraction of the infectious cases from the continuous SEIR model.

Refer to caption Refer to caption Refer to caption Refer to caption
Figure 6: Best fit curves of the continuous model to stochastic data for q=0.2q=0.2, q=0.3q=0.3, q=0.4q=0.4, and q=0.5q=0.5 (left to right order).
qq nn β\beta SSESSE
0.2 2.0 0.1554 3.90E-02
0.3 2.5 0.1544 5.73E-02
0.4 3.0 0.1367 7.94E-02
0.5 3.0 0.1512 8.48E-02
0.6 3.0 0.1677 9.11E-02
Table 1: Pair of qq and nn values for which continuous model fit well on the CA model data. The optimized values of β\beta of the continuous model and corresponding SSESSE values are also shown here.

Tab. 1 shows different qq values and corresponding nn values for which continuous model fit well on our CA model data. Also, this table shows the optimized β\beta values of the continuous model and SSESSE values of the respective cases. In Fig.6 shows the best fit results which are given in Tab. 1. From these results we can say that continuous model fits better on our model for higher nn.

7 Comparison with other neighbourhood conditions

In our model, we have introduced a different neighbourhood condition (refer to Sec. 2 for a detailed discussion) than those favored in the current literature. In this context, we discuss and compare our neighbourhood condition with other neighbourhood conditions in this section.

7.1 Moore’s neighbourhood condition

Refer to caption
Figure 7: This figure shows Moore’s neighbourhood condition where the blue cells represent nearest neighbourhood of the cell, (i,j)(i,j).

In Moore’s neighbourhood condition any cell can interact only with those neighbours which belongs to the first layer [25]. In our model when n>3n>3, d1\langle d\rangle\approx 1 and σd2\sigma_{d}^{2} decreases with increasing nn. Thus our model with degree exponent n3n\gg 3 will give a similar result like a model with Moore’s neighbourhood condition. Fig.8 shows the comparison between Moore’s neighbourhood condition and our model for n=10n=10.

Refer to caption
Figure 8: Comparison of the model with Moore’s neighbourhood condition and our model for n=10n=10.

7.2 rr-neighbourhood

Another one of the most popular neighbourhood conditions, which has been used in many CA models to study with the COVID-19 pandemic, is the rr-neighbourhood condition [36, 37, 38]. In this neighbourhood condition, a cell can interact any other cell in a radius of rr. Thus the probability of interaction of a cell with any other cells upto the rrth layer is assumed to be equal. Fig. 9 shows this neighbourhood condition for r=3r=3.

Refer to caption
Figure 9: rr-neighbourhood condition for r=3r=3. Here, blue cells represent nearest neighbourhood of the cell, (i,j)(i,j).

Thus the interaction probability of a cell to another cell of layer \ell, where r\ell\leq r is

pint()=8nr21,r.p_{\scriptscriptstyle{\text{int}}}(\ell)=\frac{8\ell}{n_{r}^{2}-1},\qquad\ell\leq r. (23)

Here, 88\ell is the total number of cells in layer \ell and nrn_{r} is the total number of cells at one side of the rrth layer which have value nr=2r+1n_{r}=2r+1. Thus for a rr-neighbourhood model, the probability of interaction varies linearly with layer number (\ell) when r\ell\leq r and goes to zero for >r\ell>r. Hence if we take the value of the degree exponent as, n=1n=-1 for r\ell\leq r and n3n\gg 3 when >r\ell>r in our model (referred to Eq. 2) then it gives the similar result as for the rr-neighbourhood model.

Thus in our neighbourhood condition, we have introduced a discontinuity at the layer =r\ell=r for defining our probability of interaction to reproduce the results of the rr-neighbourhood condition. Thus, (pintp_{\scriptscriptstyle{\text{int}}}) is mathematically expressed as,

pint()1n,n=1d.p_{\scriptscriptstyle{\text{int}}}(\ell)\propto\frac{1}{\ell^{n}},~{}n=-1\qquad\ell\leq d. (24)
pint()1n,n3>d.p_{\scriptscriptstyle{\text{int}}}(\ell)\propto\frac{1}{\ell^{n}},~{}n\gg 3\qquad\ell>d. (25)
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 10: Comparison of different rr-neighbourhood models with our model. (a) r=3r=3, (b) r=5r=5 and (c) r=Lr=L, LL is the total number of layers in a N×NN\times N lattice.

Fig. 10(a) and 10(b) shows the comparison between the rr-neighbourhood model and our model for r=3r=3 and r=5r=5 respectively. Also for a N×NN\times N lattice, if we choose r=Lr=L, where LL denotes the total number of layers in the lattice and given by Eq. 1 then one cell can interact with any other cell of the lattice. The comparison of the results for rr-neighbourhood model with r=Lr=L and our model is shown in Fig. 10(c).

8 COVID-19 data and our model

Here, we have discussed the results obtained from fitting of our model on multiple peaks of COVID-19 data for different countries. Here, we have considered a total of eight countries: Italy, France, Germany, Brazil, US, India, South Africa and Japan. The arrangement and number of peaks in daily data of new cases varies with respect to different countries. Also, the size of the peaks differ for different countries. Thus to fit the model in COVID-19 data we have made following assumptions:

  1. 1.

    We have explored only those part of COVID-19 data where there has been a sharp increase of daily new cases and have ignored those parts where daily data for new cases remains low for a long time.

  2. 2.

    Every peak of this disease spread was considered separately for fitting. When a particular peak other than the first peak is modeled, the effective susceptible population that has been considered includes both new susceptible people and the people who had been infected but have again become susceptible due to lack of sustainable immunity against COVID-19.

  3. 3.

    Each peak is considered as a new disease spread in a population. It is like a fresh start. So, it does not have any relation with the disease spread previous to it. The peaks of COVID-19 for a particular country have different features and reasons of occurrence. The number of infected persons as well as the time span for these peaks are different. Also, there could be multiple reasons for their incidence like mutation in the SARS-COV-2 virus or decrement of social restrictions. CA models have been made with finite lattices and we need to fix its size throughout the study. However, the number of effective susceptible people (susceptible persons who take part in the disease dynamics) in these peaks are different due to the variation of the total number of infections between peaks. Thus it will be difficult to model all the peaks together by a single CA model.

We have taken data from the COVID-19 data repository of Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) [47]. This repository contains time series data of confirmed cases and deaths for different countries. However, the time series data for recovered cases is discontinued at a very early stage. Our model needs removed cases (Recovered and dead both) data, which was not available in this database. Thus we have made further assumptions which are enlisted below:

  1. 1.

    Number of deaths (D(t)D(t)) and removed cases (R(t)R(t)) at any time tt is related by,

    D(t)=k(t)R(t)D(t)=k(t)R(t) (26)

    Here k(t)k(t) is a function of time tt and has a value between 0 and 1.

  2. 2.

    Number of deaths during the COVID-19 pandemic not only depended on the fatality of the virus but also on the condition of the medical infrastructure. New variants of SARS-COV-2 virus appeared at different times as well as the medical supplies and the healthcare facilities were not always the same during this entire COVID period. Thus it is a better proposal to keep kk dynamic with time. However, to preserve the simplicity of our model, we have assumed that during a particular peak in a country, kk is constant.

Here the model is fitted to the confirmed cases data. Since we have considered the peaks separately, the confirmed cases data for a particular peak required proper scaling before fitting. We suppose that any one chosen peak starts at time T1T_{1} and ends at T2T_{2}. Thus to scale the confirmed cases data in the range T1T_{1} to T2T_{2}, we have to subtract the confirmed cases data in this range with the confirmed cases at time T11T_{1}-1. So the scaled confirmed cases for a particular peak PP can be written as,

ICP(t)=IC(t)IC(T11)T1tT2I_{C}^{P}(t)=I_{C}(t)-I_{C}(T_{1}-1)\qquad T_{1}\leq t\leq T_{2} (27)

Also the confirmed cases for peak PP (ICP(t)I_{C}^{P}(t)) has been normalized as,

iCP(t)=ICP(t)ICP(T2)T1tT2i_{C}^{P}(t)=\frac{I_{C}^{P}(t)}{I_{C}^{P}(T_{2})}\qquad T_{1}\leq t\leq T_{2} (28)

The time scale of iCP(t)i_{C}^{P}(t) has also been changed from T1tT2T_{1}\leq t\leq T_{2} to 0tT2T10\leq t\leq T_{2}-T_{1}.

These scaled and normalized confirmed cases, iC(t)i_{C}(t) in 0tT0\leq t\leq T (omitting the superscript and assuming T2T1=TT_{2}-T_{1}=T) is fitted with the normalized total infected cases, itot(t)i_{tot}(t) in the same timescale. Here the total infected cases (from our model) is normalized as,

itot(t)=Itot(t)Itot(T)0tTi_{tot}(t)=\frac{I_{tot}(t)}{I_{tot}(T)}\qquad 0\leq t\leq T (29)

We have used Genetic Algorithm (GA) to optimize the following Sum of Squared Errors (SSE) as given in Eq. 30 to fit our model to the data.

SSE=t=0T(iC(t)itot(t))2SSE=\sum_{t=0}^{T}(i_{C}(t)-i_{tot}(t))^{2} (30)

Here,

iC(t)i_{C}(t)= Fraction of confirmed cases for a particular peak.

itot(t)i_{tot}(t)= Fraction of total cases for that peak simulated from the model.

For fitting, we have assumed a squared lattice of size N×N=101×101N\times N=101\times 101. Our model has four free parameters (q,n,τI,τRq,n,\tau{\scriptscriptstyle{\text{I}}},\tau_{\scriptscriptstyle{\text{R}}}) and needs three initial conditions (E(0),I(0),R(0)E(0),I(0),R(0)). We have assumed that, E(0)=0E(0)=0, R(0)=0R(0)=0 and taken I(0)I(0) as a free parameter. Thus, Eq. 30 is optimized with five free parameters q,n,τI,τRq,n,\tau_{\scriptscriptstyle{\text{I}}},\tau_{\scriptscriptstyle{\text{R}}} and I(0)I(0).

To compare the model results with the data representing the number of deaths, we need a slightly different scaling than the one which is described in Eq. 27. According to our model, an infectious person will be moved to the removed compartment after τR\tau_{\scriptscriptstyle{\text{R}}} days.Thus, for 0t<τR0\leq t<\tau_{\scriptscriptstyle{\text{R}}}, the removed cases, R(t)R(t) as well as deaths, D(t)D(t) is zero. Suppose DP(t)D^{P}(t) denotes the number of deaths at any time tt for a particular peak PP which is ranged from T1tT2T_{1}\leq t\leq T_{2}. We have assumed each peak separately, however the number of deaths will not be zero in the range T1t<T1+τRT_{1}\leq t<T_{1}+\tau_{\scriptscriptstyle{\text{R}}}. Number of deaths in this range depends on the number of infectious cases (or, active cases, both are same in our model) at t<T1t<T_{1}, which are not zero generally. Thus to compare our model to the data representing deaths, we have to omit number of deaths in the range T1t<T1+τRT_{1}\leq t<T_{1}+\tau_{\scriptscriptstyle{\text{R}}}. Thus the proper scaling for the deceased cases is,

DP(t)=D(t)D(T1+τR1)T1+τRtT2D^{P}(t)=D(t)-D(T_{1}+\tau_{\scriptscriptstyle{\text{R}}}-1)\qquad T_{1}+\tau_{\scriptscriptstyle{\text{R}}}\leq t\leq T_{2} (31)

If we re-scale the time axis, then the above relation will have a form,

DP(t)=D(t)D(τR1)τRtTD^{P}(t)=D(t)-D(\tau_{\scriptscriptstyle{\text{R}}}-1)\qquad\tau_{\scriptscriptstyle{\text{R}}}\leq t\leq T (32)

Thus, in this paper, we have fitted our model to the data by considering each peak separately by minimizing Eq. 30. We have used GA to optimize Eq. 30. We have run GA multiple times to get best fit parameter values. Then we have re-checked these parameters values by comparing the variation of deaths (simulated) with the deceased cases data.

In Fig. 13, we show our model is fitted to the confirmed cases data of six peaks (with proper scaling) of Italy. Also, we have used best fit values of the parameters to simulate the variations of the fraction of death cases for these peaks and have compared them with the scaled deceased cases data of Italy in Fig. 14.

Refer to caption
(a)
Refer to caption
(b)
Figure 11: All fitted plots for five different peaks are combined and plotted against the whole data of Italy with proper re-scaling.
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
[Uncaptioned image] [Uncaptioned image]
Refer to caption Refer to caption
Refer to caption Refer to caption
Refer to caption Refer to caption
Figure 12: Left column shows model fitting to the confirmed cases data of different countries. Simulated variations of deaths for different countries are shown in right column.
Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption Refer to caption
Figure 13: Model fitting to the confirmed cases data of Italy for five different peaks. Peaks are counted from left to right and top to bottom.
Refer to caption Refer to caption Refer to caption
Refer to caption Refer to caption Refer to caption
Figure 14: Model fitting to the death data of Italy for five different peaks. Peaks are counted from left to right and top to bottom.

In Fig. 11, we have combined all the fitted plots and re-scaled them to plot against the whole data of Italy from 22/01/2020 to 15/06/2022. Fig. 11(a) represents the plot for confirmed cases data as well as our model. Fig. 11(b) represents the same for number of deaths.

In Fig. 12, we have shown variations of confirmed cases as well as deaths with time for different countries. In confirmed cases plots, there are some intermediate parts which are not fitted. These regions in-between the peaks, the daily number of new cases are very low and tending to almost zero in comparison to the earlier days. This leads to an almost flat region in the curve for the cumulative number of infected people. Our fit incorporates this behaviour as an end to the falling part of each of the peaks. We have judiciously chosen not to fit small regions only when the number of infected people reach the base level. Also, in death plots, there are some more parts which are not under the simulated regions. This is because of the delay (τR\tau_{\scriptscriptstyle{\text{R}}}) in death cases, as mentioned earlier. Thus for each peak initial τR1\tau_{\scriptscriptstyle{\text{R}}}-1 number of points are omitted for simulation, as described in Eq. 32.

Country Parameters Peak number
1st 2nd 3rd 4th 5th 6th
Italy qq 0.36985 0.21206 0.24585 0.16389 0.35715 0.13025
nn 1.06260 2.82571 2.95011 2.79717 2.73188 2.21768
τI\tau_{\scriptscriptstyle{\text{I}}} 3 4 7 5 5 1
τR\tau_{\scriptscriptstyle{\text{R}}} 4 17 9 20 10 12
I(0)I(0) 111 151 362 227 162 254
Germany qq 0.20273 0.10106 0.16616 0.17170 0.18498 0.11204
nn 1.09335 2.81062 1.12124 2.79799 1.59799 2.31692
τI\tau_{\scriptscriptstyle{\text{I}}} 3 7 9 4 2 5
τR\tau_{\scriptscriptstyle{\text{R}}} 14 36 14 20 10 20
I(0)I(0) 96 228 397 144 99 203
France qq 0.37132 0.16421 0.11439 0.17635 0.25612 0.14976
nn 1.57481 0.45996 0.93457 2.79593 2.65914 2.85849
τI\tau_{\scriptscriptstyle{\text{I}}} 4 9 8 4 3 1
τR\tau_{\scriptscriptstyle{\text{R}}} 7 18 12 18 12 17
I(0)I(0) 114 176 170 196 218 481
US qq 0.19558 0.11801 0.17678 0.17359
nn 2.86244 2.60631 2.92009 2.17452
τI\tau_{\scriptscriptstyle{\text{I}}} 5 5 5 2
τR\tau_{\scriptscriptstyle{\text{R}}} 11 18 21 19
I(0)I(0) 156 97 92 166
Brazil qq 0.15814 0.10295 0.27827
nn 0.15893 2.17049 2.36806
τI\tau_{\scriptscriptstyle{\text{I}}} 3 8 5
τR\tau_{\scriptscriptstyle{\text{R}}} 8 16 11
I(0)I(0) 11 62 135
India qq 0.13576 0.12967 0.30007
nn 1.15718 0.93383 1.84286
τI\tau_{\scriptscriptstyle{\text{I}}} 8 3 2
τR\tau_{\scriptscriptstyle{\text{R}}} 11 15 12
I(0)I(0) 29 26 45
Japan qq 0.19671 0.17511 0.12692 0.14812 0.23149 0.28547
nn 1.44642 2.76158 1.58285 1.19748 2.09850 2.69654
τI\tau_{\scriptscriptstyle{\text{I}}} 1 4 8 8 6 2
τR\tau_{\scriptscriptstyle{\text{R}}} 13 23 19 16 21 6
I(0)I(0) 123 132 73 200 96 42
SA qq 0.17435 0.21300 0.15079 0.22935 0.29409
nn 1.15340 1.29217 1.60608 2.43431 1.63147
τI\tau_{\scriptscriptstyle{\text{I}}} 8 7 4 2 3
τR\tau_{\scriptscriptstyle{\text{R}}} 14 11 12 24 7
I(0)I(0) 64 67 22 71 118
Table 2: Best fit parameter values of different peaks of different countries.

Tab. 2 shows all the best fit parameter values to the confirmed cases data (scaled) of different peaks for various countries. The fitted values of a particular parameter are not only varied for different countries but also varied between peaks. The distribution of the best-fit values of the parameters are represented with box plots and shown in Fig. 15. In the box plot, the middle red line represents the median value of the distribution. Upper and lower edge of the box represents third and first quartile values of the distribution respectively, whereas the length of the box represents inter-quartile range (IQR). Also the upper and lower whiskers represent the values which are lain above and below the third and first quartile values respectively and have a length of 1.5*IQR. Outliers are represented with red cross. The median values of qq, nn, τI\tau_{\scriptscriptstyle{\text{I}}}, τR\tau_{\scriptscriptstyle{\text{R}}} and I(0)I(0) are 0.17511, 2.1705, 4, 14 and 123 respectively. From fig. 15(b), we can see that the values of nn is closely distributed to 3.0.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 15: Box plots of the best-fit parameter values of different peaks of different countries. (a) Box plot of qq, (b) box plot of nn, (c) box plots of τI\tau_{\scriptscriptstyle{\text{I}}} and τR\tau_{\scriptscriptstyle{\text{R}}}, (d) box plot of initial infectious cases (I(0)I(0)).
Country Peak number
1st 2nd 3rd 4th 5th 6th
Italy 0.14365 0.02494 0.02080 0.00890 0.00259 0.00223
Germany 0.04838 0.03428 0.01427 0.00614 0.00654 0.00128
France 0.15821 0.01343 0.01644 0.00392 0.00114 0.00112
US 0.01831 0.01463 0.01314 0.00502
Brazil 0.03332 0.02784 0.00550
India 0.01540 0.01307 0.00431
Japan 0.05050 0.01076 0.01883 0.01773 0.00338 0.00187
SA 0.02473 0.03896 0.02562 0.00831 0.00538
Table 3: Best fit values of kk for different peaks of different countries.

The values of kk (as described in equation. 26) which are estimated to compare our model to the deceased cases data for different peaks of various countries are shown in Tab. 3. This table shows an overall decrement of kk values as peak number increases for a country. This means, the death rate due to COVID-19 is decreased with time and it is true for all countries.

9 Conclusions

To conclude, we summarize the main results and features of this work with a short discussion. Our main aim, as stated earlier, is to build a generalized probabilistic cellular automata model with the help of basic SEIR model to study the spread of epidemics, with reference to COVID-19. In this study, our main motive is to show that merging a simple model like SEIR model with a CA model with modified neighbourhood condition can produced very good results for a pandemic which is versatile in many different fronts. We have tried to show that a simple model like the one that we have adopted has been able to take into account the effects produce by the different factors of this pandemic like infectivity of the virus, social restrictions, immunity due to vaccination, re-infection and mortality rate to a large extent. Here we have described a method to fit the COVID-19 data for different peaks with different intensities and time periods. We have compared our model with some general models and tried to understand their differences. We have also fitted our model to the data for total infected cases of eight countries and tried to understand the relevance of this model in the spread of infection of COVID-19 in the different countries widely. To achieve this, we have taken the following steps:

  • A square lattice of the size N×NN\times N is considered for our study. We have chosen different values of NN like 101, 201, 301.

  • We have built a neighbourhood criteria and made some assumptions which are described in Sec. 2. According to this neighbourhood criteria, the distance between any two cells (dd) is proportional to the number of layers (ll) between them. Also, the probability of interaction between two cells, separated by distance dd, pint(d)1dnp_{\scriptscriptstyle{\text{int}}}(d)\propto\frac{1}{d^{n}}, where nn is power law exponent.

  • The average distance of interaction (d\langle d\rangle) is calculated by considering the above mentioned neighbourhood condition (described in Sec. 2.2). average distance of interaction falls very rapidly to one for positive values of nn and saturates to a value equal to the maximum layer number (LL) for negative values of nn. Also, in this section we have reasoned about the irrelevance of large positive and negative values of nn.

  • In Sec. 3 and 4, QI(i,j)Q_{\text{I}}(i,j), which is the probability of infection for any cell (i,j)(i,j), has been defined and a mathematical analysis based on this quantity QIQ_{\text{I}} has been undertaken. We find that initially, if there are nIn_{\scriptscriptstyle{\text{I}}} number of initial infectious cases, the newly infected cases will be nIq\sim n_{\scriptscriptstyle{\text{I}}}q, where qq represents infectivity of the disease. Also, we have defined an equivalent quantity for the basic reproduction number (R0R_{0}) as, qτRq\tau_{\scriptscriptstyle{\text{R}}}, where τR\tau_{\scriptscriptstyle{\text{R}}} represents infectious period of the disease.

  • A suitable epidemic algorithm is defined for our model which is described in Sec. 5.1.

We have run model simulations and compared our results with various other models, which are listed below:

  • We have simulated our model with different lattice and sample sizes (discussed in Sec. 5.2). However, we could not find any significant variations in the results by changing lattice and sample sizes. Thus throughout our study, we have considered lattice size N×N=101×101\equiv N\times N=101\times 101 and sample size S=50S=50.

  • There are two parameters of our model which mainly control the rate of the disease spread or in other words sharpness of the infectious cases (or active cases) curve. These two parameters are: qq, which is referred as infectivity of the disease and nn, which is defined as ‘social confinement’. Thus if qq decrease or nn increase, the rate of the disease spread will decrease and the curves of the infectious cases will be flatten out. These results are portrayed in Fig. 4 of Sec. 5.2.

  • In Sec. 5.2, we have shown the effect of nn on the spatial distribution of disease spread. As nn increases that is the parameter controls ‘social confinement’, the disease spreads in small pockets in clustered form.

  • We have seen that our model has been able to reproduce the basic features of the SEIR model quite well with some discrepancies as shown in Fig. 6.

  • We have also compared our model with other CA models having different neighbourhood conditions. We have shown that for n3n\gg 3 our model behaves exactly similar to the models with Moore’s neighbourhood condition. If we assume nn as a step function of layer number (\ell) instead of choosing nn is a constant, our model exhibits similar results as the CA models consisting different rr-neighbourhood conditions. (detailed discussion in Sec. 7)

Finally, our model is used to study the time variation of COVID-19 cases in different countries for understanding this pandemic properly. We have used data of COVID-19 from eight countries, namely Italy, Germany, France, US, Brazil, India, Japan and South Africa, to obtain a fit of our model. Procedures and results of this study are described below:

  • We have fitted each of the peaks in the COVID-19 data separately to get the best fit parameters for our model.

  • The data of COVID-19 for different countries is obtained from JHU CSSE data repository. Before analyzing, a proper scaling has been done on this data. Our model is fitted with the data of confirmed cases for COVID-19. To get the best fit parameters, we have minimized the sum of squared error (SSE) between the CA model and data by applying a genetic algorithm (GA), as discussed in Sec. 8.

  • The parameter values obtained from the fitting is then re-checked with the data for deceased cases. Our model can generate the data for removed cases (\equiv recovered+dead). However, the data for recovered cases is not fully available in JHU CSSE data repository. We have assumed that for a particular peak, deceased cases vary linearly with removed cases, where kk is the proportionality constant and have values between 0 and 1. This assumption help us to generate the data for deceased cases from our model and compare with the real data (detailed discussion in Sec. 8).

  • The results of the fitting to the data for confirmed cases and comparison to the data for deceased cases of Italy are shown in Fig. 13, 14, 11 and 12. Also, the optimized parameter values for different peaks of the different countries are listed in Tab. 2.

  • These plots and the table show that the rate of the spread of COVID-19 mainly depends on two parameters qq and nn.

  • The data for deceased cases and simulated results show deviations in some situations, as seen in Fig. 12. The main reason behind this is the assumption of the linear relationship between the deceased cases and the removed cases for a particular peak. This zeroth order model with a linear relationship can be seen to be valid for most situations.

  • We have drawn box plots (Fig. 15) for fitting parameters to show their distribution. An important thing to note that the median value of nn is at 2.1705, which is above the central value, n=1.5n=1.5 of our chosen range. Thus theoretically we can say that the degree of ‘social confinement’ in different countries are above average. Effect of various restrictions like social-distancing, and lockdown may be causes of this.

  • Also Tab. 3 shows that the value of kk decreases (not always monotonically) as COVID-19 pandemic proceeds. This means the death rate decreases with time

In this work, we have analyzed COVID-19 data from a different perspective. In future, we aim to compare and validate this stochastic model with physically motivated models. We hope our work will help to understand and forecast future pandemic situations.

10 Acknowledgment

One of the authors (S. C.) would like to acknowledge the financial support lent by the University Grant Commission (UGC), Government of India, in the form of CSIR-UGC NET-JRF stipend. Another author (S. R. C.) would like to thank St. Xavier’s College (Autonomous), Kolkata for providing financial support in the form of ‘Intramural Research Grant for R&D Projects’. S. C. would also like to thank Mr. Saswata Das and Mr. Bappaditya Manna for providing support during this work. Finally, all the authors would like to acknowledge St. Xavier’s College (Autonomous), Kolkata as their host institution.

References

  • [1] Marco Marani, Gabriel G. Katul, William K. Pan, and Anthony J. Parolari. Intensity and frequency of extreme novel epidemics. Proceedings of the National Academy of Sciences, 118(35):e2105482118, 2021. DOI: 10.1073/pnas.2105482118.
  • [2] Camilo Mora, Tristan McKenzie, Isabella M Gaw, Jacqueline M Dean, Hannah von Hammerstein, Tabatha A Knudson, Renee O Setter, Charlotte Z Smith, Kira M Webster, Jonathan A Patz, et al. Over half of known human pathogenic diseases can be aggravated by climate change. Nature climate change, 12(9):869–875, 2022. DOI: 10.1038/s41558-022-01426-1.
  • [3] Live update of COVID-19 situation in different countries- Worldometers. https://www.worldometers.info/coronavirus/.
  • [4] Tracking SARS-CoV-2 variants (WHO). https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/.
  • [5] Hannah Ritchie, Nectar Gan, Simone McCarthy, Selina Wang, and Mengchen Zhang. Leaked notes from chinese health officials estimate 250 million covid-19 infections in december: reports. CNN International, Dec, 23, 2022.
  • [6] James Glanz, Mara Hvistendahl, and Agnes Chang. How deadly was china’s covid wave? The New York Times, Feb, 15, 2023.
  • [7] William Ogilvy Kermack, A. G. McKendrick, and Gilbert Thomas Walker. A contribution to the mathematical theory of epidemics. Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character, 115(772):700, 1927. DOI: 10.1098/rspa.1927.0118.
  • [8] Syafruddin Side and Mohd Salmi Md Noorani. A sir model for spread of dengue fever disease (simulation for south sulawesi, indonesia and selangor, malaysia). World Journal of Modelling and Simulation, 9(2):96–105, 2013.
  • [9] Mevin B. Hooten, Jessica Anderson, and Lance A. Waller. Assessing north american influenza dynamics with a statistical sirs model. Spatial and Spatio-temporal Epidemiology, 1(2):177–185, 2010. DOI: 10.1016/j.sste.2010.03.003.
  • [10] Daisuke Furushima, Shoko Kawano, Yuko Ohno, and Masayuki Kakehashi. Estimation of the basic reproduction number of novel influenza a (h1n1) pdm09 in elementary schools using the sir model. The Open Nursing Journal, 11:64, 2017. DOI: 10.2174/1874434601711010064.
  • [11] Tianmu Chen, Yuanxiu Huang, Ruchun Liu, Zhi Xie, Shuilian Chen, and Guoqing Hu. Evaluating the effects of common control measures for influenza a (h1n1) outbreak at school in china: A modeling study. PLOS ONE, 12(5):1–20, 2017. DOI: 10.1371/journal.pone.0177672.
  • [12] Christian L Althaus. Estimating the reproduction number of ebola virus (ebov) during the 2014 outbreak in west africa. PLoS currents, 6:ecurrents.outbreaks.91afb5e0f279e7f29e7056095255b288, 2014. DOI: 10.1371/currents.outbreaks.91afb5e0f279e7f29e7056095255b288.
  • [13] T. Berge, J.M.-S. Lubuma, G.M. Moremedi, N. Morris, and R. Kondera-Shava. A simple mathematical model for ebola in africa. Journal of Biological Dynamics, 11(1):42–74, 2017. DOI: 10.1080/17513758.2016.1229817.
  • [14] K. Glass, Y. Xia, and B.T. Grenfell. Interpreting time-series analyses for continuous-time biological models measles as a case study. Journal of Theoretical Biology, 223(1):19 – 25, 2003. DOI: 10.1016/s0022-5193(03)00031-6.
  • [15] Brody H. Foy, Brian Wahl, Kayur Mehta, Anita Shet, Gautam I. Menon, and Carl Britto. Comparing covid-19 vaccine allocation strategies in india: A mathematical modelling study. International Journal of Infectious Diseases, 103:431, 2021. DOI: 10.1016/j.ijid.2020.12.075.
  • [16] Namrata Soni, Jyoti Bhola, Ashutosh Yadav, Ishita Srivastva, and Utcarsh Mathur. A mathematical reflection of covid-19 and vaccination acceptance in india. 8:150, 2021. DOI: 10.21276/apjhs.2021.8.3.27.
  • [17] Ting-Yu Lin, Sih-Han Liao, Chao-Chih Lai, Eugenio Paci, and Shao-Yuan Chuang. Effectiveness of non-pharmaceutical interventions and vaccine for containing the spread of covid-19: Three illustrations before and after vaccination periods. Journal of the Formosan Medical Association, 120:S46, 2021. DOI: 10.1016/j.jfma.2021.05.015.
  • [18] Simon A Rella, Yuliya A Kulikova, Emmanouil T Dermitzakis, and Fyodor A Kondrashov. Rates of sars-cov-2 transmission and vaccination impact the fate of vaccine-resistant strains. Scientific Reports, 11(1):15729, 2021. DOI: 10.1038/s41598-021-95025-3.
  • [19] Sourav Chowdhury, Suparna Roychowdhury, and Indranath Chaudhuri. Universality and herd immunity threshold: Revisiting the sir model for covid-19. International Journal of Modern Physics C, 32(10):2150128, 2021. DOI: 10.1142/S012918312150128X.
  • [20] Sourav Chowdhury, Suparna Roychowdhury, and Indranath Chaudhuri. A robust prediction from a minimal model of covid-19–can we avoid the third wave? arXiv preprint arXiv:2112.08794, 2021.
  • [21] André F. Steklain, Ahmed Al-Ghamdi, and Euaggelos E. Zotos. Using chaos indicators to determine vaccine influence on epidemic stabilization. Phys. Rev. E, 103:032212, Mar 2021. DOI: 10.1103/PhysRevE.103.032212.
  • [22] Jorge Duarte, Cristina Januário, Nuno Martins, Jesús Seoane, and Miguel AF Sanjuán. Controlling infectious diseases: the decisive phase effect on a seasonal vaccination strategy. arXiv preprint arXiv:2102.08284, 2021.
  • [23] Gerardo Ortigoza, Fred Brauer, and Iris Neri. Modelling and simulating chikungunya spread with an unstructured triangular cellular automata. Infectious Disease Modelling, 5:197–220, 2020. DOI: 10.1016/j.idm.2019.12.005.
  • [24] Puspa Eosina, Taufik Djatna, and Helda Khusun. A cellular automata modeling for visualizing and predicting spreading patterns of dengue fever. TELKOMNIKA, 14(1):228, 2016. DOI: 10.12928/TELKOMNIKA.v14i1.2404.
  • [25] S. Hoya White, A. Martín del Rey, and G. Rodríguez Sánchez. Modeling epidemics using cellular automata. Applied Mathematics and Computation, 186(1):193–202, 2007. DOI: 10.1016/j.amc.2006.06.126.
  • [26] Gui-Quan Sun, Zhen Jin, Li-Peng Song, Amit Chakraborty, and Bai-Lian Li. Phase transition in spatial epidemics using cellular automata with noise. Ecological research, 26:333–340, 2011. 10.1007/s11284-010-0789-9.
  • [27] B. Cissé, S. El Yacoubi, and A. Tridane. Impact of neighborhood structure on epidemic spreading by means of cellular automata approach. Procedia Computer Science, 18:2603–2606, 2013. 2013 International Conference on Computational Science.
  • [28] A. Holko, M. Medrek, Z. Pastuszak, and K. Phusavat. Epidemiological modeling with a population density map-based cellular automata simulation system. Expert Systems with Applications, 48:1–8, 2016. DOI: 10.1016/j.eswa.2015.08.018.
  • [29] Khaled M Khalil, M Abdel-Aziz, Taymour T Nazmy, and Abdel-Badeeh M Salem. An agent-based modeling for pandemic influenza in egypt. In Handbook on Decision Making, volume 33, pages 205–218. Springer, 2012. DOI: 10.1007/978-3-642-25755-1_11.
  • [30] Senthil Athithan, Vidya Prasad Shukla, and Sangappa Ramachandra Biradar. Epidemic spread modeling with time variant infective population using pushdown cellular automata. Journal of Computational Environmental Sciences, 2014:769064, 2014. DOI: 10.1155/2014/769064.
  • [31] Sheng Bin, Gengxin Sun, and Chih-Cheng Chen. Spread of infectious disease modeling and analysis of different factors on spread of infectious disease based on cellular automata. International Journal of Environmental Research and Public Health, 16(23), 2019. DOI: 10.3390/ijerph16234683.
  • [32] Murali Krishna Enduri and Shivakumar Jolad. Dynamics of dengue disease with human and vector mobility. Spatial and Spatio-temporal Epidemiology, 25:57–66, 2018. DOI: 10.1016/j.sste.2018.03.001.
  • [33] Emily Burkhead and Jane Hawkins. A cellular automata model of ebola virus dynamics. Physica A: Statistical Mechanics and its Applications, 438:424–435, 2015. DOI: 10.1016/j.physa.2015.06.049.
  • [34] Armin R. Mikler, Sangeeta Venkatachalam, and Kaja Abbas. Modeling infectious diseases using global stochastic cellular automata. Journal of Biological Systems, 13(04):421–439, 2005. DOI: 10.1142/S0218339005001604.
  • [35] Henrique Fabricio Gagliardi and Domingos Alves. Small-world effect in epidemics using cellular automata. Mathematical Population Studies, 17(2):79–90, 2010. DOI: 10.1080/08898481003689486.
  • [36] P.H.T. Schimit. A model based on cellular automata to estimate the social isolation impact on covid-19 spreading in brazil. Computer Methods and Programs in Biomedicine, 200:105832, 2021. DOI: 10.1016/j.cmpb.2020.105832.
  • [37] P.K. Jithesh. A model based on cellular automata for investigating the impact of lockdown, migration and vaccination on covid-19 dynamics. Computer Methods and Programs in Biomedicine, 211:106402, 2021. DOI: 10.1016/j.cmpb.2021.106402.
  • [38] Sayantari Ghosh and Saumik Bhattacharya. A data-driven understanding of covid-19 dynamics using sequential genetic algorithm based probabilistic cellular automata. Applied Soft Computing, 96:106692, 2020. DOI: 10.1016/j.asoc.2020.106692.
  • [39] L. L. Lima and A. P. F. Atman. Impact of mobility restriction in covid-19 superspreading events using agent-based model. PLOS ONE, 16(3):1–17, 03 2021. DOI: 10.1371/journal.pone.0248708.
  • [40] Sourav Chowdhury, Suparna Roychowdhury, and Indranath Chaudhuri. Cellular automata in the light of covid-19. The European Physical Journal Special Topics, 231(18-20):3619–3628, 2022. DOI: 10.1140/epjs/s11734-022-00619-1.
  • [41] Yusra Bibi Ruhomally, Maheshsingh Mungur, Abdel Anwar Hossen Khoodaruth, Vishwamitra Oree, and Muhammad Zaid Dauhoo. Assessing the impact of contact tracing, quarantine and red zone on the dynamical evolution of the covid-19 pandemic using the cellular automata approach and the resulting mean field system: A case study in mauritius. Applied Mathematical Modelling, 111:567–589, 2022. DOI: 10.1016/j.apm.2022.07.008.
  • [42] Ihor Kosovych, Igor Cherevko, Denys Nevinskyi, and Yaroslav Vyklyuk. Simulation of various distribution restrictions of covid-19 using cellular automata. In 2022 12th International Conference on Advanced Computer Information Technologies (ACIT), pages 58–61, 2022. DOI: 10.1109/ACIT54803.2022.9913172.
  • [43] Puspa Eosina, Aniati Murni Arymurthy, and Adila Alfa Krisnadhi. A non-uniform continuous cellular automata for analyzing and predicting the spreading patterns of covid-19. Big Data and Cognitive Computing, 6(2), 2022. DOI: 10.3390/bdcc6020046.
  • [44] Larissa M. Fraga, Gina M. B. de Oliveira, and Luiz G. A. Martins. Multistage evolutionary strategies for adjusting a cellular automata-based epidemiological model. In 2021 IEEE Congress on Evolutionary Computation (CEC), pages 466–473, 2021. DOI: 10.1109/CEC45853.2021.9504738.
  • [45] George Grekousis. Spatial analysis methods and practice: describe–explore–explain through GIS. Cambridge University Press, 2020.
  • [46] A. Stewart Fotheringham. Spatial structure and distance-decay parameters. Annals of the Association of American Geographers, 71(3):425–436, 1981.
  • [47] Ensheng Dong, Hongru Du, and Lauren Gardner. An interactive web-based dashboard to track covid-19 in real time. The Lancet infectious diseases, 20(5):533–534, 2020. DOI: 10.1016/S1473-3099(20)30120-1.